Posts Tagged XenServer
I’m not big on “end of year” posts or predictions and lacking any other ideas, thought I’d write down some random thoughts about technology going through my head as this year draws to an end.
All Flash Array Dominance
I’m not buying the hype surrounding all flash array’s (AFA). Certainly there are legitimate use cases and they’ll be deployed more in the near future than they have in the past but the coming dominance of all flash array’s, I think, has been greatly exaggerated. It’s clear that the main problem these array’s are trying to solve is the extreme performance demands of some applications and I just think there are much better ways to solve this problem (e.g. local disk, convergence, local flash, RAM caching, etc) in most scenarios than purchasing disparate islands of SAN. And many of the things that make an AFA so “cool” (e.g. in-line dedupe, compression, no RAID, etc.) would be even cooler if the technology could be incorporated into a hybrid array. The AFA craze feels very much like the VDI craze to me, lots of hype about how “cool” the technology is but in reality a niche use case. Ironically, VDI is the main AFA use case.
The Emergence of Convergence
This year has seen a real spike in interest and deployment of converged storage/compute software and hardware and I’m extremely excited for this technology going into 2014. With VMware VSAN being GA in 2014, I expect that interest and deployment to rise to even greater heights. VSAN has some distinct strategic advantages over other converged models that should really make the competition for this space interesting. Name recognition alone is getting them a ton of interest. Being integrated with ESXi gives them an existing install base that already dominates the data center. In addition, it’s sheer simplicity and availability make it easy for anyone to try out. Pricing still hasn’t been announced so that will be the big thing to watch for in 2014 with this offering, that and any new enhancements that come with general availability. In addition to VSAN, EMC’s ScaleIO is another more ‘software-based’ rather than ‘appliance-based’ solution that is already GA that I’m looking forward to seeing more of in 2014. Along with VMware and EMC, Nutanix, Simplivity, Dell, HP, VCE, et al. all have varying “converged” solutions as well so this isn’t going away any time soon. With this new wave of convergence products and interest, expect all kinds of new tech buzzwords to develop! I fully expect and predict “Software Defined Convergence” will become mainstream by the end of the year!
Random convergence links:
Duncan Epping VSAN article collection – http://www.yellow-bricks.com/virtual-san/
Scott Lowe – http://wikibon.org/wiki/v/VMware_VSAN_vs_the_Simplicity_of_Hyperconvergence
Cormac Hogan looks at ScaleIO – http://cormachogan.com/2013/12/05/a-closer-look-at-emc-scaleio/
Good look at VSAN and All-Flash Array performance – http://blogs.vmware.com/performance/2013/11/vdi-benchmarking-using-view-planner-on-vmware-virtual-san-part-3.html
Chris Whal musing over VSAN architecture – http://wahlnetwork.com/2013/10/31/muse-vmwares-virtual-san-architecture/?utm_source=buffer&utm_medium=twitter&utm_campaign=Buffer&utm_content=buffer59ec6
The Fall of XenServer
As any reader of this blog knows, I used to be a huge proponent of XenServer. However, things have really gone downhill after 5.6 in terms of product reliability. So much so that I really have a hard time recommending it at all anymore. ESXi was always at the top of my list but XenServer remained a solid #2. Now it’s a distant 3rd in my mind behind Hyper-V. I’ll grant that there are many environments successfully and reliably running XenServer, I have built quite a few myself, but far too many suffer from bluescreen server crashes and general unreliability to be acceptable in many enterprises. The product has even had to be pulled from the site to prevent people from downloading it while bugs were fixed. I’ve never seen so many others express like sentiments about this product as I have seen this past year.
Random CTP frustration with XenServer:
Random stuff I’m reading
Colin Lynch has always had a great UCS blog and his two latest posts are great examples. Best UCS blog out there, in my opinion:
“UCS Manager 2.2 (El Capitan) Released”
“Under the Cisco UCS Kimono”
I definitely agree with Andre here! Too many customers don’t take advantage of CBRC and it’s so easy to enable:
“Here is why your Horizon View deployment is not performing to it’s max!”
Great collection of links and information on using HPs Moonshot ConvergedSystem 100 with XenDesktop by Dane Young:
“Citrix XenDesktop 7.1 HDX 3D Pro on HP’s Moonshot ConvergedSystem 100 for Hosted Desktop Infrastructure (HDI)”
In the end, this post ends up being an “end of year” post with a few predictions. Alas, at least I got the “random” part right…
For today’s post I’d like to introduce the first guest blogger to post on speakvirtual.com, Jamie Lin! Jamie has been working in the IT industry for a long time and has a ton of knowledge across a broad spectrum of technologies. Jamie and I co-wrote the below post and I anticipate him contributing more content in the future.
What is it?
With the advent of XenServer 5.6 SP2 and XenDesktop 5 SP1, Intellicache became a configurable and supported feature for the Citrix VDI stack. You can use Intellicache with the combination of XenServer and XenDesktop Machine Creation Services (MCS). The basic idea behind Intellicache is that it allows you take some of the pressure off of your shared storage by offloading IO onto host local storage. As discussed before on this site, IO in VDI environments has historically been one of the most overlooked and biggest technical challenges to any VDI rollout. With Intellicache, Citrix has sought to help alleviate this issue. See below for more on how this works and some additional considerations you won’t find in the documentation.
How does it work?
The folks over at Citrix blogs have already done an excellent job explaining how Intellicache works so we’ll try not to repeat too much here. However, at a fairly basic level, the offloading of IO is achieved by caching blocks of data accessed from shared storage by virtual desktops onto host local storage. So if Intellicache is enabled and a Windows 7 VM boots from a particular XenServer host, it will cache the roughly ~200MB accessed by the Operating System during the boot process on the hosts local storage. Subsequent VMs that boot up on that host will then access these blocks from local storage instead of the SAN. In addition, if you are using non-persistent images, your writes will occur exclusively on local storage as well. Persistent (aka “Dedicated”) images will write to local and shared storage. I think this image from the Citrix blog sums it up nicely:
You might also be wondering about storage space and what happens when you run out of room on your local storage. With both read and write caches happening on local storage, this is bound to happen from time to time. Luckily, Intellicache has taken this into account and will seamlessly fail back to shared storage in the event the local storage runs out of space. For more on “how it works”, see the link above or read more here.
How to enable Intellicache
This CTX article explains the process of enabling Intellicache quite nicely. Basically it’s a two-step process. The first step occurs during the installation of XenServer itself, where you select “Enable thin provisioning (Optimized storage for XenDesktop)”. This option will change the default local storage type from LVM to EXT3. The next step occurs after the installation of XenDesktop itself where you create a connection to your host infrastructure. There is a checkbox that says “Use IntelliCache to reduce load on the shared storage device”. Selecting this checkbox will change the virtual disk parameter “allow-caching ( RW):” to “true” for any virtual desktop created that uses that particular catalog. You can verify this by issuing the command “xe vdi-param-list uuid=<VDI_UUID>”:
You can also use the performance graphs to see Intellicache in action as well. In the performance tab, verify that “Intellicache Hits”, “Intellicache Misses” and “Intellicache Size” are all selected. If they are all selected, you will be able to monitor its usage as shown below:
While we’re uncertain as to if Citrix will support this or not, it is also possible to enable or disable Intellicache on a per VM basis. You do this with the following command, “xe vdi-param-set uuid=VDI_UUID allow-caching=true”. You can then use the param-list command to view the parameters of that virtual disk to see that “allow-caching” is set to true. As it starts to utilize Intellicache, you’ll see Intellicache hits and misses for the VM start to appear in the performance tab.
While this may appear a bit complicated, it is important to note that the only thing necessary to implement Intellicache is selecting the Thin Provisioning option during XenServer install and selecting the checkbox when creating the catalog in XenDesktop. These command line options merely allow you more granular control on configuring Intellicache and allow you to see what it’s doing “under the hood”.
According to the XenServer Installation guide, when you use Intellicache, “The load on the storage array is reduced and performance is enhanced”. Given that VDI IO is such a concern for most deployments, shouldn’t we just be enabling Intellicache all the time? Our answer is “no”. For while Intellicache does take IO pressure off of your shared storage array, you now have another IO concern to consider, IO on local storage. Remember what we said earlier about Intellicache failing back to shared storage if you run out of disk space on local storage? Well, what happens if your local storage can’t handle the IO being generated by the virtual desktops on your host, will it fall back to shared storage? The answer is no! There is no built in safeguard to prevent your VMs from using too much IO on local storage and thus, creating bad performance on any VM utilizing that hosts cache for reads and writes. This all but makes local storage SSDs an absolute necessity, particularly in blade environments where most vendors provide only two slots for local storage per blade. Given that most environments use N+1 redundancy for their host infrastructure, this means your local disks need to be able to handle the IO for the amount of VMs that can reside on two hosts!
There is another concern here as well that, as far as we can tell, is completely undocumented by Citrix. When you use Intellicache, non-persistent VMs will be unable to XenMotion! This makes complete sense when you think about it. How could a VM live migrate to another host when its write differentials reside on a separate host (the “Migrate to Server” option isn’t even present on these VMs)? What makes this so interesting is that it appears not to be mentioned by Citrix anywhere. It’s not in the installation guides, we couldn’t find it on edocs, and their blog on Intellicache only mentions XenMotioning in regards to dedicated desktops! This means you cannot perform any type of host maintenance that requires downtime while there are running non-persistent (aka “pooled”) desktops present on the host. Notice that we said “running”, not “in-use”, for a VM can still be running even though no one is using it. This caveat alone will be a deal-breaker for many considering the use of Intellicache and is something Citrix should have more openly documented.
With this post we wanted to give a broad overview on how Intellicache works and some general considerations before deploying XenDesktop with Intellicache. As we’ve seen, local host IO capability planning becomes paramount with the use of Intellicache and VM mobility is reduced. As it stands now, Intellicache use-case scenerios will be fairly limited and more features and configurable granularity needs to be built into the system before broader adoption can occur. We’ll dig deeper into Intellicache in future posts, in the meantime, let us know what you think!
This edition is the last in the series. I’ve created a new XenReference page where I will keep the most recent version of the XenApp, XenDesktop and XenServer cards. My goal is to keep these in sync with the major releases of each product (e.g. XenServer 6.0, 6.5, 7.0, etc.). Each XenReference version will probably be released sometime after the exam comes out for the newer version. Now that I’ve had a chance to run through this process three times, I will be implementing some changes in future versions.
Future versions of the XenReference cards will be less exam centric and less “wordy”. As the purpose of the card is to be a reference, the goal of future versions will be to feature those things that you have to know to manage a XenApp/XenDesktop/XenServer environment but may not always be able to easily remember. Things like, maximum amount of memory a host can support, version of Windows required, version of SQL required, commands, ports, etc. As always, please feel free to share any feedback in the comments section!
The statement, “there is no technical benefit to memory overcommitment” is usually met with universal scorn, righteous indignation and a healthy dose of boisterous laughter. However, at second glance this statement is not quite so absurd. Overcommiting memory does not make your VMs faster, it doesn’t make them more highly available and it doesn’t make them more crash-resistant. So, what is memory overcommitment good for? The sole goal of memory overcommitment is to put more VMs per host. This saves you money. Thus, the benefit that memory overcommitment provides is a financial benefit.
Are there other ways to attain this financial benefit?
Memory overcommitment is one of the main reasons people choose the ESX hypervisor. If the goal of memory overcommitment is to save money and there are other ways to attain these cost savings on other hypervisors without utilizing memory overcommitment, does that change the hypervisor landscape at all? Before delving into that question, let’s first see if there is a way to save as much money without using memory overcommitment.
One way around this I’ve heard suggested is to just increase the memory in your hosts. So, if you had an ESX host with 10GB of memory and were 40% overcommitted then you could use XenServer or Hyper-V with the same amount of VMs but each host would have 14GB of memory. This to me does not seem fair as you could also add 4GB more to your ESX host and achieve even more cost savings. However, you can only add so much memory before becoming CPU-bound, right? I’m not referring to CPU utilization but the amount of vCPU’s you can overcommit before running into high CPU Ready times. Let’s use my earlier example. You have 14, 1 CPU/1GB VMs on a 4CPU/10GB ESX host. You want to put more VMs per host so you increase your host memory to 20GB. You now try putting 28, 1CPU/1GB VMs on the host. This is now twice the amount of vCPUs to the same amount of pCPUs and let’s say your CPU Ready times are around 5%-10%. Adding more VMs to this host, regardless of how much more memory slots you have available, would adversely impact performance, so you have a ceiling of around 28 VMs per host.
Knowing this number, couldn’t you then size your hosts for 4CPU and 30GB of RAM on XenServer or Hyper-V and then be saving just as much money as ESX? And this is only one way to recoup the financial benefits overcommitment provides you. If you already have Windows or Citrix products you might already own these hypervisors from a license perspective and it might not save you money to go out and buy another hypervisor. Also, some hypervisors (like XenServer) are licensed by server count and not socket count (like ESX) so you could potentially save a lot of money by using these hypervisors. In any of these cases, an in depth analysis of your specific environment will have to be done to insure you’re making the most cost effective decision.
Of course, memory overcommitment is not the only reason you choose a hypervisor. There are many other factors that still have to be considered. But given this discussion, is memory overcommitment one of these considerations? I think once you realize what memory overcommitment is really good for, it becomes less of a factor in your overall decision making process. Does this realization change the hypervisor landscape at all? As I mentioned earlier, memory overcommitment is a major sell for ESX. If you can attain the financial benefit of memory overcommitment without overcommiting memory then I think this does take a bite out of the VMware marketing machine. That said, ESX is still the #1 hypervisor out there in my opinion and I would recommend it for the vast majority of workloads but not necessarily because of memory overcommitment. There is room for other hypervisors in the datacenter and once people realize what memory overcommitment “is” and what it “isn’t” and really start analyzing the financial impact of their hypervisor choices I think you’ll see some of these other hypervisors grabbing more market share.
Thoughts? Rebuttals? Counterpoints?
With the release of XenServer 5.6, Citrix included memory overcommitment as a feature of its hypervisor. Memory overcommitment has long been the staple of the ESX hypervisor so I thought I’d do an overview of the memory reclamation techniques employed by each hypervisor to determine if the field of play has been leveled when it comes to this feature.
Transparent Page Sharing (TPS) is basically virtual machine memory de-duplication. A service running on each ESX host scans the contents of guest physical memory and identifies potential candidates for de-duplication. After potential candidates are identified a bit-by-bit comparison is done to ensure that the pages are identical. If there has been a successful match, the guest-physical to host-physical mapping is changed to the shared host-physical page. The redundant pages are then reclaimed by the host. If a VM attempts to write to a shared host-physical page a “Copy on Write” procedure is performed. As the name implies, when a write attempt is made the hypervisor makes a copy of that page file and re-maps the guest-physical memory to this new page before the write actually happens.
One interesting characteristic of this feature is that, unlike all the other memory reclamation techniques, it’s always on. This means that even well before you start running into any low memory conditions the hypervisor is de-duplicating memory for you. This is the hallmark memory overcommitment technique for VMware. With this feature, you can turn on more VMs before contention starts to occur. All the other techniques only kick in when available memory is running low or after there is actual contention.
Ballooning is achieved via a device driver (balloon driver) included by default with every installation of VMware Tools. When the hypervisor is running low on memory it sets a target page size and the balloon driver will then “inflate”, creating artificial memory pressure within the VM, causing the operating system to either pin memory pages or push them to the page file. Pinned pages will be those pages identified as “free” or no longer in use by the operating system that are “pinned” to prevent them from being paged out but whose memory can be reclaimed in host-physical memory. If any of these pages are accessed again the host will simply treat it like any other VM memory allocation and allocate a new page for the VM. If the host is running particularly low on memory the balloon driver may need to be inflated even more, causing the guest OS to start paging memory.
If TPS and Ballooning can’t free up enough memory, memory compression kicks in. Basically, any page that can be compressed by at least 50% will be compressed and put into a “compression cache” located within VM memory. The next time the page is accessed a decompression occurs. Memory compression was new to ESX 4.1 and without this feature in place these pages would have been swapped out to disk. Of course, decompressing memory still in RAM is much faster than accessing that page from disk!
Memory swapping – the last resort! When memory contention is really bad and TPS, Ballooning and Compression haven’t freed up enough memory, the hypervisor starts actively swapping out memory pages to disk. With Ballooning, it was the guest OS that decided what was pinned and what was swapped out to disk, what makes hypervisor swapping so potentially devastating to performance is that the hypervisor has no insight on what the “best” pages to swap to disk could be. At this point memory needs to be reclaimed fast and the hypervisor could very well be swapping out active pages and accessing these pages from disk is going to cause a noticeable performance hit. Needless to say, you want to size your environment such that swapping is rare.
As of version 5.6, XenServer now has a mechanism that allows for memory overcommitment on XenServer hosts. XenServer DMC (Dynamic Memory Control) works by proportionally adjusting the memory available to running virtual machines based on pre-defined minimum and maximum memory values. The amount of memory between the dynamic minimum and dynamic maximum value is known as the Dynamic Memory Range (DMR). The Dynamic maximum value represents the maximum amount of memory available to your VM and the dynamic minimum value represents the lowest amount of memory that could be available to your VM when there is memory contention on the host. Running VMs will always run at the Dynamic maximum value until there is contention on the host. When this happens the hypervisor will proportionally inflate a balloon driver on each VM where you’ve configured DMC until it has reclaimed enough memory for the hypervisor to run effectively. DMC could be thought of then, as a configurable balloon driver.
As the diagram clearly shows, ESX employs a much broader and diverse set of mechanism to achieve memory overcommitment than XenServer. So while it’s technically possible to overcommit memory on both ESX and XenServer I think it’s clear that ESX is the hypervisor of choice where memory overcommitment is concerned. The “Virtual Reality” blog over at VMware recently posted an interesting article about this as well that also compared Hyper-V, you can read it here. For further reading I’d recommend Xen.org’s article on DMC, the XenServer admin guide or TheGenerationV blog for XenServer DMC. For ESX, Hypervizor.com has many excellent VMware related diagrams and their diagram on VMware memory management is no exception. In addition, the link I directed you to earlier is also very informative and contains other links for further reading.