This edition is the last in the series. I’ve created a new XenReference page where I will keep the most recent version of the XenApp, XenDesktop and XenServer cards. My goal is to keep these in sync with the major releases of each product (e.g. XenServer 6.0, 6.5, 7.0, etc.). Each XenReference version will probably be released sometime after the exam comes out for the newer version. Now that I’ve had a chance to run through this process three times, I will be implementing some changes in future versions.
Future versions of the XenReference cards will be less exam centric and less “wordy”. As the purpose of the card is to be a reference, the goal of future versions will be to feature those things that you have to know to manage a XenApp/XenDesktop/XenServer environment but may not always be able to easily remember. Things like, maximum amount of memory a host can support, version of Windows required, version of SQL required, commands, ports, etc. As always, please feel free to share any feedback in the comments section!
This version has an updated look-and-feel from the XenReference card I created for XenApp. I used the same methods to create this card as I did the last one. The XenDesktop 5 Administration exam prep guide was used as a basic template for the content that went into the card but with minor variations. This is why the “XenServer” section is so short in this card, there was very little requirements listed in the prep guide. The purpose of this card is to be a quick reference guide for basic XenDesktop 5 information and a useful tool for studying for the “XenDesktop 5 Administration″ (1Y0-A19) exam. Feel free to comment with any suggestions on improving this document.
Stay tuned for the XenReference for XenServer card in the coming weeks…
Citrix Provisioning Server (PVS) has been a vital component in the Citrix technology stack for years. Allowing for the rapid provisioning of machines through OS streaming, it has been the bedrock provisioning mechanism for XenDesktop and is also used in provisioning XenApp servers and streaming to physical endpoints. Even though PVS provides all these benefits and has been so integral to various Citrix technologies, its days are clearly numbered. Fundamentally, streaming an OS over the network is inferior to provisioning machines and delivering the OS locally in some way. As an example, technologies like Machine Creation Services (MCS) can be used to provision an OS without the additional streaming component. And while the initial scalability numbers for MCS were lower than PVS and is currently limited to the XenDesktop technology stack, MCS is new and its scalability estimates are improving all the time and there’s no reason to think it can’t or won’t be integrated with other Citrix products. Indeed, there has been talk for years of merging XenDesktop itself with other Citrix products. So, what other possible reasons will there be for holding onto PVS in the future?
- “PVS can use the caching capabilities inherent to the local OS, this reduces IOPS”
When a target devices boots up or accesses portions of the base image, those portions of the OS are then cached in RAM on the PVS server. Subsequent attempts by additional target devices to access those portions of the OS will be read from RAM, thereby reducing the amount of IOPS required on the backend storage. Since IOPS are one of the biggest concerns for VDI deployments, this has been a major selling point for PVS. However, with the rise in popularity of VDI over the past couple of years, storage vendors have really focused on optimizing their array’s for IOPS, with many having terabytes of caching capabilities in them. So, if you now have enough RAM to cache at the storage level, is there really much benefit in being able to cache at the OS level? In addition to that, you have emerging technologies like Intellicache and whole distributed storage models being developed for VDI that should make IOPS less of a concern in the future.
- “MCS will never be able to deliver an OS to a physical endpoint”
This is true. You will never be able to use a locally delivered OS solution for remote endpoints. However, what is the purpose of streaming an OS to physical endpoints? Two use-cases come to mind. The first involves streaming the OS to desktop PC’s outside the datacenter. Companies usually choose this option as a first step into the VDI world. It’s cheap because it uses already existing hardware and it gives you the single-image management and security benefits of VDI without purchasing thin-clients, hypervisors and backend storage arrays. But the important thing to point out here is that this is usually just a stepping stone towards much more robust VDI rollouts. Once their currently functioning PC’s reach end of life, these companies start to replace them with thin-clients and are more willing to invest in hypervisors and backend storage rather than a hardware refresh, thus eleminating the need to stream the OS over the network. The use-case for this in the future will become extremely “niche” as companies move away from purchasing fat-clients as a standard. The second use-case involves streaming to blade PC’s. This is usually done when high performance desktops are a “must”. Like the previous use-case we examined though, there is limited need for this today and as hypervisors continue to advance, there will soon be very little reason, if any, why a desktop cannot be run as a virtual machine and still expect optimal performance.
Now don’t get me wrong, PVS today is still a great solution and should be the main provisioning mechanism for most XenDesktop deployments. For the reasons listed above however, the next few years should see PVS use-cases diminishing rapidly. MCS or some future locally delivered OS solution will take it’s place.
Introducing the XenReference card for XenApp!
Click on either of the images to access the full PDF. Forbes Guthrie has been producing excellent “vReference” cards for VMware for a long time now and I’ve always thought something similar was needed for Citrix as well. The purpose of the card is to be a quick reference guide for basic XenApp 6 information and a useful tool for studying for the “Basic Administration for Citrix XenApp 6” (1Y0-A18) exam. I created the categories based off the XenApp 6 exam prep guide and used Citrix eDocs to create some of the content.
I fully intend to create updated versions of this document with more content and a more polished presentation. If anyone has any feedback, let me know. Content or category suggestions/alterations/deletions would be helpful as well as any suggestions regarding formatting. If you have an opinion on how to improve the document in any way, I’d love to hear it!
Stay tuned for the XenReference cards for XenDesktop and XenServer as well…
The statement, “there is no technical benefit to memory overcommitment” is usually met with universal scorn, righteous indignation and a healthy dose of boisterous laughter. However, at second glance this statement is not quite so absurd. Overcommiting memory does not make your VMs faster, it doesn’t make them more highly available and it doesn’t make them more crash-resistant. So, what is memory overcommitment good for? The sole goal of memory overcommitment is to put more VMs per host. This saves you money. Thus, the benefit that memory overcommitment provides is a financial benefit.
Are there other ways to attain this financial benefit?
Memory overcommitment is one of the main reasons people choose the ESX hypervisor. If the goal of memory overcommitment is to save money and there are other ways to attain these cost savings on other hypervisors without utilizing memory overcommitment, does that change the hypervisor landscape at all? Before delving into that question, let’s first see if there is a way to save as much money without using memory overcommitment.
One way around this I’ve heard suggested is to just increase the memory in your hosts. So, if you had an ESX host with 10GB of memory and were 40% overcommitted then you could use XenServer or Hyper-V with the same amount of VMs but each host would have 14GB of memory. This to me does not seem fair as you could also add 4GB more to your ESX host and achieve even more cost savings. However, you can only add so much memory before becoming CPU-bound, right? I’m not referring to CPU utilization but the amount of vCPU’s you can overcommit before running into high CPU Ready times. Let’s use my earlier example. You have 14, 1 CPU/1GB VMs on a 4CPU/10GB ESX host. You want to put more VMs per host so you increase your host memory to 20GB. You now try putting 28, 1CPU/1GB VMs on the host. This is now twice the amount of vCPUs to the same amount of pCPUs and let’s say your CPU Ready times are around 5%-10%. Adding more VMs to this host, regardless of how much more memory slots you have available, would adversely impact performance, so you have a ceiling of around 28 VMs per host.
Knowing this number, couldn’t you then size your hosts for 4CPU and 30GB of RAM on XenServer or Hyper-V and then be saving just as much money as ESX? And this is only one way to recoup the financial benefits overcommitment provides you. If you already have Windows or Citrix products you might already own these hypervisors from a license perspective and it might not save you money to go out and buy another hypervisor. Also, some hypervisors (like XenServer) are licensed by server count and not socket count (like ESX) so you could potentially save a lot of money by using these hypervisors. In any of these cases, an in depth analysis of your specific environment will have to be done to insure you’re making the most cost effective decision.
Of course, memory overcommitment is not the only reason you choose a hypervisor. There are many other factors that still have to be considered. But given this discussion, is memory overcommitment one of these considerations? I think once you realize what memory overcommitment is really good for, it becomes less of a factor in your overall decision making process. Does this realization change the hypervisor landscape at all? As I mentioned earlier, memory overcommitment is a major sell for ESX. If you can attain the financial benefit of memory overcommitment without overcommiting memory then I think this does take a bite out of the VMware marketing machine. That said, ESX is still the #1 hypervisor out there in my opinion and I would recommend it for the vast majority of workloads but not necessarily because of memory overcommitment. There is room for other hypervisors in the datacenter and once people realize what memory overcommitment “is” and what it “isn’t” and really start analyzing the financial impact of their hypervisor choices I think you’ll see some of these other hypervisors grabbing more market share.
Thoughts? Rebuttals? Counterpoints?
With the release of XenServer 5.6, Citrix included memory overcommitment as a feature of its hypervisor. Memory overcommitment has long been the staple of the ESX hypervisor so I thought I’d do an overview of the memory reclamation techniques employed by each hypervisor to determine if the field of play has been leveled when it comes to this feature.
Transparent Page Sharing (TPS) is basically virtual machine memory de-duplication. A service running on each ESX host scans the contents of guest physical memory and identifies potential candidates for de-duplication. After potential candidates are identified a bit-by-bit comparison is done to ensure that the pages are identical. If there has been a successful match, the guest-physical to host-physical mapping is changed to the shared host-physical page. The redundant pages are then reclaimed by the host. If a VM attempts to write to a shared host-physical page a “Copy on Write” procedure is performed. As the name implies, when a write attempt is made the hypervisor makes a copy of that page file and re-maps the guest-physical memory to this new page before the write actually happens.
One interesting characteristic of this feature is that, unlike all the other memory reclamation techniques, it’s always on. This means that even well before you start running into any low memory conditions the hypervisor is de-duplicating memory for you. This is the hallmark memory overcommitment technique for VMware. With this feature, you can turn on more VMs before contention starts to occur. All the other techniques only kick in when available memory is running low or after there is actual contention.
Ballooning is achieved via a device driver (balloon driver) included by default with every installation of VMware Tools. When the hypervisor is running low on memory it sets a target page size and the balloon driver will then “inflate”, creating artificial memory pressure within the VM, causing the operating system to either pin memory pages or push them to the page file. Pinned pages will be those pages identified as “free” or no longer in use by the operating system that are “pinned” to prevent them from being paged out but whose memory can be reclaimed in host-physical memory. If any of these pages are accessed again the host will simply treat it like any other VM memory allocation and allocate a new page for the VM. If the host is running particularly low on memory the balloon driver may need to be inflated even more, causing the guest OS to start paging memory.
If TPS and Ballooning can’t free up enough memory, memory compression kicks in. Basically, any page that can be compressed by at least 50% will be compressed and put into a “compression cache” located within VM memory. The next time the page is accessed a decompression occurs. Memory compression was new to ESX 4.1 and without this feature in place these pages would have been swapped out to disk. Of course, decompressing memory still in RAM is much faster than accessing that page from disk!
Memory swapping – the last resort! When memory contention is really bad and TPS, Ballooning and Compression haven’t freed up enough memory, the hypervisor starts actively swapping out memory pages to disk. With Ballooning, it was the guest OS that decided what was pinned and what was swapped out to disk, what makes hypervisor swapping so potentially devastating to performance is that the hypervisor has no insight on what the “best” pages to swap to disk could be. At this point memory needs to be reclaimed fast and the hypervisor could very well be swapping out active pages and accessing these pages from disk is going to cause a noticeable performance hit. Needless to say, you want to size your environment such that swapping is rare.
As of version 5.6, XenServer now has a mechanism that allows for memory overcommitment on XenServer hosts. XenServer DMC (Dynamic Memory Control) works by proportionally adjusting the memory available to running virtual machines based on pre-defined minimum and maximum memory values. The amount of memory between the dynamic minimum and dynamic maximum value is known as the Dynamic Memory Range (DMR). The Dynamic maximum value represents the maximum amount of memory available to your VM and the dynamic minimum value represents the lowest amount of memory that could be available to your VM when there is memory contention on the host. Running VMs will always run at the Dynamic maximum value until there is contention on the host. When this happens the hypervisor will proportionally inflate a balloon driver on each VM where you’ve configured DMC until it has reclaimed enough memory for the hypervisor to run effectively. DMC could be thought of then, as a configurable balloon driver.
As the diagram clearly shows, ESX employs a much broader and diverse set of mechanism to achieve memory overcommitment than XenServer. So while it’s technically possible to overcommit memory on both ESX and XenServer I think it’s clear that ESX is the hypervisor of choice where memory overcommitment is concerned. The “Virtual Reality” blog over at VMware recently posted an interesting article about this as well that also compared Hyper-V, you can read it here. For further reading I’d recommend Xen.org’s article on DMC, the XenServer admin guide or TheGenerationV blog for XenServer DMC. For ESX, Hypervizor.com has many excellent VMware related diagrams and their diagram on VMware memory management is no exception. In addition, the link I directed you to earlier is also very informative and contains other links for further reading.
When performing VDI rollouts many people focused on and planned for the hosting infrastructure, CPU, Memory, storage capacity, etc. but most people overlooked the importance of properly calculating the amount of IOPS their VDI environment would generate. As a result, their environment suffered from poor user experience due to slow response times and even completely “frozen” virtual desktops waiting to read/write. In an effort to educate people on this issue, there have been several excellent articles written on this topic.
In addition to these I thought I’d write about two common nuances that I’ve seen people overlook when planning for VDI IOPS.
When planning for peak VDI IO you need to know your “aggregate peak IO” in addition to your “individual peak IO”. I’ve oftentimes heard of people “planning for peak” in their VDI environment by determining what a sampling of individual virtual desktops “peak” at from an IO perspective. I’ve seen them do this by running perfmon or some other monitoring tool on individual virtual desktops and then multiplying this number by their total number of virtual desktops to determine the amount of IO their storage device will need to handle. Let’s use this chart as an example:
If I have 10 Windows 7 virtual desktops and I’ve determined that they individually peak at 30 IOPS, using individual peak IO, I would purchase a storage system capable of handling 300 IOPS. But as you can see from the chart above, at no point in the day do I reach 300 IOPS. The 12pm timeframe is my aggregate peak IO and as you can see, I reach 220 IOPS at my busiest point. That’s about a 27% difference from using the individual peak IO numbers. Remember that this is just a hypothetical example and that the real world differences between individual peak IOPS and aggregate IOPS could be greater or smaller than my example above depending on the workload and user activity. Failing to use aggregate IO numbers could lead you to believe that you’ll need much more IO capacity than you’ll really use and might deflate interest in any VDI rollout project you’re involved with.
So if you’re planning a VDI rollout make sure to get aggregate IOPS numbers from your backend storage system and not just from a sampling of individual virtual desktops. Knowing individual peak IO is still important because it’s always useful to know as much about your environment as possible. Knowing both of these numbers will help you gain a better understanding of user activity in your environment.
If you’re doing a POC (which you should must), I’d get a good sampling of a broad range of users and determine their aggregate peak IO and use this number to determine the amount of IO per virtual desktop you should be calculating for future growth. Using the example above you’d get 22 IOPS per virtual desktop (220/10 = 22). Depending on how confident you are in your sampling size, it might be a good idea to round up a bit as well.
It’s rare to hear of people over provisioning their storage for VDI so perhaps this isn’t very widespread. Not knowing your read/write ratio’s, however, could lead to the much more common problem of under provisioned storage…
Before going to a storage vendor and telling them the peak IOPS per virtual desktop there is still one thing left to figure out, the read/write ratio characteristics of your VDI environment. Why is read/write ratio so important? Because all IO’s are not created equal! Read IO is significantly less taxing on your storage device than write IO. On a RAID 5 set you will typically get around 160 read IOPS and 45 write IOPS per spindle. So depending on whether or not those 22 IOPS we calculated earlier are predominantly read or predominantly write or somewhere in the middle could have a significant impact on how many spindles your storage device will need to have.
An interesting workload characteristic that most people still don’t realize about VDI is that virtual desktops typically run at a 20% read/80% write ratio during normal working operations! At boot that ratio is flipped to 80% read and 20% write. An excellent article on this topic can be found here.
So let’s take those 22 IOPS we calculated earlier and figure out how many spindles we would need based on different read/write ratio’s. If we’re at 22 IOPS per virtual desktop at a 20/80 read/write ratio and we’ll need 100 virtual desktops this means we’ll have a total of 2200 IOPS. 1760 will be reads and 440 will be writes. Assuming we’re on RAID 5 we would make the following calculation: 1760/45 + 440/160 = 43 spindles. Now what if the read/write ratio were flipped? What if we had an 80% read/20% write ratio? 440/45 + 1760/160 = 22 spindles! That’s about half of what we needed when there were predominantly write IO’s.
As you can clearly see, figuring out the read/write ratio can have a large impact on the type and size of storage device you will use to host your VDI environment. The larger your VDI deployment is, the more important these numbers become (imagine using a storage device equipped with half the spindles you need to host your 20,000 virtual desktop users!).
I strongly recommend all the articles I’ve linked to above. While they all have great baseline numbers with which to size your environment, I highly recommend going through and figuring out what these numbers look like in your environment. What are your IO characteristics during normal working hours? What does the read/write ratio look like during your aggregate peak IO time? What is your aggregate peak IO? All of these and more are extremely important questions for you to answer and know before you deploy a VDI solution. And be mindful but weary of “industry standard” IO numbers. These don’t take into account the specialized applications you might be running in your environment or the types of users you have running on a VDI solution. Nothing can take the place of a well planned POC!
With the release of SP1 for XenClient a few weeks ago, I thought this would be a good topic for my first post. XenClient is a Type 1 client hypervisor developed by Citrix. It allows you to simultaneously run multiple virtual machines (VMs) from a single laptop. Along with XenClient there is also a “Synchronizer” appliance that allows you to upload, download, backup and even erase lost client devices. There are a slew of other features and components to this system so I thought I’d give a brief architectural overview of XenClient and its various features.
Control Domain: The control domain virtualizes hardware for XenClient VMs. All disk, network, audio and USB traffic goes through the control domain to and from each VM. If you’re familiar with XenServer then you’ll know this as Dom 0.
Service VM: The service VM provides you with the capability of managing XenClient VMs. Using the service VM you can create, modify, delete and even upload VMs to the Synchronizer. The current service VM for XenClient is running “Citrix Receiver for XenClient” and gives you the capability of viewing and operating each VM you have running on XenClient. More service VMs are being planned to add additional functionality to XenClient in the future.
GPU Passthrough: As the name implies, GPU passthrough provides direct access to the GPU to a specified VM without the hypervisor or control domain acting as a go-between. This feature allows your VM to experience the full graphical capabilities of your hardware just as if it were installed on bare metal. See here for a demo of this feature. Currently this feature is “experimental” and you can only enable it on one VM on your XenClient device. Citrix has stated that you will be able to do this on multiple VMs in the future.
AMT: Intel Active Management Technology is a hardware based remote administration tool that provides you with the capability to track assets, power on/off client devices and troubleshoot issues with XenClient VMs or XenClient itself. See here for a good demo on this.
Secure Application Sharing: The best way to describe this feature is to say that it’s basically XenApp for your local XenClient VMs. It allows you to work in one VM while using applications installed on another VM. You can publish applications from one or more VMs (known as “application publishing VMs”) and “subscribe” to them via Citrix Dazzle on any VMs you’ve configured application subscription on(known as the “application subscribing VM”). Just like XenApp, any application you subscribe to and then launch is actually running and executing on the publishing VM and merely being displayed on the subscribing VM (Citrix TV has a good demo of this feature here). A configurable application spidering process is running on any application publishing VM to discover all the applications that will be viewable in Citrix Dazzle on the application subscribing VM. To configure this process you’ll have to edit an XML file located in the following directories:
C:\Documents and Settings\All Users\Application Data\Citrix\Xci\Applications\XciDiscoveryConfig.xml
Below is short section of what’s included in this file:
<DiscoveryPath Enabled=”true” Recurse=”true” Wildcard=”*.lnk”>C:\ProgramData\Microsoft\Windows\Start Menu</DiscoveryPath>
<DiscoveryPath Enabled=”true” Recurse=”true” Wildcard=”*.lnk”>C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start Menu</DiscoveryPath> <DiscoveryPath Enabled=”true” Recurse=”true” Wildcard=”*.msc”>C:\Windows\system32</DiscoveryPath>
<Whitelist IgnoreCase=”true”>^.:\\Program Files\\Internet Explorer\\iexplore.exe</Whitelist>
With the exception of references to “perfmon” these are the default configurations for this section of the file. Perfmon for Windows 7 is located at “C:\Windows\system32\perfmon.msc”. As you can see, I’ve added it’s root directory in the “DiscoveryPaths” section of the file and also specified the .msc specifically in the “Whitelists” section. Now perfmon is ready to use in the application subscribing VM. You can follow a similar procedure as what I’ve shown above to add any application or group of applications as usable in your application subscribing VMs. This feature is currently “experimental”.
Intel TXT: The role Trusted Execution Technology plays in the XenClient architecture is to cryptographically checksum the XenClient installation at every boot. In more basic terms, its function is to ensure that the hypervisor hasn’t been tampered with while offline. This feature is currently unsupported with XenClient.
TPM: TXT checksums are stored in the Trusted Platform Module. The encryption key is sealed by the TPM and only released if the checksums match. Like TXT, this feature is currently unsupported.
Synchronizer: Synchronizer is an appliance that allows you to centrally manage and deploy virtual machines in your XenClient environment. In its current release, Synchronizer runs on XenServer exclusively and all of the management and configuration of the appliance is done through a web front-end. VMs deployed by synchronizer will, depending on how you’ve configured them, be in periodic communication with the appliance over HTTPS. Some examples of this communication include checking for new images issued to the user, checking for updates to existing images or verifying that a “kill pill” hasn’t been issued for any VM. Synchronizer will even synchronize your XenClient password to match your Active Directory password even though the XenClient device itself isn’t part of Active Directory.
Through the use of snapshots Synchronizer can, in conjunction with XenClient, provide you with the capability of downloading, uploading or even backing up your XenClient VMs. To learn more about this process I highly recommend this article.
Dynamic Image Mode: If you’ve worked with desktop virtualization before then the concept of “layering” desktop images is already familiar to you. You have an operating system with applications being streamed/virtualized on top of that and users settings being redirected/streamed on top of that. What’s interesting about Dynamic Image Mode VMs is that each of these layers are now three separate .vhd files compromising one operating system.
Unlike static mode VMs, the dynamic image mode OS layer is not persistent across reboots. Those familiar with VDI should recognize this behavior as well. Any changes made to the base OS will be wiped clean upon each reboot. The “Application” and “Documents and Settings” layers are redirected to the appropriate .vhd file through the use of junction points. If you are using Windows 7, then “C:\Program Files\Citrix\AppCache” is redirected to the “Application” .vhd and “C:\Users” is redirected to the “Documents and Settings” .vhd. Unlike the OS .vhd file, the Application and Documents and Settings .vhd are persistent across reboots. When backing up a Dynamic Image Mode VM to Synchronizer only the “Documents and Settings” .vhd is backed up. Updates to a Dynamic Image Mode VM will update the OS .vhd only. For more on this, I refer you to this article and once again to the article I mentioned before. Dynamic Image VMs are currently “experimental” for XenClient.
So there you have it, that’s XenClient in a nutshell. While there are currently several key features of XenClient that are “experimental”, I’m personally very excited about the future of this technology. Already the SP1 release has a number of improvements, these include – support for .vhd images created with XenConvert, support for images streamed from Citrix Provisioning server, faster boot times, the ability to boot directly to a VM and more! A good rundown of some of the other important improvement with SP1 can be found here. If you’d like to learn more, I’d suggest reading the XenClient and Synchronizer user/admin guides. Lastly, I’ll refer you to this video from Synergy 2010 that goes into a good amount of technical detail regarding the different features of XenClient.