With the release of XenServer 5.6, Citrix included memory overcommitment as a feature of its hypervisor. Memory overcommitment has long been the staple of the ESX hypervisor so I thought I’d do an overview of the memory reclamation techniques employed by each hypervisor to determine if the field of play has been leveled when it comes to this feature.
Transparent Page Sharing (TPS) is basically virtual machine memory de-duplication. A service running on each ESX host scans the contents of guest physical memory and identifies potential candidates for de-duplication. After potential candidates are identified a bit-by-bit comparison is done to ensure that the pages are identical. If there has been a successful match, the guest-physical to host-physical mapping is changed to the shared host-physical page. The redundant pages are then reclaimed by the host. If a VM attempts to write to a shared host-physical page a “Copy on Write” procedure is performed. As the name implies, when a write attempt is made the hypervisor makes a copy of that page file and re-maps the guest-physical memory to this new page before the write actually happens.
One interesting characteristic of this feature is that, unlike all the other memory reclamation techniques, it’s always on. This means that even well before you start running into any low memory conditions the hypervisor is de-duplicating memory for you. This is the hallmark memory overcommitment technique for VMware. With this feature, you can turn on more VMs before contention starts to occur. All the other techniques only kick in when available memory is running low or after there is actual contention.
Ballooning is achieved via a device driver (balloon driver) included by default with every installation of VMware Tools. When the hypervisor is running low on memory it sets a target page size and the balloon driver will then “inflate”, creating artificial memory pressure within the VM, causing the operating system to either pin memory pages or push them to the page file. Pinned pages will be those pages identified as “free” or no longer in use by the operating system that are “pinned” to prevent them from being paged out but whose memory can be reclaimed in host-physical memory. If any of these pages are accessed again the host will simply treat it like any other VM memory allocation and allocate a new page for the VM. If the host is running particularly low on memory the balloon driver may need to be inflated even more, causing the guest OS to start paging memory.
If TPS and Ballooning can’t free up enough memory, memory compression kicks in. Basically, any page that can be compressed by at least 50% will be compressed and put into a “compression cache” located within VM memory. The next time the page is accessed a decompression occurs. Memory compression was new to ESX 4.1 and without this feature in place these pages would have been swapped out to disk. Of course, decompressing memory still in RAM is much faster than accessing that page from disk!
Memory swapping – the last resort! When memory contention is really bad and TPS, Ballooning and Compression haven’t freed up enough memory, the hypervisor starts actively swapping out memory pages to disk. With Ballooning, it was the guest OS that decided what was pinned and what was swapped out to disk, what makes hypervisor swapping so potentially devastating to performance is that the hypervisor has no insight on what the “best” pages to swap to disk could be. At this point memory needs to be reclaimed fast and the hypervisor could very well be swapping out active pages and accessing these pages from disk is going to cause a noticeable performance hit. Needless to say, you want to size your environment such that swapping is rare.
As of version 5.6, XenServer now has a mechanism that allows for memory overcommitment on XenServer hosts. XenServer DMC (Dynamic Memory Control) works by proportionally adjusting the memory available to running virtual machines based on pre-defined minimum and maximum memory values. The amount of memory between the dynamic minimum and dynamic maximum value is known as the Dynamic Memory Range (DMR). The Dynamic maximum value represents the maximum amount of memory available to your VM and the dynamic minimum value represents the lowest amount of memory that could be available to your VM when there is memory contention on the host. Running VMs will always run at the Dynamic maximum value until there is contention on the host. When this happens the hypervisor will proportionally inflate a balloon driver on each VM where you’ve configured DMC until it has reclaimed enough memory for the hypervisor to run effectively. DMC could be thought of then, as a configurable balloon driver.
As the diagram clearly shows, ESX employs a much broader and diverse set of mechanism to achieve memory overcommitment than XenServer. So while it’s technically possible to overcommit memory on both ESX and XenServer I think it’s clear that ESX is the hypervisor of choice where memory overcommitment is concerned. The “Virtual Reality” blog over at VMware recently posted an interesting article about this as well that also compared Hyper-V, you can read it here. For further reading I’d recommend Xen.org’s article on DMC, the XenServer admin guide or TheGenerationV blog for XenServer DMC. For ESX, Hypervizor.com has many excellent VMware related diagrams and their diagram on VMware memory management is no exception. In addition, the link I directed you to earlier is also very informative and contains other links for further reading.