When performing VDI rollouts many people focused on and planned for the hosting infrastructure, CPU, Memory, storage capacity, etc. but most people overlooked the importance of properly calculating the amount of IOPS their VDI environment would generate. As a result, their environment suffered from poor user experience due to slow response times and even completely “frozen” virtual desktops waiting to read/write. In an effort to educate people on this issue, there have been several excellent articles written on this topic.
In addition to these I thought I’d write about two common nuances that I’ve seen people overlook when planning for VDI IOPS.
When planning for peak VDI IO you need to know your “aggregate peak IO” in addition to your “individual peak IO”. I’ve oftentimes heard of people “planning for peak” in their VDI environment by determining what a sampling of individual virtual desktops “peak” at from an IO perspective. I’ve seen them do this by running perfmon or some other monitoring tool on individual virtual desktops and then multiplying this number by their total number of virtual desktops to determine the amount of IO their storage device will need to handle. Let’s use this chart as an example:
If I have 10 Windows 7 virtual desktops and I’ve determined that they individually peak at 30 IOPS, using individual peak IO, I would purchase a storage system capable of handling 300 IOPS. But as you can see from the chart above, at no point in the day do I reach 300 IOPS. The 12pm timeframe is my aggregate peak IO and as you can see, I reach 220 IOPS at my busiest point. That’s about a 27% difference from using the individual peak IO numbers. Remember that this is just a hypothetical example and that the real world differences between individual peak IOPS and aggregate IOPS could be greater or smaller than my example above depending on the workload and user activity. Failing to use aggregate IO numbers could lead you to believe that you’ll need much more IO capacity than you’ll really use and might deflate interest in any VDI rollout project you’re involved with.
So if you’re planning a VDI rollout make sure to get aggregate IOPS numbers from your backend storage system and not just from a sampling of individual virtual desktops. Knowing individual peak IO is still important because it’s always useful to know as much about your environment as possible. Knowing both of these numbers will help you gain a better understanding of user activity in your environment.
If you’re doing a POC (which you should must), I’d get a good sampling of a broad range of users and determine their aggregate peak IO and use this number to determine the amount of IO per virtual desktop you should be calculating for future growth. Using the example above you’d get 22 IOPS per virtual desktop (220/10 = 22). Depending on how confident you are in your sampling size, it might be a good idea to round up a bit as well.
It’s rare to hear of people over provisioning their storage for VDI so perhaps this isn’t very widespread. Not knowing your read/write ratio’s, however, could lead to the much more common problem of under provisioned storage…
Before going to a storage vendor and telling them the peak IOPS per virtual desktop there is still one thing left to figure out, the read/write ratio characteristics of your VDI environment. Why is read/write ratio so important? Because all IO’s are not created equal! Read IO is significantly less taxing on your storage device than write IO. On a RAID 5 set you will typically get around 160 read IOPS and 45 write IOPS per spindle. So depending on whether or not those 22 IOPS we calculated earlier are predominantly read or predominantly write or somewhere in the middle could have a significant impact on how many spindles your storage device will need to have.
An interesting workload characteristic that most people still don’t realize about VDI is that virtual desktops typically run at a 20% read/80% write ratio during normal working operations! At boot that ratio is flipped to 80% read and 20% write. An excellent article on this topic can be found here.
So let’s take those 22 IOPS we calculated earlier and figure out how many spindles we would need based on different read/write ratio’s. If we’re at 22 IOPS per virtual desktop at a 20/80 read/write ratio and we’ll need 100 virtual desktops this means we’ll have a total of 2200 IOPS. 1760 will be reads and 440 will be writes. Assuming we’re on RAID 5 we would make the following calculation: 1760/45 + 440/160 = 43 spindles. Now what if the read/write ratio were flipped? What if we had an 80% read/20% write ratio? 440/45 + 1760/160 = 22 spindles! That’s about half of what we needed when there were predominantly write IO’s.
As you can clearly see, figuring out the read/write ratio can have a large impact on the type and size of storage device you will use to host your VDI environment. The larger your VDI deployment is, the more important these numbers become (imagine using a storage device equipped with half the spindles you need to host your 20,000 virtual desktop users!).
I strongly recommend all the articles I’ve linked to above. While they all have great baseline numbers with which to size your environment, I highly recommend going through and figuring out what these numbers look like in your environment. What are your IO characteristics during normal working hours? What does the read/write ratio look like during your aggregate peak IO time? What is your aggregate peak IO? All of these and more are extremely important questions for you to answer and know before you deploy a VDI solution. And be mindful but weary of “industry standard” IO numbers. These don’t take into account the specialized applications you might be running in your environment or the types of users you have running on a VDI solution. Nothing can take the place of a well planned POC!