One of the biggest trends in IT infrastructure today is dedicated “storage systems” for VDI. I put “storage systems” in scare quotes because many of the vendors making these systems would object to being called a storage system. Regardless, the primary use case driving the sales of many of these systems is as a storage location for VDI. The reason for this is that traditional arrays have proven woefully inadequate to handle the amount and type of IO VDI can generate.
The architecture backing these systems varies greatly but when looking for a dedicated storage solution for your VDI environment, here are the top features I look for:
Speed. This one should be obvious but any storage system dedicated for VDI needs to be fast. Anyone who’s ever designed storage for a VDI environment can tell you that VDI workloads can generate tremendous amounts of most write-IO with very ‘bursty’ workload patterns. Traditional storage arrays with active-passive controllers, ALUA architecture and tiered HDD storage weren’t created with this workload in mind. Trying to design a VDI environment on this architecture can become (in some cases) cost and performance prohibitive. Indeed, many businesses are spending 40%-60% of their VDI budget on storage alone.
Today, the speed solution is being solved with a variety of methods. RAM is being used as a read/write location (e.g. Atlantis ILIO) for microsecond access times. “All flash” arrays are being purpose built to hold 100% SSD drives (Invicta, XtremIO, Pure, etc.). Adding to this, a whole host of “converged” compute/storage appliances are popping up utilizing local disk/flash for increased speed and simplicity (Nutanix, Simplivity, VSAN, ScaleIO, etc.) To reiterate, each of these systems I’ve mentioned can do more than just VDI, but VDI just happens to be a good use case for these solutions in many cases. If you’re looking for a place to put your VDI environment, the ability to rapidly process lots of random write IO should be of paramount concern and you should know that there are currently many ways this can be mitigated.
Data reduction. This one will be more controversial, particularly for non-persistent fanboys. Nevertheless, persistent VDI is a fact of life for many VDI environments. As such, large amounts of duplicate data will be written to storage and as a result, data reduction mechanisms become very important. De-duplication and compression will be the most effective methods and will be preferably in-line. Again, various solutions from Atlantis to Invicta, to XtremIO to Pure all offer these features but with very different architectures. If you have no persistent desktops then this feature becomes less important. However, data reduction can still be quite valuable in many non-persistent VDI architectures as well, as an example, XenDesktop MCS could greatly benefit from storage with de-duplication. I also find that many of my customers who start out thinking they’ll have only non-persistent desktops quickly discover during the course of their migration users who need persistence. Don’t be surprised by the need for this feature at a later point, plan for this at the beginning and make sure your storage platform has the appropriate data reduction features.
Scale. I don’t know how many VDI projects I’ve heard of where storage was purchased to support X amount of users only for the VDI project to take off faster and of larger scale than expected. The project then gets stalled because the storage system can’t handle more than the X amount of users it was designed for and the business doesn’t have enough budget to purchase another storage system. For this reason, any storage dedicated to VDI should be able to scale both “up” and “out”. “Up” to support more capacity and “out” to support more IO. The scaling of the system should be such that it is one unified system…not multiple systems with a unified control plane. The converged solutions are great at this, VSAN, Nutanix, et al. All flash arrays typically have this as well e.g. Invicta, XtremIO.
Ease of Management. This sounds basic and very obvious but make sure you evaluate “ease of management” when purchasing any VDI-specific storage solution. The reason for this is simple, any VDI-specific storage system is bound to have a much different architecture than any array’s you currently have in your environment. The harder it is to manage, the higher the learning curve will be for existing admins. My criteria for determining if a VDI storage system is “easy” to manage is this – “can my VDI admins manage this?” (and that’s no slight to VDI admins!). The management of the system shouldn’t require a lot of legacy SAN knowledge or skillsets. This makes the environment more agile by not having to rely on multiple teams for basic functions and doesn’t burden SAN teams with a disparate island of storage they must learn and manage. Again, many of the converged solutions are great at this as well as some of the newer AFA’s.
There are many other important factors in deciding what to look for in a storage solution for your VDI environment. Whatever the architecture, if it doesn’t include the above four features, I’d look elsewhere.
Note: Vijay Swami wrote an excellent article entitled “A buyer’s guide for the All Flash Array Market”. I found it interesting after I wrote this to read his thoughts and note how many of the things he looks for in an AFA are similar to my top features for VDI storage. Regardless, it’s good reading and if you haven’t already, check it out.
I’m not big on “end of year” posts or predictions and lacking any other ideas, thought I’d write down some random thoughts about technology going through my head as this year draws to an end.
All Flash Array Dominance
I’m not buying the hype surrounding all flash array’s (AFA). Certainly there are legitimate use cases and they’ll be deployed more in the near future than they have in the past but the coming dominance of all flash array’s, I think, has been greatly exaggerated. It’s clear that the main problem these array’s are trying to solve is the extreme performance demands of some applications and I just think there are much better ways to solve this problem (e.g. local disk, convergence, local flash, RAM caching, etc) in most scenarios than purchasing disparate islands of SAN. And many of the things that make an AFA so “cool” (e.g. in-line dedupe, compression, no RAID, etc.) would be even cooler if the technology could be incorporated into a hybrid array. The AFA craze feels very much like the VDI craze to me, lots of hype about how “cool” the technology is but in reality a niche use case. Ironically, VDI is the main AFA use case.
The Emergence of Convergence
This year has seen a real spike in interest and deployment of converged storage/compute software and hardware and I’m extremely excited for this technology going into 2014. With VMware VSAN being GA in 2014, I expect that interest and deployment to rise to even greater heights. VSAN has some distinct strategic advantages over other converged models that should really make the competition for this space interesting. Name recognition alone is getting them a ton of interest. Being integrated with ESXi gives them an existing install base that already dominates the data center. In addition, it’s sheer simplicity and availability make it easy for anyone to try out. Pricing still hasn’t been announced so that will be the big thing to watch for in 2014 with this offering, that and any new enhancements that come with general availability. In addition to VSAN, EMC’s ScaleIO is another more ‘software-based’ rather than ‘appliance-based’ solution that is already GA that I’m looking forward to seeing more of in 2014. Along with VMware and EMC, Nutanix, Simplivity, Dell, HP, VCE, et al. all have varying “converged” solutions as well so this isn’t going away any time soon. With this new wave of convergence products and interest, expect all kinds of new tech buzzwords to develop! I fully expect and predict “Software Defined Convergence” will become mainstream by the end of the year!
Random convergence links:
Duncan Epping VSAN article collection – http://www.yellow-bricks.com/virtual-san/
Scott Lowe – http://wikibon.org/wiki/v/VMware_VSAN_vs_the_Simplicity_of_Hyperconvergence
Cormac Hogan looks at ScaleIO – http://cormachogan.com/2013/12/05/a-closer-look-at-emc-scaleio/
Good look at VSAN and All-Flash Array performance – http://blogs.vmware.com/performance/2013/11/vdi-benchmarking-using-view-planner-on-vmware-virtual-san-part-3.html
Chris Whal musing over VSAN architecture – http://wahlnetwork.com/2013/10/31/muse-vmwares-virtual-san-architecture/?utm_source=buffer&utm_medium=twitter&utm_campaign=Buffer&utm_content=buffer59ec6
The Fall of XenServer
As any reader of this blog knows, I used to be a huge proponent of XenServer. However, things have really gone downhill after 5.6 in terms of product reliability. So much so that I really have a hard time recommending it at all anymore. ESXi was always at the top of my list but XenServer remained a solid #2. Now it’s a distant 3rd in my mind behind Hyper-V. I’ll grant that there are many environments successfully and reliably running XenServer, I have built quite a few myself, but far too many suffer from bluescreen server crashes and general unreliability to be acceptable in many enterprises. The product has even had to be pulled from the site to prevent people from downloading it while bugs were fixed. I’ve never seen so many others express like sentiments about this product as I have seen this past year.
Random CTP frustration with XenServer:
Random stuff I’m reading
Colin Lynch has always had a great UCS blog and his two latest posts are great examples. Best UCS blog out there, in my opinion:
“UCS Manager 2.2 (El Capitan) Released”
“Under the Cisco UCS Kimono”
I definitely agree with Andre here! Too many customers don’t take advantage of CBRC and it’s so easy to enable:
“Here is why your Horizon View deployment is not performing to it’s max!”
Great collection of links and information on using HPs Moonshot ConvergedSystem 100 with XenDesktop by Dane Young:
“Citrix XenDesktop 7.1 HDX 3D Pro on HP’s Moonshot ConvergedSystem 100 for Hosted Desktop Infrastructure (HDI)”
In the end, this post ends up being an “end of year” post with a few predictions. Alas, at least I got the “random” part right…
Continuing with the Cisco UCS 101 series, I thought I’d post on MAC, WWPN, WWNN and even UUID pool naming conventions. There’s a number of ways this can be done but as a general rule-of-thumb my pools will ensure a few things:
- Uniqueness of MACs/WWNs/etc. across blades, UCS Domains (aka “Pods”) and sites.
- The MAC’s/WWNs/etc. that are created from your pools should give you some level of description as to the location, fabric and OS that is assigned to that particular address.
- Lastly, the naming convention should be as simple and un-cryptic as possible. Naming conventions are useless if they aren’t easily discernible to those tasked with reading them.
With that out of the way, lets look at a common naming scheme:
This is fairly straight forward. The first three bytes are a Cisco prefix that UCS Manager encourages you not to modify. This can actually be modified but I always keep it the same. The next digit in this naming convention represents a site ID. This can be any physical location where UCS may reside, so a production site might be “1” and a DR site might be “2”. Then we come to “Pod”, in UCS nomenclature a “Domain” or “Pod” is simply a pair of fabric interconnects and any attached chassis’s. For OS, I usually use “1” to denote VMware, “2” for Windows and “3” for a Linux host. Fabric just denotes whether the MAC should be destined for Fabric A or B. The last byte will just be an incremental number assigned by UCS. Let’s look at an example:
In this example pool, the MAC address would belong to a server at site “1” that resides in UCS Pod “1” that is running VMware and should be communicating out of Fabric A. A MAC address of 00:25:B5:23:1B:XX would denote a server at site “2” in the third UCS pod at that site running VMware and communicating out of Fabric B. Another commonly used naming convention would look like this:
The only difference here is that the site/pod distinction has been done away with in favor of just UCS Pod ID. So while this example won’t allow you to easily distinguish a particular site, it will give you much larger Pod ID possibilities. There’s no right answer as to which is best, it really just depends on the environment and personal preference. For WWPN pools, I follow an almost identical naming scheme:
Again, the Cisco prefix can be modified but I just prefer to leave it as it is. For WWNN, I follow a very similar convention except that I exclude Fabric ID:
As you can see, whether I’m looking at the MAC address, WWPN or WWNN I can easily discern from which site and pod the address originates, what OS the address belongs to and what fabric it is communicating out of. UUID pools can be named similarly:
This doesn’t have to and shouldn’t be complicated. These simple, common naming schemes will not only ensure unique, informative and easily discernible addresses but can make common management tasks such as network traces or zoning that much more easy. Use the above examples as a guideline, but feel free to customize if there’s a scheme that fits your environment better. For more on this topic, I recommend the following resources:
In my last post I touched briefly on a claim I’m hearing a lot in IT circles these days. This claim is often heard in discussions surrounding multi-hypervisor environments and most recently in VDI discussions. The claim in question, at its’ core, says this – “If you have two procedures to perform the same task you double your operational expense in performing that task”. Given the prevalence of this argument I wanted to focus on this in one post even though I’ve touched on it elsewhere.
As mentioned in my last post, Shawn Bass recently displayed this logic in a debate at VMworld. The example given is a company with a mixture of physical and virtual desktops. In this scenario they manage their physical desktops with Altiris/SCCM and use image-based management techniques for their non-persistent virtual desktops. Since you are using two different procedures to accomplish the same task (update desktops), it is claimed that you then “double” your operational expense.
As I’ve said, in many scenarios this is clearly false. The only way having two procedures “doubles” your operational cost is if both procedures require an equal amount of time/effort/training/etc. to implement and maintain. And the odd thing about this example is that it actually proves the opposite of what it claims. It’s very common for organizations to have physical desktops that they manage differently than their non-persistent virtual desktops. Are these organizations just not privy to the nuances of operational expenditures? I don’t think so, these organizations in many cases chose VDI at least in part for easier desktop management. For many, it’s just easier and much faster to maintain a small group of “golden images” rather than hundreds or thousands of individual images. So in this example adding the second procedure of image-based management can actually reduce the overall operational expense. Now a large portion of my desktops can be managed much more efficiently than they were before, this reduces the overall time and energy I spend managing my total desktops and thus, reduces my operational expense.
We see this same logic in a lot of multi-hypervisor discussions as well. “Two hypervisors, two ways of managing things, double the operational expense”. When done wrong, a multi-hypervisor environment can fall into this trap. However, before treating this logic as universally true you have to evaluate your own IT staff and workload requirements. Some workloads will be managed/backed up/recovered in a disaster/etc. differently than the rest of your infrastructure anyway, so putting these workloads on a separate hypervisor isn’t going to add to that expense. The management of the second hypervisor itself doesn’t necessarily “double” your cost as in many cases the knowledge your staff already possesses on how a hypervisor works in general can translate well into managing an alternate hypervisor. A lot more could be said here but in the end, CAPEX savings should override any nominal added OPEX expense or you’re doing it wrong.
In general, standardization and common management platforms are things every IT department should strive for. Like “best practice” recommendations from vendors, however, we don’t apply them universally. The main problem with this line of thinking is that it states a generalization as a universal truth and applies it to all situations while ignoring the subtle complexities of individual environments. In IT, it’s just not that easy.
There was a good discussion at VMworld this year between persistent and non-persistent VDI proponents. The debate spawned from discussions on twitter surrounding a blog post by Andre Leibovici entitled “Open letter to non-persistent VDI fanboys…”. Representing the persistent side of the debate was Andre Leibovici and Shawn Bass. Non-persistent fanboys were represented by Jason Langone and Jason Mattox. Overall, this is a good discussion with both sides pointing out some strengths and weaknesses of each position:
So which is the better VDI management model, persistent or non-persistent? Personally I think Andre nailed it near the end of the debate, it’s all about use case! I know that’s the typical IT answer to most questions but it really is the best answer in many of these “best tech” debates. What matters to most customers is not which is the “best” but which is the “right fit”. A Ferrari may be the best car in the world but it’s clearly not the right fit for a family of four on a budget. So while it may be fun and entertaining to discuss which is the best, in the real-world, the most relevant question is ‘which is the right fit given a particular use case?’. If you have a call center with a small application portfolio, then this is an obvious use case for non-persistent desktops (though certainly not the only use case). I agree with the persistence crowd in regards to larger environments that have extensive application portfolios. The time it takes to virtualize and package all these applications and the impossibly large amount of software required to go non-persistent for all desktops in such an environment (UEM, app publishing, app streaming, etc.) makes persistence a much more viable option. This is why many VDI environments will usually have a mixture of persistent and non-persistent desktops. These are extreme examples but it’s clear that no one model is perfect for every situation.
Other random thoughts from this discussion:
—Throughout the debate and in most discussions surrounding persistent desktops, the persistent desktop crowd often points to new technology advances that make persistent desktops a viable option. Flash-based arrays, inline de-duplication, etc. are all cited as examples. The only problem with this is that while this technology exists today, many customers still don’t have it and aren’t willing to make the additional investment in a new array or other technology on top of the VDI software investment. So the technology exists and we can have very high-level, academic discussions on running persistent desktops with this technology but for many customers it’s still not a reality.
—Here again, like most times this discussion crops up, the non-persistent crowd makes a point of trumpeting the ease of managing non-persistent desktops while glossing over how difficult it can be to actually deploy this desktop type when organizations are seeking a high percentage of VDI users. Even if we ignore the technical challenges around application delivery, users still have to like the desktop…and most companies will have more users than they know that will require/demand persistent desktops.
—About midway through the debate there is talk about how non-persistence is limiting the user and installing apps is what users want, but earlier in the debate the panel all agreed that just allowing users to install whatever app they want is a security and support nightmare. I found this dichotomy interesting in that it illuminates this truth – whichever desktop model you choose the user is limited in some way. Whatever marketing you may hear to the contrary, remember that.
And last but certainly not least…
—In this debate Shawn delivers an argument I hear a lot in IT that I disagree with and maybe this deserves a separate post. He talks about the “duality” of operational expense when you are managing non-persistent desktops using image-based management in an environment where you still have physical endpoints being managed by Altiris/SCCM. He says you actually “double” your operational expence managing these desktops in different ways. The logic undergirding this argument is the assumption that ‘double the procedure equals double the operational cost’. To me this is not necessarily true and for many environments, definitely false. The only way having two procedures “doubles” your operational cost is if both procedures require an equal amount of time/effort/training/etc. to implement and maintain. And for many customers (who implement VDI at least partly for easier desktop managment) it’s clear that image-based management is viewed as the easier and faster solution to maintain desktops. I see this same logic applied to multi-hypervisor environments as well and simply disagree that having multiple procedures is always going to mean you double or even increase your operational cost.
Any other thoughts, comments or disagreements are welcome in the comment section!
A couple months ago F5 came out with a very intriguing announcement when they released full proxy support for PCoIP on the latest Access Policy Manager code version, 11.4. Traditional Horizon View environments use “Security Servers” to proxy PCoIP connections from external users to desktops residing in the datacenter. Horizon View Security Servers will reside in the DMZ and the software is installed on Windows hosts. This new capability from F5 completely eliminates the need for Security Servers in a Horizon View architecture and greatly simplifies the solution in the process.
In addition to eliminating Security Servers and getting Windows hosts out of your DMZ, this feature simplifies Horizon View in other ways that aren’t being talked about as much. One caveat to using Security Servers is that they must be paired with Connection Servers in a 1:1 relationship. Any sessions brokered through these Connections Servers will then be proxied through the Security Servers they are paired with. Because Security Servers are located in the DMZ, this setup works fine for your external users. For internal users, a separate pair of Connection Servers are usually needed so users can connect directly to their virtual desktop after the brokering process without having to go through the DMZ. To learn more about this behavior see here and here.
Pictured below is a traditional Horizon View deployment with redundancy and load balancing for all the necessary components:
What does this architecture look like when eliminating the Security Servers altogether in favor of using F5’s ability to proxy PCoIP?
As you can see, this is a much simpler architecture. Note also that each Connection Server supports up to 2000 connections per server. I wouldn’t recommend pushing that limit but the above servers could easily support around 1500 total users (accounting for the failure of one Connection Server). If you wanted full redundancy and automatic failover with Security Servers in the architecture, whether it was for 10 or 1500 external users, you would still need at least 2 Security and 2 Connection servers. A lot of times they are not there so much for increased capacity but just for redundancy for external users, so eliminating them from the architecture can easily simplify your deployment.
But could this be simplified even further?
In this scenario the internal load balancers were removed in favor of the load balancers in the DMZ having an internal interface configured with an internal VIP for load balancing. Many organizations will not like this solution because it will be considered a security risk for the device in the DMZ to have interfaces physically outside the DMZ. ADC vendors and partners will claim their device is secure but most customers still aren’t comfortable with this solution. Another solution for small deployments with limited budget would be to just place that VIP in the above picture in the DMZ. Internal users will still connect directly to their virtual desktops on the internal network and the DMZ VIP is only accessed during the initial load balancing process for the Connection Servers. Regardless of whether you use an internal VIP or another set of load balancers, this solution greatly simplifies and secures a Horizon View architecture.
Overall, I’m really excited by this development and am interested in seeing if other ADC vendors offer this functionality for PCoIP in the near future or not. To learn more, see the following links:
There’s been a lot of industry news lately regarding Software-Defined Storage, Software-Defined Data Centers and hyper-convergence . After numerous conversations with various colleagues and friends about these concepts, I wanted to post my own thoughts on them and how I believe they are related.
First off, hyper-convergence has usually been used to denote the “next stage” in modern converged infrastructure. With many of the popular reference architectures or pre-built systems representing some level of “convergence”, hyper-convergence has come to refer to those systems that combine multiple data center tiers into a single appliance. However, as a term, I’ve come to view “hyper-convergence” as a misnomer. When looking at the modern landscape of integrated infrastructure platforms, there is only “convergence” and “simulated convergence”. Examples of converged infrastructure include Nutanix, Simplivity, et al while simulated convergence examples can be found in vBlock, VSPEX and FlexPod. And while there is differentiation within the simulated convergence platforms (e.g. pre-built vBlock vs. reference architectures VSPEX/FlexPod), they are only “converged” insofar as their disparate components are cabled and racked together in a branded rack and sometimes managed with common software (e.g. Cloupia). With simulated convergence, each “tier” of the data center is still represented by separate hardware components and an attempt at unity is made through the use of “single-pane” management software. Convergence differs from this in that data center tiers are consolidated into common hardware components which naturally increase management software simplicity as well.
Another interesting difference is that while simulated convergence offers simplified management and automation, convergence gives you these same things plus performance, cost and reduced complexity benefits as well. Because convergence moves data center tiers into a common platform, this naturally puts the network/compute/storage into closer proximity to each other, enabling greater performance and reduced complexity. Cost savings are achieved not only through hardware consolidation but operational expenditures can be lessened in a converged model as well.
None of this is to say that simulated convergence is worthless. On the contrary, simulated convergence via management software and reference architecture/pre-built configurations can greatly increase the consume-ability and ease of management of these separate components. Simulated convergence gives you increased efficiency on legacy platforms that organizations already have in place and already have knowledge on how to manage. It’s an improvement over traditional processes but it is not actual convergence, which is the next logical progression.
Indeed, say what you will about specific converged offerings but it’s hard to see why convergence as a model wouldn’t be the clear path to simplified software-defined data centers. No matter how much management software and automation you put in front of it, simulated convergence will always require specialized knowledge of various levels of divergent hardware components in order to properly maintain and run that model. You would never deploy a vBlock and only train your support staff on just Cloupia or vCenter with VSI plugins. No, for advanced troubleshooting and configuration an in-depth knowledge of all the network, hypervisor, compute, storage network and array components is necessary as well. Management software can mask the complexity, but it’s still there. It doesn’t move the control plane, it just creates another one.
Converged infrastructure that relies on commodity hardware and is software/virtualization-based shifts the focus from tier-based component management and support to a more holistic data center view. Under the converged model , the deployment and ongoing maintenance of the underlying infrastructure is greatly simplified, allowing for faster application deployment , monitoring and troubleshooting. In short, you spend much less time on your physical infrastructure and more time focusing on the business. Of course, hardware is still necessary on such a system but that’s not where the intelligence lies and as we’ve seen, there’s much less of it!
Going forward, I’m convinced that the popularity of convergence will only increase. What will be interesting to see is how the major compute/storage vendors handle this shift. As convergence increases, will a storage and compute vendor team up to sell their own converged solution? Will one of the startup convergence companies be acquired? Whatever happens, this will be one of the more exciting areas of IT to be involved with for many years to come. I can’t wait!