Virtualization: Commercial Xen (Part 2)

10Jul08

Enter Commercial Xen (VirtualIron):

Last year, I was planning a migration away from a mail system I wasn’t happy with to the Zimbra groupware system (which looks and works great). However, I really wanted nearly unstoppable uptime even in the event of hardware failure. I knew that Xen’s live migration capability would offer me that. Using the freely distributable open source software I ran into several issues during implementation. VirtualIron was the product that finally came in to solve all of those problems.

When I first set out to virtualize Zimbra, I tried installing it on a RedHat Enterprise Linux (RHEL) paravirtualized machine running on top of Gentoo Linux with a Xen kernel. As soon as I tried to install it, Zimbra complained that I needed to have NPTL (native POSIX thread library) support in the kernel. This was not possible with Xen in paravirtualized mode. The only options I had were to run RHEL on bare metal, which would not afford me the unstoppable uptime, or to run it in a Xen HVM (full virtualization) environment. For my second attempt, I chose the second route. I was unaware of these issues when I gave Xen a second try. I mentioned in part 1 that HVM has network and disk I/O issues, but they are resolved by VirtualIron’s VSTools software.

False Starts:

I got a system that I could test with and set up a TEST Zimbra environment with CentOS 5 as Domain0 and RHEL 5 as an HVM domain. Fortunately, I ran into problems pretty quickly while testing. The first big one, was that fully virtualized Xen HVM domains CANNOT be live migrated or paused. The second issue was, of course, the bottleneck in the RAM utilization on Domain0. If your HVM domain’s disk and network I/O is very high for an application, you’ll likely wipe out all the RAM in the Domain0 and performance will suffer as your disk and network I/O attempt to work via swapping or experience long wait states. The third point, which wasn’t really an issue but more a concern from experience, was that this is whan I found out that Xen’s fully virtualized environment was really a specialized QEMU process. This was a bit worrying to me given that I wasn’t too impressed with QEMU’s performance at that point in time. Specifically because of the disk and network I/O issues mentioned above. But, I didn’t yet know the details as to the cause.

So after seeing poor disk and network performance, I did more research and more digging around for other possible approaches. I briefly considered the OpenVZ project which doesn’t really virtualize, but is more akin to chroot. While it’s quite useful and can do many of the same things that Xen can do, it’s a completely different approach and one that I wasn’t fully comfortable with. Specifically because all virtual environments are running under one Linux kernel. Then I found a blog entry comparing virtualization techniques and noted a reference to VirtualIron’s Xen base product that explained the limitations of Xen 3.x’s HVM domains and how VirtualIron worked around them. Knowing what I knew now, I recommended that we purchase VirtualIron for production. For my third attempt, we bought VirtualIron’s version of Xen which turned out to be very nice. I was expecting “your grandfather’s virtualization techniques”, but I was completely mistaken as I would find out later.

Learning VirtualIron:

One of VirtualIron’s big points is that they don’t use paravirtualization at all. This isn’t really a good or bad thing, it’s just their way of approaching virtualization. They have also been contributing back to the Xen project, so good on them! Instead, they chose to focus on the special version of QEMU included with Xen to bring it up to speed for their product. So they made sure it could do live migration! They also worked around the disk and net I/O issues by creating custom drivers and management software (The aforementioned VSTools) to be installed in the guest after you have the OS running. This limits your choice of OS to run in HVM domains unless you’re willing to build your own VSTools from their recently opened source. They currently support Windows guests up to Windows 2003 Server, and many of the most common “big name” Linux distros. Previously, VirtualIron was using a different proprietary virtualization technology which gave them a chance to develop their management tools into the robust system they are today. They moved over to Xen later but kept their management methodology which works quite nicely with Xen’s best features.

For our solution, we got two big servers for physically hosting our VMs. HP servers with 32 gigs of RAM each, and two dual-core Xeon 64-bit CPUs each. They also have fiber channel interfaces that connect to an HP SAN back-end where we store our HVM domain images. Original I had assumed that I would install VirtualIron on each of these boxes over top of a normal Linux distro, just as I did with Xen kernel installations or any other typical virtualization technology. I did just that and was lost for a bit since this was the completely wrong approach. All it seemed to do was install a DHCP server, a TFTP server, and a Java App server (Jetty if you’re curious) and no Xen kernel. Where was the Xen kernel for the systems to boot into? Digging into their online documentation provided me with a diagram of the architecture of VirtualIron which clarified things considerably. The Java based management interface for VirtualIron contains a “walk through” set up document in a pane on the right hand side of the interface. THAT is where I finally learned and understood the actual architecture and layout. My assumptions were originally completely incorrect. I should have read the manual first!

To use VirtualIron Enterprise (we didn’t go with their Single Server product which DOES work like VMWare and others) you need at least one “management server” and one “managed node”. The management server can be one of a few supported Linux distros, or Windows. The fact that the manager could be a Windows box really confused me at first, because I couldn’t understand how they would get a Xen kernel installed under an already existing Windows installation. (Yes Virtual Iron’s manager can easily be installed on a Windows server to manage the raw hardware nodes.) Again, I was still doubtful about their approach and so was wrong in that line of thinking. But, once I understood the architecture, I was both in awe and very eager to see this thing work. So I proceeded…

The VirtualIron Way:

In my case, I have two managed nodes (those monster servers with 32 gigs each) and one manager (a Xeon dual CPU 32-bit system with 2 gigs of RAM and dual NICs). The manager is running CentOS 4.5, which is supported by VirtualIron as a host for the Enterprise manager. Once I had that installed and had the management network up (you need a separate LAN dedicated to the manager and each node that you can consider “out of band”), I set one of my managed nodes to do a PXE network boot off of the manager. That’s correct, you DON’T need to install a single thing on the managed nodes, they boot via the network from Xen images stored on the manager. It’s all diskless booting via the NICs. The TFTP server and the DHCP server give this box an IP address, and point it to a pre-configured Xen microkernel boot image. Their boot image is a Xen hypervisor with a very stripped down Suse Linux Enterprise 10 (SLES10) on it. So stripped down that the managed nodes can run headless as there is ZERO interaction on those boxes other than the power button.

Once the managed node loads it’s boot from the network, it shows up in the Java management interface and you’re ready to create VMs and assign them RAM, CPU, network and storage. (NOTE: You need to check their hardware compatibility list to see if your intended server hardware is supported) In our case, the SLES10 image has drivers for our Emulex LightPulse fiberchannel HBAs, so LUNs presented by the SAN are fully accessible from within the VirtualIron manager (storage connects to the virtualization nodes, not the manager). Once VirtualIron was up, I was off and running installing RHEL 4.5 for my Zimbra installation and the special drivers and software to improve disk and network performance as well as enable live migration for HVM domains. Performance of the virtual machine was definitely very impressive. Also, keep in mind that the managed nodes that host your HVM domains don’t need to have anything installed on them in any way at all. No OS, no kernel, no boot loader, absolutely nothing. This makes the nodes essentially hot swapable as long as you keep your VM utilization across both low enough that one node can host all VMs. Beyond that, not only do they run headless, but you don’t need ANY storage in them at all if you don’t want it and have a SAN or other remote storage like iSCSI. All VM configuration resides on the managing server. So that’s the system you want backed up reliably. But there is literally nothing on the virtualization nodes at all.

Giving it a Go:

After I got it all up and running and had a Zimbra TEST instance running with VSTools installed, it was time to try a live migration. I opened up the groupware web based client and started some work in it to mimic a standard user. Then I opened the VirtualIron manager and located the Zimbra TEST instance in the Virtual Machines tree. I clicked on it and dragged it to the other physical host to initiate the migration. Then I went back and started working in my Zimbra session while I watched the migration proceed. At this point, the manager tells Xen on the source node to dump the contents of the memory for the domain that will be moved. That memory state is copied to the destination node (the other physical host) and then a final sync is done before the copy is brought up to a live state on the destination and original domain on the source is extinguished.

As this was happening, my Zimbra client continued to function completely normally. An end user wouldn’t notice a thing. For extra points, I actually shut down the physical host that originally contained the migrated domain. Absolutely no effect. It was like the HVM domain was never shutdown or moved. I’ve since made use of the live migration feature for a variety of situtions where I didn’t want down time for services but needed to make a change to hardware. After almost a year of using VirtualIron, I’m very satisfied with it from a performance perspective. VirtualIron is quite powerful and relatively inexpensive compared to other high end solutions. It can bring the power of Xen virtualization to anyone who wants it, even if they’ve never touched Linux at all. I find that to be quite an amazing thing considering Xen’s complexity. So if you have a need to run an unmodified OS (Windows or a supported Linux distribution) on Xen, I would highly recommend the VirtualIron product. However, if you are like I am at home, and run Linux nearly exclusively on the server side, consider the open source version of Xen itself using paravirtualization. Paravirtualization does not have the disk and network performance issues that HVM does.

Advertisements


One Response to “Virtualization: Commercial Xen (Part 2)”

  1. I have been working with Virtual Iron for about one year and really like it. I am also trying to get a Zimbra server working with Virtual Iron, but don’t like the problems that I have seen with the VS Tools. When I upgraded my Cent OS Server, I would have to recompile and recreate the VS Tools. I am a Windows admin for many years and I guess I will have to figure this out.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: