My First Round with KVM Nested Virtualization

During the last couple of weeks I took some time to try out the next step in virtualization tech: nested virtualization. I set the target to add to our research infrastructure a server that was capable of managing multiple layered virtualization with the aim to learn something about the current status of KVM development regarding this particular feature, while checking out the relative performance levels and finding out if managing a host such as this and its virtual siblings through our own oVirt implementation was a viable option.

What are we talking about? 
Nested virtualization is, as the name implies, a feature that gives a virtual host the ability to run a hypervisor, rendering it capable of hosting one or more virtual machines on its own. To put it in a simple, straight forward way: this is achieved by giving the virtual host direct access to the cpu virtualization extensions (svm, vmx) of its parent physical host. In the future this will be especially interesting to those running IaaS and public virtualization infrastructures. It brings another abstraction layer that improves V.M. isolation, management and, most of all, the ability to run two different hypervisors, one on top of the other. This implies that nested virtualization will make it easier to migrate virtual assets, which are mostly hypervisor dependent, while making it easier to manage large scaled virtualization infrastructures. Just imagine hosting hundreds of virtual machines for a few customers. While zoning is possible, the number of V.M.s will still be overwhelming. What if we had a single V.M. hosting a customer's whole virtual infrastructure? What if we could port all their V.M. assets, which could be based on a different hypervisor, onto a single, manageable virtual entity? This could potentially improve legacy support too. It's not all sunshine and roses of course, as more abstraction usually brings more complex, convoluted technology - yet the potential is surely appealing. V.M. nesting is still in its teen years: while it can be found in most commercial (eg: VMWare) and open source solutions (eg: KVM, Xen) it's still widely considered an unstable feature.

The trials

Centos 6.3
Given that the standard distro in our infrastructure is CentOS, I decided to take a first dive into CentOS 6.3. Little did I know that bad news was in store. I found out that the kernel’s latest version was 2.6.32 and the available KVM module is behind in development. As it turned out the KVM module was barely capable of enabling nested virtualization. While the vmx extension was accessible on the level one virtual guest, which means that I was able to successfully install and start libvirtd, the whole "virtual stack" was stuck as soon as I tried to run the level two guest. I tried to disable virtIO and all those features that were non-critical (usb, sound devices, etc.), but that was useless. Since CentOS is supposed to be the tried and true distribution and other solutions outside the vanilla distro were not on my book (say - no custom kernel) I decided to leave this path while going one step forward anyway.

oVirt node 2.5.5
Our virtualization infrastructure is oVirt based. Since oVirt-node, oVirt distribution for host nodes, is Fedora based, the kernel level is pretty much up to the latest versions. I went into this with some optimism, knowing that the latest releases of the kernel could give me a decent chance at running a nested KVM stack. Then I learned that things weren't so easy, and I did it the hard way. The issue lies in the way oVirt creates it's virtual guests. I couldn't find a way to tell oVirt that it should instruct qemu to enable the vmx extension in the emulated cpu features. Moreover, the largest slice of an oVirt-node filesystem is non-persistent, which means that an extra effort is required to maintain changes applied to the system. oVirt apparently does not store a template of the virtual guest, neither in its database nor in the filesystem (eg. libvirt stores an xml template for its virtual guests), so I couldn't find a way to manually enforce such a feature. I began asking around the community for some advice, and, courtesy of a couple kind oVirt devs, discovered that there's a patch for the VDSM (Virtual Desktop and Server Manager) that implements a hook that lets virtual guests that run on a physical node with KVM nesting enabled expose Intel's vmx feature. Once again, time was running short and the effort required crippled my chances.

Fedora 16
With just a couple of days left I decided to go for a less fiddly approach and installed Fedora 16 on my physical host, since it was promptly waiting on our pxe server. Everything went pretty smooth this time. Once the host was installed and up to date and the KVM module was told to enable nesting I created a Centos L1 guest in a breeze. I proceeded to setup up the nested L2 guest, paying attention to setting the disk bus as IDE, given that I knew that VirtIO was not supposed to work in a nested configuration. The second level guest came to life easily, but I noticed a substantial overhead on the L1 guest during disk operations. I shut down the L2 V.M. and changed the disk bus to virtIO and, much to my amazement, the V.M. happily booted while the overhead was reduced. At this stage, I was pretty curious about nesting different hypervisors, so I set a Centos 5.8 Xen guest up, but it refused to boot into Xen, freezing with a kernel panic message. Next item on the list was Hyper-V - the Windows server 2012 machine correctly installed but, just as its open source counterpart, it froze during boot sequence.

KVM nested virtualization is basically working, yet not stable enough to be brought under the spotlight. I encountered even more issues while trying to run more than one vmx-enabled L1 guest. Yet, KVM on KVM nesting worked properly and, given the constant development and the minimal overhead, it really is promising. Concerning my previously set target, I'll recompile the VDSM and try to convert my fedora server into an oVirt node, a thing which I'll eventually attempt as soon as I'll find more spare time. In the meantime, thanks for reading this and feel free to share your experiences here.

Quick update:
I've succesfully deployed a nested configuration using Fedora 18 Beta. I'm having three L2 guests running inside a L1 guest and everything is looking pretty at the moment. I'm still fighting with VDSM to convert my host in an oVirt node, hope to have more good news soon.