My First Round with KVM Nested Virtualization

During the last couple of weeks I took some time to try out the next step in virtualization tech: nested virtualization. I set the target to add to our research infrastructure a server that was capable of managing multiple layered virtualization with the aim to learn something about the current status of KVM development regarding this particular feature, while checking out the relative performance levels and finding out if managing a host such as this and its virtual siblings through our own oVirt implementation was a viable option.

What are we talking about? 
Nested virtualization is, as the name implies, a feature that gives a virtual host the ability to run a hypervisor, rendering it capable of hosting one or more virtual machines on its own. To put it in a simple, straight forward way: this is achieved by giving the virtual host direct access to the cpu virtualization extensions (svm, vmx) of its parent physical host. In the future this will be especially interesting to those running IaaS and public virtualization infrastructures. It brings another abstraction layer that improves V.M. isolation, management and, most of all, the ability to run two different hypervisors, one on top of the other. This implies that nested virtualization will make it easier to migrate virtual assets, which are mostly hypervisor dependent, while making it easier to manage large scaled virtualization infrastructures. Just imagine hosting hundreds of virtual machines for a few customers. While zoning is possible, the number of V.M.s will still be overwhelming. What if we had a single V.M. hosting a customer's whole virtual infrastructure? What if we could port all their V.M. assets, which could be based on a different hypervisor, onto a single, manageable virtual entity? This could potentially improve legacy support too. It's not all sunshine and roses of course, as more abstraction usually brings more complex, convoluted technology - yet the potential is surely appealing. V.M. nesting is still in its teen years: while it can be found in most commercial (eg: VMWare) and open source solutions (eg: KVM, Xen) it's still widely considered an unstable feature.

The trials

Centos 6.3
Given that the standard distro in our infrastructure is CentOS, I decided to take a first dive into CentOS 6.3. Little did I know that bad news was in store. I found out that the kernel’s latest version was 2.6.32 and the available KVM module is behind in development. As it turned out the KVM module was barely capable of enabling nested virtualization. While the vmx extension was accessible on the level one virtual guest, which means that I was able to successfully install and start libvirtd, the whole "virtual stack" was stuck as soon as I tried to run the level two guest. I tried to disable virtIO and all those features that were non-critical (usb, sound devices, etc.), but that was useless. Since CentOS is supposed to be the tried and true distribution and other solutions outside the vanilla distro were not on my book (say - no custom kernel) I decided to leave this path while going one step forward anyway.

oVirt node 2.5.5
Our virtualization infrastructure is oVirt based. Since oVirt-node, oVirt distribution for host nodes, is Fedora based, the kernel level is pretty much up to the latest versions. I went into this with some optimism, knowing that the latest releases of the kernel could give me a decent chance at running a nested KVM stack. Then I learned that things weren't so easy, and I did it the hard way. The issue lies in the way oVirt creates it's virtual guests. I couldn't find a way to tell oVirt that it should instruct qemu to enable the vmx extension in the emulated cpu features. Moreover, the largest slice of an oVirt-node filesystem is non-persistent, which means that an extra effort is required to maintain changes applied to the system. oVirt apparently does not store a template of the virtual guest, neither in its database nor in the filesystem (eg. libvirt stores an xml template for its virtual guests), so I couldn't find a way to manually enforce such a feature. I began asking around the community for some advice, and, courtesy of a couple kind oVirt devs, discovered that there's a patch for the VDSM (Virtual Desktop and Server Manager) that implements a hook that lets virtual guests that run on a physical node with KVM nesting enabled expose Intel's vmx feature. Once again, time was running short and the effort required crippled my chances.

Fedora 16
With just a couple of days left I decided to go for a less fiddly approach and installed Fedora 16 on my physical host, since it was promptly waiting on our pxe server. Everything went pretty smooth this time. Once the host was installed and up to date and the KVM module was told to enable nesting I created a Centos L1 guest in a breeze. I proceeded to setup up the nested L2 guest, paying attention to setting the disk bus as IDE, given that I knew that VirtIO was not supposed to work in a nested configuration. The second level guest came to life easily, but I noticed a substantial overhead on the L1 guest during disk operations. I shut down the L2 V.M. and changed the disk bus to virtIO and, much to my amazement, the V.M. happily booted while the overhead was reduced. At this stage, I was pretty curious about nesting different hypervisors, so I set a Centos 5.8 Xen guest up, but it refused to boot into Xen, freezing with a kernel panic message. Next item on the list was Hyper-V - the Windows server 2012 machine correctly installed but, just as its open source counterpart, it froze during boot sequence.

KVM nested virtualization is basically working, yet not stable enough to be brought under the spotlight. I encountered even more issues while trying to run more than one vmx-enabled L1 guest. Yet, KVM on KVM nesting worked properly and, given the constant development and the minimal overhead, it really is promising. Concerning my previously set target, I'll recompile the VDSM and try to convert my fedora server into an oVirt node, a thing which I'll eventually attempt as soon as I'll find more spare time. In the meantime, thanks for reading this and feel free to share your experiences here.

Quick update:
I've succesfully deployed a nested configuration using Fedora 18 Beta. I'm having three L2 guests running inside a L1 guest and everything is looking pretty at the moment. I'm still fighting with VDSM to convert my host in an oVirt node, hope to have more good news soon.


Linus against Nvidia

In the last days Linus Torvalds called Nvidia "worst company".

I asked Nvidia what they want to do to support open source developer.

This is Nvidia official reply received.

Supporting Linux is important to NVIDIA, and we understand that there are people who are as passionate about Linux as an open source platform as we are passionate about delivering an awesome GPU experience.

Recently, there have been some questions raised about our lack of support for our Optimus notebook technology. When we launched our Optimus notebook technology, it was with support for Windows 7 only.  The open source community rallied to work around this with support from the Bumblebee Open Source Project http://bumblebee-project.org/. And as a result, we've recently made Installer and readme changes in our R295 drivers that were designed to make interaction with Bumblebee easier.

While we understand that some people would prefer us to provide detailed documentation on all of our GPU internals, or be more active in Linux kernel community development discussions, we have made a decision to support Linux on our GPUs by leveraging NVIDIA common code, rather than the Linux common infrastructure.  While this may not please everyone, it does allow us to provide the most consistent GPU experience to our customers, regardless of platform or operating system.

As a result:

1) Linux end users benefit from same-day support for new GPUs , OpenGL version and extension parity between NVIDIA Windows and NVIDIA Linux support, and OpenGL performance parity between NVIDIA Windows and NVIDIA Linux.

2) We support a wide variety of GPUs on Linux, including our latest GeForce, Quadro, and Tesla-class GPUs, for both desktop and notebook platforms. Our drivers for these platforms are updated regularly, with seven updates released so far this year for Linux alone. The latest Linux drivers can be downloaded  from www.nvidia.com/object/unix.html.

3)  We are a very active participant in the ARM Linux kernel.  For the latest 3.4 ARM kernel – the next-gen kernel to be used on future Linux, Android, and Chrome distributions – NVIDIA ranks second in terms of total lines changed and fourth in terms of number of changesets for all employers or organizations.

At the end of the day, providing a consistent GPU experience across multiple platforms for all of our customers continues to be one of our key goals.