Intro
During the last couple of weeks I took some time to try out
the next step in virtualization tech: nested virtualization. I set
the target to add to our research infrastructure a server that was
capable of managing multiple layered virtualization with the aim to
learn something about the current status of KVM development regarding
this particular feature, while checking out the relative performance
levels and finding out if managing a host such as this and its
virtual siblings through our own oVirt implementation was a viable
option.
What are we talking about?
Nested
virtualization is, as the name implies, a feature that gives a
virtual host the ability to run a hypervisor, rendering it capable of
hosting one or more virtual machines on its own. To put it in a
simple, straight forward way: this is achieved by giving the virtual
host direct access to the cpu virtualization extensions (svm, vmx) of
its parent physical host. In the future this will be especially
interesting to those running IaaS and public virtualization
infrastructures. It brings another abstraction layer that improves
V.M. isolation, management and, most of all, the ability to run two
different hypervisors, one on top of the other. This implies that
nested virtualization will make it easier to migrate virtual assets,
which are mostly hypervisor dependent, while making it easier to
manage large scaled virtualization infrastructures. Just imagine
hosting hundreds of virtual machines for a few customers. While
zoning is possible, the number of V.M.s will still be overwhelming.
What if we had a single V.M. hosting a customer's whole virtual
infrastructure? What if we could port all their V.M. assets, which
could be based on a different hypervisor, onto a single, manageable
virtual entity? This could potentially improve legacy support too.
It's not all sunshine and roses of course, as more abstraction
usually brings more complex, convoluted technology - yet the
potential is surely appealing. V.M. nesting is still in its teen
years: while it can be found in most commercial (eg: VMWare) and open
source solutions (eg: KVM, Xen) it's still widely considered an
unstable feature.
The trials
Centos
6.3
Given that the standard distro in our infrastructure is
CentOS, I decided to take a first dive into CentOS 6.3. Little did I
know that bad news was in store. I found out that the kernel’s
latest version was 2.6.32 and the available KVM module is behind in
development. As it turned out the KVM module was barely capable of
enabling nested virtualization. While the vmx extension was
accessible on the level one virtual guest, which means that I was
able to successfully install and start libvirtd, the whole "virtual
stack" was stuck as soon as I tried to run the level two guest.
I tried to disable virtIO and all those features that were
non-critical (usb, sound devices, etc.), but that was useless. Since
CentOS is supposed to be the tried and true distribution and other
solutions outside the vanilla distro were not on my book (say - no
custom kernel) I decided to leave this path while going one step
forward anyway.
oVirt node 2.5.5
Our
virtualization infrastructure is oVirt based. Since oVirt-node, oVirt
distribution for host nodes, is Fedora based, the kernel level is
pretty much up to the latest versions. I went into this with some
optimism, knowing that the latest releases of the kernel could give
me a decent chance at running a nested KVM stack. Then I learned that
things weren't so easy, and I did it the hard way. The issue lies in
the way oVirt creates it's virtual guests. I couldn't find a way to
tell oVirt that it should instruct qemu to enable the vmx extension
in the emulated cpu features. Moreover, the largest slice of an
oVirt-node filesystem is non-persistent, which means that an extra
effort is required to maintain changes applied to the system. oVirt
apparently does not store a template of the virtual guest, neither in
its database nor in the filesystem (eg. libvirt stores an xml
template for its virtual guests), so I couldn't find a way to
manually enforce such a feature. I began asking around the community
for some advice, and, courtesy of a couple kind oVirt devs,
discovered that there's a patch for the VDSM (Virtual Desktop and
Server Manager) that implements a hook that lets virtual guests that
run on a physical node with KVM nesting enabled expose Intel's vmx
feature. Once again, time was running short and the effort required
crippled my chances.
Fedora 16
With just a couple
of days left I decided to go for a less fiddly approach and installed
Fedora 16 on my physical host, since it was promptly waiting on our
pxe server. Everything went pretty smooth this time. Once the host
was installed and up to date and the KVM module was told to enable
nesting I created a Centos L1 guest in a breeze. I proceeded to setup
up the nested L2 guest, paying attention to setting the disk bus as
IDE, given that I knew that VirtIO was not supposed to work in a
nested configuration. The second level guest came to life easily, but
I noticed a substantial overhead on the L1 guest during disk
operations. I shut down the L2 V.M. and changed the disk bus to
virtIO and, much to my amazement, the V.M. happily booted while the
overhead was reduced. At this stage, I was pretty curious
about nesting different hypervisors, so I set a Centos 5.8 Xen guest
up, but it refused to boot into Xen, freezing with a kernel panic
message. Next item on the list was Hyper-V - the Windows server 2012
machine correctly installed but, just as its open source counterpart,
it froze during boot sequence.
Conclusions
KVM nested
virtualization is basically working, yet not stable enough to be
brought under the spotlight. I encountered even more issues while
trying to run more than one vmx-enabled L1 guest. Yet, KVM on KVM
nesting worked properly and, given the constant development and the
minimal overhead, it really is promising. Concerning my previously
set target, I'll recompile the VDSM and try to convert my fedora
server into an oVirt node, a thing which I'll eventually attempt as
soon as I'll find more spare time. In the meantime, thanks for
reading this and feel free to share your experiences here.
Quick update:
I've succesfully deployed a nested configuration using Fedora 18 Beta. I'm having three L2 guests running inside a L1 guest and everything is looking pretty at the moment. I'm still fighting with VDSM to convert my host in an oVirt node, hope to have more good news soon.