February 7, 2018
Some notes about my explorations of the World of KVM virtualization.
In my never ending pursuit of abstraction and encapsulation I recently started integrating all of my services into docker containers and deploying them inside virtual kvm guests. This article presents my continues effort to summarize the findings of this ongoing journey, which implies that the following material might be subject to change anytime without notice. This collection of notes and sentimental thoughts comes without any warranty or implication of fitness for any purpose. You have been warned! Now feel free to make use of it.
This are the most common virsh commands I use to manage the kvm guests, where domain is simply the name of the targeted guest and FILE the name of a XML file. Remember that libvirt supports other virtualization infrastructure as well (Xen, VMware, QEMU). Most of the options are self-explanatory. With ‘virsh create’ starting a transient domain, that will disappear after shutdown, and the define/start combo resulting in a persistent domain that will even survive host restarts
virsh create _FILE_ # create domain from xml file
virsh destroy _domain_ # forcefully remove domain
virsh define _FILE_ # define domain from xml file
virsh undefine _domain_ # undefine domain
virsh suspend _domain_ # stop all scheduling
virsh resume _domain_ # start scheduling
virsh start _domain_ # power on domain
virsh shutdown _domain_ # send corresponding ACPI signal to guest
virsh edit _domain_ # edit xml config in place
virsh autostart _domain_ # set autostart flag
virsh autostart _domain_ --disable # unset autostart flag
virsh list [--all] [--autostart] # list defined/active/autostart domains
Most of the guest systems will require some sort of storage. Creating a fresh qcow2 image, to back our virtual disk, is as simple as running:
qemu-img create -f qcow2 milky.img 200G
Do not worry, the image will only take a fraction of the declared space, and will not grow larger than necessary, due to trimming, which will be explained later.
Domains, this is how libvirt calls our ‘guests’, can be defined in XML formatted files. This is my minimalistic defintion of the domain ‘milkman’ carrying 8GB RAM and 4 CPUs:
<domain type='kvm'>
<name>milkman</name>
<uuid>504d80ee-1427-11e8-9861-0708f4830f96</uuid>
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<vcpu>4</vcpu>
<os>
<type>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' discard='unmap' />
<source file='/home/miguel/KVM/images/milky.img'/>
<target dev='sda' bus='scsi'/>
</disk>
<interface type='bridge'>
<source bridge='virbr1'/>
<model type='virtio'/>
</interface>
<controller type='scsi' index='0' model='virtio-scsi' />
<graphics type='vnc' port='55555' autoport='no' listen='::1' />
</devices>
</domain>
Beside the obvious RAM size and CPU count w specify the underlying qcow2 image, to be used for our emulated hard disk. We also want to specify discard=’unmap’ and make use of a virtio-scsi controller, both to allow trimming. Trimming will be covered in more detail later.
Our virtual machine relies on a virtual bridge virbr1. It is very important to use type=’virtio’ here. The defaults resulted in extremely poor network performance, at least in some of my particular use cases. The setup of the bridge with accompanying parameters is described in the next section about networking.
At the very last we tell the vnc-server to listen on ::1 at port 5555. This values can be also adjusted during run-time as explained later on.
In order to install an operating system we can add a virtual cd-rom along with an iso-image by augmenting the devices section in our XML defintion with the following lines:
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/home/miguel/KVM/isos/debian-9.3.0-amd64-netinst.iso'/>
<target dev='hdc' bus='ide'/>
<readonly/>
</disk>
Make sure to adapt the boot order in the os section by adding an appropriate line, so you end up with this:
Since my primary interface to the virtual machines is SSH, reliable network connectivity is one of the primary foci. IPv4 addresses became scarse so we will not waste any for the host systems virbr1 or eth0. The following diagram illustrates my IPv4 setup of a simple arp proxy utilizing ipv4 forwarding. The guests use their public ipv4 addreses and the ips of the hosts gateway.
There is no need to save address space in case of IPv6 since we have a complete /64 IPv6 subnet at our disposal. While only a few guests are accessible by their IPv4 public addresses directly, we have virtually an infinite number of IPv6 addresses. Sidenote: One single /64 IPv6 subnet consists of 2^64 different addresses, which is over four billion times more than there are IPv4 addresses in the whole world! I use just the lower /65 half of our /64 subnet for the guests while the IPv6 address of the hosts NIC lies in the upper half.
My IPv6 setup in /etc/network/interface goes along this lines:
#/etc/network/interfaces
iface eth0 inet6 static
address2a01:6a8:122:5622:8000::88/128
gateway fe80::1
iface virbr1 inet6 static
pre-up brctl addbr virbr1
address 2a01:6a8:122:5622::3/65
All we need to do is activate IPv6 forwarding on the host to let our guests communicate with the world outside.
sysctl -w net.ipv6.conf.all.forwarding=1
And this is how the IPv6 config of a particular guest looks like:
Things could be improved further by running a DHCP server, like dnsmasq, to assign the guest addresses, but for now I want to keep it simple.
While ssh is perfectly sufficient for most of the time, you sometimes might need to have a look at the frame-buffer console. You can start/stop listening on a specific port or interface with:
sudo virsh qemu-monitor-command <guest_name> --hmp change vnc <listen_ip>:<port>
sudo virsh qemu-monitor-command <guest_name> --hmp change vnc none
Interestingly the port is offset by 5900 meaning that e.g. :87 will let the vnc-server listen on port 5987! Check it with netstat -tulpn to be sure.
One of the beautiful things about using virtual machines is the level of control we have over them. We can for instance backup our running machines with almost no downtime using the following approach:
dump config to xml file save kvm state (RAM etc.) and stop the guest. create an overlay on the underlying qcow2 disk image. restore the kvm on the overlay. backup the original disk image. commit deltas from overlay to the image. switch to the image with merged changes and delete deltas.
A downtime will be experienced only between the save and restore steps, while the most time consuming part of the process, backing up the disk, can be delayed. The XML, RAM state and HDD snapshot contain all the data required to re-spawn an identical consistent copy of our virtual machine, as at the time of the backup. NOTE: the clock might cause problems if not adjusted, if some applications rely on it. Ntp can take care of that. A fast and dirty implementation of this technique, for my particular setup, can be found on our gitweb [2]. A more complete but complex solution is Daniel Berteaud’s perl script [3], which I frankly did not test myself.
A common use case is to run docker inside the virtual guests, which makes it an integral part of my ‘KVM Adventures’. I prefer to remap docker’s root user to a non-privileged user of my host, as well as utilize syslog instead of the default json-file driver. This is reflected by the following config:
/etc/docker/daemon.json:
Optionally you can tell rsyslog to log deamon.* entries into a separate file and adjust logrotation as outlined here [5].
virt-host-validate # validate host virtualization setup
Use this RedHat virtio drivers when you install win10 [6].
Forward ipv4 to an ipv6 only:
socat TCP4-LISTEN:51247,fork,su=nobody TCP6:[2a01:4f8:192:5112::6]:51247
[1] https://libvirt.org/formatdomain.html
[2] https://gitweb.softwarefools.com/?p=miguel/kvm_tools.git
[3] http://repo.firewall-services.com/misc/virt/virt-backup.pl
[4] https://www.linux-kvm.org/page/Tuning_KVM
[5] https://www.wolfe.id.au/2015/05/03/syslog-logging-driver-for-docker/
[6] https://www.funtoo.org/Windows_10_Virtualization_with_KVM