www.idziorek.net | blog | contact

Miguel’s KVM Adventures

February 7, 2018

Some notes about my explorations of the World of KVM virtualization.

Abstract

In my never ending pursuit of abstraction and encapsulation I recently started integrating all of my services into docker containers and deploying them inside virtual kvm guests. This article presents my continues effort to summarize the findings of this ongoing journey, which implies that the following material might be subject to change anytime without notice. This collection of notes and sentimental thoughts comes without any warranty or implication of fitness for any purpose. You have been warned! Now feel free to make use of it.

Libvirt

This are the most common virsh commands I use to manage the kvm guests, where domain is simply the name of the targeted guest and FILE the name of a XML file. Remember that libvirt supports other virtualization infrastructure as well (Xen, VMware, QEMU). Most of the options are self-explanatory. With ‘virsh create’ starting a transient domain, that will disappear after shutdown, and the define/start combo resulting in a persistent domain that will even survive host restarts

    virsh create _FILE_                 # create domain from xml file
    virsh destroy _domain_              # forcefully remove domain
    
    virsh define _FILE_                 # define domain from xml file
    virsh undefine _domain_             # undefine domain 
    
    virsh suspend _domain_              # stop all scheduling
    virsh resume _domain_               # start scheduling
    
    virsh start _domain_                # power on domain
    virsh shutdown _domain_             # send corresponding ACPI signal to guest
    
    virsh edit _domain_                 # edit xml config in place
    
    virsh autostart _domain_            # set autostart flag
    virsh autostart _domain_ --disable  # unset autostart flag
    virsh list [--all] [--autostart]    # list defined/active/autostart domains

Disk Image

Most of the guest systems will require some sort of storage. Creating a fresh qcow2 image, to back our virtual disk, is as simple as running:

    qemu-img create -f qcow2 milky.img 200G

Do not worry, the image will only take a fraction of the declared space, and will not grow larger than necessary, due to trimming, which will be explained later.

Domain Definition

Domains, this is how libvirt calls our ‘guests’, can be defined in XML formatted files. This is my minimalistic defintion of the domain ‘milkman’ carrying 8GB RAM and 4 CPUs:

    <domain type='kvm'>
    
    <name>milkman</name>
    <uuid>504d80ee-1427-11e8-9861-0708f4830f96</uuid>
    
    <memory unit='KiB'>8388608</memory>
    <currentMemory unit='KiB'>8388608</currentMemory>
    <vcpu>4</vcpu>
    
    <os>
        <type>hvm</type>
        <boot dev='hd'/>
    </os>
    
    <features>
        <acpi/>
    </features>
    
    <clock offset='utc'/>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    
    <devices>
    
        <emulator>/usr/bin/kvm</emulator>
    
        <disk type='file' device='disk'>
        <driver name='qemu' type='qcow2' discard='unmap' />
        <source file='/home/miguel/KVM/images/milky.img'/>
        <target dev='sda' bus='scsi'/>
        </disk>
    
        <interface type='bridge'>
        <source bridge='virbr1'/>
        <model type='virtio'/>
        </interface>
    
        <controller type='scsi' index='0' model='virtio-scsi' />
    
        <graphics type='vnc' port='55555' autoport='no' listen='::1' />
    
    </devices>
    
    </domain>

Beside the obvious RAM size and CPU count w specify the underlying qcow2 image, to be used for our emulated hard disk. We also want to specify discard=’unmap’ and make use of a virtio-scsi controller, both to allow trimming. Trimming will be covered in more detail later.

Our virtual machine relies on a virtual bridge virbr1. It is very important to use type=’virtio’ here. The defaults resulted in extremely poor network performance, at least in some of my particular use cases. The setup of the bridge with accompanying parameters is described in the next section about networking.

At the very last we tell the vnc-server to listen on ::1 at port 5555. This values can be also adjusted during run-time as explained later on.

In order to install an operating system we can add a virtual cd-rom along with an iso-image by augmenting the devices section in our XML defintion with the following lines:

    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/miguel/KVM/isos/debian-9.3.0-amd64-netinst.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
    </disk>

Make sure to adapt the boot order in the os section by adding an appropriate line, so you end up with this:

    <os>
      <type>hvm</type>
      <boot dev='cdrom'/>
      <boot dev='hd'/>
    </os>

Networking

Since my primary interface to the virtual machines is SSH, reliable network connectivity is one of the primary foci. IPv4 addresses became scarse so we will not waste any for the host systems virbr1 or eth0. The following diagram illustrates my IPv4 setup of a simple arp proxy utilizing ipv4 forwarding. The guests use their public ipv4 addreses and the ips of the hosts gateway.

There is no need to save address space in case of IPv6 since we have a complete /64 IPv6 subnet at our disposal. While only a few guests are accessible by their IPv4 public addresses directly, we have virtually an infinite number of IPv6 addresses. Sidenote: One single /64 IPv6 subnet consists of 2^64 different addresses, which is over four billion times more than there are IPv4 addresses in the whole world! I use just the lower /65 half of our /64 subnet for the guests while the IPv6 address of the hosts NIC lies in the upper half.

My IPv6 setup in /etc/network/interface goes along this lines:

    #/etc/network/interfaces
     
    iface eth0 inet6 static
        address2a01:6a8:122:5622:8000::88/128
        gateway fe80::1
     
    iface virbr1 inet6 static
        pre-up brctl addbr virbr1
        address 2a01:6a8:122:5622::3/65

All we need to do is activate IPv6 forwarding on the host to let our guests communicate with the world outside.

    sysctl -w net.ipv6.conf.all.forwarding=1

And this is how the IPv6 config of a particular guest looks like:

    iface ens3 inet6 static
      address  2a01:6a8:122:5622::13/65
      gateway  2a01:6a8:122:5622::3

Things could be improved further by running a DHCP server, like dnsmasq, to assign the guest addresses, but for now I want to keep it simple.

VNC

While ssh is perfectly sufficient for most of the time, you sometimes might need to have a look at the frame-buffer console. You can start/stop listening on a specific port or interface with:

    sudo virsh qemu-monitor-command <guest_name> --hmp change vnc <listen_ip>:<port>
    sudo virsh qemu-monitor-command <guest_name> --hmp change vnc none

Interestingly the port is offset by 5900 meaning that e.g. :87 will let the vnc-server listen on port 5987! Check it with netstat -tulpn to be sure.

Backup Running KVM

One of the beautiful things about using virtual machines is the level of control we have over them. We can for instance backup our running machines with almost no downtime using the following approach:

dump config to xml file save kvm state (RAM etc.) and stop the guest. create an overlay on the underlying qcow2 disk image. restore the kvm on the overlay. backup the original disk image. commit deltas from overlay to the image. switch to the image with merged changes and delete deltas.

A downtime will be experienced only between the save and restore steps, while the most time consuming part of the process, backing up the disk, can be delayed. The XML, RAM state and HDD snapshot contain all the data required to re-spawn an identical consistent copy of our virtual machine, as at the time of the backup. NOTE: the clock might cause problems if not adjusted, if some applications rely on it. Ntp can take care of that. A fast and dirty implementation of this technique, for my particular setup, can be found on our gitweb [2]. A more complete but complex solution is Daniel Berteaud’s perl script [3], which I frankly did not test myself.

Docker Containers

A common use case is to run docker inside the virtual guests, which makes it an integral part of my ‘KVM Adventures’. I prefer to remap docker’s root user to a non-privileged user of my host, as well as utilize syslog instead of the default json-file driver. This is reflected by the following config:

/etc/docker/daemon.json:

    {
                  "userns-remap": "miguel",
                  "log-driver": "syslog"
    }

Optionally you can tell rsyslog to log deamon.* entries into a separate file and adjust logrotation as outlined here [5].

Miscellaneous

    virt-host-validate # validate host virtualization setup

Use this RedHat virtio drivers when you install win10 [6].

Forward ipv4 to an ipv6 only:

socat TCP4-LISTEN:51247,fork,su=nobody TCP6:[2a01:4f8:192:5112::6]:51247

References

[1] https://libvirt.org/formatdomain.html
[2] https://gitweb.softwarefools.com/?p=miguel/kvm_tools.git
[3] http://repo.firewall-services.com/misc/virt/virt-backup.pl
[4] https://www.linux-kvm.org/page/Tuning_KVM
[5] https://www.wolfe.id.au/2015/05/03/syslog-logging-driver-for-docker/
[6] https://www.funtoo.org/Windows_10_Virtualization_with_KVM