puppetising kvm on centos7

Yak shaving

I found out about a bunch of stuff today, including the now 19-year-old term yak shaving. Clearer examples out there such as here, here and here.

I was trying to roll creation of KVM virtual machines into Puppet (that is, bring my existing use of KVM at home under management) and in unpicking my existing setup and trying to puppetise the key bits, I quickly found all sorts of cul-de-sacs and alternative roads opening up.

Seem to me that there’s two flavours of yak shaving:

  1. When you find that you can’t get a thing done because of the all the dependencies that cascade off of it.
  2. When you lose focus on getting a thing done because you’re trying to do it ‘properly.’

It takes discipline to sort the diversions into ‘must have’ and ‘nice to have.’

So, for example, I’m currently using raw disk files created with ‘dd’.  There are other ways to do storage, not least storage pools. The goal, though, is minimum viable product. So, ‘dd’ it is. Put other disk formats and storage methodologies in the backlog!

Reference

Network XML definition.

Virtual Machine XML definition.

Error: could not find capabilities

error: invalid argument: could not find capabilities for arch=x86_64 domaintype=kvm

Why does our world have to be filled with errors that define problems but not solutions.  Or even define problems in a manner that doesn’t even point to a solution.  Just go ‘Error 0xde67af. Google for it.’ and be done with it.

This one means KVM is turned off in the BIOS. This is a much better error:

$ dmesg | grep kvm
kernel: kvm: disabled by bios

If I was going to build a fact to detect this problem, I’d consume the following (showing the system after I’d fixed the BIOS setting.)

# virt-host-validate qemu
QEMU: Checking for hardware virtualization : PASS
QEMU: Checking if device /dev/kvm exists : PASS
QEMU: Checking if device /dev/kvm is accessible : PASS
QEMU: Checking if device /dev/vhost-net exists : PASS
QEMU: Checking if device /dev/net/tun exists : PASS
QEMU: Checking for cgroup 'memory' controller support : PASS
QEMU: Checking for cgroup 'memory' controller mount-point : PASS
QEMU: Checking for cgroup 'cpu' controller support : PASS
QEMU: Checking for cgroup 'cpu' controller mount-point : PASS
QEMU: Checking for cgroup 'cpuacct' controller support : PASS
QEMU: Checking for cgroup 'cpuacct' controller mount-point : PASS
QEMU: Checking for cgroup 'cpuset' controller support : PASS
QEMU: Checking for cgroup 'cpuset' controller mount-point : PASS
QEMU: Checking for cgroup 'devices' controller support : PASS
QEMU: Checking for cgroup 'devices' controller mount-point : PASS
QEMU: Checking for cgroup 'blkio' controller support : PASS
QEMU: Checking for cgroup 'blkio' controller mount-point : PASS
QEMU: Checking for device assignment IOMMU support : WARN (No ACPI IVRS table found, IOMMU either disabled in BIOS or not supported by this hardware platform)

 

Networking

problem

My existing hand rolled KVM setup is on RHEL6 and the networking comprises

  • eth0 with no ip address, ifcfg-eth0 specifying BRIDGE=br0
  • br0 (a linux bridge interface) with the IPv4 definition for the physical host
  • I’d bound a virtual IP to br0; the server used to do more than host VMs.
  • virbr0 which comes with libvirtd and isn’t in use.  There’s a virbr0-nic as well, which I can’t find config for, but I assume is related.

I don’t want to replicate this setup because

  • It’s hard to puppetise.  I’ve not yet got network manager under management (and don’t want to shave that particular yak yet) and it would involve moving the external address for the running system between interfaces.
  • Puppet/libvirt can manage bridge interfaces, so it’s be good to do it all there.

Options

A fillet of the documentation.

  • You have to build VM networking on the back of a bridge device.
  • This will act like a switch or network inside the physical host; the VMs can talk to each other over it.
  • The device can be configured with a number of forward modes
    • isolated: the default if <forward> is omitted
    • nat:
      • default if <forward> is included but mode is omitted.
      • A separate IP range will be used on the internal network and (IPv4) traffic to the wider network will go via the physical host’s IP address using network address translation.
      • IPv6 traffic is routed.
      • Connectivity between guests and between the hosts and the guest are not restricted.
      • Firewall rules can/will interfere with connections to guests from outside. Doc (link above) not clear on behaviour if firewall is turned off.
      • specifying the device (dev attribute) will result in firewall only allowing outgoing traffic via that interface.
    • route
      • guest network traffic is routed to the physical network without NATing.
      • dev attribute will restrict outgoing traffic to named device.
      • Sessions in and out to guests are unrestricted.
    • open
      • like route, but with no ability to firewall traffic. ‘dev’ attribute cannot be set.
    • bridge
      • An existing host bridge or Open vSwitch bridge has been defined outside libvirt, or direct connection via macvtap bridge mode via existing interface(s).
      • Guest appears to be directly connected to the interface.
    • private
      • A macvtap ‘direct’ connection is used to connect each guest to the network
      • A physical interface is taken by the guest if using 802.1Qbh in conjunction with a switch that supports this.
    • vepa
      • Requires a vepa capable switch.
      • Guests are connected direct to interfaces using macvtap ‘direct’.
    • passthrough
      • macvtap ‘direct’ connection in passthrough mode; guests will monopolise interfaces.
    • hostdev
      • PCI passthrough of a physical device.

Documented recommendations

  • nat, route, open, and isolated network types: name the bridge device with prefix virbr

When libvirtd is fired up, you typically get virbr0 pop up with a definition of:

<ip address="192.168.122.1" netmask="255.255.255.0">
  <dhcp>
    <range start="192.168.122.2" end="192.168.122.254" />
  </dhcp>
</ip>

So, that’s consistent; ‘forward’ specified but no mode, so you get NAT.

Puppet

I’m using the cirrax/libvirt module which supports ubuntu/debian and flavours of RHEL7.  I started by running the example, and then gradually abstracted it to hiera.

Their simple example:

libvirt::network { 'net-simple':
    forward_mode   => 'bridge',
    bridge         => 'br-simple',
}

I read that as bridge mode consuming a device defined outside libvirt. Puppet/libvirt processed this definition but it doesn’t start a network, and guests can’t use it:

error: Cannot get interface MTU on 'br-simple': No such device

I think I want to use route or open. I want the machines on the network.

So, something like this:

libvirt::network { 'lan':
  autostart    => true,
  forward_dev  => 'enp9s0',
  forward_mode => 'route',
  bridge       => 'virbr0',
}

Idempotency

The module is only very coarsely idempotent. If, for example, you’ve defined a guest with the wrong network, it won’t fix it.

virsh destroy testguest
virsh undefine testguest
# run puppet

 

Dealing with virbr0

# virsh net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes

/etc/libvirt is the place to find the persistent config.

Two ways to deal with this. I recommend disabling it, as it potentially sets up firewalld rules as well, and these might get left behind to meddle with a future virbr0.

Delete it:

# virsh net-destroy default
# virsh net-undefine default

Disable it:

# virsh net-destroy default
# virsh net-autostart default --disable

I’ve done that with execs in puppet. (using unless/onlyif to make it idempotent.)

The one where we find it’s harderer than it looks

‘route’ doesn’t work as configured using the puppet code above.

The module generates XML on the fly, and passes that to the relevant virsh subcommands. When it goes wrong, the XML is dropped into the puppet run output.  This is useful, because it ties into the libvirt documentation (linked and ‘summarised’ above.)

<network>
  <name>lan</name>
  <forward dev='enp9s0' mode='route'/>
  <bridge name='virbr0'/>
</network>

route forwarding requested, but no IP address provided for network 'lan'

Take #2

libvirt::network { 'lan':
  autostart    => true,
  forward_dev  => 'enp9s0',
  forward_mode => 'route',
  bridge       => 'virbr0',
  ip_address   => '192.168.1.13',
  ip_netmask   => '255.255.255.0',
}

<network>
  <name>lan</name>
  <forward dev='enp9s0' mode='route'/>
  <bridge name='virbr0'/>
  <ip address='192.168.1.13' netmask='255.255.255.0'>
  </ip>
</network>

error: internal error: Network is already in use by interface enp9s0

Take #3

libvirt::network { 'lan':
  autostart    => true,
  forward_dev  => 'enp9s0',
  forward_mode => 'open',
  bridge       => 'virbr0',
}

<network>
  <name>lan</name>
  <forward mode='open'/>
  <bridge name='virbr0'/>
</network>

error: XML error: open forwarding requested, but no IP address provided for network 'lan'

Take #4: ‘open’ doesn’t work with an IP on the same network either.

 

So, can’t create a bridge without an IP address, and can’t create a bridge on same network as existing interface.  Which means that to puppetise this, have to do one of

  • Build the machine with the ethernet device unconfigured, and the system IP bound to a bridge device, or
  • Multiple networks are needed;  VLANs and Open vSwitch perhaps.
  • Switch them round somehow.

Bleugh.  Yak hair.

Ugly ..

Conclusion: got to shift host networking onto a bridge device. Which is exactly how I’ve found my existing server set up.

I currently configure networking as part of kickstart, so doing the bridge at build time might be a nicer way to do it, but the following will probably be my permanent fix.

Hat tips:  @lzap, from libvirt.org, Redhat, and here 3.3.8.

This is the ugly way I did it in puppet:

  file { "/usr/local/bin/kvm_net_swap.sh":
    ensure  => present,
    mode    => '0744',
    owner   => root,
    group   => root,
    content => template("profile/kvm/kvm_net_swap.sh.erb"),
  } ->
  exec {'change to bridged networking':
    command   => "/bin/echo -e 'please run /usr/local/bin/kvm_net_swap.sh' ; /bin/false",
    creates   => "/sys/devices/virtual/net/br0",
    logoutput => true,
  }

And here’s the script that gets deployed.

Limitations

  • If you have multiple IPs on the existing interface, it’ll only take the first.
  • If you have multiple DNS servers, it’ll only take the first.
  • Network manager manages the domain/search entry in /etc/resolv.conf; I had an entry bound to the physical interface to make-this-so, that gets stripped out by this.  My puppet code puts it back, but you may lose stuff you wanted.
  • IPv6 isn’t supported.
#!/bin/bash

usage () {
  echo "usage: $0 eth0" 1>&2
  echo "run as root; supply existing (non wifi) interface " 1>&2
  echo "to get wrapped with a bridge" 1>&2
  exit 1
}
  # validation of param, run as root
[ "x" = "x$1" ] && usage
[ "root" != "$(whoami)" ] && usage

BASEIF=$1
BRIDIF="br0"

if /bin/nmcli -t dev show "$BASEIF" 2>&1 | grep -q 'Error: Device .* not found' ; then
  echo "nmcli cannot find device $BASEIF" 1>&2
  usage
else
  CURRIP="$(/bin/nmcli -t dev show "$BASEIF" | grep '^IP4.ADDRESS\[1\]:' | awk -F: '{ print $2 }')"
  CURRGW="$(/bin/nmcli -t dev show "$BASEIF" | grep '^IP4.GATEWAY:' | awk -F: '{ print $2 }')"
  CURRNS="$(/bin/nmcli -t dev show "$BASEIF" | grep '^IP4.DNS\[1\]:' | awk -F: '{ print $2 }')"
    # expect this to have spaces in it
  CURRCN="$(/bin/nmcli -t dev show "$BASEIF" | grep '^GENERAL.CONNECTION:' | awk -F: '{ print $2 }')"

  [ "x" = "x$CURRIP" ] && echo "failed to get current IP" 1>&2 && exit 1
  [ "x" = "x$CURRGW" ] && echo "failed to get current gateway" 1>&2 && exit 1
  [ "x" = "x$CURRNS" ] && echo "failed to get current DNS" 1>&2 && exit 1
  [ "x" = "x$CURRCN" ] && echo "failed to get current connection name" 1>&2 && exit 1

  set -x
  /bin/nmcli con delete "$CURRCN" || exit 1
  /bin/nmcli con add type bridge stp off autoconnect yes \
      ifname "$BRIDIF" con-name "$BRIDIF" ip4 "$CURRIP" gw4 "$CURRGW" || exit 1
  /bin/nmcli con mod "$BRIDIF" ipv4.dns "$CURRNS" || exit 1
  /bin/nmcli con add type bridge-slave autoconnect yes \
      ifname "$BASEIF" con-name "$BASEIF" master "$BRIDIF" || exit 1

  /bin/systemctl restart NetworkManager
fi

.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s