Virtual Machines Server

by Sebastien Mirolo on Sat, 8 Jan 2011

Ubuntu 10.10 comes with support for Virtual Machines. As I started to port fortylines testing infrastructure to a new more quiet machine, it is a good time to start playing around with virtual machine provisioning and decommissioning.

Ubuntu's default virtual machine infrastructure is built around KVM. The Ubuntu wiki has some information on booting a virtual image from an UEC Image. A blog post titled "Setting up virtualization on Ubuntu with KVM" contains also a lot of useful information. After browsing around, the tools to get familiar with include vmbuilder, kvm, qemu, cloud-init and eucalyptus.

$ which kvm
/usr/bin/kvm
$ which qemu-system-x86_64
/usr/bin/qemu-system-x86_64

First, the easiest seems to try booting from a pre-built image. So I downloaded the current uec image and looked forward to boot the virtual machine from it. I tried the amd64 unsuccessfully and the i386 image goes through the same set of errors: "Could not initialize SDL" (add -curses option), then "General error mounting filesystems" (replave if=virtio by if=scsi,bus=0,unit=6). Finally I got a login prompt running the following commands.

# download UEC image
wget http://uec-images.ubuntu.com/server/maverick/20101110/maverick-server-uec-i386.tar.gz
mkdir maverick-server-uec-i386
cd maverick-server-uec-i386
tar zxvf maverick-server-uec-i386.tar.gz
chmod 444 maverick-server-uec-i386*

# booting the virtual machine from the UEC image
kvm -drive file=maverick-server-uec-i386.img,if=scsi,bus=0,unit=6,boot=on \ -kernel "maverick-server-uec-i386-vmlinuz-virtual" \ -append "root=/dev/sda ec2init=0 ro init=/usr/lib/cloud-init/uncloud-init \ ds=nocloud ubuntu-pass=ubuntu" -net nic,model=virtio \ -net "user,hostfwd=tcp::5555-:22" -snapshot -curses

The idea behind the Ubuntu virtual machine support investigation is to run nightly mechanical builds on a virtual machine. The virtual machine is provisioned with a standard EUC image, the build is performed, installing prerequisites as necessary, the generated log is communicated back to the forum server and the virtual machine decommissioned.

The two main issues to be solved are starting the automatic build in the virtual machine, communicating the log back to forum server. A third issue not directly related to the cloud infrastructure is to run a sudo command on the virtual instance through a batch script.

The documentation and the kernel command line hint at a "xupdate=" option to the /usr/lib/cloud-init/uncloud-init init process. I thus mounted the disk image and starting digging through the uncloud-init script to find clues on how it could be useful for my purpose.

mkdir image
losetup /dev/loop2 maverick-server-uec-i386.img
mount /dev/loop2 image
less image/usr/lib/cloud-init/uncloud-init
...
 if [ -d "${mp}/updates" ]; then
      rsync -av "${mp}/updates/" "/" ||
               { log FAIL "failed rsync updates/ /"; return 1; }
fi
if [ -d "${mp}/updates.tar" ]; then
        tar -C / -xvf "${mp}/updates.tar" ||
                { log FAIL "failed tar -C / -xvf ${mp}/updates.tar"; return 1; }
fi
script="${mp}/updates.script"
if [ -f "${script}" -a -x "${script}" ]; then
        MP_DIR=${mp} "${mp}/updates.script" ||
                { log FAIL "failed to run updates.script"; return 1; }
fi
...

The uncloud-init script is designed to customize a virtual instance before the system fully boots and becomes operational, thus it is no surprise that xupdate mechanism cannot be used for starting the build process. It seems we will have to login into the instance and run the build process

For our purpose of a mechanical build system, it is possible to run virtual instances without bringing up an ssh server. Once the build is finished, we could mount the disk image through a loopback device on the host and retrieve the files from the mounted drive. That requires to add an entry like the following in /etc/fstab. Some blogs suggest to use autofs instead but I haven't been able to get it to work properly nor do I understand how it gets rid of the "mount as root" requirement.

/var/images/build-uec-i386.img /mnt/images/build auto ro,user,noauto,loop 0 0

Once the virtual machines are not provisioned locally but rather spawn into the cloud, that approach does not work anymore. So we might look into using the virtual instance ssh server to transfer logs around. All that is required is to copy the build master controller ssh public key into the virtual instance ubuntu account authorized_keys file, something that can be done by uncloud-init through the xupdate mechanism. So we create a custom update disk as follow.

mkdir -p overlay/updates
# ... set subdirectory structure to match the updated root ...
genisoimage -rock --output updates.iso overlay
qemu-img create -f qcow2 -b maverick-server-uec-i386.img disk.img
# This command works but still prompts for login.
kvm -drive file=disk.img,if=scsi,bus=0,unit=5,boot=on \
  -drive file=updates.iso,if=scsi,bus=1,unit=6 \
  -kernel "maverick-server-uec-i386-vmlinuz-virtual" \
  -append "root=/dev/sda ro init=/usr/lib/cloud-init/uncloud-init \
  ds=nocloud ubuntu-pass=ubuntu xupdate=sdb:mnt" \
  -net nic,model=virtio -net "user,hostfwd=tcp::5555-:22" -nographic

The dws script needs to communicate to the source control repository through the Internet. I found out that the edition of /etc/network/interfaces is unnecessary once you install libvirt. Despite some posts around the web, it seems the virtual bridge is only necessary to access the virtual machine from outside the host if either.

sudo aptitude install libvirt0

Ubuntu 10.10 had already done it as part of the installation for me as shown through the ifconfig command.

...
virbr0    Link encap:Ethernet  HWaddr 9e:66:65:fc:97:5b  
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
...

Two commands require sudo access, apt-get and shutdown. We use apt-get to install system prerequisites and shutdown to cleanly stop the virtual machine. We thus add the following two lines to the /etc/sudoers file. The batch script can then execute both commands without prompting for a password.

%admin ALL = NOPASSWD: /sbin/shutdown
%admin ALL = NOPASSWD: /usr/bin/apt-get

Once the virtual machine and the ssh server is started, it is then possible to execute the build script on the guest, copy the log file and shutdown the virtual machine in three successive ssh commands.

ssh -p 5555 ubuntu@localhost /home/ubuntu/bin/dkicks
scp -P 5555 -r ubuntu@localhost:/home/ubuntu/log .
ssh -p 5555 ubuntu@localhost /sbin/shutdown -P 0

The following python code can be used to wait until the ssh server responds.

def waitUntilSSHUp(hostname,login=None,port=22,timeout=120):
    '''wait until an ssh connection can be established to *hostname*
    or the attempt timed out after *timeout* seconds.'''
    import time

    up = False
    waited = 0
    sshConnect = hostname
    if login:
        sshConnect = login + '@' + hostname
    while not up and (waited <= timeout):
        time.sleep(30)
        waited = waited + 30
        cmd = subprocess.Popen(['ssh',
                                '-o', 'BatchMode yes',
                                '-p', str(port),
                                sshConnect,
                                'echo'],
                               stdout=subprocess.PIPE,
                               stderr=subprocess.STDOUT)
        cmd.wait()
        if cmd.returncode == 0:
            up = True
        else:
            sys.stdout.write("waiting 30 more seconds (" \
                                 + str(waited) + " so far)...\n")
    if waited > timeout:
        raise Error("ssh connection attempt to " + hostname + " timed out.")

As it turns out, the build script is running out of space while installing all the prerequisites and compiling the repository. The original disk image (1.4Gb) seems to small for that purpose.

There seem to be three solutions to this problem.

  • Find a base image with a bigger disk
  • Create a new image with a bigger disk
  • Increase the size of the disk on the original disk image

As the next steps in our vm mechanical build project consist of running centOS disk images, it is a good time to start investigating running EC2 images locally. Apparently, there is a large library of those and we should find a public one that is sized correctly for our purpose. Looking around on the web, there is a lot of documentation creating and uploading EC2 images but I couldn't find relevant information on downloading a public image and running it locally in kvm. I was looking for something as simple as a url to a disk image but no luck so far.

To increase the size of disk image, the most common solution consists of concating two raw files together and update the partition table. The partition update part looks like a lot of complexity to code in a batch system. Currently we are using an update disk to customize the default disk image and now we also need to resize it which seem tricky enough. So I looked into building an image with vm-builder. Apparently that is how the UEC image I used earlier was put together.

$ aptitude search vm-builder
p   python-vm-builder                        - VM builder
p   python-vm-builder-ec2                    - EC2 Ubuntu VM builder
p   ubuntu-vm-builder                        - Ubuntu VM builder

I am not yet certain vmbuilder will also provide a mean to create CentOS images or if I will need a different tool for that purpose. None-the-less, let's start there for now.

sudo vmbuilder kvm ubuntu --rootsize=8192

The ubuntu-kvm directory was created with two files in it: run.sh, a shell script to with the kvm invoke command and a tmp78iihO.qcow2 file of 389Mb, the system disk image. Let's launch the image and see what's in it.

cd ubuntu-kvm && ./run.sh

Using the "ubuntu" login and "ubuntu" password, I am able to to get a shell prompt.

$ df -h
Filesystem  Size Used Avail Use% Mounted on
/dev/sda1   7.6G 482M  6.7G   7% /
...  
$ ps aux | grep sshd
$ find /etc -name 'sshd*'

So we have a bootable image with 6.7G of space available. The sshd daemon is not running nor installed and most likely the scripts necessary to make a copy of that image unique in the cloud are not there either. Let's add our modifications to run the build script first, see how far it goes.

$ sudo aptitude install openssh-server
$ mkdir -p /home/ubuntu/bin
$ mkdir -p /home/ubuntu/.ssh
$ sudo vi /etc/sudoers
  # Defaults
+ # Preserve environment variables such that we do not get the error message: 
+ # "sorry, you are not allowed to set the following 
+ #  environment variables: DEBIAN_FRONT"
+ Defaults        !env_reset

  # Members of the admin group may gain root privileges
  %admin ALL=(ALL) ALL
+ %admin ALL = NOPASSWD: /sbin/shutdown
+ %admin ALL = NOPASSWD: /usr/bin/apt-get
$ sudo shutdown -P 0
> kvm -drive file=tmp78iihO.qcow2 \
  -net nic,model=virtio -net "user,hostfwd=tcp::5555-:22"

The previous command hangs the virtual machine in start-up while the following command does not permit to ssh into the virtual machine.

> kvm -drive file=tmp78iihO.qcow2 -net "user,hostfwd=tcp::5555-:22" &
> ssh -v -p 5555 ubuntu@localhost
...
ssh_exchange_identification: Connection closed by remote host

There are apparently more to vmbuilder that the documentation suggests to build an equivalent image to the one I originaly used...

Looking through the vmbuilder source repository I found a README.files in automated-ec2-builds that mentioned a uec-resize-image script.

sudo aptitude install bzr
bzr branch lp:~ubuntu-on-ec2/vmbuilder/automated-ec2-builds

I might actually be able to resize my original image with a single command after all.

aptitude search *uec*
bzr branch lp:~ubuntu-on-ec2/ubuntu-on-ec2/uec-tools
ls uec-tools/resize-uec-image
sudo install -m 755 uec-tools/resize-uec-image /usr/local/bin

Let's use the resize script and check the free space on our new image.

> resize-uec-image maverick-server-uec-i386.img 5G
> ls -la maverick-server-uec-i386.img
-rw-r--r--  5368709120 2011-01-03 07:28 maverick-server-uec-i386.img
>kvm -drive file=maverick-server-uec-i386.img,if=scsi,bus=0,unit=6,boot=on \
    -kernel "maverick-server-uec-i386-vmlinuz-virtual" \
    -append "root=/dev/sda ec2init=0 ro \
    init=/usr/lib/cloud-init/uncloud-init ds=nocloud \
    ubuntu-pass=ubuntu" -net nic,model=virtio \
    -net "user,hostfwd=tcp::5555-:22" -snapshot -curses
$ df -h
Filesystem  Size Used Avail Use% Mounted on
/dev/sda1   5.0G 516M  4.2G  11% /
...

Finally, I managed to get a script that starts a virtual machine, runs the mechanical build end-to-end and copies the build logs back out. It is a good start but there remains a few issues with the current approach. The cloud-init script re-enables ssh password authentication after it updates sshd_config with our version.

# /usr/lib/cloud-init/uncloud-init
...
pa=PasswordAuthentication
sed -i "s,${pa} no,${pa} yes," /etc/ssh/sshd_config 2>/dev/null &&
        log "enabled passwd auth in ssh" ||
        log "failed to enable passwd ssh"
...

The IP address of our virtual machine is always the same but uncloud-init will generate different ssh server keys every time we run our build on fresh virtual-machine script. That creates identification issues for the ssh client that the "StrictHostKeyChecking no" parameter does not always solve.

I looked quickly through virsh and eucalyptus. It seems each running instance needs to be registered with a global store, i.e. requires sudo access on the host. Those tools do not seem suited for the kind of thirty minute life span virtual machine (start, build, throw away) I need.

It took way longer than I anticipated to figure the pieces out and it surely a long and winding road ahead before we have a simple to use cloud-based build infrastructure.

by Sebastien Mirolo on Sat, 8 Jan 2011