passt/contrib/kata-containers
Stefano Brivio 20d271b226 contrib: Introduce PoC for Kata Containers with user-mode networking
passt can be used to implement user-mode networking for the Kata
Containers runtime, so that networking setup doesn't need elevated
privileges or capabilities.

This commit adds the patch for Kata Containers runtime and agent
to support passt as networking model and endpoint, and some basic
documentation.

See contrib/kata-containers/README.md for more details and setup
steps.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-28 18:51:50 +01:00
..
0001-virtcontainers-agent-Add-passt-networking-model-and-.patch contrib: Introduce PoC for Kata Containers with user-mode networking 2022-01-28 18:51:50 +01:00
README.md contrib: Introduce PoC for Kata Containers with user-mode networking 2022-01-28 18:51:50 +01:00

This document shows how to set up a Kata Containers environment using passt to implement user-mode networking: contrary to other networking models currently implemented, this kind of setup requires no elevated privileges or capabilities as far as networking is concerned.

This proof-of-concept uses CRI-O as implementation container runtime, which is controlled directly without resorting to a full Kubernetes environment.

Pre-requisites

  • Go and rust toolchains, typically provided by distribution packages
  • the usual tools, such as git, make, etc.
  • a 4.x qemu version, or more recent, with a working virtiofsd executable (provided at least by Debian, Ubuntu, Fedora packages)

Fetch and prepare components

CRI-O

CRI-O is the container runtime. It implements the Kubernetes CRI (Container Runtime Interface) on one side -- and we'll handle that part manually with crictl here, and on the other side it supports OCI (Open Container Initiative) runtimes -- Kata Containers is one of them.

Fetch

git clone https://github.com/cri-o/cri-o.git

Build

cd cri-o
make

Install

As root:

make install

Configure

Configuration is now at /etc/crio/crio.conf. This would also be the case for distribution packages. Some specific configuration items for Kata Containers are:

# Cgroup management implementation used for the runtime.
cgroup_manager = "cgroupfs"

# manage_ns_lifecycle determines whether we pin and remove namespaces
# and manage their lifecycle
manage_ns_lifecycle = true

and the following section, that can be added at the end, defines a special type of runtime, the vm type. This is needed to run the Kata Containers runtime instead of the default crun choice:

[crio.runtime.runtimes.kata]
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
runtime_type = "vm"
runtime_root = "/run/vc"

Note that we don't have a containerd-shim-kata-v2 binary yet, we'll deal with that in the next steps.

CNI plugins

CNI plugins are actually binaries, run by CRI-O, used to configure networking on the host as well as on the pod side. A few network topologies are offered, with very limited capabilities.

Fetch

git clone https://github.com/containernetworking/plugins

Build

cd plugins
./build_linux.sh

Install

As root:

mkdir -p /opt/cni/bin
cp bin/* /opt/cni/bin/

Configure

The path where CNI configurations are located is configurable in /etc/crio/crio.conf, see the network_dir parameter there. Assuming the default value, we need to provide at least one configuration under /etc/cni/net.d/. For example:

# cat /etc/cni/net.d/50-kata-sandbox.conf 
{
    "cniVersion": "0.3.0",
    "name": "crio-bridge",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "subnet": "10.88.0.0/16",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ]
    }
}

crictl

crictl is needed to control CRI-O in lieu of Kubernetes.

Fetch

git clone https://github.com/kubernetes-sigs/cri-tools.git

Build

cd cri-tools
make

Install

As root:

make install

mbuto

We'll use mbuto to build a minimal virtual machine image for usage with the Kata Containers runtime.

Fetch

git clone https://mbuto.lameexcu.se/mbuto

Kata Containers

Fetch

git clone https://github.com/kata-containers/kata-containers

Patch

The current upstream version doesn't support the passt networking model yet, use the patch from this directory to add it:

patch -p1 < 0001-virtcontainers-agent-Add-passt-networking-model-and-.patch

Build

make -C src/runtime
make -C src/agent LIBC=gnu

Install

As root:

make -C src/runtime install
cp src/agent/target/x86_64-unknown-linux-gnu/release/kata-agent /usr/libexec/
chmod 755 /usr/libexec/kata-agent

Build the Virtual Machine image

cd mbuto
./mbuto -f /tmp/kata.img

See mbuto -h for additional parameters, such as choice of kernel version, kernel modules, program add-ons, etc. mbuto will print some configuration parameters to be used in the configuration of the Kata Containers runtime below. For example:

$ ./mbuto -c lz4 -f /tmp/kata.img
Not running as root, won't keep cpio mounted
Size: bin   12M lib   59M kmod  1.4M total   70M compressed   33M
Kata Containers [hypervisor.qemu] configuration:

	kernel = "/boot/vmlinuz-5.10.0-6-amd64"
	initrd = "/tmp/kata.img"

Configure

The configuration file at this point is located at /usr/share/defaults/kata-containers/configuration-qemu.toml. Some parameters of general interest are:

[hypervisor.qemu]
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
initrd = "/tmp/kata.img"

where we can use the values indicated earlier by mbuto. Currently, the default path for the virtiofsd daemon doesn't work for all distributions, ensure that it matches. For example, on Debian:

virtio_fs_daemon = "/usr/lib/qemu/virtiofsd"

we'll then need to enable the passt networking model for the runtime. In the [runtime] section:

	internetworking_model=passt

Run an example container

Fetch

We'll now need an image of a container to run as example. With podman installed via distribution package, we can import one:

podman pull docker.io/i386/busybox

Configure

Now we can define configuration files for pod and container we want to create and start:

$ cat pod-config.json
{
    "metadata": {
        "name": "kata-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "logDirectory": "/tmp",
    "linux": {
    }
}

$ cat container-busybox.json
{
  "metadata": {
      "name": "kata-busybox"
  },
  "image": {
      "image": "docker.io/i386/busybox"
  },
  "command": [
      "sleep", "6000"
  ],
  "log_path":"kata-busybox.log",
  "linux": {
  }
}

Run the container workload

Assuming we have pod-config.json and container-busybox.json defined above, we can now:

start CRI-O

crio -l debug

create the pod and run a container inside it

c=$(crictl start $(crictl create $(crictl runp --runtime=kata pod-config.json) container-dpdk.json pod-config.json))

verify that addresses are properly configured

crictl exec $c ip ad sh

Enable support for ICMP/ICMPv6 Echo Request

passt can replicate ICMP Echo Requests sent by the workload, and propagate the replies back. However, as it's not running as root, we need to enable so-called ping sockets for unprivileged users. From the namespace created by CRI-O for this container:

sysctl -w net.ipv4.ping_group_range=net.ipv4.ping_group_range = 0 2147483647

Troubleshooting

Redirect qemu's console output to file

Agent errors and kernel messages should be accessible via named UNIX domain socket at /run/vc/vm/*/console.sock, provided agent.debug_console is enabled in kernel_params of configuration.toml but this won't work if the agent doesn't start. In order to get those, we can wrap qemu and get, additionally, all the output piped to a file:

$ cat /usr/local/bin/qemu.sh
#!/bin/sh

/usr/bin/qemu-system-x86_64 "$@" -serial file:/tmp/qemu.log 2>/tmp/qemu_err.log

now, use this as path for qemu in configuration.toml:

[hypervisor.qemu]
path = "/usr/local/bin/qemu.sh"

and don't forget to add console=ttyS0 to the kernel parameters, so that kernel messages will also be included:

kernel_params = "... console=ttyS0"

Debug console

See the kata-console script in the kata-vfio-tools repository for a convenient helper to access the debug console provided by the agent.