20d271b226
passt can be used to implement user-mode networking for the Kata Containers runtime, so that networking setup doesn't need elevated privileges or capabilities. This commit adds the patch for Kata Containers runtime and agent to support passt as networking model and endpoint, and some basic documentation. See contrib/kata-containers/README.md for more details and setup steps. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
302 lines
7.8 KiB
Markdown
302 lines
7.8 KiB
Markdown
This document shows how to set up a Kata Containers environment using passt to
|
|
implement user-mode networking: contrary to other networking models currently
|
|
implemented, this kind of setup requires no elevated privileges or capabilities
|
|
as far as networking is concerned.
|
|
|
|
This proof-of-concept uses CRI-O as implementation container runtime, which is
|
|
controlled directly without resorting to a full Kubernetes environment.
|
|
|
|
# Pre-requisites
|
|
|
|
* Go and rust toolchains, typically provided by distribution packages
|
|
* the usual tools, such as git, make, etc.
|
|
* a 4.x qemu version, or more recent, with a working virtiofsd executable
|
|
(provided at least by Debian, Ubuntu, Fedora packages)
|
|
|
|
# Fetch and prepare components
|
|
|
|
## CRI-O
|
|
|
|
CRI-O is the container runtime. It implements the Kubernetes CRI (Container
|
|
Runtime Interface) on one side -- and we'll handle that part manually with
|
|
`crictl` here, and on the other side it supports OCI (Open Container Initiative)
|
|
runtimes -- Kata Containers is one of them.
|
|
|
|
### Fetch
|
|
|
|
git clone https://github.com/cri-o/cri-o.git
|
|
|
|
### Build
|
|
|
|
cd cri-o
|
|
make
|
|
|
|
### Install
|
|
|
|
As root:
|
|
|
|
make install
|
|
|
|
### Configure
|
|
|
|
Configuration is now at `/etc/crio/crio.conf`. This would also be the case for
|
|
distribution packages. Some specific configuration items for Kata Containers
|
|
are:
|
|
|
|
# Cgroup management implementation used for the runtime.
|
|
cgroup_manager = "cgroupfs"
|
|
|
|
# manage_ns_lifecycle determines whether we pin and remove namespaces
|
|
# and manage their lifecycle
|
|
manage_ns_lifecycle = true
|
|
|
|
and the following section, that can be added at the end, defines a special type
|
|
of runtime, the `vm` type. This is needed to run the Kata Containers runtime
|
|
instead of the default `crun` choice:
|
|
|
|
[crio.runtime.runtimes.kata]
|
|
runtime_path = "/usr/local/bin/containerd-shim-kata-v2"
|
|
runtime_type = "vm"
|
|
runtime_root = "/run/vc"
|
|
|
|
Note that we don't have a containerd-shim-kata-v2 binary yet, we'll deal with
|
|
that in the next steps.
|
|
|
|
## CNI plugins
|
|
|
|
CNI plugins are actually binaries, run by CRI-O, used to configure networking on
|
|
the host as well as on the pod side. A few network topologies are offered, with
|
|
very limited capabilities.
|
|
|
|
### Fetch
|
|
|
|
git clone https://github.com/containernetworking/plugins
|
|
|
|
### Build
|
|
|
|
cd plugins
|
|
./build_linux.sh
|
|
|
|
### Install
|
|
|
|
As root:
|
|
|
|
mkdir -p /opt/cni/bin
|
|
cp bin/* /opt/cni/bin/
|
|
|
|
|
|
### Configure
|
|
|
|
The path where CNI configurations are located is configurable in
|
|
`/etc/crio/crio.conf`, see the `network_dir` parameter there. Assuming the
|
|
default value, we need to provide at least one configuration under
|
|
`/etc/cni/net.d/`. For example:
|
|
|
|
# cat /etc/cni/net.d/50-kata-sandbox.conf
|
|
{
|
|
"cniVersion": "0.3.0",
|
|
"name": "crio-bridge",
|
|
"type": "bridge",
|
|
"bridge": "cni0",
|
|
"isGateway": true,
|
|
"ipMasq": true,
|
|
"ipam": {
|
|
"type": "host-local",
|
|
"subnet": "10.88.0.0/16",
|
|
"routes": [
|
|
{ "dst": "0.0.0.0/0" }
|
|
]
|
|
}
|
|
}
|
|
|
|
## crictl
|
|
|
|
`crictl` is needed to control CRI-O in lieu of Kubernetes.
|
|
|
|
### Fetch
|
|
|
|
git clone https://github.com/kubernetes-sigs/cri-tools.git
|
|
|
|
### Build
|
|
|
|
cd cri-tools
|
|
make
|
|
|
|
### Install
|
|
|
|
As root:
|
|
|
|
make install
|
|
|
|
## mbuto
|
|
|
|
We'll use `mbuto` to build a minimal virtual machine image for usage with the
|
|
Kata Containers runtime.
|
|
|
|
### Fetch
|
|
|
|
git clone https://mbuto.lameexcu.se/mbuto
|
|
|
|
## Kata Containers
|
|
|
|
### Fetch
|
|
|
|
git clone https://github.com/kata-containers/kata-containers
|
|
|
|
### Patch
|
|
|
|
The current upstream version doesn't support the _passt_ networking model yet,
|
|
use the patch from this directory to add it:
|
|
|
|
patch -p1 < 0001-virtcontainers-agent-Add-passt-networking-model-and-.patch
|
|
|
|
### Build
|
|
|
|
make -C src/runtime
|
|
make -C src/agent LIBC=gnu
|
|
|
|
### Install
|
|
|
|
As root:
|
|
|
|
make -C src/runtime install
|
|
cp src/agent/target/x86_64-unknown-linux-gnu/release/kata-agent /usr/libexec/
|
|
chmod 755 /usr/libexec/kata-agent
|
|
|
|
### Build the Virtual Machine image
|
|
|
|
cd mbuto
|
|
./mbuto -f /tmp/kata.img
|
|
|
|
See `mbuto -h` for additional parameters, such as choice of kernel version,
|
|
kernel modules, program add-ons, etc. `mbuto` will print some configuration
|
|
parameters to be used in the configuration of the Kata Containers runtime below.
|
|
For example:
|
|
|
|
$ ./mbuto -c lz4 -f /tmp/kata.img
|
|
Not running as root, won't keep cpio mounted
|
|
Size: bin 12M lib 59M kmod 1.4M total 70M compressed 33M
|
|
Kata Containers [hypervisor.qemu] configuration:
|
|
|
|
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
|
|
initrd = "/tmp/kata.img"
|
|
|
|
### Configure
|
|
|
|
The configuration file at this point is located at
|
|
`/usr/share/defaults/kata-containers/configuration-qemu.toml`. Some parameters of general interest are:
|
|
|
|
[hypervisor.qemu]
|
|
kernel = "/boot/vmlinuz-5.10.0-6-amd64"
|
|
initrd = "/tmp/kata.img"
|
|
|
|
where we can use the values indicated earlier by `mbuto`. Currently, the default
|
|
path for the `virtiofsd` daemon doesn't work for all distributions, ensure that
|
|
it matches. For example, on Debian:
|
|
|
|
virtio_fs_daemon = "/usr/lib/qemu/virtiofsd"
|
|
|
|
we'll then need to enable the `passt` networking model for the runtime. In the
|
|
`[runtime]` section:
|
|
|
|
internetworking_model=passt
|
|
|
|
# Run an example container
|
|
|
|
## Fetch
|
|
|
|
We'll now need an image of a container to run as example. With `podman`
|
|
installed via distribution package, we can import one:
|
|
|
|
podman pull docker.io/i386/busybox
|
|
|
|
## Configure
|
|
|
|
Now we can define configuration files for pod and container we want to create
|
|
and start:
|
|
|
|
$ cat pod-config.json
|
|
{
|
|
"metadata": {
|
|
"name": "kata-sandbox",
|
|
"namespace": "default",
|
|
"attempt": 1,
|
|
"uid": "hdishd83djaidwnduwk28bcsb"
|
|
},
|
|
"logDirectory": "/tmp",
|
|
"linux": {
|
|
}
|
|
}
|
|
|
|
$ cat container-busybox.json
|
|
{
|
|
"metadata": {
|
|
"name": "kata-busybox"
|
|
},
|
|
"image": {
|
|
"image": "docker.io/i386/busybox"
|
|
},
|
|
"command": [
|
|
"sleep", "6000"
|
|
],
|
|
"log_path":"kata-busybox.log",
|
|
"linux": {
|
|
}
|
|
}
|
|
|
|
## Run the container workload
|
|
|
|
Assuming we have `pod-config.json` and `container-busybox.json` defined above,
|
|
we can now:
|
|
|
|
### start CRI-O
|
|
|
|
crio -l debug
|
|
|
|
### create the pod and run a container inside it
|
|
|
|
c=$(crictl start $(crictl create $(crictl runp --runtime=kata pod-config.json) container-dpdk.json pod-config.json))
|
|
|
|
### verify that addresses are properly configured
|
|
|
|
crictl exec $c ip ad sh
|
|
|
|
## Enable support for ICMP/ICMPv6 Echo Request
|
|
|
|
_passt_ can replicate ICMP Echo Requests sent by the workload, and propagate the
|
|
replies back. However, as it's not running as root, we need to enable so-called
|
|
_ping_ sockets for unprivileged users. From the namespace created by CRI-O for
|
|
this container:
|
|
|
|
sysctl -w net.ipv4.ping_group_range=net.ipv4.ping_group_range = 0 2147483647
|
|
|
|
# Troubleshooting
|
|
|
|
## Redirect qemu's console output to file
|
|
|
|
Agent errors and kernel messages should be accessible via named UNIX domain
|
|
socket at `/run/vc/vm/*/console.sock`, provided `agent.debug_console` is enabled
|
|
in `kernel_params` of `configuration.toml` but this won't work if the agent
|
|
doesn't start. In order to get those, we can wrap `qemu` and get, additionally,
|
|
all the output piped to a file:
|
|
|
|
$ cat /usr/local/bin/qemu.sh
|
|
#!/bin/sh
|
|
|
|
/usr/bin/qemu-system-x86_64 "$@" -serial file:/tmp/qemu.log 2>/tmp/qemu_err.log
|
|
|
|
now, use this as path for `qemu` in `configuration.toml`:
|
|
|
|
[hypervisor.qemu]
|
|
path = "/usr/local/bin/qemu.sh"
|
|
|
|
and don't forget to add `console=ttyS0` to the kernel parameters, so that kernel
|
|
messages will also be included:
|
|
|
|
kernel_params = "... console=ttyS0"
|
|
|
|
## Debug console
|
|
|
|
See the `kata-console` script in the
|
|
[kata-vfio-tools repository](https://github.com/dgibson/kata-vfio-tools) for a
|
|
convenient helper to access the debug console provided by the agent.
|