passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.
Conceptually, this follows the design presented at:
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md
The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.
Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.
While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
because of explicit checks now introduced in qemu to ensure that
the corresponding file descriptor is actually a tap device. For
this reason, qrap now operates on a 'socket' back-end type,
accounting for and building the additional header reporting
frame length
- provide a demo script that sets up namespaces, addresses and
routes, and starts the daemon. A virtual machine started in the
network namespace, wrapped by qrap, will now directly interface
with passt and communicate using Layer 4 sockets provided by the
host kernel.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:25:09 +01:00
|
|
|
#!/bin/sh -e
|
|
|
|
#
|
|
|
|
# SPDX-License-Identifier: AGPL-3.0-or-later
|
|
|
|
#
|
|
|
|
# PASST - Plug A Simple Socket Transport
|
|
|
|
#
|
2022-08-09 23:19:13 +02:00
|
|
|
# demo.sh - Set up namespace with pasta, start qemu and passt, step by step
|
passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.
Conceptually, this follows the design presented at:
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md
The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.
Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.
While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
because of explicit checks now introduced in qemu to ensure that
the corresponding file descriptor is actually a tap device. For
this reason, qrap now operates on a 'socket' back-end type,
accounting for and building the additional header reporting
frame length
- provide a demo script that sets up namespaces, addresses and
routes, and starts the daemon. A virtual machine started in the
network namespace, wrapped by qrap, will now directly interface
with passt and communicate using Layer 4 sockets provided by the
host kernel.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:25:09 +01:00
|
|
|
#
|
2022-08-09 23:19:13 +02:00
|
|
|
# Copyright (c) 2020-2022 Red Hat GmbH
|
passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.
Conceptually, this follows the design presented at:
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md
The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.
Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.
While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
because of explicit checks now introduced in qemu to ensure that
the corresponding file descriptor is actually a tap device. For
this reason, qrap now operates on a 'socket' back-end type,
accounting for and building the additional header reporting
frame length
- provide a demo script that sets up namespaces, addresses and
routes, and starts the daemon. A virtual machine started in the
network namespace, wrapped by qrap, will now directly interface
with passt and communicate using Layer 4 sockets provided by the
host kernel.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:25:09 +01:00
|
|
|
# Author: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
|
2022-08-09 23:19:13 +02:00
|
|
|
# mbuto_profile() - Profile for https://mbuto.sh/, sourced, return after setting
|
|
|
|
mbuto_profile() {
|
|
|
|
PROGS="${PROGS:-ash,dash,bash ip mount ls ln chmod insmod mkdir sleep
|
|
|
|
lsmod modprobe find grep mknod mv rm umount iperf3 dhclient cat
|
|
|
|
hostname chown socat dd strace ping killall sysctl wget,curl}"
|
|
|
|
|
|
|
|
KMODS="${KMODS:- virtio_net virtio_pci}"
|
|
|
|
|
|
|
|
LINKS="${LINKS:-
|
|
|
|
ash,dash,bash /init
|
|
|
|
ash,dash,bash /bin/sh}"
|
|
|
|
|
|
|
|
DIRS="${DIRS} /tmp /sbin /var/log /var/run /var/lib"
|
|
|
|
|
|
|
|
# shellcheck disable=SC2016
|
|
|
|
FIXUP="${FIXUP}"'
|
|
|
|
cat > /sbin/dhclient-script << EOF
|
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
[ -n "\${new_interface_mtu}" ] && ip link set dev \${interface} mtu \${new_interface_mtu}
|
|
|
|
|
|
|
|
[ -n "\${new_ip_address}" ] && ip addr add \${new_ip_address}/\${new_subnet_mask} dev \${interface}
|
|
|
|
[ -n "\${new_routers}" ] && for r in \${new_routers}; do ip route add default via \${r} dev \${interface}; done
|
|
|
|
[ -n "\${new_domain_name_servers}" ] && for d in \${new_domain_name_servers}; do echo "nameserver \${d}" >> /etc/resolv.conf; done
|
|
|
|
[ -n "\${new_domain_name}" ] && echo "search \${new_domain_name}" >> /etc/resolf.conf
|
|
|
|
[ -n "\${new_domain_search}" ] && (printf "search"; for d in \${new_domain_search}; do printf " %s" "\${d}"; done; printf "\n") >> /etc/resolv.conf
|
|
|
|
[ -n "\${new_ip6_address}" ] && ip addr add \${new_ip6_address}/\${new_ip6_prefixlen} dev \${interface}
|
|
|
|
[ -n "\${new_dhcp6_name_servers}" ] && for d in \${new_dhcp6_name_servers}; do echo "nameserver \${d}%\${interface}" >> /etc/resolv.conf; done
|
|
|
|
[ -n "\${new_dhcp6_domain_search}" ] && (printf "search"; for d in \${new_dhcp6_domain_search}; do printf " %s" "\${d}"; done; printf "\n") >> /etc/resolv.conf
|
|
|
|
[ -n "\${new_host_name}" ] && hostname "\${new_host_name}"
|
|
|
|
exit 0
|
|
|
|
EOF
|
|
|
|
|
|
|
|
chmod 755 /sbin/dhclient-script
|
|
|
|
|
|
|
|
mkdir -p /etc/dhcp
|
|
|
|
echo "timeout 3;" > /etc/dhcp/dhclient.conf
|
|
|
|
|
|
|
|
ln -s /sbin /usr/sbin
|
|
|
|
:> /etc/fstab
|
|
|
|
|
|
|
|
echo
|
|
|
|
echo "The guest is up and running. Networking is not configured yet:"
|
|
|
|
echo
|
|
|
|
echo "$ ip address show"
|
|
|
|
echo
|
|
|
|
ip address show
|
|
|
|
echo
|
|
|
|
echo "...the next step will take care of that."
|
|
|
|
read x
|
|
|
|
|
|
|
|
echo "$ ip link set dev eth0 up"
|
|
|
|
ip link set dev eth0 up
|
|
|
|
sleep 3
|
2022-09-17 01:10:38 +02:00
|
|
|
echo "$ dhclient -4 -1 -sf /sbin/dhclient-script"
|
|
|
|
dhclient -4 -1 -sf /sbin/dhclient-script
|
2022-08-09 23:19:13 +02:00
|
|
|
sleep 2
|
2022-09-17 01:10:38 +02:00
|
|
|
echo "$ dhclient -6 -1 -sf /sbin/dhclient-script"
|
|
|
|
dhclient -6 -1 -sf /sbin/dhclient-script
|
2022-08-09 23:19:13 +02:00
|
|
|
sleep 2
|
|
|
|
echo
|
|
|
|
echo "$ ip address show"
|
|
|
|
ip address show
|
|
|
|
echo
|
|
|
|
echo "$ ip route show"
|
|
|
|
ip route show
|
|
|
|
echo
|
|
|
|
echo "...done."
|
|
|
|
read x
|
|
|
|
|
|
|
|
echo "Checking connectivity..."
|
|
|
|
echo
|
|
|
|
echo "$ wget --no-check-certificate https://passt.top/ || curl -k https://passt.top/"
|
|
|
|
wget --no-check-certificate https://passt.top/ || curl -k https://passt.top/
|
|
|
|
echo "...done."
|
|
|
|
read x
|
|
|
|
|
|
|
|
echo "An interactive shell will start now. When you are done,"
|
|
|
|
echo "use ^C to terminate the guest and exit the demo."
|
|
|
|
echo
|
|
|
|
|
|
|
|
sh +m
|
|
|
|
'
|
passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.
Conceptually, this follows the design presented at:
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md
The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.
Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.
While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
because of explicit checks now introduced in qemu to ensure that
the corresponding file descriptor is actually a tap device. For
this reason, qrap now operates on a 'socket' back-end type,
accounting for and building the additional header reporting
frame length
- provide a demo script that sets up namespaces, addresses and
routes, and starts the daemon. A virtual machine started in the
network namespace, wrapped by qrap, will now directly interface
with passt and communicate using Layer 4 sockets provided by the
host kernel.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:25:09 +01:00
|
|
|
}
|
|
|
|
|
2022-08-09 23:19:13 +02:00
|
|
|
[ "${0##*/}" = "mbuto" ] && mbuto_profile && return 0
|
|
|
|
|
|
|
|
# cmd() - Show command being executed, then run it
|
|
|
|
# $@: Command and arguments
|
|
|
|
cmd() {
|
|
|
|
echo "$" "$@"
|
|
|
|
"$@"
|
|
|
|
}
|
|
|
|
|
|
|
|
# next() - Go to next step once a key is pressed, sets $KEY
|
|
|
|
next() {
|
|
|
|
KEY="$(dd ibs=1 count=1 2>/dev/null)"
|
|
|
|
echo
|
passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.
Conceptually, this follows the design presented at:
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md
The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.
Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.
While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
because of explicit checks now introduced in qemu to ensure that
the corresponding file descriptor is actually a tap device. For
this reason, qrap now operates on a 'socket' back-end type,
accounting for and building the additional header reporting
frame length
- provide a demo script that sets up namespaces, addresses and
routes, and starts the daemon. A virtual machine started in the
network namespace, wrapped by qrap, will now directly interface
with passt and communicate using Layer 4 sockets provided by the
host kernel.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:25:09 +01:00
|
|
|
}
|
|
|
|
|
2022-08-09 23:19:13 +02:00
|
|
|
# cleanup() - Terminate pasta and passt, clean up, restore TTY settings
|
|
|
|
cleanup() {
|
|
|
|
[ -f "${DEMO_DIR}/pasta.pid" ] && kill "$(cat "${DEMO_DIR}/pasta.pid")"
|
|
|
|
[ -f "${DEMO_DIR}/passt.pid" ] && kill "$(cat "${DEMO_DIR}/passt.pid")"
|
|
|
|
rm -rf "${DEMO_DIR}" 2>/dev/null
|
|
|
|
[ -n "${STTY_BACKUP}" ] && stty "${STTY_BACKUP}"
|
|
|
|
}
|
|
|
|
|
|
|
|
# start_pasta_delayed() - Start pasta once $DEMO_DIR/pasta.wait is gone
|
|
|
|
start_pasta_delayed() {
|
|
|
|
trap '' EXIT
|
|
|
|
while [ -d "${DEMO_DIR}/pasta.wait" ]; do sleep 1; done
|
|
|
|
cmd pasta --config-net -P "${DEMO_DIR}/pasta.pid" \
|
|
|
|
"$(cat "${DEMO_DIR}/shell.pid")"
|
|
|
|
echo
|
|
|
|
echo "...pasta is running."
|
|
|
|
exit 0
|
|
|
|
}
|
2021-05-21 11:14:51 +02:00
|
|
|
|
2022-09-17 01:14:27 +02:00
|
|
|
# start_mbuto_delayed() - Run mbuto once, and if, $DEMO_DIR/mbuto.wait is gone
|
|
|
|
start_mbuto_delayed() {
|
|
|
|
trap '' EXIT
|
|
|
|
while [ -d "${DEMO_DIR}/mbuto.wait" ]; do sleep 1; done
|
|
|
|
cmd git -C "${DEMO_DIR}" clone git://mbuto.sh/mbuto
|
|
|
|
echo
|
|
|
|
cmd "${DEMO_DIR}/mbuto/mbuto" \
|
|
|
|
-p "$(realpath "${0}")" -f "${DEMO_DIR}/demo.img"
|
|
|
|
|
|
|
|
mkdir "${DEMO_DIR}/mbuto.done"
|
|
|
|
exit 0
|
|
|
|
}
|
|
|
|
|
2022-08-09 23:19:13 +02:00
|
|
|
# into_ns() - Entry point and demo script to run inside new namespace
|
|
|
|
into_ns() {
|
|
|
|
echo "We're in the new namespace now."
|
|
|
|
next
|
|
|
|
|
|
|
|
echo "Networking is not configured yet:"
|
|
|
|
echo
|
|
|
|
cmd ip link show
|
|
|
|
echo
|
|
|
|
cmd ip address show
|
|
|
|
next
|
|
|
|
|
|
|
|
echo "Let's run pasta(1) to configure networking and connect this"
|
|
|
|
echo "namespace. Note that we'll run pasta(1) from outside this"
|
|
|
|
echo "namespace, because it needs to implement the connection between"
|
|
|
|
echo "this namespace and the initial (\"outer\") one."
|
|
|
|
next
|
|
|
|
|
|
|
|
echo "$$" > "${DEMO_DIR}/shell.pid"
|
|
|
|
rmdir "${DEMO_DIR}/pasta.wait"
|
|
|
|
next
|
|
|
|
|
|
|
|
echo "Back to the new namespace, networking is configured:"
|
|
|
|
echo
|
|
|
|
cmd ip link show
|
|
|
|
echo
|
|
|
|
cmd ip address show
|
|
|
|
next
|
|
|
|
|
|
|
|
echo "and we can now start passt(1), to connect this namespace to a"
|
|
|
|
echo "virtual machine. If you want to start a shell in this namespace,"
|
|
|
|
echo "press 's' now. Exiting the shell will resume the script."
|
|
|
|
next
|
|
|
|
[ "${KEY}" = "s" ] && ${SHELL}
|
|
|
|
|
|
|
|
cmd passt -P "${DEMO_DIR}/passt.pid"
|
|
|
|
echo
|
|
|
|
echo "...passt is running."
|
|
|
|
next
|
|
|
|
|
|
|
|
__arch="$(uname -m)"
|
|
|
|
case ${__arch} in
|
|
|
|
x86_64)
|
|
|
|
__arch_supported=1
|
|
|
|
__qemu_arch="qemu-system-x86_64 -M pc,accel=kvm:tcg"
|
|
|
|
;;
|
|
|
|
*)
|
|
|
|
__arch_supported=0
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
|
|
|
|
if [ "${__arch_supported}" -eq 1 ]; then
|
|
|
|
echo "We're ready to start a virtual machine now. This script"
|
|
|
|
echo "can download and use mbuto (https://mbuto.sh/) to build a"
|
|
|
|
echo "basic initramfs image. Otherwise, press 's' to skip this"
|
|
|
|
echo "step, and start an existing virtual machine yourself."
|
|
|
|
echo "You'll need to use the qrap(1) wrapper, with qemu options"
|
|
|
|
echo "as reported above."
|
|
|
|
|
|
|
|
next
|
|
|
|
else
|
|
|
|
echo "This script doesn't know, yet, how to run a virtual"
|
|
|
|
echo "machine on your architecture (${__arch}). Please start an"
|
|
|
|
echo "existing virtual machine yourself, using the qrap(1)"
|
|
|
|
echo "wrapper, with qemu options as reported above."
|
|
|
|
echo
|
|
|
|
fi
|
|
|
|
|
|
|
|
if [ "${__arch_supported}" -eq 0 ] || [ "${KEY}" = "s" ]; then
|
|
|
|
echo "Start a virtual machine now. Pressing any key here will"
|
|
|
|
echo "terminate passt and pasta, and clean up."
|
|
|
|
next
|
|
|
|
|
|
|
|
exit 0
|
|
|
|
fi
|
|
|
|
|
2022-09-17 01:14:27 +02:00
|
|
|
rmdir "${DEMO_DIR}/mbuto.wait"
|
|
|
|
while [ ! -d "${DEMO_DIR}/mbuto.done" ]; do sleep 1; done
|
2022-08-09 23:19:13 +02:00
|
|
|
echo "The guest image is ready. The next step will start the guest."
|
|
|
|
echo "Use ^C to terminate it."
|
|
|
|
next
|
|
|
|
|
|
|
|
cmd qrap 5 qemu-system-x86_64 -M pc,accel=kvm:tcg \
|
|
|
|
-smp "$(nproc)" -m 1024 \
|
|
|
|
-nographic -serial stdio -nodefaults -no-reboot -vga none \
|
|
|
|
-initrd "${DEMO_DIR}/demo.img" \
|
|
|
|
-kernel "/boot/vmlinuz-$(uname -r)" -append "console=ttyS0" \
|
|
|
|
-net socket,fd=5 -net nic,model=virtio || :
|
passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.
Conceptually, this follows the design presented at:
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md
The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.
Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.
While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
because of explicit checks now introduced in qemu to ensure that
the corresponding file descriptor is actually a tap device. For
this reason, qrap now operates on a 'socket' back-end type,
accounting for and building the additional header reporting
frame length
- provide a demo script that sets up namespaces, addresses and
routes, and starts the daemon. A virtual machine started in the
network namespace, wrapped by qrap, will now directly interface
with passt and communicate using Layer 4 sockets provided by the
host kernel.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:25:09 +01:00
|
|
|
}
|
|
|
|
|
2022-08-09 23:19:13 +02:00
|
|
|
STTY_BACKUP="$(stty -g)"
|
|
|
|
stty -icanon
|
|
|
|
|
|
|
|
trap cleanup EXIT INT
|
|
|
|
[ "${1}" = "into_ns" ] && into_ns && exit 0
|
|
|
|
|
|
|
|
DEMO_DIR="$(mktemp -d)"
|
|
|
|
mkdir "${DEMO_DIR}/pasta.wait"
|
2022-09-17 01:14:27 +02:00
|
|
|
mkdir "${DEMO_DIR}/mbuto.wait"
|
2022-08-09 23:19:13 +02:00
|
|
|
|
|
|
|
echo "This script sets up a network and user namespace using pasta(1), then"
|
|
|
|
echo "starts a virtual machine in it, connected via passt(1), pausing at every"
|
|
|
|
echo "step. Press any key to go to the next step."
|
|
|
|
next
|
|
|
|
|
|
|
|
echo "Let's create the network and user namespace, first. This could be done"
|
|
|
|
echo "with pasta(1) itself (just issue \`pasta\`), but for the sake of this"
|
|
|
|
echo "script we'll create it first with unshare(1), and run the next steps"
|
|
|
|
echo "of this script from there."
|
|
|
|
next
|
|
|
|
|
|
|
|
start_pasta_delayed &
|
2022-09-17 01:14:27 +02:00
|
|
|
start_mbuto_delayed &
|
2022-08-09 23:19:13 +02:00
|
|
|
DEMO_DIR="${DEMO_DIR}" cmd unshare -rUn "${0}" into_ns
|
|
|
|
|
|
|
|
exit 0
|