README: pasta mode, CI, performance, updated links, etc.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit is contained in:
Stefano Brivio 2021-09-26 19:31:37 +02:00
parent b216df04a1
commit cc8db1c5bc

255
README.md
View file

@ -1,11 +1,13 @@
<span style="font-weight: bold; color: red;">While functional and tested to some extent, this project is still in early development phase: don't use in production or critical environments yet.</span>
# passt: Plug A Simple Socket Transport # passt: Plug A Simple Socket Transport
_passt_ implements a translation layer between a Layer-2 network interface (tap) _passt_ implements a translation layer between a Layer-2 network interface and
and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't
require any capabilities or privileges, and it can be used as a simple require any capabilities or privileges, and it can be used as a simple
replacement for Slirp. replacement for Slirp.
<img src="/builds/passt_overview.png" usemap="#image-map" class="bright" style="z-index: 20; position: relative;"> <img src="/builds/latest/web/passt_overview.png" usemap="#image-map" class="bright" style="z-index: 20; position: relative;">
<map name="image-map" id="map_overview"> <map name="image-map" id="map_overview">
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/tcp.7.html" coords="229,275,246,320,306,294,287,249" shape="poly"> <area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/tcp.7.html" coords="229,275,246,320,306,294,287,249" shape="poly">
<area class="map_area" target="_blank" href="https://lwn.net/Articles/420799/" coords="230,201,243,246,297,232,289,186" shape="poly"> <area class="map_area" target="_blank" href="https://lwn.net/Articles/420799/" coords="230,201,243,246,297,232,289,186" shape="poly">
@ -35,7 +37,7 @@ replacement for Slirp.
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man4/veth.4.html" coords="1044,471,1090,461,1126,462,1150,464,1176,479,1160,491,1121,500,1081,501,1044,491,1037,483" shape="poly"> <area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man4/veth.4.html" coords="1044,471,1090,461,1126,462,1150,464,1176,479,1160,491,1121,500,1081,501,1044,491,1037,483" shape="poly">
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/network_namespaces.7.html" coords="240,379,524,452" shape="rect"> <area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/network_namespaces.7.html" coords="240,379,524,452" shape="rect">
<area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/netlink.7.html" coords="1119,278,1117,293,1165,304,1169,288" shape="poly"> <area class="map_area" target="_blank" href="https://man7.org/linux/man-pages/man7/netlink.7.html" coords="1119,278,1117,293,1165,304,1169,288" shape="poly">
<area class="map_area" target="_blank" href="https://passt.top/passt/tree/passt.c#n195" coords="989,294,1040,264,1089,280,986,344" shape="poly"> <area class="map_area" target="_blank" href="https://passt.top/passt/tree/conf.c" coords="989,294,1040,264,1089,280,986,344" shape="poly">
</map> </map>
<canvas id="map_highlight" style="border: 0px; z-index: 10; position: fixed; pointer-events: none"></canvas> <canvas id="map_highlight" style="border: 0px; z-index: 10; position: fixed; pointer-events: none"></canvas>
<script> <script>
@ -92,17 +94,35 @@ for (var i = 0; i < map_areas.length; i++) {
} }
</script> </script>
- [General idea](#general-idea) # pasta: Pack A Subtle Tap Abstraction
_pasta_ (same binary as _passt_, different command) offers equivalent
functionality, for network namespaces: traffic is forwarded using a tap
interface inside the namespace, without the need to create further interfaces on
the host, hence not requiring any capabilities or privileges.
It also implements a tap bypass path for local connections: packets with a local
destination address are moved directly between Layer-4 sockets, avoiding Layer-2
translations, using the _splice_(2) and _recvmmsg_(2)/_sendmmsg_(2) system calls
for TCP and UDP, respectively.
<img src="/builds/latest/web/pasta_overview.png" class="bright" style="z-index: 20; position: relative;">
- [Motivation](#motivation)
- [Non-functional Targets](#non-functional-targets) - [Non-functional Targets](#non-functional-targets)
- [Interfaces and Environment](#interfaces-and-environment) - [Interfaces and Environment](#interfaces-and-environment)
- [Services](#services) - [Services](#services)
- [Addresses](#addresses) - [Addresses](#addresses)
- [Protocols](#protocols) - [Protocols](#protocols)
- [Ports](#ports) - [Ports](#ports)
- [Continuous Integration](#continuous-integration)
- [Performance](#performance)
- [Try it](#try-it) - [Try it](#try-it)
- [Contribute](#contribute) - [Contribute](#contribute)
## General idea ## Motivation
### passt
When container workloads are moved to virtual machines, the network traffic is When container workloads are moved to virtual machines, the network traffic is
typically forwarded by interfaces operating at data link level. Some components typically forwarded by interfaces operating at data link level. Some components
@ -110,19 +130,17 @@ in the containers ecosystem (such as _service meshes_), however, expect
applications to run locally, with visible sockets and processes, for the applications to run locally, with visible sockets and processes, for the
purposes of socket redirection, monitoring, port mapping. purposes of socket redirection, monitoring, port mapping.
To solve this issue, user mode networking as provided e.g. by _Slirp_, To solve this issue, user mode networking, as provided e.g. by _libslirp_,
_libslirp_, _slirp4netns_ can be used. However, these existing solutions can be used. Existing solutions implement a full TCP/IP stack, replaying traffic
implement a full TCP/IP stack, replaying traffic on sockets that are local to on sockets that are local to the pod of the service mesh. This creates the
the pod of the service mesh. This creates the illusion of application processes illusion of application processes running on the same host, eventually separated
running on the same host, eventually separated by user namespaces. by user namespaces.
While being almost transparent to the service mesh infrastructure, that kind of While being almost transparent to the service mesh infrastructure, that kind of
solution comes with a number of downsides: solution comes with a number of downsides:
* three different TCP/IP stacks (guest, adaptation and host) need to be * three different TCP/IP stacks (guest, adaptation and host) need to be
traversed for every service request. There are no chances to implement traversed for every service request
zero-copy mechanisms, and the amount of context switches increases
dramatically
* addressing needs to be coordinated to create the pretense of consistent * addressing needs to be coordinated to create the pretense of consistent
addresses and routes between guest and host environments. This typically needs addresses and routes between guest and host environments. This typically needs
a NAT with masquerading, or some form of packet bridging a NAT with masquerading, or some form of packet bridging
@ -135,21 +153,43 @@ solution comes with a number of downsides:
would if deployed with regular containers would if deployed with regular containers
_passt_ implements a thinner layer between guest and host, that only implements _passt_ implements a thinner layer between guest and host, that only implements
what's strictly needed to pretend processes are running locally. A further, full what's strictly needed to pretend processes are running locally. The TCP
TCP/IP stack is not necessarily needed. Some sort of TCP adaptation is needed, adaptation doesn't keep per-connection packet buffers, and reflects observed
however, as this layer runs without the `CAP_NET_RAW` capability: we can't sending windows and acknowledgements between the two sides. This TCP adaptation
create raw IP sockets on the pod, and therefore need to map packets at Layer-2 is needed as _passt_ runs without the `CAP_NET_RAW` capability: it can't create
to Layer-4 sockets offered by the host kernel. raw IP sockets on the pod, and therefore needs to map packets at Layer-2 to
Layer-4 sockets offered by the host kernel.
The problem and this approach are illustrated in more detail, with diagrams, The problem and this approach are illustrated in more detail, with diagrams,
[here](https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md). [here](https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md).
### pasta
On Linux, regular users can create network namespaces and run application
services inside them. However, connecting namespaces to other namespaces and to
external hosts requires the creation of network interfaces, such as `veth`
pairs, which needs in turn elevated privileges or the `CAP_NET_ADMIN`
capability. _pasta_, similarly to _slirp4netns_, solves this problem by creating
a tap interface available to processes in the namespace, and mapping network
traffic outside the namespace using native Layer-4 sockets.
Existing approaches typically implement a full, generic TCP/IP stack for this
translation between data and transport layers, without the possibility of
speeding up local connections, and usually requiring NAT. _pasta_:
* avoids the need for a generic, full-fledged TCP/IP stack by coordinating TCP
connection dynamics between sender and receiver
* offers a fast bypass path for local connections: if a process connects to
another process on the same host across namespaces, data is directly forwarded
using pairs of Layer-4 sockets
* with default options, maps routing and addressing information to the
namespace, avoiding any need for NAT
## Non-functional Targets ## Non-functional Targets
Security and maintainability goals: Security and maintainability goals:
* no dynamic memory allocation * no dynamic memory allocation
* ~2 000 LoC target * ~5 000 LoC target
* no external dependencies * no external dependencies
## Interfaces and Environment ## Interfaces and Environment
@ -166,83 +206,125 @@ TCP. Two temporary solutions are available:
This approach, compared to using a _tap_ device, doesn't require any security This approach, compared to using a _tap_ device, doesn't require any security
capabilities, as we don't need to create any interface. capabilities, as we don't need to create any interface.
_pasta_ runs out of the box with any recent (post-3.8) Linux kernel.
## Services ## Services
_passt_ provides some minimalistic implementations of networking services that _passt_ and _pasta_ provide some minimalistic implementations of networking
can't practically run on the host: services:
* [ARP proxy](https://passt.top/passt/tree/arp.c), that resolves the address of * [ARP proxy](https://passt.top/passt/tree/arp.c), that resolves the address of
the host (which is used as gateway) to the original MAC address of the host the host (which is used as gateway) to the original MAC address of the host
* [DHCP server](https://passt.top/passt/tree/dhcp.c), a simple implementation * [DHCP server](https://passt.top/passt/tree/dhcp.c), a simple implementation
handing out one single IPv4 address to the guest, namely, the same address as handing out one single IPv4 address to the guest or namespace, namely, the
the first one configured for the upstream host interface, and passing the same address as the first one configured for the upstream host interface, and
nameservers configured on the host passing the nameservers configured on the host
* [NDP proxy](https://passt.top/passt/tree/ndp.c), which can also assign prefix * [NDP proxy](https://passt.top/passt/tree/ndp.c), which can also assign prefix
and nameserver using SLAAC and nameserver using SLAAC
* [DHCPv6 server](https://passt.top/passt/tree/dhcpv6.c): a simple * [DHCPv6 server](https://passt.top/passt/tree/dhcpv6.c): a simple
implementation handing out one single IPv6 address to the guest, namely, the implementation handing out one single IPv6 address to the guest or namespace,
the same address as the first one configured for the upstream host interface, namely, the the same address as the first one configured for the upstream host
and passing the first nameserver configured on the host interface, and passing the nameservers configured on the host
## Addresses ## Addresses
For IPv4, the guest is assigned, via DHCP, the same address as the upstream For IPv4, the guest or namespace is assigned, via DHCP, the same address as the
interface of the host, and the same default gateway as the default gateway of upstream interface of the host, and the same default gateway as the default
the host. Addresses are translated in case the guest is seen using a different gateway of the host. Addresses are translated in case the guest is seen using a
address from the assigned one. different address from the assigned one.
For IPv6, the guest is assigned, via SLAAC, the same prefix as the upstream For IPv6, the guest or namespace is assigned, via SLAAC, the same prefix as the
interface of the host, the same default route as the default route of the upstream interface of the host, the same default route as the default route of
host, and, if a DHCPv6 client is running on the guest, also the same address as the host, and, if a DHCPv6 client is running in the guest or namespace, also the
the upstream address of the host. This means that, with a DHCPv6 client on the same address as the upstream address of the host. This means that, with a DHCPv6
guest, addresses don't need to be translated. Should the client use a different client in the guest or namespace, addresses don't need to be translated. Should
address, the destination address is translated for packets going to the guest. the client use a different address, the destination address is translated for
packets going to the guest or to the namespace.
For UDP and TCP, for both IPv4 and IPv6, packets addressed to a loopback address ### Local connections with _passt_
are forwarded to the guest with their source address changed to the address of
the gateway or first hop of the default route. This mapping is reversed as the For UDP and TCP, for both IPv4 and IPv6, packets from the host addressed to a
guest replies to those packets (on the same TCP connection, or using destination loopback address are forwarded to the guest with their source address changed to
port and address that were used as source for UDP). the address of the gateway or first hop of the default route. This mapping is
reversed on the other way.
### Local connections with _pasta_
Packets addressed to a loopback address in either namespace are directly
forwarded to the corresponding (or configured) port in the other namespace.
Similarly as _passt_, packets from the non-init namespace addressed to the
default gateway, which are therefore sent via the tap device, will have their
destination address translated to the loopback address.
## Protocols ## Protocols
_passt_ supports TCP, UDP and ICMP/ICMPv6 echo (requests and replies). More _passt_ and _pasta_ support TCP, UDP and ICMP/ICMPv6 echo (requests and
details about the TCP implementation are available replies). More details about the TCP implementation are available
[here](https://passt.top/passt/tree/tcp.c), and for the UDP [here](https://passt.top/passt/tree/tcp.c), and for the UDP
implementation [here](https://passt.top/passt/tree/udp.c). implementation [here](https://passt.top/passt/tree/udp.c).
An IGMP proxy is currently work in progress. An IGMP/MLD proxy is currently work in progress.
## Ports ## Ports
To avoid the need for explicit port mapping configuration, _passt_ binds to all ### passt
unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports (0-1023)
will fail without additional capabilities, and ports already bound (service To avoid the need for explicit port mapping configuration, _passt_ can bind to
proxies, etc.) will also not be used. all unbound non-ephemeral (0-49152) TCP and UDP ports. Binding to low ports
(0-1023) will fail without additional capabilities, and ports already bound
(service proxies, etc.) will also not be used. Smaller subsets of ports, with
port translations, are also configurable.
UDP ephemeral ports are bound dynamically, as the guest uses them. UDP ephemeral ports are bound dynamically, as the guest uses them.
Service proxies and other services running in the container need to be started If all ports are forwarded, service proxies and other services running in the
before _passt_ starts. container need to be started before _passt_ starts.
### pasta
With default options, _pasta_ scans for bound ports on init and non-init
namespaces, and automatically forwards them from the other side. Port forwarding
is fully configurable with command line options.
## Continuous Integration
<p><video id="ci_video" style="width: 90%; height: auto; max-height: 90%" controls>
<source src="/builds/latest/web/ci.webm" type="video/webm">
</video></p>
<script src="/builds/latest/web/ci.js"></script>
Test logs [here](https://passt.top/builds/latest/test/).
## Performance
<script src="/builds/latest/web/perf.js"></script>
## Try it ## Try it
### passt
* build from source: * build from source:
git clone https://passt.top/passt git clone https://passt.top/passt
cd passt cd passt
make make
* to make _passt_ not fork into background when it starts, and to get verbose * alternatively, static builds for x86_64, with or without AVX2 instructions,
debug information, build with: as of the latest commit are also available for convenience
[here](https://passt.top/builds/latest/x86_64/avx2/) and
[here](https://passt.top/builds/latest/x86_64/). Convenience, non-official
packages for Debian (and derivatives) and RPM-based distributions are also
available there. These binaries and packages are simply built with:
CFLAGS="-DDEBUG" make CFLAGS="-static" make avx2
make pkgs
make static
make pkgs
* a static build for x86_64 as of the latest commit is also available for * have a look at the _man_ page for synopsis and options:
convenience [here](https://passt.top/builds/static/). These binaries are
simply built with:
CFLAGS="-static" make man ./passt.1
* run the demo script, that creates a network namespace called `passt`, sets up * run the demo script, that creates a network namespace called `passt`, sets up
sets up a _veth_ pair and and addresses, together with NAT for IPv4 and NDP sets up a _veth_ pair and and addresses, together with NAT for IPv4 and NDP
@ -283,14 +365,51 @@ before _passt_ starts.
ssh 192.0.2.2 ssh 192.0.2.2
### passt
* build from source:
git clone https://passt.top/passt
cd passt
make
* alternatively, static builds for x86_64, with or without AVX2 instructions,
as of the latest commit are also available for convenience
[here](https://passt.top/builds/latest/x86_64/avx2/) and
[here](https://passt.top/builds/latest/x86_64/). Convenience, non-official
packages for Debian (and derivatives) and RPM-based distributions are also
available there. These binaries and packages are simply built with:
CFLAGS="-static" make avx2
make pkgs
make static
make pkgs
* have a look at the _man_ page for synopsis and options:
man ./pasta.1
* start pasta with:
./pasta
* you're now inside a new user and network namespace. For IPv6, SLAAC happens
right away as _pasta_ sets up the interface, but DHCPv6 support is available
as well. For IPv4, configure the interface with a DHCP client:
dhclient
and, optionally:
dhclient -6
* and that's it, you should now have TCP connections, UDP, and ICMP/ICMPv6
echo working from/to the guest for IPv4 and IPv6
* to connect to a service inside the namespace, just connect to the same port
using the loopback address.
## Contribute ## Contribute
Send patches and issue reports to [sbrivio@redhat.com](mailto:sbrivio@redhat.com). Public bug tracker and mailing lists are coming soon. For the moment being, send
patches and issue reports to [sbrivio@redhat.com](mailto:sbrivio@redhat.com).
<p><video id="ci_video" style="width: 90%; height: auto; max-height: 90%" controls>
<source src="/builds/ci.mp4" type="video/mp4">
</video></p>
<script src="/builds/perf.js"></script>
<script src="/builds/video_links.js"></script>