Commit graph

47 commits

Author SHA1 Message Date
Stefano Brivio
be41639c20 README: Point openSUSE links to Dario's OBS repository
...instead of my Copr. It's also not official yet, but surely more
appropriate now.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-24 00:18:40 +02:00
Stefano Brivio
8b3443c561 README: Fix misspellings of openSUSE
For some reason, I used a capital O everywhere.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-24 00:14:47 +02:00
Stefano Brivio
47d424d083 README: Update Availability and Try It sections with new packages
We now have official packages for Fedora, unofficial (Fedora Copr)
for other common RPM-based distributions, and the existing
packages with static builds for Debian, and for other RPM-based
distributions.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-22 16:53:35 +02:00
Stefano Brivio
f3aaced135 README: Add link to Copr repositories
These have packages covering all recent versions of CentOS Stream,
EPEL, Fedora, Mageia and OpenSUSE Tumbleweed.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-18 21:17:39 +02:00
Stefano Brivio
bda79ba401 doc: Rewrite demo script
The original demo script was written when pasta wasn't a thing yet,
so it needed to run as root, set up a veth pair, and configure
addresses and routes by itself.

Now pasta can do all that for us, and become part of the demo as
well.

Further, extend it to start qemu, optionally preparing a basic demo
image with mbuto (https://mbuto.sh), and execute one logical step at
a time, for clarity.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-18 21:17:29 +02:00
Stefano Brivio
1d223e4b4c passt: Allow exit_group() system call in seccomp profiles
We handle SIGQUIT and SIGTERM calling exit(), which is usually
implemented with the exit_group() system call.

If we don't allow exit_group(), we'll get a SIGSYS while handling
SIGQUIT and SIGTERM, which means a misleading non-zero exit code.

Reported-by: Wenli Quan <wquan@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2101990
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-14 01:36:05 +02:00
Stefano Brivio
d7d467f60c README: Fix links to static builds
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-06-08 11:17:59 +02:00
Stefano Brivio
8cc6c9b490 README: Fix link to contrib/debian
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-30 14:34:42 +02:00
Stefano Brivio
baf79c033e README: Drop red notice about early development phase
Last famous words: it should be tested enough by now.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-30 05:52:39 +02:00
Stefano Brivio
bc925b1da4 contrib: Add example of Debian package files
...using dh_apparmor to ship and apply AppArmor profiles. Tried on
current Debian testing (Bookworm, 12).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-30 05:52:39 +02:00
Stefano Brivio
8d85b6a99e tap: Allow ioctl() and openat() for tap_ns_tun() re-initialisation
If the tun interface disappears, we'll call tap_ns_tun() after the
seccomp profile is applied: add ioctl() and openat() to it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-30 05:49:46 +02:00
Stefano Brivio
1f4b7fa0d7 passt, pasta: Add examples of SELinux policy modules
These should cover any reasonably common use case in distributions.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
bb70811183 treewide: Packet abstraction with mandatory boundary checks
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
bc4ec1a8e9 README: Update Interfaces and Availability sections
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
e80f608710 README: Avoid "here" links
They look a bit lame: rephrase sentences to avoid them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
be5bbb9b06 tcp: Rework timers to use timerfd instead of periodic bitmap scan
With a lot of concurrent connections, the bitmap scan approach is
not really sustainable.

Switch to per-connection timerfd timers, set based on events and on
two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added
to the common epoll list, and implement the existing timeouts.

While at it, drop the CONN_ prefix from flag names, otherwise they
get quite long, and fix the logic to decide if a connection has a
local, possibly unreachable endpoint: we shouldn't go through the
rest of tcp_conn_from_tap() if we reset the connection due to a
successful bind(2), and we'll get EACCES if the port number is low.

Suggested by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
14c4c0253c README: Make it somewhat readable on mobile devices
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-04 19:23:45 +01:00
Stefano Brivio
216a266a75 hooks, README: gzipped js snippets, webp alternatives for png
Upload gzipped js snippets for usage with gzip_static in nginx or
equivalent. Convert png drawings to webp for smaller size, use them
as alternatives in README.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-02 14:02:03 +01:00
Stefano Brivio
71ab6d9972 README: Don't preload CI recording, show poster from end of run
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 22:31:42 +01:00
Stefano Brivio
628c4f0cae README: s/guest/namespace/ in pasta "Try it" section
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 21:43:41 +01:00
Stefano Brivio
06f8e4f960 Makefile, hooks: Static target precondition for pkgs, copy .avx2 builds
Convenience packages are anyway built from static builds.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 21:41:22 +01:00
Stefano Brivio
213c397492 passt, pasta: Run-time selection of AVX2 build
Build-time selection of AVX2 flags and routines is not practical for
distributions, but limiting AVX2 usage to checksum routines with
specific run-time detection doesn't allow for easy performance gains
from auto-vectorisation of batched packet handling routines.

For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple
wrapper replacing the current executable with the AVX2 build if it's
available, and if AVX2 is supported by the current CPU.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 16:46:28 +01:00
Stefano Brivio
c47d9f7ee0 README: Fix demo div grid layout
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 11:49:15 +01:00
Stefano Brivio
337f55166f demo, ci: Switch to asciinema(1) for terminal recordings
For demos, cool-retro-term(1) looked fancier, but several threads of
that and ffmpeg(1) are just messing up with performance testing.

The CI videos started getting really big as well, and they were
difficult to read.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-22 18:36:24 +01:00
Stefano Brivio
be2a7898e9 test: Add demo for Podman with pasta
...showing setup steps, some peculiarities as --net option, and a
general side-to-side comparison with slirp4netns(1), including
"quick" TCP and UDP throughput and latency benchmarks.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-22 18:34:44 +01:00
Stefano Brivio
39a3531270 README, hooks: Build HTML man page on push, add a link
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
0515adceaa passt, pasta: Namespace-based sandboxing, defer seccomp policy application
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
6e61b4040a test: Add distribution tests for several architectures and kernel versions
The new tests check build and a simple case with pasta sending a
short message in both directions (namespace to init, init to
namespace).

Tests cover a mix of Debian, Fedora, OpenSUSE and Ubuntu combinations
on aarch64, i386, ppc64, ppc64le, s390x, x86_64.

Builds tested starting from approximately glibc 2.19, gcc 4.7, and
actual functionality approximately from 4.4 kernels, glibc 2.25,
gcc 4.8, all the way up to current glibc/gcc/kernel versions.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-28 18:51:50 +01:00
Stefano Brivio
21b1a8445b README: Fix link to IGMP/MLD proxy ticket
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-28 02:05:19 +01:00
Stefano Brivio
2fbec4d300 README: Fix anchor for Performance section
It shouldn't refer to the subsection under "Features".

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-27 16:44:05 +01:00
Stefano Brivio
33b1bdd079 seccomp: Add a number of alternate and per-arch syscalls
Depending on the C library, but not necessarily in all the
functions we use, statx() might be used instead of stat(),
getdents() instead of getdents64(), readlinkat() instead of
readlink(), openat() instead of open().

On aarch64, it's clone() and not fork(), and dup3() instead of
dup2() -- just allow the existing alternative instead of dealing
with per-arch selections.

Since glibc commit 9a7565403758 ("posix: Consolidate fork
implementation"), we need to allow set_robust_list() for
fork()/clone(), even in a single-threaded context.

On some architectures, epoll_pwait() is provided instead of
epoll_wait(), but never both. Same with newfstat() and
fstat(), sigreturn() and rt_sigreturn(), getdents64() and
getdents(), readlink() and readlinkat(), unlink() and
unlinkat(), whereas pipe() might not be available, but
pipe2() always is, exclusively or not.

Seen on Fedora 34: newfstatat() is used on top of fstat().

syslog() is an actual system call on some glibc/arch combinations,
instead of a connect()/send() implementation.

On ppc64 and ppc64le, _llseek(), recv(), send() and getuid()
are used. For ppc64 only: ugetrlimit() for the getrlimit()
implementation, plus sigreturn() and fcntl64().

On s390x, additionally, we need to allow socketcall() (on top
of socket()), and sigreturn() also for passt (not just for
pasta).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 16:30:59 +01:00
Stefano Brivio
2c7431ffcf README: Feature list, links to lists, bugs, chat
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-23 12:28:50 +02:00
Stefano Brivio
a77c5ef93a README, perf_report: Markdown and CSS fixes
Updating md2html on the server needs a few adjustments.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-22 14:52:47 +02:00
Stefano Brivio
4f69efcfba README: .. doesn't actually work for comments in Markdown
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:34:02 +02:00
Stefano Brivio
087b5f4dbb LICENSES: Add license text files, add missing notices, fix SPDX tags
SPDX tags don't replace license files. Some notices were missing and
some tags were not according to the SPDX specification, too.

Now reuse --lint from the REUSE tool (https://reuse.software/) passes.

Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
e871fa9f22 README: Drop domain part in absolute links
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-07 15:14:22 +02:00
Stefano Brivio
a8b767b06d README: Fix pasta anchor in Try it section
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-28 14:45:07 +02:00
Stefano Brivio
ca325e7583 README: Add demo section
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 13:45:17 +02:00
Stefano Brivio
cc8db1c5bc README: pasta mode, CI, performance, updated links, etc.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
964b7e12da README: Source js
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-18 13:26:48 +02:00
Stefano Brivio
9d063569ff README: Mention the -DDEBUG flag
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-10 07:34:24 +02:00
Stefano Brivio
605af213c5 udp: Connection tracking for ephemeral, local ports, and related fixes
As we support UDP forwarding for packets that are sent to local
ports, we actually need some kind of connection tracking for UDP.
While at it, this commit introduces a number of vaguely related fixes
for issues observed while trying this out. In detail:

- implement an explicit, albeit minimalistic, connection tracking
  for UDP, to allow usage of ephemeral ports by the guest and by
  the host at the same time, by binding them dynamically as needed,
  and to allow mapping address changes for packets with a loopback
  address as destination

- set the guest MAC address whenever we receive a packet from tap
  instead of waiting for an ARP request, and set it to broadcast on
  start, otherwise DHCPv6 might not work if all DHCPv6 requests time
  out before the guest starts talking IPv4

- split context IPv6 address into address we assign, global or site
  address seen on tap, and link-local address seen on tap, and make
  sure we use the addresses we've seen as destination (link-local
  choice depends on source address). Similarly, for IPv4, split into
  address we assign and address we observe, and use the address we
  observe as destination

- introduce a clock_gettime() syscall right after epoll_wait() wakes
  up, so that we can remove all the other ones and pass the current
  timestamp to tap and socket handlers -- this is additionally needed
  by UDP to time out bindings to ephemeral ports and mappings between
  loopback address and a local address

- rename sock_l4_add() to sock_l4(), no semantic changes intended

- include <arpa/inet.h> in passt.c before kernel headers so that we
  can use <netinet/in.h> macros to check IPv6 address types, and
  remove a duplicate <linux/ip.h> inclusion

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-29 17:15:26 +02:00
Stefano Brivio
61fa05c7c0 README: Don't let <canvas> steal pointer events
...otherwise some links on the bottom won't be clickable.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-13 22:54:08 +02:00
Stefano Brivio
4aa8e54a30 passt: Introduce a DHCPv6 server
This implementation, similarly to the IPv4 DHCP one, hands out a
single address, which is the same as the upstream address for the
host.

This avoids the need for address translation as long as the client
runs a DHCPv6 client. The NDP "Managed" flag is now set in Router
Advertisements.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-13 22:37:40 +02:00
Stefano Brivio
a673fdba13 README: Add image map for overview
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-03-25 09:03:17 +01:00
Stefano Brivio
e653f9b3ed passt: Add libvirt patch for qemu UNIX socket domain back-end
...and mention it in the README.

While at it, remove useless escaping in the README, and fix
indentation in the syslog message with the qemu command line
example.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-03-21 00:08:42 +01:00
Stefano Brivio
00f3bcea05 passt: Add the README
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-03-18 17:02:54 +01:00