passt

Author	SHA1	Message	Date
Stefano Brivio	bc4ec1a8e9	README: Update Interfaces and Availability sections Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	e80f608710	README: Avoid "here" links They look a bit lame: rephrase sentences to avoid them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	be5bbb9b06	tcp: Rework timers to use timerfd instead of periodic bitmap scan With a lot of concurrent connections, the bitmap scan approach is not really sustainable. Switch to per-connection timerfd timers, set based on events and on two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added to the common epoll list, and implement the existing timeouts. While at it, drop the CONN_ prefix from flag names, otherwise they get quite long, and fix the logic to decide if a connection has a local, possibly unreachable endpoint: we shouldn't go through the rest of tcp_conn_from_tap() if we reset the connection due to a successful bind(2), and we'll get EACCES if the port number is low. Suggested by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	14c4c0253c	README: Make it somewhat readable on mobile devices Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-04 19:23:45 +01:00
Stefano Brivio	216a266a75	hooks, README: gzipped js snippets, webp alternatives for png Upload gzipped js snippets for usage with gzip_static in nginx or equivalent. Convert png drawings to webp for smaller size, use them as alternatives in README. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-02 14:02:03 +01:00
Stefano Brivio	71ab6d9972	README: Don't preload CI recording, show poster from end of run Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-01 22:31:42 +01:00
Stefano Brivio	628c4f0cae	README: s/guest/namespace/ in pasta "Try it" section Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-01 21:43:41 +01:00
Stefano Brivio	06f8e4f960	Makefile, hooks: Static target precondition for pkgs, copy .avx2 builds Convenience packages are anyway built from static builds. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-01 21:41:22 +01:00
Stefano Brivio	213c397492	passt, pasta: Run-time selection of AVX2 build Build-time selection of AVX2 flags and routines is not practical for distributions, but limiting AVX2 usage to checksum routines with specific run-time detection doesn't allow for easy performance gains from auto-vectorisation of batched packet handling routines. For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple wrapper replacing the current executable with the AVX2 build if it's available, and if AVX2 is supported by the current CPU. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-28 16:46:28 +01:00
Stefano Brivio	c47d9f7ee0	README: Fix demo div grid layout Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-23 11:49:15 +01:00
Stefano Brivio	337f55166f	demo, ci: Switch to asciinema(1) for terminal recordings For demos, cool-retro-term(1) looked fancier, but several threads of that and ffmpeg(1) are just messing up with performance testing. The CI videos started getting really big as well, and they were difficult to read. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-22 18:36:24 +01:00
Stefano Brivio	be2a7898e9	test: Add demo for Podman with pasta ...showing setup steps, some peculiarities as --net option, and a general side-to-side comparison with slirp4netns(1), including "quick" TCP and UDP throughput and latency benchmarks. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-22 18:34:44 +01:00
Stefano Brivio	39a3531270	README, hooks: Build HTML man page on push, add a link Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	0515adceaa	passt, pasta: Namespace-based sandboxing, defer seccomp policy application To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	6e61b4040a	test: Add distribution tests for several architectures and kernel versions The new tests check build and a simple case with pasta sending a short message in both directions (namespace to init, init to namespace). Tests cover a mix of Debian, Fedora, OpenSUSE and Ubuntu combinations on aarch64, i386, ppc64, ppc64le, s390x, x86_64. Builds tested starting from approximately glibc 2.19, gcc 4.7, and actual functionality approximately from 4.4 kernels, glibc 2.25, gcc 4.8, all the way up to current glibc/gcc/kernel versions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-28 18:51:50 +01:00
Stefano Brivio	21b1a8445b	README: Fix link to IGMP/MLD proxy ticket Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-28 02:05:19 +01:00
Stefano Brivio	2fbec4d300	README: Fix anchor for Performance section It shouldn't refer to the subsection under "Features". Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-27 16:44:05 +01:00
Stefano Brivio	33b1bdd079	seccomp: Add a number of alternate and per-arch syscalls Depending on the C library, but not necessarily in all the functions we use, statx() might be used instead of stat(), getdents() instead of getdents64(), readlinkat() instead of readlink(), openat() instead of open(). On aarch64, it's clone() and not fork(), and dup3() instead of dup2() -- just allow the existing alternative instead of dealing with per-arch selections. Since glibc commit 9a7565403758 ("posix: Consolidate fork implementation"), we need to allow set_robust_list() for fork()/clone(), even in a single-threaded context. On some architectures, epoll_pwait() is provided instead of epoll_wait(), but never both. Same with newfstat() and fstat(), sigreturn() and rt_sigreturn(), getdents64() and getdents(), readlink() and readlinkat(), unlink() and unlinkat(), whereas pipe() might not be available, but pipe2() always is, exclusively or not. Seen on Fedora 34: newfstatat() is used on top of fstat(). syslog() is an actual system call on some glibc/arch combinations, instead of a connect()/send() implementation. On ppc64 and ppc64le, _llseek(), recv(), send() and getuid() are used. For ppc64 only: ugetrlimit() for the getrlimit() implementation, plus sigreturn() and fcntl64(). On s390x, additionally, we need to allow socketcall() (on top of socket()), and sigreturn() also for passt (not just for pasta). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-26 16:30:59 +01:00
Stefano Brivio	2c7431ffcf	README: Feature list, links to lists, bugs, chat Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-23 12:28:50 +02:00
Stefano Brivio	a77c5ef93a	README, perf_report: Markdown and CSS fixes Updating md2html on the server needs a few adjustments. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-22 14:52:47 +02:00
Stefano Brivio	4f69efcfba	README: .. doesn't actually work for comments in Markdown Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-20 08:34:02 +02:00
Stefano Brivio	087b5f4dbb	LICENSES: Add license text files, add missing notices, fix SPDX tags SPDX tags don't replace license files. Some notices were missing and some tags were not according to the SPDX specification, too. Now reuse --lint from the REUSE tool (https://reuse.software/) passes. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-20 08:29:30 +02:00
Stefano Brivio	e871fa9f22	README: Drop domain part in absolute links Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-07 15:14:22 +02:00
Stefano Brivio	a8b767b06d	README: Fix pasta anchor in Try it section Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-28 14:45:07 +02:00
Stefano Brivio	ca325e7583	README: Add demo section Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-27 13:45:17 +02:00
Stefano Brivio	cc8db1c5bc	README: pasta mode, CI, performance, updated links, etc. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-27 01:28:02 +02:00
Stefano Brivio	964b7e12da	README: Source js Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-18 13:26:48 +02:00
Stefano Brivio	9d063569ff	README: Mention the -DDEBUG flag Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-05-10 07:34:24 +02:00
Stefano Brivio	605af213c5	udp: Connection tracking for ephemeral, local ports, and related fixes As we support UDP forwarding for packets that are sent to local ports, we actually need some kind of connection tracking for UDP. While at it, this commit introduces a number of vaguely related fixes for issues observed while trying this out. In detail: - implement an explicit, albeit minimalistic, connection tracking for UDP, to allow usage of ephemeral ports by the guest and by the host at the same time, by binding them dynamically as needed, and to allow mapping address changes for packets with a loopback address as destination - set the guest MAC address whenever we receive a packet from tap instead of waiting for an ARP request, and set it to broadcast on start, otherwise DHCPv6 might not work if all DHCPv6 requests time out before the guest starts talking IPv4 - split context IPv6 address into address we assign, global or site address seen on tap, and link-local address seen on tap, and make sure we use the addresses we've seen as destination (link-local choice depends on source address). Similarly, for IPv4, split into address we assign and address we observe, and use the address we observe as destination - introduce a clock_gettime() syscall right after epoll_wait() wakes up, so that we can remove all the other ones and pass the current timestamp to tap and socket handlers -- this is additionally needed by UDP to time out bindings to ephemeral ports and mappings between loopback address and a local address - rename sock_l4_add() to sock_l4(), no semantic changes intended - include <arpa/inet.h> in passt.c before kernel headers so that we can use <netinet/in.h> macros to check IPv6 address types, and remove a duplicate <linux/ip.h> inclusion Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-04-29 17:15:26 +02:00
Stefano Brivio	61fa05c7c0	README: Don't let <canvas> steal pointer events ...otherwise some links on the bottom won't be clickable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-04-13 22:54:08 +02:00
Stefano Brivio	4aa8e54a30	passt: Introduce a DHCPv6 server This implementation, similarly to the IPv4 DHCP one, hands out a single address, which is the same as the upstream address for the host. This avoids the need for address translation as long as the client runs a DHCPv6 client. The NDP "Managed" flag is now set in Router Advertisements. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-04-13 22:37:40 +02:00
Stefano Brivio	a673fdba13	README: Add image map for overview Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-25 09:03:17 +01:00
Stefano Brivio	e653f9b3ed	passt: Add libvirt patch for qemu UNIX socket domain back-end ...and mention it in the README. While at it, remove useless escaping in the README, and fix indentation in the syslog message with the qemu command line example. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-21 00:08:42 +01:00
Stefano Brivio	00f3bcea05	passt: Add the README Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-18 17:02:54 +01:00

34 commits