passt

Author	SHA1	Message	Date
Stefano Brivio	099ace64ce	treewide: Address cert-err33-c clang-tidy warnings for clock and timer functions For clock_gettime(), we shouldn't ignore errors if they happen at initialisation phase, because something is seriously wrong and it's not helpful if we proceed as if nothing happened. As we're up and running, though, it's probably better to report the error and use a stale value than to terminate altogether. Make sure we use a zero value if we don't have a stale one somewhere. For timerfd_gettime() and timerfd_settime() failures, just report an error, there isn't much else we can do. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-30 12:37:31 +01:00
Stefano Brivio	59fe34ee36	treewide: Suppress clang-tidy warning if we already use O_CLOEXEC In pcap_init(), we should always open the packet capture file with O_CLOEXEC, even if we're not running in foreground: O_CLOEXEC means close-on-exec, not close-on-fork. In logfile_init() and pidfile_open(), the fact that we pass a third 'mode' argument to open() seems to confuse the android-cloexec-open checker in LLVM versions from 16 to 19 (at least). The checker is suggesting to add O_CLOEXEC to 'mode', and not in 'flags', where we already have it. Add a suppression for clang-tidy and a comment, and avoid repeating those three times by adding a new helper, output_file_open(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-30 12:37:31 +01:00
Laurent Vivier	fd8334b25d	pcap: Add an offset argument in pcap_iov() The offset is passed directly to pcap_frame() and allows any headers that are not part of the frame to capture to be skipped. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-04 14:51:02 +02:00
David Gibson	bfc294b90d	util: Add helper to write() all of a buffer write(2) might not write all the data it is given. Add a write_all_buf() helper to keep calling it until all the given data is written, or we get an error. Currently we use write_remainder() to do this operation in pcap_frame(). That's a little awkward since it requires constructing an iovec, and future changes we want to make to write_remainder() will be easier in terms of this single buffer helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-18 17:14:59 +02:00
Stefano Brivio	dba7f0f5ce	treewide: Replace strerror() calls Now that we have logging functions embedding perror() functionality, we can make _some_ calls more terse by using them. In many places, the strerror() calls are still more convenient because, for example, they are used in flow debugging functions, or because the return code variable of interest is not 'errno'. While at it, convert a few error messages from a scant perror style to proper failure descriptions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-06-21 15:32:44 +02:00
David Gibson	5566386f5f	treewide: Standardise variable names for various packet lengths At various points we need to track the lengths of a packet including or excluding various different sets of headers. We don't always use the same variable names for doing so. Worse in some places we use the same name for different things: e.g. tcp_fill_headers[46]() use ip_len for the length including the IP headers, but then tcp_send_flag() which calls it uses it to mean the IP payload length only. To improve clarity, standardise on these names: dlen: L4 protocol payload length ("data length") l4len: plen + length of L4 protocol header l3len: l4len + length of IPv4/IPv6 header l2len: l3len + length of L2 (ethernet) header Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:23 +02:00
Stefano Brivio	3fe9878db7	pcap: Use clock_gettime() instead of gettimeofday() POSIX.1-2008 declared gettimeofday() as obsolete, but I'm a dinosaur. Usually, C libraries translate that to the clock_gettime() system call anyway, but this doesn't happen in Jon's environment, and, there, seccomp happily kills pasta(1) when started with --pcap, because we didn't add gettimeofday() to our seccomp profiles. Use clock_gettime() instead. Reported-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-14 08:18:36 +01:00
Laurent Vivier	94502fa15e	pcap: add pcap_iov() Introduce a new function pcap_iov() to capture packet desribed by an IO vector. Update pcap_frame() to manage iovcnt > 1. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-2-lvivier@redhat.com> [dwg: Fixed trivial cppcheck regressions] Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-06 08:03:30 +01:00
David Gibson	dda7945ca9	pcap: Handle short writes in pcap_frame() Currently pcap_frame() assumes that if write() doesn't return an error, it has written everything we want. That's not necessarily true, because it could return a short write. That's not likely to happen on a regular file, but there's not a lot of reason not to be robust here; it's conceivable we might want to direct the pcap fd at a named pipe or similar. So, make pcap_frame() handle short frames by using the write_remainder() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Formatting fix, and avoid gcc warning in pcap_frame()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 06:35:01 +01:00
David Gibson	24410b37a4	pcap: Update pcap_frame() to take an iovec and offset Update the low-level helper pcap_frame() to take a struct iovec and offset within it, rather than an explicit pointer and length for the frame. This moves the handling of an offset (to skip vnet_len) from pcap_multiple() to pcap_frame(). This doesn't accomplish a great deal immediately, but will make subsequent changes easier. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 06:24:10 +01:00
Stefano Brivio	06559048e7	treewide: Use 'z' length modifier for size_t/ssize_t conversions Types size_t and ssize_t are not necessarily long, it depends on the architecture. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-12-02 03:54:42 +01:00
Stefano Brivio	ca2749e1bd	passt: Relicense to GPL 2.0, or any later version In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-04-06 18:00:33 +02:00
David Gibson	54502cca7f	udp: Use tap_send_frames() To send frames on the tap interface, the UDP uses a fairly complicated two level batching. First multiple frames are gathered into a single "message" for the qemu stream socket, then multiple messages are send with sendmmsg(). We now have tap_send_frames() which already deals with sending a number of frames, including batching and handling partial sends. Use that to considerably simplify things. This does make a couple of behavioural changes: * We used to split messages to keep them under 32kiB (except when a single frame was longer than that). The comments claim this is needed to stop qemu from closing the connection, but we don't have any equivalent logic for TCP. I wasn't able to reproduce the problem with this series, although it was apparently easy to reproduce earlier. My suspicion is that there was never an inherent need to keep messages small, however with larger messages (and default kernel buffer sizes) the chances of needing more than one resend for partial send()s is greatly increased. We used not to correctly handle that case of multiple resends, but now we do. * Previously when we got a partial send on UDP, we would resend the remainder of the entire "message", including multiple frames. The common code now only resends the remainder of a single frame, simply dropping any frames which weren't even partially sent. This is what TCP always did and is probably a better idea for UDP too. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-01-23 18:55:04 +01:00
David Gibson	d3089eb0ea	pcap: Replace pcapm() with pcap_multiple() pcapm() captures multiple frames from a msghdr, however the only thing it cares about in the msghdr is the list of buffers, where it assumes there is one frame to capture per buffer. That's what we want for its single caller but it's not the only obvious choice here (one frame per msghdr would arguably make more sense in isolation). In addition pcapm() has logic that only makes sense in the context of the passt specific path its called from: it skips the first 4 bytes of each buffer, because those have the qemu vnet_len rather than the frame proper. Make this clearer by replacing pcapm() with pcap_multiple() which more explicitly takes one struct iovec per frame, and parameterizes how much of each buffer to skip (i.e. the offset of the frame within the buffer). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-01-23 18:54:27 +01:00
David Gibson	cb3c1ce307	pcap: Introduce pcap_frame() helper pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing to the capture file. The purpose of pcapm() and pcapmm() not calling pcap() seems to be to avoid repeatedly calling gettimeofday() and to avoid printing errors for every packet in a batch if there's a problem. We can accomplish that while still sharing code by adding a new helper which takes the packet timestamp as a parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-01-23 18:48:51 +01:00
Stefano Brivio	da152331cf	Move logging functions to a new file, log.c Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:25 +02:00
David Gibson	bf95322fc1	conf: Make the argument to --pcap option mandatory The --pcap or -p option can be used with or without an argument. If given, the argument gives the name of the file to save a packet trace to. If omitted, we generate a default name in /tmp. Generating the default name isn't particularly useful though, since making a suitable name can easily be done by the caller. Remove this feature. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:17:57 +02:00
Stefano Brivio	dbd0a7035c	treewide: Invalid type in argument to printf format specifier, CWE-686 Harmless except for two bad debugging prints. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-04-05 18:47:04 +02:00
Stefano Brivio	62c3edd957	treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	48582bf47f	treewide: Mark constant references as const Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	5c0426981e	pcap: Fix mistake in printed string Packets are saved to a file, not at it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-25 13:21:13 +01:00
Stefano Brivio	0515adceaa	passt, pasta: Namespace-based sandboxing, defer seccomp policy application To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	b93c2c1713	passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-26 07:57:09 +01:00
Stefano Brivio	000ae818d4	pcap: Fix failure check on write() in pcapm() Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 17:36:05 +02:00
Stefano Brivio	627e18fa8a	passt: Add cppcheck target, test, and address resulting warnings ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 09:41:13 +02:00
Stefano Brivio	dd942eaa48	passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 04:26:08 +02:00
Stefano Brivio	1a563a0cbd	passt: Address gcc 11 warnings A mix of unchecked return values, a missing permission mask for open(2) with O_CREAT, and some false positives from -Wstringop-overflow and -Wmaybe-uninitialized. Reported-by: Martin Hauke <mardnh@gmx.de> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-20 08:29:30 +02:00
Giuseppe Scrivano	9a175cc2ce	pasta: Allow specifying paths and names of namespaces Based on a patch from Giuseppe Scrivano, this adds the ability to: - specify paths and names of target namespaces to join, instead of a PID, also for user namespaces, with --userns - request to join or create a network namespace only, without entering or creating a user namespace, with --netns-only - specify the base directory for netns mountpoints, with --nsrun-dir Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [sbrivio: reworked logic to actually join the given namespaces when they're not created, implemented --netns-only and --nsrun-dir, updated pasta demo script and man page] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-07 04:05:15 +02:00
Stefano Brivio	a491c8ee3c	pcap: Drop O_DSYNC from pcap file descriptor passt is stable enough, and dropping O_DSYNC makes reduces the impact of capturing packets on timing, while running tests. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-27 01:28:02 +02:00
Stefano Brivio	11cc43feec	pcap: Don't make pcap files world-readable Even if it's just a debugging feature, it's not nice to leak packets to everybody around. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-27 01:28:02 +02:00
Stefano Brivio	e4b94da0af	pcap: Don't reinitialise packet capture if we already have one If the guest disconnects, and a given name (without timestamp) for the pcap file is passed, we would otherwise lose the packets captured until that point. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-27 01:28:02 +02:00
Stefano Brivio	1e49d194d0	passt, pasta: Introduce command-line options and port re-mapping Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-09-01 17:00:27 +02:00
Stefano Brivio	64a0ba3b27	udp: Introduce recvmmsg()/sendmmsg(), zero-copy path from socket Packets are received directly onto pre-cooked, static buffers for IPv4 (with partial checksum pre-calculation) and IPv6 frames, with pre-filled Ethernet addresses and, partially, IP headers, and sent out from the same buffers with sendmmsg(), for both passt and pasta (non-local traffic only) modes. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-07-21 12:01:04 +02:00
Stefano Brivio	33482d5bf2	passt: Add PASTA mode, major rework PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host connectivity to an otherwise disconnected, unprivileged network and user namespace, similarly to slirp4netns. Given that the implementation is largely overlapping with PASST, no separate binary is built: 'pasta' (and 'passt4netns' for clarity) both link to 'passt', and the mode of operation is selected depending on how the binary is invoked. Usage example: $ unshare -rUn # echo $$ 1871759 $ ./pasta 1871759 # From another terminal # udhcpc -i pasta0 2>/dev/null # ping -c1 pasta.pizza PING pasta.pizza (64.190.62.111) 56(84) bytes of data. 64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms --- pasta.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms # ping -c1 spaghetti.pizza PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes 64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms --- spaghetti.pizza ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms This entails a major rework, especially with regard to the storage of tracked connections and to the semantics of epoll(7) references. Indexing TCP and UDP bindings merely by socket proved to be inflexible and unsuitable to handle different connection flows: pasta also provides Layer-2 to Layer-2 socket mapping between init and a separate namespace for local connections, using a pair of splice() system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local bindings. For instance, building on the previous example: # ip link set dev lo up # iperf3 -s $ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 \| tail -n4 [SUM] 0.00-10.00 sec 52.3 GBytes 44.9 Gbits/sec 283 sender [SUM] 0.00-10.43 sec 52.3 GBytes 43.1 Gbits/sec receiver iperf Done. epoll(7) references now include a generic part in order to demultiplex data to the relevant protocol handler, using 24 bits for the socket number, and an opaque portion reserved for usage by the single protocol handlers, in order to track sockets back to corresponding connections and bindings. A number of fixes pertaining to TCP state machine and congestion window handling are also included here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-07-17 11:04:22 +02:00
Stefano Brivio	19d254bbbb	passt: Add support for multiple instances in different network namespaces ...sharing the same filesystem. Instead of a fixed path for the UNIX domain socket, passt now uses a path with a counter, probing for existing instances, and picking the first free one. The demo script is updated accordingly -- it can now be started several times to create multiple namespaces with an instance of passt each, with addressing reflecting separate subnets, and NDP proxying between them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-05-21 11:14:51 +02:00
Stefano Brivio	17337a736f	passt: Introduce packet capture implementation With -DDEBUG, passt now saves guest-side traffic captures in pcap format at /tmp/passt_<ISO8601 timestamp>.pcap. The timestamp refers to time and date of start-up. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-05-21 11:14:48 +02:00

36 commits