passt

Author	SHA1	Message	Date
Laurent Vivier	0938100596	udp: rename udp_sock_handler() to udp_buf_sock_handler() We are going to introduce a variant of the function to use vhost-user buffers rather than passt internal buffers. Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-03-12 10:54:19 +01:00
Laurent Vivier	c38f260820	tcp: rename functions that manage buffers To separate these functions from the ones specific to TCP management, we are going to move it to a new file, but before that update their names to reflect their role. Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-03-11 19:47:54 +01:00
Paul Holzinger	15001b39ef	conf: set the log level much earlier --quiet is supposed to silence the "No routable interface" message but it does not work because the log level was set long after conf_ip4/6() was called which means it uses the default level which logs everything. To address this move the log level logic directly after the option parsing in conf(). Signed-off-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-27 14:08:33 +01:00
Paul Holzinger	b08716551a	passt: make --quiet set the log level to warning Based on the man page and help output --quiet hides informational messages. This means that warnings should still be logged. This was discussed in[1]. [1] https://archives.passt.top/passt-dev/20240216114304.7234a83f@elisabeth/T/#m42652824644973674e84baf9e0bf1d0e88104450 Signed-off-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-27 14:08:00 +01:00
Stefano Brivio	8f3f8e190c	pasta: Add fallback timer mechanism to check if namespace is gone We don't know how frequently this happens, but hitting fs.inotify.max_user_watches or similar sysctl limits is definitely not out of question, and Paul mentioned that, for example, Podman's CI environments hit similar issues in the past. Introduce a fallback mechanism based on a timer file descriptor: we grab the directory handle at startup, and we can then use openat(), triggered periodically, to check if the (network) namespace directory still exists. If openat() fails at some point, exit. Link: https://github.com/containers/podman/pull/21563#issuecomment-1943505707 Reported-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-16 08:47:14 +01:00
Stefano Brivio	7ee4e17267	log: setlogmask(0) can actually result in a system call, don't use it Before commit `32d07f5e59` ("passt, pasta: Completely avoid dynamic memory allocation"), we didn't store the current log mask in a variable, and we fetched it using setlogmask(0) wherever needed. But after that commit, we can use our log_mask copy instead. And we should: with recent glibc versions, setlogmask(0) actually results in a system call, which causes a substantial overhead with high transfer rates: we use setlogmask(0) even to decide we don't want to print debug messages. Now that we rely on log_mask in early stages, before setlogmask() is called, we need to initialise that variable to the special LOG_EMERG mask value right away: define LOG_EARLY to make this clearer, and, while at it, group conditions in vlogmsg() into something more terse. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-02-14 01:10:11 +01:00
David Gibson	a325121759	icmp: Consolidate icmp_sock_handler() with icmpv6_sock_handler() Currently we have separate handlers for ICMP and ICMPv6 ping replies. Although there are a number of points of difference, with some creative refactoring we can combine these together sensibly. Although it doesn't save a vast amount of code, it does make it clearer that we're performing basically the same steps for each case. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-01-22 23:36:51 +01:00
David Gibson	8981a720aa	flow: Avoid moving flow entries to compact table Currently we always keep the flow table maximally compact: that is all the active entries are contiguous at the start of the table. Doing this sometimes requires moving an entry when one is freed. That's kind of fiddly, and potentially expensive: it requires updating the hash table for the new location, and depending on flow type, it may require EPOLL_CTL_MOD, system calls to update epoll tags with the new location too. Implement a new way of managing the flow table that doesn't ever move entries. It attempts to maintain some compactness by always using the first free slot for a new connection, and mitigates the effect of non compactness by cheaply skipping over contiguous blocks of free entries. See the "theory of operation" comment in flow.c for details. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>b [sbrivio: additional ASSERT(flow_first_free <= FLOW_MAX - 2) to avoid Coverity Scan false positive] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-01-22 23:35:37 +01:00
David Gibson	02e092b4fe	tcp, tcp_splice: Avoid double layered dispatch for connected TCP sockets Currently connected TCP sockets have the same epoll type, whether they're for a "tap" connection or a spliced connection. This means that tcp_sock_handler() has to do a secondary check on the type of the connection to call the right function. We can avoid this by adding a new epoll type and dispatching directly to the right thing. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-01-22 23:35:25 +01:00
David Gibson	70121ca1ec	epoll: Better handling of number of epoll types As we already did for flow types, use an "EPOLL_NUM_TYPES" isntead of EPOLL_TYPE_MAX, which is a little bit safer and clearer. Add a static assert on the size of the matching names array. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-01-22 23:35:22 +01:00
David Gibson	36dfa8b8fb	flow, tcp: Add handling for per-flow timers tcp_timer() scans the flow table so that it can run tcp_splice_timer() on each spliced connection. More generally, other flow types might want to run similar timers in future. We could add a flow_timer() analagous to tcp_timer(), udp_timer() etc. However, this would need to scan the flow table, which we would have just done in flow_defer_handler(). We'd prefer to just scan the flow table once, dispatching both per-flow deferred events and per-flow timed events if necessary. So, extend flow_defer_handler() to do this. For now we use the same timer interval for all flow types (1s). We can make that more flexible in future if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-01-22 23:35:19 +01:00
David Gibson	b43e4483ed	flow, tcp: Add flow-centric dispatch for deferred flow handling tcp_defer_handler(), amongst other things, scans the flow table and does some processing for each TCP connection. When we add other protocols to the flow table, they're likely to want some similar scanning. It makes more sense for cache friendliness to perform a single scan of the flow table and dispatch to the protocol specific handlers, rather than having each protocol separately scan the table. To that end, add a new flow_defer_handler() handling all flow-linked deferred operations. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-01-22 23:35:17 +01:00
David Gibson	e21b6d69b1	tcp: "TCP" hash secret doesn't need to be TCP specific The TCP state structure includes a 128-bit hash_secret which we use for SipHash calculations to mitigate attacks on the TCP hash table and initial sequence number. We have plans to use SipHash in places that aren't TCP related, and there's no particular reason they'd need their own secret. So move the hash_secret to the general context structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-12-04 09:51:32 +01:00
David Gibson	955dd3251c	tcp, udp: Don't pre-fill IPv4 destination address in headers Because packets sent on the tap interface will always be going to the guest/namespace, we more-or-less know what address they'll be going to. So we pre-fill this destination address in our header buffers for IPv4. We can't do the same for IPv6 because we could need either the global or link-local address for the guest. In future we're going to want more flexibility for the destination address, so this pre-filling will get in the way. Change the flow so we always fill in the IPv4 destination address for each packet, rather than prefilling it from proto_update_l2_buf(). In fact for TCP we already redundantly filled the destination for each packet anyway. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-22 12:15:33 +02:00
David Gibson	ae5f6c8e1b	epoll: Use different epoll types for passt and pasta tap fds Currently we have a single epoll event type for the "tap" fd, which could be either a handle on a /dev/net/tun device (pasta) or a connected Unix socket (passt). However for the two modes we call different handler functions. Simplify this a little by using different epoll types and dispatching directly to the correct handler function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:20 +02:00
David Gibson	eda4f1997e	epoll: Split listening Unix domain socket into its own type tap_handler() actually handles events on three different types of object: the /dev/tap character device (pasta), a connected Unix domain socket (passt) or a listening Unix domain socket (passt). The last, in particular, really has no handling in common with the others, so split it into its own epoll type and directly dispatch to the relevant handler from the top level. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:17 +02:00
David Gibson	485b5fb8f9	epoll: Split handling of listening TCP sockets into their own handler tcp_sock_handler() handles both listening TCP sockets, and connected TCP sockets, but what it needs to do in those cases has essentially nothing in common. Therefore, give listening sockets their own epoll_type value and dispatch directly to their own handler from the top level. Furthermore, the two handlers need essentially entirely different information from the reference: we re-(ab)used the index field in the tcp_epoll_ref to indicate the port for the listening socket, but that's not the same meaning. So, switch listening sockets to their own reference type which we can lay out as we please. That lets us remove the listen and outbound fields from the normal (connected) tcp_epoll_ref, reducing it to just the connection table index. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:15 +02:00
David Gibson	e6f81e5578	epoll: Split handling of TCP timerfds into its own handler function tcp_sock_handler() actually handles several different types of fd events. This includes timerfds that aren't sockets at all. The handling of these has essentially nothing in common with the other cases. So, give the TCP timers there own epoll_type value and dispatch directly to their handler. This also means we can remove the timer field from tcp_epoll_ref, the information it encoded is now implicit in the epoll_type value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:13 +02:00
David Gibson	8271a2ed57	epoll: Tiny cleanup to udp_sock_handler() Move the test for c->no_udp into the function itself, rather than in the dispatching switch statement to better localize the UDP specific logic, and make for greated consistency with other handler functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:10 +02:00
David Gibson	05f606ab0b	epoll: Split handling of ICMP and ICMPv6 sockets We have different epoll type values for ICMP and ICMPv6 sockets, but they both call the same handler function, icmp_sock_handler(). However that function does essentially nothing in common for the two cases. So, split it into icmp_sock_handler() and icmpv6_sock_handler() and dispatch them separately from the top level. While we're there remove some parameters that the function was never using anyway. Also move the test for c->no_icmp into the functions, so that all the logic specific to ICMP is within the handler, rather than in the top level dispatch code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:07 +02:00
David Gibson	d850caab66	epoll: Fold sock_handler into general switch on epoll event fd When we process events from epoll_wait(), we check for a number of special cases before calling sock_handler() which then dispatches based on the protocol type of the socket in the event. Fold these cases together into a single switch on the fd type recorded in the epoll data field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:29:56 +02:00
David Gibson	6a6735ece4	epoll: Always use epoll_ref for the epoll data variable epoll_ref contains a variety of information useful when handling epoll events on our sockets, and we place it in the epoll_event data field returned by epoll. However, for a few other things we use the 'fd' field in the standard union of types for that data field. This actually introduces a bug which is vanishingly unlikely to hit in practice, but very nasty if it ever did: theoretically if we had a very large file descriptor number for fd_tap or fd_tap_listen it could overflow into bits that overlap with the 'proto' field in epoll_ref. With some very bad luck this could mean that we mistakenly think an event on a regular socket is an event on fd_tap or fd_tap_listen. More practically, using different (but overlapping) fields of the epoll_data means we can't unify dispatch for the various different objects in the epoll. Therefore use the same epoll_ref as the data for the tap fds and the netns quit fd, adding new fd type values to describe them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:29:53 +02:00
David Gibson	3401644453	epoll: Generalize epoll_ref to cover things other than sockets The epoll_ref type includes fields for the IP protocol of a socket, and the socket fd. However, we already have a few things in the epoll which aren't protocol sockets, and we may have more in future. Rename these fields to an abstract "fd type" and file descriptor for more generality. Similarly, rather than using existing IP protocol numbers for the type, introduce our own number space. For now these just correspond to the supported protocols, but we'll expand on that in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:29:51 +02:00
David Gibson	8218d99013	Use C11 anonymous members to make poll refs less verbose to use union epoll_ref has a deeply nested set of structs and unions to let us subdivide it into the various different fields we want. This means that referencing elements can involve an awkward long string of intermediate fields. Using C11 anonymous structs and unions lets us do this less clumsily. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:17:57 +02:00
Stefano Brivio	940bd3eff9	passt: Fix error check for signal(), improve error messages Valtteri reports that if SIGPIPE already has a disposition set by the parent process, such as systemd with the default setting of IgnoreSIGPIPE=yes, signal() will return the previous value, not zero, and this is not an error: check for SIG_ERR instead. While at it, split messages for failures of sigaction() and signal(), and report the actual error. Reported-by: Valtteri Vuorikoski <vuori@notcom.org> Fixes: `8534be076c` ("Catch failures when installing signal handlers") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-04-13 19:32:13 +02:00
Stefano Brivio	ca2749e1bd	passt: Relicense to GPL 2.0, or any later version In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-04-06 18:00:33 +02:00
Chris Kuhn	89e38f5540	treewide: Fix header includes to build with musl Roughly inspired from a patch by Chris Kuhn: fix up includes so that we can build against musl: glibc is more lenient as headers generally include a larger amount of other headers. Compared to the original patch, I only included what was needed directly in C files, instead of adding blanket includes in local header files. It's a bit more involved, but more consistent with the current (not ideal) situation. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-03-09 03:44:21 +01:00
Chris Kuhn	5c58feab7b	conf, passt: Rename stderr to force_stderr While building against musl, gcc informs us that 'stderr' is a protected keyword. This probably comes from a #define stderr (stderr) in musl's stdio.h, to avoid a clash with extern FILE *const stderr, but I didn't really track it down. Just rename it to force_stderr, it makes more sense. [sbrivio: Added commit message] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-03-09 03:44:21 +01:00
Laine Stump	c9af6f92db	convert all remaining err() followed by exit() to die() This actually leaves us with 0 uses of err(), but someone could want to use it in the future, so we may as well leave it around. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:27 +01:00
Laine Stump	193385bd2f	log to stderr until process is daemonized, even if a log file is set Once a log file (specified on the commandline) is opened, the logging functions will stop sending error logs to stderr, which is annoying if passt has been started by another process that wants to collect error messages from stderr so it can report why passt failed to start. It would be much nicer if passt continued sending all log messages to stderr until it daemonizes itself (at which point the process that started passt can assume that it was started successfully). The system log mask is set to LOG_EMERG when the process starts, and we're already using that to do "special" logging during the period from process start until the log level requested on the commandline is processed (setting the log mask to something else). This period almost matches with "the time before the process is daemonized"; if we just delay setting the log mask a tiny bit, then it will match exactly, and we can use it to determine if we need to send log messages to stderr even when a log file has been specified and opened. This patch delays the setting of the log mask until immediately before the call to __daemon(). It also modifies logfn() slightly, so that it will log to stderr any time log mask is LOG_EMERG, even if a log file has been opened. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:31:42 +01:00
Stefano Brivio	d8921dafe5	pasta: Wait for tap to be set up before spawning command Adapted from a patch by Paul Holzinger: when pasta spawns a command, operating without a pre-existing user and network namespace, it needs to wait for the tap device to be configured and its handler ready, before the command is actually executed. Otherwise, something like: pasta --config-net nslookup passt.top usually fails as the nslookup command is issued before the network interface is ready. We can't adopt a simpler approach based on SIGSTOP and SIGCONT here: the child runs in a separate PID namespace, so it can't send SIGSTOP to itself as the kernel sees the child as init process and blocks the delivery of the signal. We could send SIGSTOP from the parent, but this wouldn't avoid the possible condition where the child isn't ready to wait for it when the parent sends it, also raised by Paul -- and SIGSTOP can't be blocked, so it can never be pending. Use SIGUSR1 instead: mask it before clone(), so that the child starts with it blocked, and can safely wait for it. Once the parent is ready, it sends SIGUSR1 to the child. If SIGUSR1 is sent before the child is waiting for it, the kernel will queue it for us, because it's blocked. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `1392bc5ca0` ("Allow pasta to take a command to execute") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-12 12:22:59 +01:00
Richard W.M. Jones	6b4e68383c	passt, tap: Add --fd option This passes a fully connected stream socket to passt. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> [sbrivio: reuse fd_tap instead of adding a new descriptor, imply --one-off on --fd, add to optstring and usage()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-25 01:40:47 +01:00
Stefano Brivio	6533a4a07b	passt: Move __setlogmask() calls before output unrelated to configuration ...so that we avoid printing some lines twice because log-level is still set to LOG_EMERG, as if logging configuration didn't happen yet. While at it, note that logging to stderr doesn't really depend on whether debug mode is enabled or not. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-10 11:17:50 +01:00
David Gibson	7c7b68dbe0	Use typing to reduce chances of IPv4 endianness errors We recently corrected some errors handling the endianness of IPv4 addresses. These are very easy errors to make since although we mostly store them in network endianness, we sometimes need to manipulate them in host endianness. To reduce the chances of making such mistakes again, change to always using a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network endian addresses. This makes it harder to accidentally do arithmetic or comparisons on such addresses as if they were host endian. We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to directly work with struct in_addr values. This has the additional benefit of making the IPv4 and IPv6 paths more visually similar. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:24 +01:00
Stefano Brivio	f212044940	icmp: Don't discard first reply sequence for a given echo ID In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any reply we send: we're on the same host as the target, after all. We discard them by comparing the last sequence we sent with the sequence we receive. However, on the first reply for a given identifier, the sequence might be zero, depending on the implementation of ping(8): we need another value to indicate we haven't sent any sequence number, yet. Use -1 as initialiser in the echo identifier map. This is visible with Busybox's ping, and was reported by Paul on the integration at https://github.com/containers/podman/pull/16141, with: $ podman run --net=pasta alpine ping -c 2 192.168.188.1 ...where only the second reply would be routed back. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `33482d5bf2` ("passt: Add PASTA mode, major rework") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-27 00:18:21 +02:00
David Gibson	096e48669b	isolation: Clarify various self-isolation steps We have a number of steps of self-isolation scattered across our code. Improve function names and add comments to make it clearer what the self isolation model is, what the steps do, and why they happen at the points they happen. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
Stefano Brivio	01efc71ddd	log, conf: Add support for logging to file In some environments, such as KubeVirt pods, we might not have a system logger available. We could choose to run in foreground, but this takes away the convenient synchronisation mechanism derived from forking to background when interfaces are ready. Add optional logging to file with -l/--log-file and --log-size. Unfortunately, this means we need to duplicate features that are more appropriately implemented by a system logger, such as rotation. Keep that reasonably simple, by using fallocate() with range collapsing where supported (Linux kernel >= 3.15, extent-based ext4 and XFS) and falling back to an unsophisticated block-by-block moving of entries toward the beginning of the file once we reach the (mandatory) size limit. While at it, clarify the role of LOG_EMERG in passt.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:28 +02:00
Stefano Brivio	da152331cf	Move logging functions to a new file, log.c Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:25 +02:00
David Gibson	8f6be016ae	cppcheck: Suppress same-value-in-ternary branches warning TIMER_INTERVAL is the minimum of two separately defined intervals which happen to have the same value at present. This results in an expression which has the same value in both branches of a ternary operator, which cppcheck warngs about. This is logically sound in this case, so suppress the error (we appear to already have a similar suppression for clang-tidy). Also add an unmatchedSuppression suppression, since only some cppcheck versions complain about this instance. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:22:30 +02:00
David Gibson	8534be076c	Catch failures when installing signal handlers Stop ignoring the return codes from sigaction() and signal(). Unlikely to happen in practice, but if it ever did it could lead to really hard to debug problems. So, take clang-tidy's advice and check for errors here. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:21:53 +02:00
David Gibson	eed17a47fe	Handle userns isolation and dropping root at the same time passt/pasta can interact with user namespaces in a number of ways: 1) With --netns-only we'll remain in our original user namespace 2) With --userns or a PID option to pasta we'll join either the given user namespace or that of the PID 3) When pasta spawns a shell or command we'll start a new user namespace for the command and then join it 4) With passt we'll create a new user namespace when we sandbox() ourself However (3) and (4) turn out to have essentially the same effect. In both cases we create one new user namespace. The spawned command starts there, and passt/pasta itself will live there from sandbox() onwards. Because of this, we can simplify user namespace handling by moving the userns handling earlier, to the same point we drop root in the original namespace. Extend the drop_user() function to isolate_user() which does both. After switching UID and GID in the original userns, isolate_user() will either join or create the userns we require. When we spawn a command with pasta_start_ns()/pasta_setup_ns() we no longer need to create a userns, because we're already made one. sandbox() likewise no longer needs to create (or join) an userns because we're already in the one we need. We no longer need c->pasta_userns_fd, since the fd is only used locally in isolate_user(). Likewise we can replace c->netns_only with a local in conf(), since it's not used outside there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	d72a1e7bb9	Move self-isolation code into a separate file passt/pasta contains a number of routines designed to isolate passt from the rest of the system for security. These are spread through util.c and passt.c. Move them together into a new isolation.c file. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	60ffc5b6cb	Don't unnecessarily avoid CLOEXEC flags There are several places in the passt code where we have lint overrides because we're not adding CLOEXEC flags to open or other operations. Comments suggest this is because it's before we fork() into the background but we'll need those file descriptors after we're in the background. However, as the name suggests CLOEXEC closes on exec(), not on fork(). The only place we exec() is either super early invoke the avx2 version of the binary, or when we start a shell in pasta mode, which certainly doesn't require the fds in question. Add the CLOEXEC flag in those places, and remove the lint overrides. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-24 18:01:48 +02:00
David Gibson	16f5586bb8	Make substructures for IPv4 and IPv6 specific context information The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-07-30 22:14:07 +02:00
David Gibson	5e12d23acb	Separate IPv4 and IPv6 configuration After recent changes, conf_ip() now has essentially entirely disjoint paths for IPv4 and IPv6 configuration. So, it's cleaner to split them out into different functions conf_ip4() and conf_ip6(). Splitting these out also lets us make the interface a bit nicer, having them return success or failure directly, rather than manipulating c->v4 and c->v6 to indicate success/failure of the two versions. Since these functions may also initialize the interface index for each protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since a 0 interface index is never valid). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Whitespace fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-30 22:12:50 +02:00
Stefano Brivio	3ec02c0975	passt: Truncate PID file on open() Otherwise, if the current PID has fewer digits than a previously written one, the content will be wrong. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-22 19:42:18 +02:00
Stefano Brivio	1d223e4b4c	passt: Allow exit_group() system call in seccomp profiles We handle SIGQUIT and SIGTERM calling exit(), which is usually implemented with the exit_group() system call. If we don't allow exit_group(), we'll get a SIGSYS while handling SIGQUIT and SIGTERM, which means a misleading non-zero exit code. Reported-by: Wenli Quan <wquan@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2101990 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-14 01:36:05 +02:00
Stefano Brivio	17689cc9bf	arch, passt: Use executable link to form AVX2 binary path ...instead of argv[0], which might or might not contain a valid path to the executable itself. Instead of mangling argv[0], use the same link to find out if we're already running the AVX2 build where supported. Alternatively, we could use execvpe(), but that might result in running a different installed version, in case e.g. the set of binaries is present in both /usr/bin and /usr/local/bin, with both being in $PATH. Reported-by: Wenli Quan <wquan@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2101310 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-14 01:36:05 +02:00
Stefano Brivio	a951e0b9ef	conf: Add --runas option, changing to given UID and GID if started as root On some systems, user and group "nobody" might not be available. The new --runas option allows to override the default "nobody" choice if started as root. Now that we allow this, drop the initgroups() call that was used to add any additional groups for the given user, as that might now grant unnecessarily broad permissions. For instance, several distributions have a "kvm" group to allow regular user access to /dev/kvm, and we don't need that in passt or pasta. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-05-19 16:27:20 +02:00
Stefano Brivio	3c6ae62510	conf, tcp, udp: Allow address specification for forwarded ports This feature is available in slirp4netns but was missing in passt and pasta. Given that we don't do dynamic memory allocation, we need to bind sockets while parsing port configuration. This means we need to process all other options first, as they might affect addressing and IP version support. It also implies a minor rework of how TCP and UDP implementations bind sockets. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-05-01 07:19:05 +02:00

1 2 3

138 commits