passt

Author	SHA1	Message	Date
Stefano Brivio	92a22fef93	treewide: Replace perror() calls with calls to logging functions perror() prints directly to standard error, but in many cases standard error might be already closed, or we might want to skip logging, based on configuration. Our logging functions provide all that. While at it, make errors more descriptive, replacing some of the existing basic perror-style messages. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-06-21 15:32:43 +02:00
David Gibson	d949667436	cppcheck: Suppress constParameterCallback errors We have several functions which are used as callbacks for NS_CALL() which only read their void * parameter, they don't write it. The constParameterCallback warning in cppcheck 2.14.1 complains that this parameter could be const void , also pointing out that that would require casting the function pointer when used as a callback. Casting the function pointers seems substantially uglier than using a non-const void as the parameter, especially since in each case we cast the void * to a const pointer of specific type immediately. So, suppress these errors. I think it would make logical sense to suppress this globally, but that would cause unmatchedSuppression errors on earlier cppcheck versions. So, instead individually suppress it, along with unmatchedSuppression in the relevant places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-08 13:06:20 +02:00
Stefano Brivio	ee338a256e	pasta, util: Align stack area for clones to maximum natural alignment Given that we use this stack pointer as a location to store arbitrary data types from the cloned process, we need to guarantee that its alignment matches any of those possible data types. runsisi reports that pasta gets a SIGBUS in pasta_open_ns() on aarch64, where the alignment requirement for stack pointers is a 16 bytes (same as the size of a long double), and similar requirements actually apply to most architectures we run on. Reported-by: runsisi <runsisi@hust.edu.cn> Link: https://bugs.passt.top/show_bug.cgi?id=85 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-04-19 11:15:27 +02:00
Stefano Brivio	5d5208b67d	treewide: Compilers' name for armv6l and armv7l is "arm" When I switched from 'uname -m' to 'gcc -dumpmachine' to fetch the architecture name for, among others, seccomp.sh, I didn't realise that "armv6l" and "armv7l" are just Linux kernel names -- compilers just call that "arm". Fix the "syscalls" annotation we use to define seccomp profiles accordingly, otherwise pasta will be terminated on sigreturn() on armv6l and armv7l. Fixes: `213c397492` ("passt, pasta: Run-time selection of AVX2 build") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-11 17:34:04 +02:00
Stefano Brivio	ff22a78d7b	pasta: Don't try to watch namespaces in procfs with inotify, use timer instead We watch network namespace entries to detect when we should quit (unless --no-netns-quit is passed), and these might stored in a tmpfs typically mounted at /run/user/UID or /var/run/user/UID, or found in procfs at /proc/PID/ns/. Currently, we try to use inotify for any possible location of those entries, but inotify, of course, doesn't work on pseudo-filesystems (see inotify(7)). The man page reflects this: the description of --no-netns-quit implies that we won't quit anyway if the namespace is not "bound to the filesystem". Well, we won't quit, but, since commit `9e0dbc8948` ("More deterministic detection of whether argument is a PID, PATH or NAME"), we try. And, indeed, this is harmless, as the caveat from that commit message states. Now, it turns out that Buildah, a tool to create container images, sharing its codebase with Podman, passes a procfs entry to pasta, and expects pasta to exit once the network namespace is not needed anymore, that is, once the original container process, also spawned by Buildah, terminates. Get this to work by using the timer fallback mechanism if the namespace name is passed as a path belonging to a pseudo-filesystem. This is expected to be procfs, but I covered sysfs and devpts pseudo-filesystems as well, because nothing actually prevents creating this kind of directory structure and links there. Note that fstatfs(), according to some versions of man pages, was apparently "deprecated" by the LSB. My reasoning for using it is essentially this: https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u ...that is, there was no such thing as an LSB deprecation, and anyway there's no other way to get the filesystem type. Also note that, while it might sound more obvious to detect the filesystem type using fstatfs() on the file descriptor itself (c->pasta_netns_fd), the reported filesystem type for it is nsfs, no matter what path was given to pasta. If we use the parent directory, we'll typically have either tmpfs or procfs reported. If the target namespace is given as a PID, or as a PID-based procfs entry, we don't risk races if this PID is recycled: our handle on /proc/PID/ns will always refer to the original namespace associated with that PID, and we don't re-open this entry from procfs to check it. There's, however, a remaining race possibility if the parent process is not the one associated to the network namespace we operate on: in that case, the parent might pass a procfs entry associated to a PID that was recycled by the time we parse it. This can't happen if the namespace PID matches the one of the parent, because we detach from the controlling terminal after parsing the namespace reference. To avoid this type of race, if desired, we could add the option for the parent to pass a PID file descriptor, that the parent obtained via pidfd_open(). This is beyond the scope of this change. Update the man page to reflect that, even if the target network namespace is passed as a procfs path or a PID, we'll now quit when the procfs entry is gone. Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/pull/21563#issuecomment-1948200214 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-19 19:58:50 +01:00
Stefano Brivio	8f3f8e190c	pasta: Add fallback timer mechanism to check if namespace is gone We don't know how frequently this happens, but hitting fs.inotify.max_user_watches or similar sysctl limits is definitely not out of question, and Paul mentioned that, for example, Podman's CI environments hit similar issues in the past. Introduce a fallback mechanism based on a timer file descriptor: we grab the directory handle at startup, and we can then use openat(), triggered periodically, to check if the (network) namespace directory still exists. If openat() fails at some point, exit. Link: https://github.com/containers/podman/pull/21563#issuecomment-1943505707 Reported-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-16 08:47:14 +01:00
David Gibson	5972203174	log: Enable format warnings logmsg() takes printf like arguments, but because it's not a built in, the compiler won't generate warnings if the format string and parameters don't match. Enable those by using the format attribute. Strictly speaking this is a gcc extension, but I believe it is also supported by some other common compilers. We already use some other attributes in various places. For now, just use it and we can worry about compilers that don't support it if it comes up. This exposes some warnings from existing callers, both in gcc and in clang-tidy: - Some are straight out bugs, which we correct - It's occasionally useful to invoke the logging functions with an empty string, which gcc objects to, so disable that specific warning in the Makefile - Strictly speaking the C standard requires that the parameter for a %p be a (void *), not some other pointer type. That's only likely to cause problems in practice on weird architectures with different sized representations for pointers to different types. Nonetheless add the casts to make it happy. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-11-07 09:54:56 +01:00
David Gibson	6471c7d01b	cppcheck: Make many pointers const Newer versions of cppcheck (as of 2.12.0, at least) added a warning for pointers which could be declared to point at const data, but aren't. Based on that, make many pointers throughout the codebase const. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-10-04 23:23:35 +02:00
David Gibson	955dd3251c	tcp, udp: Don't pre-fill IPv4 destination address in headers Because packets sent on the tap interface will always be going to the guest/namespace, we more-or-less know what address they'll be going to. So we pre-fill this destination address in our header buffers for IPv4. We can't do the same for IPv6 because we could need either the global or link-local address for the guest. In future we're going to want more flexibility for the destination address, so this pre-filling will get in the way. Change the flow so we always fill in the IPv4 destination address for each packet, rather than prefilling it from proto_update_l2_buf(). In fact for TCP we already redundantly filled the destination for each packet anyway. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-22 12:15:33 +02:00
David Gibson	6a6735ece4	epoll: Always use epoll_ref for the epoll data variable epoll_ref contains a variety of information useful when handling epoll events on our sockets, and we place it in the epoll_event data field returned by epoll. However, for a few other things we use the 'fd' field in the standard union of types for that data field. This actually introduces a bug which is vanishingly unlikely to hit in practice, but very nasty if it ever did: theoretically if we had a very large file descriptor number for fd_tap or fd_tap_listen it could overflow into bits that overlap with the 'proto' field in epoll_ref. With some very bad luck this could mean that we mistakenly think an event on a regular socket is an event on fd_tap or fd_tap_listen. More practically, using different (but overlapping) fields of the epoll_data means we can't unify dispatch for the various different objects in the epoll. Therefore use the same epoll_ref as the data for the tap fds and the netns quit fd, adding new fd type values to describe them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:29:53 +02:00
David Gibson	02b30e7871	netlink: Propagate errors for "dup" operations We now detect errors on netlink "set" operations while configuring the pasta namespace with --config-net. However in many cases rather than a simple "set" we use a more complex "dup" function to copy configuration from the host to the namespace. We're not yet properly detecting and reporting netlink errors for that case. Change the "dup" operations to propagate netlink errors to their caller, pasta_ns_conf() and report them there. Link: https://bugs.passt.top/show_bug.cgi?id=60 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting changes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:32:32 +02:00
David Gibson	8de9805224	netlink: Propagate errors for "set" operations Currently if anything goes wrong while we're configuring the namespace network with --config-net, we'll just ignore it and carry on. This might lead to a silently unconfigured or misconfigured namespace environment. For simple "set" operations based on nl_do() we can now detect failures reported via netlink. Propagate those errors up to pasta_ns_conf() and report them usefully. Link: https://bugs.passt.top/show_bug.cgi?id=60 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting changes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:30:22 +02:00
David Gibson	576df71e8b	netlink: Explicitly pass netlink sockets to operations All the netlink operations currently implicitly use one of the two global netlink sockets, sometimes depending on an 'ns' parameter. Change them all to explicitly take the socket to use (or two sockets to use in the case of the *_dup() functions). As well as making these functions strictly more general, it makes the callers easier to follow because we're passing a socket variable with a name rather than an unexplained '0' or '1' for the ns parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting changes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:27:42 +02:00
David Gibson	257a6b0b7e	netlink: Split nl_route() into separate operation functions nl_route() can perform 3 quite different operations based on the 'op' parameter. Split this into separate functions for each one. This requires more lines of code, but makes the internal logic of each operation much easier to follow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:25:20 +02:00
David Gibson	eff3bcb245	netlink: Split nl_addr() into separate operation functions nl_addr() can perform three quite different operations based on the 'op' parameter, each of which uses a different subset of the parameters. Split them up into a function for each operation. This does use more lines of code, but the overlap wasn't that great, and the separated logic is much easier to follow. It's also clearer in the callers what we expect the netlink operations to do, and what information it uses. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting fixes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:24:52 +02:00
David Gibson	e96182e9c2	netlink: Split up functionality of nl_link() nl_link() performs a number of functions: it can bring links up, set MAC address and MTU and also retrieve the existing MAC. This makes for a small number of lines of code, but high conceptual complexity: it's quite hard to follow what's going on both in nl_link() itself and it's also not very obvious which function its callers are intending to use. Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(), and nl_link_get_mac(). The first brings up a link, optionally setting the MTU, the others get or set the MAC address. This fixes an arguable bug in pasta_ns_conf(): it looks as though that was intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set. However, it only actually does so in the !c->pasta_conf_ns case: the fact that we set up==1 means we would only ever set, never get, the MAC in the nl_link() call in the other path. We get away with this because the MAC will quickly be discovered once we receive packets on the tap interface. Still, it's neater to always get the MAC address here. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:18:14 +02:00
Paul Holzinger	32660cea04	pasta: include errno in error message When the open() or setns() calls fails pasta exits early and prints an error. However it did not include the errno so it was impossible to know why the syscall failed. Signed-off-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Split print to fit 80 columns in pasta_open_ns()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-06-25 23:50:42 +02:00
Stefano Brivio	cc9d16758b	conf, pasta: With --config-net, copy all addresses by default Use the newly-introduced NL_DUP mode for nl_addr() to copy all the addresses associated to the template interface in the outer namespace, unless --no-copy-addrs (also implied by -a) is given. This option is introduced as deprecated right away: it's not expected to be of any use, but it's helpful to keep it around for a while to debug any suspected issue with this change. This is done mostly for consistency with routes. It might partially cover the issue at: https://bugs.passt.top/show_bug.cgi?id=47 Support multiple addresses per address family for some use cases, but not the originally intended one: we'll still use a single outbound address (unless the routing table specifies different preferred source addresses depending on the destination), regardless of the address used in the target namespace. Link: https://bugs.passt.top/show_bug.cgi?id=47 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	e89da3cf03	netlink: Add functionality to copy addresses from outer namespace Similarly to what we've just done with routes, support NL_DUP for addresses (currently not exposed): nl_addr() can optionally copy mulitple addresses to the target namespace, by fixing up data from the dump with appropriate flags and interface index, and repeating it back to the kernel on the socket opened in the target namespace. Link-local addresses are not copied: the family is set to AF_UNSPEC, which means the kernel will ignore them. Same for addresses from a mismatching address (pre-4.19 kernels without support for NETLINK_GET_STRICT_CHK). Ignore IFA_LABEL attributes by changing their type to IFA_UNSPEC, because in general they will report mismatching names, and we don't really need to use labels as we already know the interface index. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	da54641f14	conf, pasta: With --config-net, copy all routes by default Use the newly-introduced NL_DUP mode for nl_route() to copy all the routes associated to the template interface in the outer namespace, unless --no-copy-routes (also implied by -g) is given. This option is introduced as deprecated right away: it's not expected to be of any use, but it's helpful to keep it around for a while to debug any suspected issue with this change. Otherwise, we can't use default gateways which are not, address-wise, on the same subnet as the container, as reported by Callum. Reported-by: Callum Parsey <callum@neoninteger.au> Link: https://github.com/containers/podman/issues/18539 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	2fe0461856	netlink: Add functionality to copy routes from outer namespace Instead of just fetching the default gateway and configuring a single equivalent route in the target namespace, on 'pasta --config-net', it might be desirable in some cases to copy the whole set of routes corresponding to a given output interface. For instance, in: https://github.com/containers/podman/issues/18539 IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes configuring the default gateway won't work without a gateway-less route (specifying the output interface only), because the default gateway is, somewhat dubiously, not on the same subnet as the container. This is a similar case to the one covered by commit `7656a6f888` ("conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway"), and I'm not exactly proud of that workaround. We also have: https://bugs.passt.top/show_bug.cgi?id=49 pasta does not work with tap-style interface for which, eventually, we should be able to configure a gateway-less route in the target namespace. Introduce different operation modes for nl_route(), including a new NL_DUP one, not exposed yet, which simply parrots back to the kernel the route dump for a given interface from the outer namespace, fixing up flags and interface indices on the way, and requesting to add the same routes in the target namespace, on the interface we manage. For n routes we want to duplicate, send n identical netlink requests including the full dump: routes might depend on each other and the kernel processes RTM_NEWROUTE messages sequentially, not atomically, and repeating the full dump naturally resolves dependencies without the need to actually calculate them. I'm not kidding, it actually works pretty well. Link: https://github.com/containers/podman/issues/18539 Link: https://bugs.passt.top/show_bug.cgi?id=49 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	f099afb1f2	pasta: Improve error handling on failure to join network namespace In pasta_wait_for_ns(), open() failing with ENOENT is expected: we're busy-looping until the network namespace appears. But any other failure is not something we're going to recover from: return right away if we don't get either success or ENOENT. Now that pasta_wait_for_ns() can actually fail, handle that in pasta_start_ns() by reporting the issue and exiting. Looping on EPERM, when pasta doesn't actually have the permissions to join a given namespace, isn't exactly a productive thing to do. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-05-23 16:13:28 +02:00
Stefano Brivio	b0e450aa85	pasta: Detach mount namespace, (re)mount procfs before spawning command If we want /proc contents to be consistent after pasta spawns a child process in a new PID namespace (only for operation without a pre-existing namespace), we need to mount /proc after the clone(2) call with CLONE_NEWPID, and we enable the child to do that by passing, in the same call, the CLONE_NEWNS flag, as described by pid_namespaces(7). This is not really a remount: in fact, passing MS_REMOUNT to mount(2) would make the call fail. We're in another mount namespace now, so it's a fresh mount that has the effect of hiding the existing one. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	ca2749e1bd	passt: Relicense to GPL 2.0, or any later version In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-04-06 18:00:33 +02:00
Laine Stump	c9af6f92db	convert all remaining err() followed by exit() to die() This actually leaves us with 0 uses of err(), but someone could want to use it in the future, so we may as well leave it around. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:27 +01:00
Paul Holzinger	a234407f5c	pasta: propagate exit code from child command Exits codes are very useful for scripts, when the pasta child execvp() call fails with ENOENT that parent should also exit with > 0. In short the parent should always exit with the code from the child to make it useful in scripts. It is easy to test with: `pasta -- bash -c "exit 3"; echo $?` Signed-off-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-12 23:42:50 +01:00
Paul Holzinger	04dfc5b81f	pasta: correctly exit when execvp() fails By default clone() will create a child that does not send SIGCHLD when the child exits. The caller has to specifiy the SIGNAL it should get in the flag bitmask. see clone(2) under "The child termination signal" This fixes the problem where pasta would not exit when the execvp() call failed, i.e. when the command does not exists. Signed-off-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-12 23:42:43 +01:00
Stefano Brivio	d8921dafe5	pasta: Wait for tap to be set up before spawning command Adapted from a patch by Paul Holzinger: when pasta spawns a command, operating without a pre-existing user and network namespace, it needs to wait for the tap device to be configured and its handler ready, before the command is actually executed. Otherwise, something like: pasta --config-net nslookup passt.top usually fails as the nslookup command is issued before the network interface is ready. We can't adopt a simpler approach based on SIGSTOP and SIGCONT here: the child runs in a separate PID namespace, so it can't send SIGSTOP to itself as the kernel sees the child as init process and blocks the delivery of the signal. We could send SIGSTOP from the parent, but this wouldn't avoid the possible condition where the child isn't ready to wait for it when the parent sends it, also raised by Paul -- and SIGSTOP can't be blocked, so it can never be pending. Use SIGUSR1 instead: mask it before clone(), so that the child starts with it blocked, and can safely wait for it. Once the parent is ready, it sends SIGUSR1 to the child. If SIGUSR1 is sent before the child is waiting for it, the kernel will queue it for us, because it's blocked. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `1392bc5ca0` ("Allow pasta to take a command to execute") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-12 12:22:59 +01:00
Stefano Brivio	ab6f825889	util, pasta: Add do_clone() wrapper around __clone2() and clone() Spotted in Debian's buildd logs: on ia64, clone(2) is not available: the glibc wrapper is named __clone2() and it takes, additionally, the size of the stack area passed by the caller. Add a do_clone() wrapper handling the different cases, and also taking care of pointing the child's stack in the middle of the allocated area: on PA-RISC (hppa), handled by clone(), the stack grows up, and on ia64 the stack grows down, but the register backing store grows up -- and I think it might be actually used here. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-16 17:28:53 +01:00
David Gibson	dd09cceaee	Minor improvements to IPv4 netmask handling There are several minor problems with our parsing of IPv4 netmasks (-n). First, we don't reject nonsensical netmasks like 0.255.0.255. Address this structurally by using prefix length instead of netmask as the primary variable, only converting (and validating) when we need to. This has the added benefit of making some things more uniform with the IPv6 path. Second, when the user specifies a prefix length, we truncate the output from strtol() to an integer, which means we would treat -n 4294967320 as valid (equivalent to 24). Fix types to check for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:19 +01:00
David Gibson	40abd447c8	Rename pasta_setup_ns() to pasta_spawn_cmd() pasta_setup_ns() no longer has much to do with setting up a namespace. Instead it's really about starting the shell or other command we want to run with pasta connectivity. Rename it and its argument structure to be less misleading. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	eb3d03a588	isolation: Only configure UID/GID mappings in userns when spawning shell When in passt mode, or pasta mode spawning a command, we create a userns for ourselves. This is used both to isolate the pasta/passt process itself and to run the spawned command, if any. Since `eed17a47` "Handle userns isolation and dropping root at the same time" we've handled both cases the same, configuring the UID and GID mappings in the new userns to map whichever UID we're running as to root within the userns. This mapping is desirable when spawning a shell or other command, so that the user gets a root shell with reasonably clear abilities within the userns and netns. It's not necessarily essential, though. When not spawning a shell, it doesn't really have any purpose: passt itself doesn't need to be root and can operate fine with an unmapped user (using some of the capabilities we get when entering the userns instead). Configuring the uid_map can cause problems if passt is running with any capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to allow it to forward low ports. In this case the kernel makes files in /proc/pid owned by root rather than the starting user to prevent the user from interfering with the operation of the capability-enhanced process. This includes uid_map meaning we are not able to write to it. Whether this behaviour is correct in the kernel is debatable, but in any case we might as well avoid problems by only initializing the user mappings when we really want them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	ea5936dd3f	Replace FWRITE with a function In a few places we use the FWRITE() macro to open a file, replace it's contents with a given string and close it again. There's no real reason this needs to be a macro rather than just a function though. Turn it into a function 'write_file()' and make some ancillary cleanups while we're there: - Add a return code so the caller can handle giving a useful error message - Handle the case of short write()s (unlikely, but possible) - Add O_TRUNC, to make sure we replace the existing contents entirely Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	6909a8e339	Remove unhelpful drop_caps() call in pasta_start_ns() drop_caps() has a number of bugs which mean it doesn't do what you'd expect. However, even if we fixed those, the call in pasta_start_ns() doesn't do anything useful: * In the common case, we're UID 0 at this point. In this case drop_caps() doesn't accomplish anything, because even with capabilities dropped, we are still privileged. * When attaching to an existing namespace with --userns or --netns-only we might not be UID 0. In this case it's too early to drop all capabilities: we need at least CAP_NET_ADMIN to configure the tap device in the namespace. Remove this call - we will still drop capabilities a little later in sandbox(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	01b4e71f7a	pasta_start_ns() always ends in parent context The end of pasta_start_ns() has a test against pasta_child_pid, testing if we're in the parent or the child. However we started the child running the pasta_setup_ns function which always exec()s or exit()s, so if we return from the clone() we are always in the parent, making that test unnecessary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	672a8cd80e	pasta: More general way of starting spawned shell as a login shell When invoked so as to spawn a shell, pasta checks explicitly for the shell being bash and if so, adds a "-l" option to make it a login shell. This is not ideal, since this is a bash specific option and requires pasta to know about specific shell variants. There's a general convention for starting a login shell, which is to prepend a "-" to argv[0]. Use this approach instead, so we don't need bash specific logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
Stefano Brivio	da152331cf	Move logging functions to a new file, log.c Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:25 +02:00
David Gibson	5823dc5c68	clang-tidy: Fix spurious null pointer warning in pasta_start_ns() clang-tidy isn't quite clever enough to figure out that getenv("SHELL") will return the same thing both times here, which makes it conclude that shell could be NULL, causing problems later. It's a bit ugly that we call getenv() twice in any case, so rework this in a way that clang-tidy can figure out shell won't be NULL. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:21:48 +02:00
David Gibson	eed17a47fe	Handle userns isolation and dropping root at the same time passt/pasta can interact with user namespaces in a number of ways: 1) With --netns-only we'll remain in our original user namespace 2) With --userns or a PID option to pasta we'll join either the given user namespace or that of the PID 3) When pasta spawns a shell or command we'll start a new user namespace for the command and then join it 4) With passt we'll create a new user namespace when we sandbox() ourself However (3) and (4) turn out to have essentially the same effect. In both cases we create one new user namespace. The spawned command starts there, and passt/pasta itself will live there from sandbox() onwards. Because of this, we can simplify user namespace handling by moving the userns handling earlier, to the same point we drop root in the original namespace. Extend the drop_user() function to isolate_user() which does both. After switching UID and GID in the original userns, isolate_user() will either join or create the userns we require. When we spawn a command with pasta_start_ns()/pasta_setup_ns() we no longer need to create a userns, because we're already made one. sandbox() likewise no longer needs to create (or join) an userns because we're already in the one we need. We no longer need c->pasta_userns_fd, since the fd is only used locally in isolate_user(). Likewise we can replace c->netns_only with a local in conf(), since it's not used outside there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	d9f889a55a	Correctly handle --netns-only in pasta_start_ns() --netns-only is supposed to make pasta use only a network namespace, not a user namespace. However, pasta_start_ns() has this backwards, and if --netns-only is specified it creates a user namespace but not a network namespace. Correct this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	fc1be3d5ab	Clean up and rename conf_ns_open() conf_ns_open() opens file descriptors for the namespaces pasta needs, but it doesnt really have anything to do with configuration any more. For better clarity, move it to pasta.c and rename it pasta_open_ns(). This makes the symmetry between it and pasta_start_ns() more clear, since these represent the two basic ways that pasta can operate, either attaching to an existing namespace/process or spawning a new one. Since its no longer validating options, the errors it could return shouldn't cause a usage message. Just exit directly with an error instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	d72a1e7bb9	Move self-isolation code into a separate file passt/pasta contains a number of routines designed to isolate passt from the rest of the system for security. These are spread through util.c and passt.c. Move them together into a new isolation.c file. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	1392bc5ca0	Allow pasta to take a command to execute When not given an existing PID or network namspace to attach to, pasta spawns a shell. Most commands which can spawn a shell in an altered environment can also run other commands in that same environment, which can be useful in automation. Allow pasta to do the same thing; it can be given an arbitrary command to run in the network and user namespace which pasta creates. If neither a command nor an existing PID or netns to attach to is given, continue to spawn a default shell, as before. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:43:31 +02:00
David Gibson	60ffc5b6cb	Don't unnecessarily avoid CLOEXEC flags There are several places in the passt code where we have lint overrides because we're not adding CLOEXEC flags to open or other operations. Comments suggest this is because it's before we fork() into the background but we'll need those file descriptors after we're in the background. However, as the name suggests CLOEXEC closes on exec(), not on fork(). The only place we exec() is either super early invoke the avx2 version of the binary, or when we start a shell in pasta mode, which certainly doesn't require the fds in question. Add the CLOEXEC flag in those places, and remove the lint overrides. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-24 18:01:48 +02:00
David Gibson	16f5586bb8	Make substructures for IPv4 and IPv6 specific context information The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-07-30 22:14:07 +02:00
David Gibson	5e12d23acb	Separate IPv4 and IPv6 configuration After recent changes, conf_ip() now has essentially entirely disjoint paths for IPv4 and IPv6 configuration. So, it's cleaner to split them out into different functions conf_ip4() and conf_ip6(). Splitting these out also lets us make the interface a bit nicer, having them return success or failure directly, rather than manipulating c->v4 and c->v6 to indicate success/failure of the two versions. Since these functions may also initialize the interface index for each protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since a 0 interface index is never valid). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Whitespace fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-30 22:12:50 +02:00
Stefano Brivio	eb3d3f367e	treewide: Argument cannot be negative, CWE-687 Actually harmless. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-04-07 11:44:35 +02:00
Stefano Brivio	62c3edd957	treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	6d661dc5b2	seccomp: Adjust list of allowed syscalls for armv6l, armv7l It looks like glibc commonly implements clock_gettime(2) with clock_gettime64(), and uses recv() instead of recvfrom(), send() instead of sendto(), and sigreturn() instead of rt_sigreturn() on armv6l and armv7l. Adjust the list of system calls for armv6l and armv7l accordingly. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-26 23:39:19 +01:00
Stefano Brivio	745a9ba428	pasta: By default, quit if filesystem-bound net namespace goes away This should be convenient for users managing filesystem-bound network namespaces: monitor the base directory of the namespace and exit if the namespace given as PATH or NAME target is deleted. We can't add an inotify watch directly on the namespace directory, that won't work with nsfs. Add an option to disable this behaviour, --no-netns-quit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00

1 2

63 commits