passt

Author	SHA1	Message	Date
David Gibson	5ed4e034b2	conf: Demote overlapping port ranges error to a warning We give a fatal error if the port ranges from any port forwarding specifiers overlap. This occurs even if those port ranges are specifically bound to different addresses, so there's not really any overlap. Right now, we can't 100% handle this case correctly, because our data structures don't have a way to represent per-address forwarding. However, there are a number of cases that will actually work just fine: e.g. mapping the same port to the same port on two different addresses (say :: and 127.0.0.1). We have long term plans to fix this properly, but that is still some time away. For the time being, demote this error to a warning so that the cases that already work will be allowed. Link: https://bugs.passt.top/show_bug.cgi?id=56 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-13 17:30:22 +02:00
David Gibson	5103811e2d	netlink: Propagate errors for "dump" operations Currently if we receive any netlink errors while discovering network configuration from the host, we'll just ignore it and carry on. This might lead to cryptic error messages later on, or even silent misconfiguration. We now have the mechanisms to detect errors from get/dump netlink operations. Propgate these errors up to the callers and report them usefully. Link: https://bugs.passt.top/show_bug.cgi?id=60 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:30:41 +02:00
David Gibson	576df71e8b	netlink: Explicitly pass netlink sockets to operations All the netlink operations currently implicitly use one of the two global netlink sockets, sometimes depending on an 'ns' parameter. Change them all to explicitly take the socket to use (or two sockets to use in the case of the *_dup() functions). As well as making these functions strictly more general, it makes the callers easier to follow because we're passing a socket variable with a name rather than an unexplained '0' or '1' for the ns parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting changes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:27:42 +02:00
David Gibson	257a6b0b7e	netlink: Split nl_route() into separate operation functions nl_route() can perform 3 quite different operations based on the 'op' parameter. Split this into separate functions for each one. This requires more lines of code, but makes the internal logic of each operation much easier to follow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:25:20 +02:00
David Gibson	eff3bcb245	netlink: Split nl_addr() into separate operation functions nl_addr() can perform three quite different operations based on the 'op' parameter, each of which uses a different subset of the parameters. Split them up into a function for each operation. This does use more lines of code, but the overlap wasn't that great, and the separated logic is much easier to follow. It's also clearer in the callers what we expect the netlink operations to do, and what information it uses. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor formatting fixes in pasta_ns_conf()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:24:52 +02:00
David Gibson	e96182e9c2	netlink: Split up functionality of nl_link() nl_link() performs a number of functions: it can bring links up, set MAC address and MTU and also retrieve the existing MAC. This makes for a small number of lines of code, but high conceptual complexity: it's quite hard to follow what's going on both in nl_link() itself and it's also not very obvious which function its callers are intending to use. Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(), and nl_link_get_mac(). The first brings up a link, optionally setting the MTU, the others get or set the MAC address. This fixes an arguable bug in pasta_ns_conf(): it looks as though that was intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set. However, it only actually does so in the !c->pasta_conf_ns case: the fact that we set up==1 means we would only ever set, never get, the MAC in the nl_link() call in the other path. We get away with this because the MAC will quickly be discovered once we receive packets on the tap interface. Still, it's neater to always get the MAC address here. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:18:14 +02:00
David Gibson	6920adda0d	util: Make ns_enter() a void function and report setns() errors ns_enter() returns an integer... but it's always zero. If we actually fail the function doesn't return. Therefore it makes more sense for this to be a function returning void, and we can remove the cases where we pointlessly checked its return value. In addition ns_enter() is usually called from an ephemeral thread created by NS_CALL(). That means that the exit(EXIT_FAILURE) there usually won't be reported (since NS_CALL() doesn't wait() for the thread). So, use die() instead to print out some information in the unlikely event that our setns() here does fail. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-08-04 01:18:02 +02:00
David Gibson	4c98d3be80	conf: Correct length checking of interface names in conf_ports() When interface names are specified in forwarding specs, we need to check the length of the given interface name against the limit of IFNAMSIZ - 1 (15) characters. However, we managed to have 3 separate off-by-one errors here meaning we only accepted interface names up to 12 characters. 1. At the point of the check 'ifname' was still on the '%' character, not the first character of the name, meaning we overestimated the length by one 2. At the point of the check 'spec' had been advanced one character past the '/' which terminates the interface name, meaning we overestimated the length by another one 3. We checked if the (miscalculated) length was >= IFNAMSIZ - 1, that is >= 15, whereas lengths equal to 15 should be accepted. Correct all 3 errors. Link: https://bugs.passt.top/show_bug.cgi?id=61 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-06-28 17:51:25 +02:00
David Gibson	c4017cc4a1	conf: Fix size checking of -I interface name Network interface names must fit in a buffer of IFNAMSIZ bytes, including the terminating \0. IFNAMSIZ is 16 on Linux, so interface names can be up to (and including) 15 characters long. We validate this for the -I option, but we have an off by one error. We pass (IFNAMSIZ - 1) as the buffer size to snprintf(), but that buffer size already includes the terminating \0, so this actually truncates the value to 14 characters. The return value returned from snprintf() however, is the number of characters that would have been printed excluding the terminating \0, so by comparing it >= IFNAMSIZ - 1 we are giving an error on names >= 15 characters rather than strictly > 15 characters. Link: https://bugs.passt.top/show_bug.cgi?id=61 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-06-28 17:51:11 +02:00
Stefano Brivio	5b646b9b10	conf: Accept -a and -g without --config-net in pasta mode While --no-copy-addrs and --no-copy-routes only make sense with --config-net, and they are implied on -g and -a, respectively, that doesn't mean we should refuse -a or -g without --config-net: they are still relevant for a number of things (including DHCP/DHCPv6/NDP configuration). Reported-by: Gianluca Stivan <me@yawnt.com> Fixes: `cc9d16758b` ("conf, pasta: With --config-net, copy all addresses by default") Fixes: `da54641f14` ("conf, pasta: With --config-net, copy all routes by default") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-06-25 23:49:25 +02:00
Stefano Brivio	d034fb698f	conf: Make -a/--address really imply --no-copy-addrs I wrote it in commit message and man page, but not in conf()... Note that -g/--gateway correctly implies --no-copy-routes already. This fixes Podman's tests: podman networking with pasta(1) - IPv4 address assignment podman networking with pasta(1) - IPv4 default route assignment where we pass -a and -g to assign an address and a default gateway that's compatible with it, but -a doesn't disable the copy of addresses, so we ignore -a, and the default gateway is incompatible with the addresses we copy -- hence no routes in the container. Link: https://github.com/containers/podman/pull/18612 Fixes: `cc9d16758b` ("conf, pasta: With --config-net, copy all addresses by default") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-06-25 23:49:25 +02:00
Stefano Brivio	3c6d1b9bb2	conf, log: On -h / --help, print usage to stdout, not stderr Erik suggests that this makes it easier to grep for options, and with --help we're anyway printing usage information as expected, not as part of an error report. While at it: on -h, we should exit with 0. Reported-by: Erik Sjölund <erik.sjolund@gmail.com> Link: https://bugs.passt.top/show_bug.cgi?id=52 Link: https://bugs.passt.top/show_bug.cgi?id=53 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-06-23 10:15:55 +02:00
David Gibson	429e1a7e71	conf: Fix erroneous check of ip6->gw `a7359f0948` ("conf: Don't exit if sourced default route has no gateway") was supposed to allow passt/pasta to run even if given a template interface which has no default gateway. However a mistake in the patch means it still requires the gateway, but doesn't require a global address for the guest which we really do need. This is one part (but not the only part) of the problem seen in https://bugs.passt.top/show_bug.cgi?id=50. Reported-by: Justin Jereza <justinjereza@gmail.com> Fixes: `a7359f0948` ("conf: Don't exit if sourced default route has no gateway") Link: https://bugs.passt.top/show_bug.cgi?id=50 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-06-03 07:49:17 +02:00
Stefano Brivio	cc9d16758b	conf, pasta: With --config-net, copy all addresses by default Use the newly-introduced NL_DUP mode for nl_addr() to copy all the addresses associated to the template interface in the outer namespace, unless --no-copy-addrs (also implied by -a) is given. This option is introduced as deprecated right away: it's not expected to be of any use, but it's helpful to keep it around for a while to debug any suspected issue with this change. This is done mostly for consistency with routes. It might partially cover the issue at: https://bugs.passt.top/show_bug.cgi?id=47 Support multiple addresses per address family for some use cases, but not the originally intended one: we'll still use a single outbound address (unless the routing table specifies different preferred source addresses depending on the destination), regardless of the address used in the target namespace. Link: https://bugs.passt.top/show_bug.cgi?id=47 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	e89da3cf03	netlink: Add functionality to copy addresses from outer namespace Similarly to what we've just done with routes, support NL_DUP for addresses (currently not exposed): nl_addr() can optionally copy mulitple addresses to the target namespace, by fixing up data from the dump with appropriate flags and interface index, and repeating it back to the kernel on the socket opened in the target namespace. Link-local addresses are not copied: the family is set to AF_UNSPEC, which means the kernel will ignore them. Same for addresses from a mismatching address (pre-4.19 kernels without support for NETLINK_GET_STRICT_CHK). Ignore IFA_LABEL attributes by changing their type to IFA_UNSPEC, because in general they will report mismatching names, and we don't really need to use labels as we already know the interface index. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	a7359f0948	conf: Don't exit if sourced default route has no gateway If we use a template interface without a gateway on the default route, we can still offer almost complete functionality, except that, of course, we can't map the gateway address to the outer namespace or host, and that we have no obvious server address or identifier for use in DHCP's siaddr and option 54 (Server identifier, mandatory). Continue, if we have a default route but no default gateway, and imply --no-map-gw and --no-dhcp in that case. NDP responder and DHCPv6 should be able to work as usual because we require a link-local address to be present, and we'll fall back to that. Together with the previous commits implementing an actual copy of routes from the outer namespace, this should finally fix the operation of 'pasta --config-net' for cases where we have a default route on the host, but no default gateway, as it's the case for tap-style routes, including typical Wireguard endpoints. Reported-by: me@yawnt.com Link: https://bugs.passt.top/show_bug.cgi?id=49 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	e8fef7525c	Revert "conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway" This reverts commit `7656a6f888`: now, by default, we copy all the routes associated to the outbound interface into the routing table of the container, so there's no need for this horrible workaround anymore. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	da54641f14	conf, pasta: With --config-net, copy all routes by default Use the newly-introduced NL_DUP mode for nl_route() to copy all the routes associated to the template interface in the outer namespace, unless --no-copy-routes (also implied by -g) is given. This option is introduced as deprecated right away: it's not expected to be of any use, but it's helpful to keep it around for a while to debug any suspected issue with this change. Otherwise, we can't use default gateways which are not, address-wise, on the same subnet as the container, as reported by Callum. Reported-by: Callum Parsey <callum@neoninteger.au> Link: https://github.com/containers/podman/issues/18539 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	468f19a852	conf: --config-net option is for pasta mode only Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	2fe0461856	netlink: Add functionality to copy routes from outer namespace Instead of just fetching the default gateway and configuring a single equivalent route in the target namespace, on 'pasta --config-net', it might be desirable in some cases to copy the whole set of routes corresponding to a given output interface. For instance, in: https://github.com/containers/podman/issues/18539 IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes configuring the default gateway won't work without a gateway-less route (specifying the output interface only), because the default gateway is, somewhat dubiously, not on the same subnet as the container. This is a similar case to the one covered by commit `7656a6f888` ("conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway"), and I'm not exactly proud of that workaround. We also have: https://bugs.passt.top/show_bug.cgi?id=49 pasta does not work with tap-style interface for which, eventually, we should be able to configure a gateway-less route in the target namespace. Introduce different operation modes for nl_route(), including a new NL_DUP one, not exposed yet, which simply parrots back to the kernel the route dump for a given interface from the outer namespace, fixing up flags and interface indices on the way, and requesting to add the same routes in the target namespace, on the interface we manage. For n routes we want to duplicate, send n identical netlink requests including the full dump: routes might depend on each other and the kernel processes RTM_NEWROUTE messages sequentially, not atomically, and repeating the full dump naturally resolves dependencies without the need to actually calculate them. I'm not kidding, it actually works pretty well. Link: https://github.com/containers/podman/issues/18539 Link: https://bugs.passt.top/show_bug.cgi?id=49 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-05-23 16:13:28 +02:00
Stefano Brivio	b0881aae6d	util, conf: Add and use ns_is_init() helper We'll need this in isolate_initial(). While at it, don't rely on BUFSIZ: the earlier issue we had with musl reminded me it's not a magic "everything will fit" value. Size the read buffer to what we actually need from uid_map, and check for the final newline too, because uid_map is organised in lines. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-05-23 16:13:18 +02:00
Stefano Brivio	ca2749e1bd	passt: Relicense to GPL 2.0, or any later version In practical terms, passt doesn't benefit from the additional protection offered by the AGPL over the GPL, because it's not suitable to be executed over a computer network. Further, restricting the distribution under the version 3 of the GPL wouldn't provide any practical advantage either, as long as the passt codebase is concerned, and might cause unnecessary compatibility dilemmas. Change licensing terms to the GNU General Public License Version 2, or any later version, with written permission from all current and past contributors, namely: myself, David Gibson, Laine Stump, Andrea Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-04-06 18:00:33 +02:00
Stefano Brivio	98a9a7d9e5	conf: Allow binding to ports on an interface without a specific address Somebody might want to bind listening sockets to a specific interface, but not a specific address, and there isn't really a reason to prevent that. For example: -t %eth0/2022 Alternatively, we support options such as -t 0.0.0.0%eth0/2022 and -t ::%eth0/2022, but not together, for the same port. Enable this kind of syntax and add examples to the man page. Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/14425#issuecomment-1485192195 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-03-29 13:48:12 +02:00
David Gibson	34ade90957	Work around weird false positives with cppcheck-2.9.1 Commit `89e38f55` "treewide: Fix header includes to build with musl" added extra #includes to work with musl. Unfortunately with the cppcheck version I'm using (cppcheck-2.9-1.fc37.x86_64 in Fedora 37) this causes weird false positives: specifically cppcheck seems to hit a #error in <bits/unistd.h> complaining about including it directly instead of via <unistd.h> (which is not something we're doing). I have no idea why that would be happening; but I'm guessing it has to be a bug in the cpp implementation in that cppcheck version. In any case, it's possible to work around this by moving the include of <unistd.h> before the include of <signal.h>. So, do that. Fixes: `89e38f5540` ("treewide: Fix header includes to build with musl") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-03-21 16:38:06 +01:00
Stefano Brivio	bb2b67cb35	conf: Terminate on EMFILE or ENFILE on sockets for port mapping In general, we don't terminate or report failures if we fail to bind to some ports out of a given port range specifier, to allow users to conveniently specify big port ranges (or "all") without having to care about ports that might already be in use. However, running out of the open file descriptors quota is a different story: we can't do what the user requested in a very substantial way. For example, if the user specifies '-t all' and we can only bind 1024 sockets, the behaviour is rather unexpected. Fail whenever socket creation returns -ENFILE or -EMFILE. Link: https://bugs.passt.top/show_bug.cgi?id=27 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-03-09 03:44:21 +01:00
Chris Kuhn	89e38f5540	treewide: Fix header includes to build with musl Roughly inspired from a patch by Chris Kuhn: fix up includes so that we can build against musl: glibc is more lenient as headers generally include a larger amount of other headers. Compared to the original patch, I only included what was needed directly in C files, instead of adding blanket includes in local header files. It's a bit more involved, but more consistent with the current (not ideal) situation. Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-03-09 03:44:21 +01:00
Chris Kuhn	5c58feab7b	conf, passt: Rename stderr to force_stderr While building against musl, gcc informs us that 'stderr' is a protected keyword. This probably comes from a #define stderr (stderr) in musl's stdio.h, to avoid a clash with extern FILE *const stderr, but I didn't really track it down. Just rename it to force_stderr, it makes more sense. [sbrivio: Added commit message] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-03-09 03:44:21 +01:00
Stefano Brivio	a9c59dd91b	conf, icmp, tcp, udp: Add options to bind to outbound address and interface I didn't notice earlier: libslirp (and slirp4netns) supports binding outbound sockets to specific IPv4 and IPv6 addresses, to force the source addresse selection. If we want to claim feature parity, we should implement that as well. Further, Podman supports specifying outbound interfaces as well, but this is simply done by resolving the primary address for an interface when the network back-end is started. However, since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can actually bind to a specific interface name, which doesn't need to be validated in advance. Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses, and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets to given interfaces. Given that it probably makes little sense to select addresses and routes from interfaces different than the ones given for outbound sockets, also assign those as "template" interfaces, by default, unless explicitly overridden by '-i'. For ICMP and UDP, we call sock_l4() to open outbound sockets, as we already needed to bind to given ports or echo identifiers, and we can bind() a socket only once: there, pass address (if any) and interface (if any) for the existing bind() and setsockopt() calls. For TCP, in general, we wouldn't otherwise bind sockets. Add a specific helper to do that. For UDP outbound sockets, we need to know if the final destination of the socket is a loopback address, before we decide whether it makes sense to bind the socket at all: move the block mangling the address destination before the creation of the socket in the IPv4 path. This was already the case for the IPv6 path. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-03-09 03:43:59 +01:00
Stefano Brivio	70148ce5be	conf, passt.h: Rename "outbound" interface to "template" interface In preparation for the next patch, make it clear that the first routable interface fetched via netlink, or the one configured via -i/--interface, is simply used as template to copy addresses and routes, not an interface we actually use to derive the source address (which will be _bound to_) for outgoing packets. The man page and usage message appear to be already clear enough. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-03-09 00:36:08 +01:00
Stefano Brivio	d2df763232	log, conf, tap: Define die() as err() plus exit(), drop cppcheck workarounds If we define die() as a variadic macro, passing __VA_ARGS__ to err(), and calling exit() outside err() itself, we can drop the workarounds introduced in commit `36f0199f6e` ("conf, tap: Silence two false positive invalidFunctionArg from cppcheck"). Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-02-27 18:55:57 +01:00
Stefano Brivio	bad2526872	conf, udp: Allow any loopback address to be used as resolver Andrea reports that with a Fedora 37 guest running on a Fedora 37 host, both using systemd-resolved, with passt connecting them, running with default options, DNS queries don't work. systemd-resolved on the host is reachable only at the loopback address 127.0.0.53. We advertise the default gateway address to the guest as resolver, because our local address is of course unreachable from there, which means we see DNS queries directed to the default gateway, and we redirect them to 127.0.0.1. However, systemd-resolved doesn't answer on 127.0.0.1. To fix this, set @dns_match to the address of the default gateway, unless a different resolver address is explicitly configured, so that we know we explicitly have to map DNS queries, in this case, to the address of the local resolver. This means that in udp_tap_handler() we need to check, first, if the destination address of packets matches @dns_match: even if it's the address of the local gateway, we want to map that to a specific address, which isn't necessarily 127.0.0.1. Do the same for IPv6 for consistency, even though IPv6 defines a single loopback address. Reported-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-02-27 18:52:56 +01:00
Stefano Brivio	8ca907a3f0	conf: Split add_dns{4,6}() out of get_dns() The logic handling which resolvers we add, and whether to add them, is getting rather cramped in get_dns(): split it into separate functions. No functional changes intended. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-02-27 18:52:46 +01:00
Stefano Brivio	4663ccc89a	conf: Fix typo and logic in conf_ports() check for port binding Ouch, I accidentally pushed the previous change without running the tests: - we need to check, in conf_ports(), that udp_sock_init() managed to bind at least a port, not the opposite - for -T and -U, we have no way to know if we'll manage to bind the port later, so never report an error for those Fixes: `3d0de2c1d7` ("conf, tcp, udp: Exit if we fail to bind sockets for all given ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 19:59:07 +01:00
Stefano Brivio	36f0199f6e	conf, tap: Silence two false positive invalidFunctionArg from cppcheck The newly introduced die() calls exit(), but cppcheck doesn't see it and warns about possibly invalid arguments used after the check which triggers die(). Add return statements to silence the warnings. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 19:19:23 +01:00
Stefano Brivio	3d0de2c1d7	conf, tcp, udp: Exit if we fail to bind sockets for all given ports passt supports ranges of forwarded ports as well as 'all' for TCP and UDP, so it might be convenient to proceed if we fail to bind only some of the desired ports. But if we fail to bind even a single port for a given specification, we're clearly, unexpectedly, conflicting with another network service. In that case, report failure and exit. Reported-by: Yalan Zhang <yalzhang@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2023-02-16 17:33:49 +01:00
Laine Stump	a1ab1ca2ee	log a detailed error (not usage()) when there are extra non-option arguments Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:24 +01:00
Laine Stump	60bd93e91f	make conf_netns_opt() exit immediately after logging error ...and return void to simplify the caller. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:21 +01:00
Laine Stump	ead4a98111	make conf_ugid() exit immediately after logging error Again, it can then be made to return void, simplifying the caller. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:18 +01:00
Laine Stump	fe2a54e86e	make conf_pasta_ns() exit immediately after logging error As with conf_ports, this allows us to make the function return void. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:13 +01:00
Laine Stump	b20fe11b36	make conf_ports() exit immediately after logging error Rather than having conf_ports() (possibly) log an error, and then letting the caller log the entire usage() message and exit, just log the errors and exit immediately (using die()). For some errors, conf_ports would previously not log any specific message, leaving it up to the user to determine the problem by guessing. We replace all of those silent returns with die() (logging a specific error), thus permitting us to make conf_ports() return void, which simplifies the caller. While modifying the two callers to conf_ports() to not check for a return value, we can further simplify the code by removing the check for a non-null optarg, as that is guaranteed to never happen (due to prior calls to getopt_long() with "argument required" for all relevant options - getopt_long() would have already caught this error). Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:10 +01:00
Laine Stump	c864c60047	eliminate most calls to usage() in conf() Nearly all of the calls to usage() in conf() occur immediately after logging a more detailed error message, and the fact that these errors are occuring indicates that the user has already seen the passt usage message (or read the manpage). Spamming the logfile with the complete contents of the usage message serves only to obscure the more detailed error message. The only time when the full usage message should be output is if the user explicitly asks for it with -h (or its synonyms) As a start to eliminating the excessive calls to usage(), this patch replaces most calls to err() followed by usage() with a call to die() instead. A few other usage() calls remain, but their removal involves bit more nuance that should be properly explained in separate commit messages. Signed-off-by: Laine Stump <laine@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2023-02-16 17:32:07 +01:00
Richard W.M. Jones	6b4e68383c	passt, tap: Add --fd option This passes a fully connected stream socket to passt. Signed-off-by: Richard W.M. Jones <rjones@redhat.com> [sbrivio: reuse fd_tap instead of adding a new descriptor, imply --one-off on --fd, add to optstring and usage()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-25 01:40:47 +01:00
David Gibson	73d3a3e84e	tcp: Separate helpers to create ns listening sockets tcp_sock_init*() can create either sockets listening on the host, or in the pasta network namespace (with @ns==1). There are, however, a number of differences in how these two cases work in practice though. "ns" sockets are only used in pasta mode, and they always lead to spliced connections only. The functions are also only ever called in "ns" mode with a NULL address and interface name, and it doesn't really make sense for them to be called any other way. Later changes will introduce further differences in behaviour between these two cases, so it makes more sense to use separate functions for creating the ns listening sockets than the regular external/host listening sockets. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-25 01:35:02 +01:00
David Gibson	698e4fd761	style: Minor corrections to function comments Some style issues and a typo. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-25 01:34:26 +01:00
Stefano Brivio	3a2afde87d	conf, udp: Drop mostly duplicated dns_send arrays, rename related fields Given that we use just the first valid DNS resolver address configured, or read from resolv.conf(5) on the host, to forward DNS queries to, in case --dns-forward is used, we don't need to duplicate dns[] to dns_send[]: - rename dns_send[] back to dns[]: those are the resolvers we advertise to the guest/container - for forwarding purposes, instead of dns[], use a single field (for each protocol version): dns_host - and rename dns_fwd to dns_match, so that it's clear this is the address we are matching DNS queries against, to decide if they need to be forwarded Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-11-16 15:09:31 +01:00
Stefano Brivio	4129764eca	conf: Fix mask calculation from prefix_len in conf_print() Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `dd09cceaee` ("Minor improvements to IPv4 netmask handling") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-11-10 11:18:09 +01:00
Stefano Brivio	73f50a76aa	conf: Split the notions of read DNS addresses and offered ones With --dns-forward, if the host has a loopback address configured as DNS server, we should actually use it to forward queries, but, if --no-map-gw is passed, we shouldn't offer the same address via DHCP, NDP and DHCPv6, because it's not going to be reachable. Problematic configuration: * systemd-resolved configuring the usual 127.0.0.53 on the host: we read that from /etc/resolv.conf * --dns-forward specified with an unrelated address, for example 198.51.100.1 We still want to forward queries to 127.0.0.53, if we receive one directed to 198.51.100.1, so we can't drop 127.0.0.53 from our list: we want to use it for forwarding. At the same time, we shouldn't offer 127.0.0.53 to the guest or container either. With this change, I'm only covering the case of automatically configured DNS servers from /etc/resolv.conf. We could extend this to addresses configured with command-line options, but I don't really see a likely use case at this point. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:32 +01:00
Stefano Brivio	7656a6f888	conf: Adjust netmask on mismatch between IPv4 address/netmask and gateway Seen in a Google Compute Engine environment with a machine configured via cloud-init-dhcp, while testing Podman integration for pasta: the assigned address has a /32 netmask, and there's a default route, which can be added on the host because there's another route, also /32, pointing to the default gateway. For example, on the host: ip -4 address add 10.156.0.2/32 dev eth0 ip -4 route add 10.156.0.1/32 dev eth0 ip -4 route add default via 10.156.0.1 This is not a valid configuration as far as I can tell: if the address is configured as /32, it shouldn't be used to reach a gateway outside its derived netmask. However, Linux allows that, and everything works. The problem comes when pasta --config-net sources address and default route from the host, and it can't configure the route in the target namespace because the gateway is invalid. That is, we would skip configuring the first route in the example, which results in the equivalent of doing: ip -4 address add 10.156.0.2/32 dev eth0 ip -4 route add default via 10.156.0.1 where, at this point, 10.156.0.1 is unreachable, and hence invalid as a gateway. Sourcing more routes than just the default is doable, but probably undesirable: pasta users want to provide connectivity to a container, not reflect exactly whatever trickery is configured on the host. Add a consistency check and an adjustment: if the configured default gateway is not reachable, shrink the given netmask until we can reach it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:32 +01:00
David Gibson	7c7b68dbe0	Use typing to reduce chances of IPv4 endianness errors We recently corrected some errors handling the endianness of IPv4 addresses. These are very easy errors to make since although we mostly store them in network endianness, we sometimes need to manipulate them in host endianness. To reduce the chances of making such mistakes again, change to always using a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network endian addresses. This makes it harder to accidentally do arithmetic or comparisons on such addresses as if they were host endian. We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to directly work with struct in_addr values. This has the additional benefit of making the IPv4 and IPv6 paths more visually similar. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:24 +01:00
David Gibson	dd3470d9a9	Use IPV4_IS_LOOPBACK more widely This macro checks if an IPv4 address is in the loopback network (127.0.0.0/8). There are two places where we open code an identical check, use the macro instead. There are also a number of places we specifically exclude the loopback address (127.0.0.1), but we should actually be excluding anything in the loopback network. Change those sites to use the macro as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:21 +01:00
David Gibson	dd09cceaee	Minor improvements to IPv4 netmask handling There are several minor problems with our parsing of IPv4 netmasks (-n). First, we don't reject nonsensical netmasks like 0.255.0.255. Address this structurally by using prefix length instead of netmask as the primary variable, only converting (and validating) when we need to. This has the added benefit of making some things more uniform with the IPv6 path. Second, when the user specifies a prefix length, we truncate the output from strtol() to an integer, which means we would treat -n 4294967320 as valid (equivalent to 24). Fix types to check for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:19 +01:00
David Gibson	2b793d94ca	Correct some missing endian conversions of IPv4 addresses The INADDR_LOOPBACK constant is in host endianness, and similarly the IN_MULTICAST macro expects a host endian address. However, there are some places in passt where we use those with network endian values. This means that passt will incorrectly allow you to set 127.0.0.1 or a multicast address as the guest address or DNS forwarding address. Add the necessary conversions to correct this. INADDR_ANY and INADDR_BROADCAST logically behave the same way, although because they're palindromes it doesn't have an effect in practice. Change them to be logically correct while we're there, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:03:58 +01:00
Stefano Brivio	7402951658	conf, passt.1: Don't imply --foreground with --debug Having -f implied by -d (and --trace) usually saves some typing, but debug mode in background (with a log file) is quite useful if pasta is started by Podman, and is probably going to be handy for passt with libvirt later, too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-27 00:17:56 +02:00
Stefano Brivio	c11277b94f	conf: Don't pass leading ~ to parse_port_range() on exclusions Commit `84fec4e998` ("Clean up parsing of port ranges") drops the strspn() call before the parsing of excluded port ranges, because now we're checking against any stray characters at every step. However, that also has the effect of passing ~ as first character to the new parse_port_range(), which makes no sense: we already checked that ~ is the first character before the call, so skip it. Alona reported this output: Invalid port specifier ~15000,~15001,~15006,~15008,~15020,~15021,~15090 while the whole specifier is indeed valid. Reported-by: Alona Paz <alkaplan@redhat.com> Fixes: `84fec4e998` ("Clean up parsing of port ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-24 14:37:22 +02:00
Stefano Brivio	3e2eb4337b	conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user() Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in the target user namespace as we isolate the process, which means we're unable to bind to low ports at that point. Bind inbound ports, and only those, before isolate_user(). Keep the handling of outbound ports (for pasta mode only) after the setup of the namespace, because that's where we'll bind them. To this end, initialise the netlink socket for the init namespace before isolate_user() as well, as we actually need to know the addresses of the upstream interface before binding ports, in case they're not explicitly passed by the user. As we now call nl_sock_init() twice, checking its return code from conf() twice looks a bit heavy: make it exit(), instead, as we can't do much if we don't have netlink sockets. While at it: - move the v4_only && v6_only options check just after the first option processing loop, as this is more strictly related to option parsing proper - update the man page, explaining that CAP_NET_BIND_SERVICE is not the preferred way to bind ports, because passt and pasta can be abused to allow other processes to make effective usage of it. Add a note about the recommended sysctl instead - simplify nl_sock_init_do() now that it's called once for each case Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	eb3d03a588	isolation: Only configure UID/GID mappings in userns when spawning shell When in passt mode, or pasta mode spawning a command, we create a userns for ourselves. This is used both to isolate the pasta/passt process itself and to run the spawned command, if any. Since `eed17a47` "Handle userns isolation and dropping root at the same time" we've handled both cases the same, configuring the UID and GID mappings in the new userns to map whichever UID we're running as to root within the userns. This mapping is desirable when spawning a shell or other command, so that the user gets a root shell with reasonably clear abilities within the userns and netns. It's not necessarily essential, though. When not spawning a shell, it doesn't really have any purpose: passt itself doesn't need to be root and can operate fine with an unmapped user (using some of the capabilities we get when entering the userns instead). Configuring the uid_map can cause problems if passt is running with any capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to allow it to forward low ports. In this case the kernel makes files in /proc/pid owned by root rather than the starting user to prevent the user from interfering with the operation of the capability-enhanced process. This includes uid_map meaning we are not able to write to it. Whether this behaviour is correct in the kernel is debatable, but in any case we might as well avoid problems by only initializing the user mappings when we really want them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	c22ebccba8	isolation: Replace drop_caps() with a version that actually does something The current implementation of drop_caps() doesn't really work because it attempts to drop capabilities from the bounding set. That's not the set that really matters, it's about limiting the abilities of things we might later exec() rather than our own capabilities. It also requires CAP_SETPCAP which we won't usually have. Replace it with a new version which uses setcap(2) to drop capabilities from the effective and permitted sets. For now we leave the inheritable set as is, since we don't want to preclude the user from passing inheritable capabilities to the command spawed by pasta. Correctly dropping caps reveals that we were relying on some capabilities we'd supposedly dropped. Re-divide the dropping of capabilities between isolate_initial(), isolate_user() and isolate_prefork() to make this work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
Stefano Brivio	57e2c066e9	conf: Drop excess colons in usage for DHCP and DNS options Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	10236de486	conf: Report usage for --no-netns-quit Fixes: `745a9ba428` ("pasta: By default, quit if filesystem-bound net namespace goes away") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	c1eff9a3c6	conf, tcp, udp: Allow specification of interface to bind to Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can bind sockets to interfaces, if they haven't been bound yet (as in bind()). Introduce an optional interface specification for forwarded ports, prefixed by %, that can be passed together with an address. Reported use case: running local services that use ports we want to have externally forwarded: https://github.com/containers/podman/issues/14425 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	a62ed181db	conf, tap: Add option to quit once the client closes the connection This is practical to avoid explicit lifecycle management in users, e.g. libvirtd, and is trivial to implement. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	e23024ccff	conf, log, Makefile: Add versioning information Add a --version option displaying that, and also include this information in the log files. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:28 +02:00
Stefano Brivio	01efc71ddd	log, conf: Add support for logging to file In some environments, such as KubeVirt pods, we might not have a system logger available. We could choose to run in foreground, but this takes away the convenient synchronisation mechanism derived from forking to background when interfaces are ready. Add optional logging to file with -l/--log-file and --log-size. Unfortunately, this means we need to duplicate features that are more appropriately implemented by a system logger, such as rotation. Keep that reasonably simple, by using fallocate() with range collapsing where supported (Linux kernel >= 3.15, extent-based ext4 and XFS) and falling back to an unsophisticated block-by-block moving of entries toward the beginning of the file once we reach the (mandatory) size limit. While at it, clarify the role of LOG_EMERG in passt.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:28 +02:00
Stefano Brivio	51fa9bfd7b	conf: Drop duplicate, diverging optstring assignments This originated as a result of copy and paste to introduce a second stage for processing options related to port forwarding, has already bitten David in the past, and just gave me hours of fun. As a matter of fact, the second set of optstring assignments was already incorrect, but it didn't matter because the first one was more restrictive, not allowing optional arguments for -P, -D, -S. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:26 +02:00
Stefano Brivio	da152331cf	Move logging functions to a new file, log.c Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:25 +02:00
David Gibson	40901c5437	cppcheck: Use inline suppression for strtok() in conf.c strtok() is non-reentrant and old-fashioned, so cppcheck would complains about its use in conf.c if it weren't suppressed. We're single threaded and strtok() is convenient though, so it's not really worth reworking at this time. Convert this to an inline suppression so it's adjacent to the code its annotating. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:22:19 +02:00
David Gibson	ab96da98cd	Don't shadow 'i' in conf_ports() The counter 'i' is used in a number of places in conf_ports(), but in one of those we unnecessarily shadow it in an inner scope. We could re-use the same 'i' every time, but each use is logically separate, so instead remove the outer declaration and declare it locally in each of the clauses where we need it. While we're there change it from a signed to unsigned int, since it's used to iterate over port numbers which are generally treated as unsigned. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:22:05 +02:00
David Gibson	68ef4931cb	Clean up parsing in conf_runas() conf_runas() handles several of the different possible cases for the --runas argument in a slightly odd order. Although it can parse both numeric UIDs/GIDs and user/group names, it can't parse a numeric UID combined with a group name or vice versa. That's not obviously useful, but it's slightly surprising gap to have. Rework the parsing to be more systematic: first split the option into user and (optional) group parts, then separately parse each part as either numeric or a name. As a bonus this removes some clang-tidy warnings. While we're there also add cppcheck suppressions for getpwnam() and getgrnam(). It complains about those because they're not reentrant. passt is single threaded though, and is always likely to be during this initialization code, even if we multithread later. There were some existing suppressions for these in the cppcheck invocation but they're no longer up to date. Replace them with inline suppressions which, being next to the code, are more likely to stay correct. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:21:58 +02:00
David Gibson	84fec4e998	Clean up parsing of port ranges conf_ports() parses ranges of ports for the -t, -u, -T and -U options. The code is quite difficult to the follow, to the point that clang-tidy and cppcheck disagree on whether one of the pointers can be NULL at some points. Rework the code with the use of two new helper functions: * parse_port_range() operates a bit like strtoul(), but can parse a whole port range specification (e.g. '80' or '1000-1015') * next_chunk() does the necessary wrapping around strchr() to advance to just after the next given delimiter, while cleanly handling if there are no more delimiters The new version is easier to follow, and also removes some cppcheck warnings. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:21:41 +02:00
David Gibson	d5b80ccc72	Fix widespread off-by-one error dealing with port numbers Port numbers (for both TCP and UDP) are 16-bit, and so fit exactly into a 'short'. USHRT_MAX is therefore the maximum port number and this is widely used in the code. Unfortunately, a lot of those places don't actually want the maximum port number (USHRT_MAX == 65535), they want the total number of ports (65536). This leads to a number of potentially nasty consequences: * We have buffer overruns on the port_fwd::delta array if we try to use port 65535 * We have similar potential overruns for the tcp_sock_* arrays * Interestingly udp_act had the correct size, but we can calculate it in a more direct manner * We have a logical overrun of the ports bitmap as well, although it will just use an unused bit in the last byte so isnt harmful * Many loops don't consider port 65535 (which does mitigate some but not all of the buffer overruns above) * In udp_invert_portmap() we incorrectly compute the reverse port translation for return packets Correct all these by using a new NUM_PORTS defined explicitly for this purpose. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	3ede07aac9	Treat port numbers as unsigned Port numbers are unsigned values, but we're storing them in (signed) int variables in some places. This isn't actually harmful, because int is large enough to hold the entire range of ports. However in places we don't want to use an in_port_t (usually to avoid overflow on the last iteration of a loop) it makes more conceptual sense to use an unsigned int. This will also avoid some problems with later cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	0d1886dca0	Pass entire port forwarding configuration substructure to conf_ports() conf_ports() switches on the optname argument to select the target array for several updates. Now that all these maps are in a common structure, we can simplify by just passing in a pointer to the whole struct port_fwd to update. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	f5a31ee94c	Don't use indirect remap functions for conf_ports() Now that we've delayed initialization of the UDP specific "reverse" map until udp_init(), the only difference between the various 'remap' functions used in conf_ports() is which array they target. So, simplify by open coding the logic into conf_ports() with a pointer to the correct mapping array. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	163dc5f188	Consolidate port forwarding configuration into a common structure The configuration for how to forward ports in and out of the guest/ns is divided between several different variables. For each connect direction and protocol we have a mode in the udp/tcp context structure, a bitmap of which ports to forward also in the context structure and an array of deltas to apply if the outward facing and inward facing port numbers are different. This last is a separate global variable, rather than being in the context structure, for no particular reason. UDP also requires an additional array which has the reverse mapping used for return packets. Consolidate these into a re-used substructure in the context structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	1128fa03fe	Improve types and names for port forwarding configuration enum conf_port_type is local to conf.c and is used to track the port forwarding mode during configuration. We don't keep it around in the context structure, however the 'init_detect_ports' and 'ns_detect_ports' fields in the context are based solely on this. Rather than changing encoding, just include the forwarding mode into the context structure. Move the type definition to a new port_fwd.h, which is kind of trivial at the moment but will have more stuff later. While we're there, "conf_port_type" doesn't really convey that this enum is describing how port forwarding is configured. Rename it to port_fwd_mode. The variables (now fields) of this type also have mildly confusing names since it's not immediately obvious whether 'ns' and 'init' refer to the source or destination of the packets. Use "in" (host to guest / init to ns) and "out" (guest to host / ns to init) instead. This has the added bonus that we no longer have locals 'udp_init' and 'tcp_init' which shadow global functions. In addition, add a typedef 'port_fwd_map' for a bitmap of each port number, which is used in several places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
Stefano Brivio	4a1b675278	conf, tcp, udp: Arrays for ports need 2^16 values, not 2^16-8 Reported by David but also by Coverity (CWE-119): In conf_ports: Out-of-bounds access to a buffer ...not in practice, because the allocation size is rounded up anyway, but not nice either. Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-22 16:54:09 +02:00
David Gibson	ef6da15732	Allow --userns when pasta spawns a command Currently --userns is only allowed when pasta is attaching to an existing netns or PID, and is prohibited when creating a new netns by spawning a command or shell. With the new handling of userns, this check isn't neccessary. I'm not sure if there's any use case for --userns with a spawned command, but it's strictly more flexible and requires zero extra code, so we might as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	eed17a47fe	Handle userns isolation and dropping root at the same time passt/pasta can interact with user namespaces in a number of ways: 1) With --netns-only we'll remain in our original user namespace 2) With --userns or a PID option to pasta we'll join either the given user namespace or that of the PID 3) When pasta spawns a shell or command we'll start a new user namespace for the command and then join it 4) With passt we'll create a new user namespace when we sandbox() ourself However (3) and (4) turn out to have essentially the same effect. In both cases we create one new user namespace. The spawned command starts there, and passt/pasta itself will live there from sandbox() onwards. Because of this, we can simplify user namespace handling by moving the userns handling earlier, to the same point we drop root in the original namespace. Extend the drop_user() function to isolate_user() which does both. After switching UID and GID in the original userns, isolate_user() will either join or create the userns we require. When we spawn a command with pasta_start_ns()/pasta_setup_ns() we no longer need to create a userns, because we're already made one. sandbox() likewise no longer needs to create (or join) an userns because we're already in the one we need. We no longer need c->pasta_userns_fd, since the fd is only used locally in isolate_user(). Likewise we can replace c->netns_only with a local in conf(), since it's not used outside there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	fc1be3d5ab	Clean up and rename conf_ns_open() conf_ns_open() opens file descriptors for the namespaces pasta needs, but it doesnt really have anything to do with configuration any more. For better clarity, move it to pasta.c and rename it pasta_open_ns(). This makes the symmetry between it and pasta_start_ns() more clear, since these represent the two basic ways that pasta can operate, either attaching to an existing namespace/process or spawning a new one. Since its no longer validating options, the errors it could return shouldn't cause a usage message. Just exit directly with an error instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	e8b19a4bd2	Consolidate validation of pasta namespace options There are a number of different ways to specify namespaces for pasta to use. Some combinations are valid and some are not. Currently validation for these is spread across several places: conf_ns_pid() validates PID options specifically. Near its callsite in conf() several other checks are made. Some additional checks are made in conf_ns_open() and finally theres a check just before the call to pasta_start_ns(). This is quite hard to follow. Make it easier by putting all the validation logic together in a new conf_pasta_ns() function, which subsumes conf_ns_pid(). This reveals that some of the checks were redundant with each other, so remove those. For good measure, rename conf_netns() to conf_netns_opt() to make it clearer its handling just the --netns option specifically, not overall configuration of the netns. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	d72a1e7bb9	Move self-isolation code into a separate file passt/pasta contains a number of routines designed to isolate passt from the rest of the system for security. These are spread through util.c and passt.c. Move them together into a new isolation.c file. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	5d3b50c100	Safer handling if we can't open /proc/self/uid_map passt is allowed to run as "root" (UID 0) in a user namespace, but notas real root in the init namespace. We read /proc/self/uid_map to determine if we're in the init namespace or not. If we're unable to open /proc/self/uid_map we assume we're ok and continue running as UID 0. This seems unwise. The only instances I can think of where uid_map won't be available are if the host kernel doesn't support namespaces, or /proc is not mounted. In neither case is it safe to assume we're "not really" root and continue (although in practice we'd likely fail for other reasons pretty soon anyway). Therefore, fail with an error in this case, instead of carrying on. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	80d7012b09	Consolidate determination of UID/GID to run as Currently the logic to work out what UID and GID we will run as is spread across conf(). If --runas is specified it's handled in conf_runas(), otherwise it's handled by check_root(), which depends on initialization of the uid and gid variables by either conf() itself or conf_runas(). Make this clearer by putting all the UID and GID logic into a single conf_ugid() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	10c6347747	Split checking for root from dropping root privilege check_root() both checks to see if we are root (in the init namespace), and if we are drops to an unprivileged user. To make future cleanups simpler, split the checking for root (now in check_root()) from the actual dropping of privilege (now in drop_root()). Note that this does slightly alter semantics. Previously we would only setuid() if we were originally root (in the init namespace). Now we will always setuid() and setgid(), though it won't actually change anything if we weren't privileged to begin with. This also means that we will now always attempt to switch to the user specified with --runas, even if we aren't (init namespace) root to begin with. Obviously this will fail with an error if we weren't privileged to start with. --help and the man page are updated accordingly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
David Gibson	7330ae3abf	Don't store UID & GID persistently in the context structure c->uid and c->gid are first set in conf(), and last used in check_root() itself called from conf(). Therefore these don't need to be fields in the long lived context structure and can instead be locals in conf(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 05:31:51 +02:00
Stefano Brivio	bac7dfebe4	conf: Fix getopt_long() optstring for current semantics of -D, -S, -p Declaring them as required_argument in the longopts array specifies validation, but doesn't affect how optind is increased after parsing their values. Currently, passing one of these options as last option causes pasta to handle their own values as path to a binary to execute. Fixes: `aae2a9bbf7` ("conf: Use "-D none" and "-S none" instead of missing empty option arguments") Fixes: `bf95322fc1` ("conf: Make the argument to --pcap option mandatory") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-02 17:07:45 +02:00
David Gibson	1392bc5ca0	Allow pasta to take a command to execute When not given an existing PID or network namspace to attach to, pasta spawns a shell. Most commands which can spawn a shell in an altered environment can also run other commands in that same environment, which can be useful in automation. Allow pasta to do the same thing; it can be given an arbitrary command to run in the network and user namespace which pasta creates. If neither a command nor an existing PID or netns to attach to is given, continue to spawn a default shell, as before. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:43:31 +02:00
David Gibson	c188736cd8	Use explicit --netns option rather than multiplexing with PID When attaching to an existing namespace, pasta can take a PID or the name or path of a network namespace as a non-option parameter. We disambiguate based on what the parameter looks like. Make this more explicit by using a --netns option for explicitly giving the path or name, and treating a non-option argument always as a PID. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Fix typo in man page] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-08-30 19:43:31 +02:00
David Gibson	9e0dbc8948	More deterministic detection of whether argument is a PID, PATH or NAME pasta takes as its only non-option argument either a PID to attach to the namespaces of, a PATH to a network namespace or a NAME of a network namespace (relative to /run/netns). Currently to determine which it is we try all 3 in that order, and if anything goes wrong we move onto the next. This has the potential to cause very confusing failure modes. e.g. if the argument is intended to be a network namespace name, but a (non-namespace) file of the same name exists in the current directory. Make behaviour more predictable by choosing how to treat the argument based only on the argument's contents, not anything else on the system: - If it's a decimal integer treat it as a PID - Otherwise, if it has no '/' characters, treat it as a netns name (ip-netns doesn't allow '/' in netns names) - Otherwise, treat it as a netns path If you want to open a persistent netns in the current directory, you can use './netns'. This also allows us to split the parsing of the PID\|PATH\|NAME option from the actual opening of the namespaces. In turn that allows us to put the opening of existing namespaces next to the opening of new namespaces in pasta_start_ns. That makes the logical flow easier to follow and will enable later cleanups. Caveats: - The separation of functions mean we will always generate the basename and dirname for the netns_quit system, even when using PID namespaces. This is pointless, since the netns_quit system doesn't work for non persistent namespaces, but is harmless. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:43:31 +02:00
David Gibson	70389d3640	Move ENOENT error message into conf_ns_opt() After calling conf_ns_opt() we check for -ENOENT and print an error message, but conf_ns_opt() prints messages for other errors itself. For consistency move the ENOENT message into conf_ns_opt() as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:43:31 +02:00
David Gibson	8de488892f	Remove --nsrun-dir option pasta can identify a netns as a "name", which is to say a path relative to (usually) /run/netns, which is the place that ip(8) creates persistent network namespaces. Alternatively a full path to a netns can be given. The --nsrun-dir option allows the user to change the standard path where netns names are resolved. However, there's no real point to this, if the user wants to override the location of the netns, they can just as easily use the full path to specify the netns. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:43:31 +02:00
David Gibson	aae2a9bbf7	conf: Use "-D none" and "-S none" instead of missing empty option arguments Both the -D (--dns) and -S (--search) options take an optional argument. If the argument is omitted the option is disabled entirely. However, handling the optional argument requires some ugly special case handling if it's the last option on the command line, and has potential ambiguity with non-option arguments used with pasta. It can also make it more confusing to read command lines. Simplify the logic here by replacing the non-argument versions with an explicit "-D none" or "-S none". Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Reworked logic to exclude redundant/conflicting options] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-08-30 19:42:52 +02:00
David Gibson	bf95322fc1	conf: Make the argument to --pcap option mandatory The --pcap or -p option can be used with or without an argument. If given, the argument gives the name of the file to save a packet trace to. If omitted, we generate a default name in /tmp. Generating the default name isn't particularly useful though, since making a suitable name can easily be done by the caller. Remove this feature. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-30 19:17:57 +02:00
David Gibson	60ffc5b6cb	Don't unnecessarily avoid CLOEXEC flags There are several places in the passt code where we have lint overrides because we're not adding CLOEXEC flags to open or other operations. Comments suggest this is because it's before we fork() into the background but we'll need those file descriptors after we're in the background. However, as the name suggests CLOEXEC closes on exec(), not on fork(). The only place we exec() is either super early invoke the avx2 version of the binary, or when we start a shell in pasta mode, which certainly doesn't require the fds in question. Add the CLOEXEC flag in those places, and remove the lint overrides. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-24 18:01:48 +02:00
David Gibson	cc287af173	conf: Fix incorrect bounds checking for sock_path parameter Looks like a copy-paste error where we're checking against the size of the pcap field, rather than the sock_path field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-08-24 18:01:48 +02:00
David Gibson	16f5586bb8	Make substructures for IPv4 and IPv6 specific context information The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-07-30 22:14:07 +02:00
David Gibson	5e12d23acb	Separate IPv4 and IPv6 configuration After recent changes, conf_ip() now has essentially entirely disjoint paths for IPv4 and IPv6 configuration. So, it's cleaner to split them out into different functions conf_ip4() and conf_ip6(). Splitting these out also lets us make the interface a bit nicer, having them return success or failure directly, rather than manipulating c->v4 and c->v6 to indicate success/failure of the two versions. Since these functions may also initialize the interface index for each protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since a 0 interface index is never valid). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Whitespace fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-30 22:12:50 +02:00
David Gibson	c984ee5afd	Clarify semantics of c->v4 and c->v6 variables The v4 and v6 fields of the context structure can be confusing, because they change meaning part way through the code: Before conf_ip(), they are booleans which indicate whether the -4 or -6 options have been given. After conf_ip() they are DISABLED\|ENABLED\|PROBE enums which indicate whether the IP version is available (which means both that it was allowed on the command line and we were able to configure it). The PROBE variant of the enum is only used locally within conf_ip() and since recent changes there it no longer has a real purpose different from ENABLED. Simplify this all by making the context fields always just a boolean indicating the availability of the IP version. They both default to 1, but can be set to 0 by either command line options or configuration failures. We use some local variables in conf() for tracking the state of the command line options on their own. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor coding style fix in conf.c] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-30 22:09:38 +02:00
David Gibson	4bc883aeab	Move passt mac_guest init to be more symmetric with pasta In pasta mode, the guest's MAC address is set up in pasta_ns_cobf() called from tap_sock_tun_init(). If we have a guest MAC configured with --ns-mac-addr, this will set the given MAC on the kernel tuntap device, or if we haven't configured one it will update our record of the guest MAC to the kernel assigned one from the device. For passt, we don't initially know the guest's MAC until we receive packets from it, so we have to initially use a broadcast address. This is - oddly - set up in an entirely different place, in conf_ip() conditional on the mode. Move it to the logically matching place for passt - tap_sock_unix_init(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-07-30 21:57:50 +02:00
David Gibson	3f19072640	Initialize host side MAC when in IPv6 only mode When sending packets to the guest we need a source MAC address, which we currently take from the host side interface we're using (though it's basically arbitrary). However if not given on the command line this MAC is initialized in an IPv4 specific path, and will end up as 00:00:00:00:00:00 when running "passt 6". The MAC address is also used for IPv6 packets, though. Interestingly, we largely seem to get away with using an all-zero MAC, but it's probably not a good idea. Make the IPv6 path pick the MAC address from its interface if the IPv4 path hasn't already done so. While we're there, use the existing MAC_IS_ZERO macro to make the code a little clearer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-07-30 21:57:50 +02:00

1 2 3 4 5

210 commits