passt

mirror of https://passt.top/passt synced 2025-11-03 10:59:13 +01:00

Author	SHA1	Message	Date
Stefano Brivio	07c2d584b3	conf: Include libgen.h for basename(), fix build against musl Fixes: `4b17d042c7` ("conf: Move mode detection into helper function") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-20 05:50:49 +01:00
David Gibson	96fe5548cb	conf: Unify several paths in conf_ports() In conf_ports() we have three different paths which actually do the setup of an individual forwarded port: one for the "all" case, one for the exclusions only case and one for the range of ports with possible exclusions case. We can unify those cases using a new helper which handles a single range of ports, with a bitmap of exclusions. Although this is slightly longer (largely due to the new helpers function comment), it reduces duplicated logic. It will also make future improvements to the tracking of port forwards easier. The new conf_ports_range_except() function has a pretty prodigious parameter list, but I still think it's an overall improvement in conceptual complexity. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-14 23:40:23 +01:00
David Gibson	26df8a3608	conf: Limit maximum MTU based on backend frame size The -m option controls the MTU, that is the maximum transmissible L3 datagram, not including L2 headers. We currently limit it to ETH_MAX_MTU which sounds like it makes sense. But ETH_MAX_MTU is confusing: it's not consistently used as to whether it means the maximum L3 datagram size or the maximum L2 frame size. Even within conf() we explicitly account for the L2 header size when computing the default --mtu value, but not when we compute the maximum --mtu value. Clean this up by reworking the maximum MTU computation to be the minimum of IP_MAX_MTU (65535) and the maximum sized IP datagram which can fit into our L2 frames when we account for the L2 header. The latter can vary depending on our tap backend, although it doesn't right now. Link: https://bugs.passt.top/show_bug.cgi?id=66 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	74cd82adc8	conf: Detect vhost-user mode earlier We detect our operating mode in conf_mode(), unless we're using vhost-user mode, in which case we change it later when we parse the --vhost-user option. That means we need to delay parsing the --repair-path option (for vhost-user only) until still later. However, there are many other places in the main option parsing loop which also rely on mode. We get away with those, because they happen to be able to treat passt and vhost-user modes identically. This is potentially confusing, though. So, move setting of MODE_VU into conf_mode() so c->mode always has its final value from that point onwards. To match, we move the parsing of --repair-path back into the main option parsing loop. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	4b17d042c7	conf: Move mode detection into helper function One of the first things we need to do is determine if we're in passt mode or pasta mode. Currently this is open-coded in main(), by examining argv[0]. We want to complexify this a bit in future to cover vhost-user mode as well. Prepare for this, by moving the mode detection into a new conf_mode() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	bb00a0499f	conf: Use the same optstring for passt and pasta modes Currently we rely on detecting our mode first and use different sets of (single character) options for each. This means that if you use an option valid in only one mode in another you'll get the generic usage() message. We can give more helpful errors with little extra effort by combining all the options into a single value of the option string and giving bespoke messages if an option for the wrong mode is used; in fact we already did this for some single mode options like '-1'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	1924e25f07	conf: Be more precise about minimum MTUs Currently we reject the -m option if given a value less than ETH_MIN_MTU (68). That define is derived from the kernel, but its name is misleading: it doesn't really have anything to do with Ethernet per se, but is rather the minimum payload any L2 link must be able to handle in order to carry IPv4. For IPv6, it's not sufficient: that requires an MTU of at least 1280. Newer kernels have better named constants IPV4_MIN_MTU and IPv6_MIN_MTU. Copy and use those constants instead, along with some more specific error messages. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-06 20:17:23 +01:00
David Gibson	1cc5d4c9fe	conf: Use 0 instead of -1 as "unassigned" mtu value On the command line -m 0 means "don't assign an MTU" (letting the guest use its default. However, internally we use (c->mtu == -1) to represent that state. We use (c->mtu == 0) to represent "the user didn't specify on the command line, so use the default" - but this is only used during conf(), never afterwards. This is unnecessarily confusing. We can instead just initialise c->mtu to its default (65520) before parsing options and use 0 on both the command line and internally to represent the "don't assign" special case. This ensures that c->mtu is always 0..65535, so we can store it in a uint16_t which is more natural. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:41 +01:00
David Gibson	3dc7da68a2	conf: More thorough error checking when parsing --mtu option We're a bit sloppy with parsing MTU which can lead to some surprising, though fairly harmless, results: * Passing a non-number like '-m xyz' will not give an error and act like -m 0 * Junk after a number (e.g. '-m 1500pqr') will be ignored rather than giving an error * We parse the MTU as a long, then immediately assign to an int, so on some platforms certain ludicrously out of bounds values will be silently truncated, rather than giving an error Be a bit more thorough with the error checking to avoid that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:38 +01:00
Stefano Brivio	b899141ad5	Add interfaces and configuration bits for passt-repair In vhost-user mode, by default, create a second UNIX domain socket accepting connections from passt-repair, with the usual listener socket. When we need to set or clear TCP_REPAIR on sockets, we'll send them via SCM_RIGHTS to passt-repair, who sets the socket option values we ask for. To that end, introduce batched functions to request TCP_REPAIR settings on sockets, so that we don't have to send a single message for each socket, on migration. When needed, repair_flush() will send the message and check for the reply. Co-authored-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-12 19:47:28 +01:00
Enrique Llorente	31e8109a86	dhcp, dhcpv6: Add hostname and client fqdn ops Both DHCPv4 and DHCPv6 has the capability to pass the hostname to clients, the DHCPv4 uses option 12 (hostname) while the DHCPv6 uses option 39 (client fqdn), for some virt deployments like kubevirt is expected to have the VirtualMachine name as the guest hostname. This change add the following arguments: - -H --hostname NAME to configure the hostname DHCPv4 option(12) - --fqdn NAME to configure client fqdn option for both DHCPv4(81) and DHCPv6(39) Signed-off-by: Enrique Llorente <ellorent@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-10 18:30:24 +01:00
Stefano Brivio	a3d142a6f6	conf: Don't map DNS traffic to host, if host gateway is a resolver This should be a relatively common case and I'm a bit surprised it's been broken since I added the "gateway mapping" functionality, but it doesn't happen with Podman, and not with systemd-resolved or similar local proxies, and also not with servers where typically the gateway is just a router and not a DNS resolver. That could be the reason why nobody noticed until now. By default, we'll map the address of the default gateway, in containers and guests, to represent "the host", so that we have a well-defined way to reach the host. Say: 0.0029: NAT to host 127.0.0.1: 192.168.100.1 But if the host gateway is also a DNS resolver: 0.0029: DNS: 0.0029: 192.168.100.1 then we'll send DNS queries directed to it to the host instead: 0.0372: Flow 0 (INI): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => ? 0.0372: Flow 0 (TGT): INI -> TGT 0.0373: Flow 0 (TGT): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => HOST [0.0.0.0]:41892 -> [127.0.0.1]:53 0.0373: Flow 0 (UDP flow): TGT -> TYPED 0.0373: Flow 0 (UDP flow): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => HOST [0.0.0.0]:41892 -> [127.0.0.1]:53 0.0373: Flow 0 (UDP flow): Side 0 hash table insert: bucket: 31049 0.0374: Flow 0 (UDP flow): TYPED -> ACTIVE 0.0374: Flow 0 (UDP flow): TAP [192.168.100.157]:41892 -> [192.168.100.1]:53 => HOST [0.0.0.0]:41892 -> [127.0.0.1]:53 which doesn't quite work, of course: 0.0374: pasta: epoll event on UDP reply socket 95 (events: 0x00000008) 0.0374: ICMP error on UDP socket 95: Connection refused unless the host is a resolver itself... but then we wouldn't find the address of the gateway in its /etc/resolv.conf, presumably. Fix this by making an exception for DNS traffic: if the default gateway is a resolver, match on DNS traffic going to the default gateway, and explicitly forward it to the configured resolver. Reported-by: Prafulla Giri <prafulla.giri@protonmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-09 08:17:06 +01:00
Stefano Brivio	a5cca995de	conf, passt.1: Un-deprecate --host-lo-to-ns-lo It was established behaviour, and it's now the third report about it: users ask how to achieve the same functionality, and we don't have a better answer yet. The idea behind declaring it deprecated to start with, I guess, was that we would eventually replace it by more flexible and generic configuration options, which is still planned. But there's nothing preventing us to alias this in the future to a particular configuration. So, stop scaring users off, and un-deprecate this. Link: https://archives.passt.top/passt-dev/20240925102009.62b9a0ce@elisabeth/ Link: https://github.com/rootless-containers/rootlesskit/pull/482#issuecomment-2591855705 Link: https://github.com/moby/moby/issues/48838 Link: https://github.com/containers/podman/discussions/25243 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-02-06 11:14:30 +01:00
Paul Holzinger	d0006fa784	treewide: use _exit() over exit() In the podman CI I noticed many seccomp denials in our logs even though tests passed: comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 arch=c000003e syscall=202 compat=0 ip=0x7fb3d31f69db code=0x80000000 Which is futex being called and blocked by the pasta profile. After a few tries I managed to reproduce locally with this loop in ~20 min: while :; do podman run -d --network bridge quay.io/libpod/testimage:20241011 \ sleep 100 && \ sleep 10 && \ podman rm -fa -t0 done And using a pasta version with prctl(PR_SET_DUMPABLE, 1); set I got the following stack trace: Stack trace of thread 1: #0 0x00007fc95e6de91b __lll_lock_wait_private (libc.so.6 + 0x9491b) #1 0x00007fc95e68d6de __run_exit_handlers (libc.so.6 + 0x436de) #2 0x00007fc95e68d70e exit (libc.so.6 + 0x4370e) #3 0x000055f31b78c50b n/a (n/a + 0x0) #4 0x00007fc95e68d70e exit (libc.so.6 + 0x4370e) #5 0x000055f31b78d5a2 n/a (n/a + 0x0) Pasta got killed in exit(), it seems glibc is trying to use a lock when running exit handlers even though no exit handlers are defined. Given no exit handlers are needed we can call _exit() instead. This skips exit handlers and does not flush stdio streams compared to exit() which should be fine for the use here. Based on the input from Stefano I did not change the test/doc programs or qrap as they do not use seccomp filters. Signed-off-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-05 15:19:02 +01:00
Stefano Brivio	09478d55fe	treewide: Dodge dynamic memory allocation in strerror() from glibc > 2.40 With glibc commit 25a5eb4010df ("string: strerror, strsignal cannot use buffer after dlmopen (bug 32026)"), strerror() now needs, at least on x86, the getrandom() and brk() system calls, in order to fill in the locale-translated error message. But getrandom() and brk() are not allowed by our seccomp profiles. This became visible on Fedora Rawhide with the "podman login and logout" Podman tests, defined at test/e2e/login_logout_test.go in the Podman source tree, where pasta would terminate upon printing error descriptions (at least the ones related to the SO_ERROR queue for spliced connections). Avoid dynamic memory allocation by calling strerrordesc_np() instead, which is a GNU function returning a static, untranslated version of the error description. If it's not available, keep calling strerror(), which at that point should be simple enough as to be usable (at least, that's currently the case for musl). Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/24804 Analysed-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Paul Holzinger <pholzing@redhat.com>	2024-12-11 12:21:23 +01:00
Jon Maloy	e24f026222	pasta: make it possible to disable socket splicing During testing it is sometimes useful to force traffic which would normally be forwared by socket splicing through the tap interface. In this commit, we add a command switch enabling such funtionality for inbound local traffic. For outbound local traffic this is much trickier, if even possible, so leave that for a later commit. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-11 01:47:37 +01:00
Laurent Vivier	28997fcb29	vhost-user: add vhost-user add virtio and vhost-user functions to connect with QEMU. $ ./passt --vhost-user and # qemu-system-x86_64 ... -m 4G \ -object memory-backend-memfd,id=memfd0,share=on,size=4G \ -numa node,memdev=memfd0 \ -chardev socket,id=chr0,path=/tmp/passt_1.socket \ -netdev vhost-user,id=netdev0,chardev=chr0 \ -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \ ... Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: as suggested by lvivier, include <netinet/if_ether.h> before including <linux/if_ether.h> as C libraries such as musl __UAPI_DEF_ETHHDR in <netinet/if_ether.h> if they already have a definition of struct ethhdr] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-27 16:47:32 +01:00
Stefano Brivio	14b84a7f07	treewide: Introduce 'local mode' for disconnected setups There are setups where no host interface is available or configured at all, intentionally or not, temporarily or not, but users expect (Podman) containers to run in any case as they did with slirp4netns, and we're now getting reports that we broke such setups at a rather alarming rate. To this end, if we don't find any usable host interface, instead of exiting: - for IPv4, use 169.254.2.1 as guest/container address and 169.254.2.2 as default gateway - for IPv6, don't assign any address (forcibly disable DHCPv6), and use the first link-local address we observe to represent the guest/container. Advertise fe80::1 as default gateway - use 'tap0' as default interface name for pasta Change ifi4 and ifi6 in struct ctx to int and accept a special -1 value meaning that no host interface was selected, but the IP family is enabled. The fact that the kernel uses unsigned int values for those is not an issue as 1. one can't create so many interfaces anyway and 2. we otherwise handle those values transparently. Fix a botched conditional in conf_print() to actually skip printing DHCPv6 information if DHCPv6 is disabled (and skip printing NDP information if NDP is disabled). Link: https://github.com/containers/podman/issues/24614 Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-27 05:16:38 +01:00
Stefano Brivio	6819b2e102	conf, passt.1: Update --mac-addr default in usage() and man page Fixes: `90e83d50a9` ("Don't take "our" MAC address from the host") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-26 08:30:18 +01:00
David Gibson	93bce404c1	Makefile: Move NETNS_RUN_DIR definition to C code NETNS_RUN_DIR is set in the Makefile, then passed into the C code with -D. But NETNS_RUN_DIR is just a fixed string, it doesn't depend on any make probes or variables, so there's really no reason to handle it via the Makefile. Just move it to a plain #define in conf.c. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:52 +01:00
Stefano Brivio	59fe34ee36	treewide: Suppress clang-tidy warning if we already use O_CLOEXEC In pcap_init(), we should always open the packet capture file with O_CLOEXEC, even if we're not running in foreground: O_CLOEXEC means close-on-exec, not close-on-fork. In logfile_init() and pidfile_open(), the fact that we pass a third 'mode' argument to open() seems to confuse the android-cloexec-open checker in LLVM versions from 16 to 19 (at least). The checker is suggesting to add O_CLOEXEC to 'mode', and not in 'flags', where we already have it. Add a suppression for clang-tidy and a comment, and avoid repeating those three times by adding a new helper, output_file_open(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-30 12:37:31 +01:00
Stefano Brivio	744247856d	treewide: Silence cert-err33-c clang-tidy warnings for fprintf() We use fprintf() to print to standard output or standard error streams. If something gets truncated or there's an output error, we don't really want to try and report that, and at the same time it's not abnormal behaviour upon which we should terminate, either. Just silence the warning with an ugly FPRINTF() variadic macro casting the fprintf() expressions to void. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-10-30 12:37:31 +01:00
Stefano Brivio	98efe7c2fd	treewide: Comply with CERT C rule ERR33-C for snprintf() clang-tidy, starting from LLVM version 16, up to at least LLVM version 19, now checks that we detect and handle errors for snprintf() as requested by CERT C rule ERR33-C. These warnings were logged with LLVM version 19.1.2 (at least Debian and Fedora match): /home/sbrivio/passt/arch.c:43:3: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 43 \| snprintf(new_path, PATH_MAX + sizeof(".avx2"), "%s.avx2", exe); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/sbrivio/passt/arch.c:43:3: note: cast the expression to void to silence this warning /home/sbrivio/passt/conf.c:577:4: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 577 \| snprintf(netns, PATH_MAX, "/proc/%ld/ns/net", pidval); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/sbrivio/passt/conf.c:577:4: note: cast the expression to void to silence this warning /home/sbrivio/passt/conf.c:579:5: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 579 \| snprintf(userns, PATH_MAX, "/proc/%ld/ns/user", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 580 \| pidval); \| ~~~~~~~ /home/sbrivio/passt/conf.c:579:5: note: cast the expression to void to silence this warning /home/sbrivio/passt/pasta.c:105:2: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 105 \| snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/sbrivio/passt/pasta.c:105:2: note: cast the expression to void to silence this warning /home/sbrivio/passt/pasta.c:242:2: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 242 \| snprintf(uidmap, BUFSIZ, "0 %u 1", uid); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/sbrivio/passt/pasta.c:242:2: note: cast the expression to void to silence this warning /home/sbrivio/passt/pasta.c:243:2: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 243 \| snprintf(gidmap, BUFSIZ, "0 %u 1", gid); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/sbrivio/passt/pasta.c:243:2: note: cast the expression to void to silence this warning /home/sbrivio/passt/tap.c:1155:4: error: the value returned by this function should not be disregarded; neglecting it may lead to errors [cert-err33-c,-warnings-as-errors] 1155 \| snprintf(path, UNIX_PATH_MAX - 1, UNIX_SOCK_PATH, i); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/sbrivio/passt/tap.c:1155:4: note: cast the expression to void to silence this warning Don't silence the warnings as they might actually have some merit. Add an snprintf_check() function, instead, checking that we're not truncating messages while printing to buffers, and terminate if the check fails. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-10-30 12:37:25 +01:00
David Gibson	b4dace8f46	fwd: Direct inbound spliced forwards to the guest's external address In pasta mode, where addressing permits we "splice" connections, forwarding directly from host socket to guest/container socket without any L2 or L3 processing. This gives us a very large performance improvement when it's possible. Since the traffic is from a local socket within the guest, it will go over the guest's 'lo' interface, and accordingly we set the guest side address to be the loopback address. However this has a surprising side effect: sometimes guests will run services that are only supposed to be used within the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's forwarding exposes those services to the host, which isn't generally what we want. Correct this by instead forwarding inbound "splice" flows to the guest's external address. Link: https://github.com/containers/podman/issues/24045 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-18 20:28:03 +02:00
David Gibson	ff63ac922a	conf: Add --dns-host option to configure host side nameserver When redirecting DNS queries with the --dns-forward option, passt/pasta needs a host side nameserver to redirect the queries to. This is controlled by the c->ip[46].dns_host variables. This is set to the first first nameserver listed in the host's /etc/resolv.conf, and there isn't currently a way to override it from the command line. Prior to `0b25cac9` ("conf: Treat --dns addresses as guest visible addresses") it was possible to alter this with the -D/--dns option. However, doing so was confusing and had some nonsensical edge cases because -D generally takes guest side addresses, rather than host side addresses. Add a new --dns-host option to restore this functionality in a more sensible way. Link: https://bugs.passt.top/show_bug.cgi?id=102 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-04 19:04:29 +02:00
David Gibson	9d66df9a9a	conf: Add command line switch to enable IP_FREEBIND socket option In a couple of recent reports, we've seen that it can be useful for pasta to forward ports from addresses which are not currently configured on the host, but might be in future. That can be done with the sysctl net.ipv4.ip_nonlocal_bind, but that does require CAP_NET_ADMIN to set in the first place. We can allow the same thing on a per-socket basis with the IP_FREEBIND (or IPV6_FREEBIND) socket option. Add a --freebind command line argument to enable this socket option on all listening sockets. Link: https://bugs.passt.top/show_bug.cgi?id=101 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-04 19:04:29 +02:00
David Gibson	b55013b1a7	inany: Add inany_pton() helper We already have an inany_ntop() function to format inany addresses into text. Add inany_pton() to parse them from text, and use it in conf_ports(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2024-09-25 19:03:17 +02:00
David Gibson	cbde4192ee	tcp, udp: Make {tcp,udp}_sock_init() take an inany address tcp_sock_init() and udp_sock_init() take an address to bind to as an address family and void * pair. Use an inany instead. Formerly AF_UNSPEC was used to indicate that we want to listen on both 0.0.0.0 and ::, now use a NULL inany to indicate that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2024-09-25 19:03:16 +02:00
David Gibson	eedc81b6ef	fwd, conf: Probe host's ephemeral ports When we forward "all" ports (-t all or -u all), or use an exclude-only range, we don't actually forward all ports - that wouln't leave local ports to use for outgoing connections. Rather we forward all non-ephemeral ports - those that won't be used for outgoing connections or datagrams. Currently we assume the range of ephemeral ports is that recommended by RFC 6335, 49152-65535. However, that's not the range used by default on Linux, 32768-60999 but configurable with the net.ipv4.ip_local_port_range sysctl. We can't really know what range the guest will consider ephemeral, but if it differs too much from the host it's likely to cause problems we can't avoid anyway. So, using the host's ephemeral range is a better guess than using the RFC 6335 range. Therefore, add logic to probe the host's ephemeral range, falling back to the RFC 6335 range if that fails. This has the bonus advantage of reducing the number of ports bound by -t all -u all on most Linux machines thereby reducing kernel memory usage. Specifically this reduces kernel memory usage with -t all -u all from ~380MiB to ~289MiB. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-29 22:26:08 +02:00
David Gibson	4a41dc58d6	conf, fwd: Don't attempt to forward port 0 When using -t all, -u all or exclude-only ranges, we'll attempt to forward all non-ephemeral port numbers, including port 0. However, this won't work as intended: bind() treats a zero port not as literal port 0, but as "pick a port for me". Because of the special meaning of port 0, we mostly outright exclude it in our handling. Do the same for setting up forwards, not attempting to forward for port 0. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-29 22:26:05 +02:00
David Gibson	1daf6f4615	conf, fwd: Make ephemeral port logic more flexible "Ephemeral" ports are those which the kernel may allocate as local port numbers for outgoing connections or datagrams. Because of that, they're generally not good choices for listening servers to bind to. Thefore when using -t all, -u all or exclude-only ranges, we map only non-ephemeral ports. Our logic for this is a bit rigid though: we assume the ephemeral ports are always a fixed range at the top of the port number space. We also assume PORT_EPHEMERAL_MIN is a multiple of 8, or we won't set the forward bitmap correctly. Make the logic in conf.c more flexible, using a helper moved into fwd.[ch], although we don't change which ports we consider ephemeral (yet). The new handling is undoubtedly more computationally expensive, but since it's a once-off operation at start off, I don't think it really matters. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-29 22:25:51 +02:00
Stefano Brivio	f00ebda369	util: Don't stop on unrelated values when looking for --fd in close_open_files() Seen with krun: we get a file descriptor via --fd, but we close it and happily use the same number for TCP files. The issue is that if we also get other options before --fd, with arguments, getopt_long() stops parsing them because it sees them as non-option values. Use the - modifier at the beginning of optstring (before :, which is needed to avoid printing errors) instead of +, which means we'll continue parsing after finding unrelated option values, but getopt_long() won't reorder them anyway: they'll be passed with option value '1', which we can ignore. By the way, we also need to add : after F in the optstring, so that we're able to parse the option when given as short name as well. Now that we change the parsing mode between close_open_files() and conf(), we need to reset optind to 0, not to 1, whenever we call getopt_long() again in conf(), so that the internal initialisation of getopt_long() evaluating GNU extensions is re-triggered. Link: https://github.com/slp/krun/issues/17#issuecomment-2294943828 Fixes: `baccfb95ce` ("conf: Stop parsing options at first non-option argument") Fixes: `09603cab28` ("passt, util: Close any open file that the parent might have leaked") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-08-21 12:04:53 +02:00
David Gibson	57b7bd2a48	fwd, conf: Allow NAT of the guest's assigned address The guest is usually assigned one of the host's IP addresses. That means it can't access the host itself via its usual address. The --map-host-loopback option (enabled by default with the gateway address) allows the guest to contact the host. However, connections forwarded this way appear on the host to have originated from the loopback interface, which isn't always desirable. Add a new --map-guest-addr option, which acts similarly but forwarded connections will go to the host's external address, instead of loopback. If '-a' is used, so the guest's address is not the same as the host's, this will instead forward to whatever host-visible site is shadowed by the guest's assigned address. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:40 +02:00
David Gibson	e813a4df7d	conf: Allow address remapped to host to be configured Because the host and guest share the same IP address with passt/pasta, it's not possible for the guest to directly address the host. Therefore we allow packets from the guest going to a special "NAT to host" address to be redirected to the host, appearing there as though they have both source and destination address of loopback. Currently that special address is always the address of the default gateway (or none). That can be a problem if we want that gateway to be addressable by the guest. Therefore, allow the special "NAT to host" address to be overridden on the command line with a new --map-host-loopback option. In order to exercise and test it, update the passt_in_ns and perf tests to use this option and give different mapping addresses for the two layers of the environment. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:35 +02:00
David Gibson	935bd81936	conf, fwd: Split notion of gateway/router from guest-visible host address The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default gateway. We use this for two quite distinct things: advertising the gateway that the guest should use (via DHCP, NDP and/or --config-net) and for a limited form of NAT. So that the guest can access services on the host, we map the gateway address within the guest to the loopback address on the host. Using the gateway address for this isn't necessarily the best choice for this purpose, certainly not for all circumstances. So, start off by splitting the notion of these into two different values: @guest_gw which is the gateway address the guest should use and @nat_host_loopback, which is the guest visible address to remap to the host's loopback. Usually nat_host_loopback will have the same value as guest_gw. However when --no-map-gw is specified we leave them unspecified instead. This means when we use nat_host_loopback, we don't need to separately check c->no_map_gw to see if it's relevant. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:31 +02:00
David Gibson	90e83d50a9	Don't take "our" MAC address from the host When sending frames to the guest over the tap link, we need a source MAC address. Currently we take that from the MAC address of the main interface on the host, but that doesn't actually make much sense: * We can't preserve the real MAC address of packets from anywhere external so there's no transparency case here * In fact, it's confusingly different from how we handle IP addresses: whereas we give the guest the same IP as the host, we're making the host's MAC the one MAC that the guest can't use for itself. * We already need a fallback case if the host doesn't have an Ethernet like MAC (e.g. if it's connected via a point to point interface, such as a wireguard VPN). Change to just just use an arbitrary fixed MAC address - I've picked 9a:55:9a:55:9a:55. It's simpler and has the small advantage of making the fact that passt/pasta is in use typically obvious from guest side packet dumps. This can still, of course, be overridden with the -M option. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:28 +02:00
David Gibson	975cfa5f32	Initialise our_tap_ll to ip6.gw when suitable In every place we use our_tap_ll, we only use it as a fallback if the IPv6 gateway address is not link-local. We can avoid that conditional at use time by doing it at initialisation of our_tap_ll instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:22 +02:00
David Gibson	a42fb9c000	treewide: Change misleading 'addr_ll' name c->ip6.addr_ll is not like c->ip6.addr. The latter is an address for the guest, but the former is an address for our use on the tap link. Rename it accordingly, to 'our_tap_ll'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:16 +02:00
David Gibson	57532f1ded	conf: Remove incorrect initialisation of addr_ll_seen Despite the names, addr_ll_seen does not relate to addr_ll the same way addr_seen relates to addr. addr_ll_seen is an observed address from the guest, whereas addr_ll is our link-local address for use on the tap link when we can't use an external endpoint address. It's used both for passt provided services (DHCPv6, NDP) and in some cases for connections from addresses the guest can't access. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:10 +02:00
David Gibson	0b25cac94e	conf: Treat --dns addresses as guest visible addresses Although it's not 100% explicit in the man page, addresses given to the --dns option are intended to be addresses as seen by the guest. This differs from addresses taken from the host's /etc/resolv.conf, which must be translated to guest accessible versions in some cases. Our implementation is currently inconsistent on this: when using --dns-forward, you must usually also give --dns with the matching address, which is meaningful only in the guest's address view. However if you give --dns with a loopback addres, it will be translated like a host view address. Move the remapping logic for DNS addresses out of add_dns4() and add_dns6() into add_dns_resolv() so that it is only applied for host nameserver addresses, not for nameservers given explicitly with --dns. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:08 +02:00
David Gibson	a6066f4e27	conf: Correct setting of dns_match address in add_dns6() add_dns6() (but not add_dns4()) has a bug setting dns_match: it sets it to the given address, rather than the gateway address. This is doubly wrong: - We've just established the given address is a host loopback address the guest can't access - We've just set ip6.dns[] to tell the guest to use the gateway address, so it won't use the dns_match address we're setting Correct this to use the gateway address, like IPv4. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:06 +02:00
David Gibson	7c083ee41c	conf: Move adding of a nameserver from resolv.conf into subfunction get_dns() is already quite deeply nested, and future changes I have in mind will add more complexity. Prepare for this by splitting out the adding of a single nameserver to the configuration into its own function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:04 +02:00
David Gibson	1d10760c9f	conf: Move DNS array bounds checks into add_dns[46] Every time we call add_dns[46] we need to first check if there's space in the c->ip[46].dns array for the new entry. We might as well make that check in add_dns[46]() itself. In fact it looks like the calls in get_dns() had an off by one error, not allowing the last entry of the array to be filled. So, that bug is also fixed by the change. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:02 +02:00
David Gibson	6852bd07cc	conf: More accurately count entries added in get_dns() get_dns() counts the number of guest DNS servers it adds, and gives an error if it couldn't add any. However, this count ignores the fact that add_dns[46]() may in some cases not add an entry. Use the array indices we're already tracking to get an accurate count. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 12:00:00 +02:00
David Gibson	c679894668	conf: Use array indices rather than pointers for DNS array slots Currently add_dns[46]() take a somewhat awkward double pointer to the entry in the c->ip[46].dns array to update. It turns out to be easier to work with indices into that array instead. This diff does add some lines, but it's comments, and will allow some future code reductions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 11:59:58 +02:00
David Gibson	ceea52ca93	treewide: Use struct assignment instead of memcpy() for IP addresses We rely on C11 already, so we can use clearer and more type-checkable struct assignment instead of mempcy() for copying IP addresses around. This exposes some "pointer could be const" warnings from cppcheck, so address those too. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 11:59:56 +02:00
David Gibson	905ecd2b0b	treewide: Rename MAC address fields for clarity c->mac isn't a great name, because it doesn't say whose mac address it is and it's not necessarily obvious in all the contexts we use it. Since this is specifically the address that we (passt/pasta) use on the tap interface, rename it to "our_tap_mac". Rename the "mac_guest" field to "guest_mac" to be grammatically consistent. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 11:59:54 +02:00
David Gibson	066e69986b	util: Helper for formatting MAC addresses There are a couple of places where we somewhat messily open code formatting an Ethernet like MAC address for display. Add an eth_ntop() helper for this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 11:59:51 +02:00
David Gibson	baba284912	conf: Don't ignore -t and -u options after -D `f6d5a52392` moved handling of -D into a later loop. However as a side effect it moved this from a switch block to an if block. I left a couple of 'break' statements that don't make sense in the new context. They should be 'continue' so that we go onto the next option, rather than leaving the loop entirely. Fixes: `f6d5a52392` ("conf: Delay handling -D option until after addresses are configured") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-14 09:14:12 +02:00
David Gibson	f6d5a52392	conf: Delay handling -D option until after addresses are configured add_dns[46]() rely on the gateway address and c->no_map_gw being already initialised, in order to properly handle DNS servers which need NAT to be accessed from the guest. Usually these are called from get_dns() which is well after the addresses are configured, so that's fine. However, they can also be called earlier if an explicit -D command line option is given. In this case no_map_gw and/or c->ip[46].gw may not get be initialised properly, leading to this doing the wrong thing. Luckily we already have a second pass of option parsing for things which need addresses to already be configured. Move handling of -D to there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-12 21:29:36 +02:00

1 2 3 4 5 ...

254 commits