passt

mirror of https://passt.top/passt synced 2025-07-05 19:48:44 +02:00

Author	SHA1	Message	Date
David Gibson	59cc89f4cc	udp, udp_flow: Track our specific address on socket interfaces So far for UDP flows (like TCP connections) we didn't record our address (oaddr) in the flow table entry for socket based pifs. That's because we didn't have that information when a flow was initiated by a datagram coming to a "listening" socket with 0.0.0.0 or :: address. Even when we did have the information, we didn't record it, to simplify address matching on lookups. This meant that in some circumstances we could send replies on a UDP flow from a different address than the originating request came to, which is surprising and breaks certain setups. We now have code in udp_peek_addr() which does determine our address for incoming UDP datagrams. We can use that information to properly populate oaddr in the flow table for flow initiated from a socket. In order to be able to consistently match datagrams to flows, we must always have a specific oaddr, not an unspecified address (that's how the flow hash table works). So, we also need to fill in oaddr correctly for flows we initiate to sockets. Our forwarding logic doesn't specify oaddr here, letting the kernel decide based on the routing table. In this case we need to call getsockname() after connect()ing the socket to find which local address the kernel picked. This adds getsockname() to our seccomp profile for all variants. Link: https://bugs.passt.top/show_bug.cgi?id=99 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-10 19:46:16 +02:00
David Gibson	9725e79888	udp_flow: Don't discard packets that arrive between bind() and connect() When we establish a new UDP flow we create connect()ed sockets that will only handle datagrams for this flow. However, there is a race between bind() and connect() where they might get some packets queued for a different flow. Currently we handle this by simply discarding any queued datagrams after the connect. UDP protocols should be able to handle such packet loss, but it's not ideal. We now have the tools we need to handle this better, by redirecting any datagrams received during that race to the appropriate flow. We need to use a deferred handler for this to avoid unexpectedly re-ordering datagrams in some edge cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Update comment to udp_flow_defer()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:44:31 +02:00
David Gibson	159beefa36	udp_flow: Take pif and port as explicit parameters to udp_flow_from_sock() Currently udp_flow_from_sock() is only used when receiving a datagram from a "listening" socket. It takes the listening socket's epoll reference to get the interface and port on which the datagram arrived. We have some upcoming cases where we want to use this in different contexts, so make it take the pif and port as direct parameters instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Drop @ref from comment to udp_flow_from_sock()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:43:53 +02:00
David Gibson	d74b5a7c10	udp: Use connect()ed sockets for initiating side Currently we have an asymmetry in how we handle UDP sockets. For flows where the target side is a socket, we create a new connect()ed socket - the "reply socket" specifically for that flow used for sending and receiving datagrams on that flow and only that flow. For flows where the initiating side is a socket, we continue to use the "listening" socket (or rather, a dup() of it). This has some disadvantages: * We need a hash lookup for every datagram on the listening socket in order to work out what flow it belongs to * The dup() keeps the socket alive even if automatic forwarding removes the listening socket. However, the epoll data remains the same including containing the now stale original fd. This causes bug 103. * We can't (easily) set flow-specific options on an initiating side socket, because that could affect other flows as well Alter the code to use a connect()ed socket on the initiating side as well as the target side. There's no way to "clone and connect" the listening socket (a loose equivalent of accept() for UDP), so we have to create a new socket. We have to bind() this socket before we connect() it, which is allowed thanks to SO_REUSEADDR, but does leave a small window where it could receive datagrams not intended for this flow. For now we handle this by simply discarding any datagrams received between bind() and connect(), but I intend to improve this in a later patch. Link: https://bugs.passt.top/show_bug.cgi?id=103 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:24:36 +02:00
Jon Maloy	a7775e9550	udp: support traceroute in direction tap-socket Now that ICMP pass-through from socket-to-tap is in place, it is easy to support UDP based traceroute functionality in direction tap-to-socket. We fix that in this commit. Link: https://bugs.passt.top/show_bug.cgi?id=64 Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:23:35 +02:00
David Gibson	77883fbdd1	udp: Add helper function for creating connected UDP socket Currently udp_flow_new() open codes creating and connecting a socket to use for reply messages. We have in mind some more places to use this logic, plus it just makes for a rather large function. Split this handling out into a new udp_flow_sock() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:34 +01:00
David Gibson	37d78c9ef3	udp: Always hash socket facing flowsides For UDP packets from the tap interface (like TCP) we use a hash table to look up which flow they belong to. Unlike TCP, we sometimes also create a hash table entry for the socket side of UDP flows. We need that when we receive a UDP packet from a "listening" socket which isn't specific to a single flow. At present we only do this for the initiating side of flows, which re-use the listening socket. For the target side we use a connected "reply" socket specific to the single flow. We have in mind changes that maye introduce some edge cases were we could receive UDP packets on a non flow specific socket more often. To allow for those changes - and slightly simplifying things in the meantime - always put both sides of a UDP flow - tap or socket - in the hash table. It's not that costly, and means we always have the option of falling back to a hash lookup. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:32 +01:00
David Gibson	adb46c11d0	flow: Add flow_perror() helper Our general logging helpers include a number of _perror() variants which, like perror(3) include the description of the current errno. We didn't have those for our flow specific logging helpers, though. Fill this gap with flow_perror() and flow_dbg_perror(), and use them where it's useful. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 13:33:12 +01:00
Stefano Brivio	71fa736277	tcp_splice, udp_flow: fcntl64() support on PPC64 depends on glibc version I explicitly added fcntl64() to the list of allowed system calls for PPC64 a while ago, and now it turns out it's not available in recent Debian builds. The warning from seccomp.sh is harmless because we unconditionally try to enable fcntl() anyway, but take care of it anyway. Link: https://buildd.debian.org/status/fetch.php?pkg=passt&arch=ppc64&ver=0.0%7Egit20250121.4f2c8e7-1&stamp=1737477147&raw=0 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-02-03 22:43:35 +01:00
David Gibson	0349cf637f	util: Rename and make global vu_remove_watch() vu_remove_watch() is used in vhost_user.c to remove an fd from the global epoll set. There's nothing really vhost user specific about it though, so rename, move to util.c and use it in a bunch of places outside vhost_user.c where it makes things marginally more readable. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-03 07:32:51 +01:00
Stefano Brivio	324233bd9b	udp_flow: Don't block multicast and broadcast messages It was reported that SSDP notifications sent from a container (with e.g. minidlna) stopped appearing on the network starting from commit `1db4f773e8` ("udp: Improve detail of UDP endpoint sanity checking"). As a minimal reproducer using minidlnad(8): $ mkdir /tmp/minidlna $ cat conf media_dir=/tmp/minidlna db_dir=/tmp/minidlna $ ./pasta -d --config-net -- sh -c '/usr/sbin/minidlnad -p 31337 -S -f conf -P /dev/null & (sleep 1; killall minidlnad)' [...] 1.0327: Flow 0 (NEW): FREE -> NEW 1.0327: Flow 0 (INI): NEW -> INI 1.0327: Flow 0 (INI): TAP [88.198.0.164]:54185 -> [239.255.255.250]:1900 => ? 1.0327: Flow 0 (INI): Invalid endpoint on UDP packet 1.0327: Flow 0 (FREE): INI -> FREE 1.0328: Flow 0 (FREE): TAP [88.198.0.164]:54185 -> [239.255.255.250]:1900 => ? 1.0328: Dropping datagram with no flow TAP 88.198.0.164:54185 -> 239.255.255.250:1900 This is an actual regression as there's no particular reason to block outbound multicast UDP packets. And even if we don't handle multicast groups in any particular way (https://bugs.passt.top/show_bug.cgi?id=2, "Add IGMP/MLD proxy"), there's no reason to block inbound multicast or broadcast packets either, should they ever be somehow delivered to passt or pasta. Let multicast and broadcast packets through, refusing only to establish flows with unspecified endpoint, as those would actually cause havoc in the flow table. IP-wise, SSDP notifications look like this (after this patch), inside and outside: $ pasta -p /tmp/minidlna.pcap --config-net -- sh -c '/usr/sbin/minidlnad -p 31337 -S -f minidlna.conf -P /dev/null & (sleep 1; killall minidlnad)' [...] $ tshark -a packets:1 -r /tmp/minidlna.pcap ssdp 2 0.074808 88.198.0.164 ? 239.255.255.250 SSDP 200 NOTIFY * HTTP/1.1 # tshark -i ens3 -a packets:1 multicast 2>/dev/null 1 0.000000000 88.198.0.164 ? 239.255.255.250 SSDP 200 NOTIFY * HTTP/1.1 Link: https://github.com/containers/podman/issues/24871 Fixes: `1db4f773e8` ("udp: Improve detail of UDP endpoint sanity checking") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-01-05 23:48:59 +01:00
Stefano Brivio	09478d55fe	treewide: Dodge dynamic memory allocation in strerror() from glibc > 2.40 With glibc commit 25a5eb4010df ("string: strerror, strsignal cannot use buffer after dlmopen (bug 32026)"), strerror() now needs, at least on x86, the getrandom() and brk() system calls, in order to fill in the locale-translated error message. But getrandom() and brk() are not allowed by our seccomp profiles. This became visible on Fedora Rawhide with the "podman login and logout" Podman tests, defined at test/e2e/login_logout_test.go in the Podman source tree, where pasta would terminate upon printing error descriptions (at least the ones related to the SO_ERROR queue for spliced connections). Avoid dynamic memory allocation by calling strerrordesc_np() instead, which is a GNU function returning a static, untranslated version of the error description. If it's not available, keep calling strerror(), which at that point should be simple enough as to be usable (at least, that's currently the case for musl). Reported-by: Paul Holzinger <pholzing@redhat.com> Link: https://github.com/containers/podman/issues/24804 Analysed-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Paul Holzinger <pholzing@redhat.com>	2024-12-11 12:21:23 +01:00
David Gibson	1db4f773e8	udp: Improve detail of UDP endpoint sanity checking In udp_flow_new() we reject a flow if the endpoint isn't unicast, or it has a zero endpoint port. Those conditions aren't strictly illegal, but we can't safely handle them at present: * Multicast UDP endpoints are certainly possible, but our current flow tracking only makes sense for simple unicast flows - we'll need different handling if we want to handle multicast flows in future * It's not entirely clear if port 0 is RFC-ishly correct, but for socket interfaces port 0 sometimes has a special meaning such as "pick the port for me, kernel". That makes flows on port 0 unsafe to forward in the usual way. For the same reason we also can't safely handle port 0 as our port. In principle that's also true for our address, however in the case of flows initiated from a socket, we may not know our address since the socket could be bound to 0.0.0.0 or ::, so we can only verify that our address is unicast for flows initiated from the tap side. Refine the current check in udp_flow_new() to slightly more detailed checks in udp_flow_from_sock() and udp_flow_from_tap() to make what is and isn't handled clearer. This makes this checking more similar to what we do for TCP connections. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-12-05 21:08:58 +01:00
David Gibson	1166401c2f	udp: Allow UDP flows to be prematurely closed Unlike TCP, UDP has no in-band signalling for the end of a flow. So the only way we remove flows is on a timer if they have no activity for 180s. However, we've started to investigate some error conditions in which we want to prematurely abort / abandon a UDP flow. We can call udp_flow_close(), which will make the flow inert (sockets closed, no epoll events, can't be looked up in hash). However it will still wait 3 minutes to clear away the stale entry. Clean this up by adding an explicit 'closed' flag which will cause a flow to be more promptly cleaned up. We also publish udp_flow_close() so it can be called from other places to abort UDP flows(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-06 12:53:24 +02:00
Stefano Brivio	7291b70ba7	udp_flow: Add missing unistd.h include for close() For some reason, this is reported only with musl, and older glibc versions (2.31, at least). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-08-21 12:03:34 +02:00
Stefano Brivio	2aea1da143	treewide: Allow additional system calls for i386/i686 I haven't tested i386 for a long time (after playing with some openSUSE i586 image a couple of years ago). It turns out that a number of system calls we actually need were denied by the seccomp filter, and not even basic functionality works. Add some system calls that glibc started using with the 64-bit time ("t64") transition, see also: https://wiki.debian.org/ReleaseGoals/64bit-time that is: clock_gettime64, timerfd_gettime64, fcntl64, and recvmmsg_time64. Add further system calls that are needed regardless of time_t width, that is, mmap2 (valgrind profile only), _llseek and sigreturn (common outside x86_64), and socketcall (same as s390x). I validated this against an almost full run of the test suite, with just a few selected tests skipped. Fixes needed to run most tests on i386/i686, and other assorted fixes for tests, are included in upcoming patches. Reported-by: Uroš Knupleš <uros@knuples.net> Analysed-by: Faidon Liambotis <paravoid@debian.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1078981 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-08-21 12:00:43 +02:00
Laurent Vivier	e877f905e5	udp_flow: move all udp_flow functions to udp_flow.c No code change. They need to be exported to be available by the vhost-user version of passt. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-05 17:38:17 +02:00

17 commits