passt

mirror of https://passt.top/passt synced 2025-08-13 10:33:13 +02:00

Author	SHA1	Message	Date
David Gibson	59cc89f4cc	udp, udp_flow: Track our specific address on socket interfaces So far for UDP flows (like TCP connections) we didn't record our address (oaddr) in the flow table entry for socket based pifs. That's because we didn't have that information when a flow was initiated by a datagram coming to a "listening" socket with 0.0.0.0 or :: address. Even when we did have the information, we didn't record it, to simplify address matching on lookups. This meant that in some circumstances we could send replies on a UDP flow from a different address than the originating request came to, which is surprising and breaks certain setups. We now have code in udp_peek_addr() which does determine our address for incoming UDP datagrams. We can use that information to properly populate oaddr in the flow table for flow initiated from a socket. In order to be able to consistently match datagrams to flows, we must always have a specific oaddr, not an unspecified address (that's how the flow hash table works). So, we also need to fill in oaddr correctly for flows we initiate to sockets. Our forwarding logic doesn't specify oaddr here, letting the kernel decide based on the routing table. In this case we need to call getsockname() after connect()ing the socket to find which local address the kernel picked. This adds getsockname() to our seccomp profile for all variants. Link: https://bugs.passt.top/show_bug.cgi?id=99 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-10 19:46:16 +02:00
Stefano Brivio	06ef64cdb7	udp_flow: Save 8 bytes in struct udp_flow on 64-bit architectures Shuffle the fields just added by commits `a7775e9550` ("udp: support traceroute in direction tap-socket") and `9725e79888` ("udp_flow: Don't discard packets that arrive between bind() and connect()"). On x86_64, as reported by pahole(1), before: struct udp_flow { struct flow_common f; /* 0 76 / / --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- / _Bool closed:1; / 76: 0 1 / / XXX 7 bits hole, try to pack / _Bool flush0; / 77 1 / _Bool flush1:1; / 78: 0 1 / / XXX 7 bits hole, try to pack / / XXX 1 byte hole, try to pack / time_t ts; / 80 8 / int s[2]; / 88 8 / uint8_t ttl[2]; / 96 2 / / size: 104, cachelines: 2, members: 7 / / sum members: 95, holes: 1, sum holes: 1 / / sum bitfield members: 2 bits, bit holes: 2, sum bit holes: 14 bits / / padding: 6 / / last cacheline: 40 bytes / }; and after: struct udp_flow { struct flow_common f; / 0 76 / / --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- / uint8_t ttl[2]; / 76 2 / _Bool closed:1; / 78: 0 1 / _Bool flush0:1; / 78: 1 1 / _Bool flush1:1; / 78: 2 1 / / XXX 5 bits hole, try to pack / / XXX 1 byte hole, try to pack / time_t ts; / 80 8 / int s[2]; / 88 8 / / size: 96, cachelines: 2, members: 7 / / sum members: 94, holes: 1, sum holes: 1 / / sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits / / last cacheline: 32 bytes / }; It doesn't matter much because anyway the typical storage for struct udp_flow is given by union flow: union flow { struct flow_common f; / 0 76 / struct flow_free_cluster free; / 0 84 / struct tcp_tap_conn tcp; / 0 120 / struct tcp_splice_conn tcp_splice; / 0 120 / struct icmp_ping_flow ping; / 0 96 / struct udp_flow udp; / 0 96 */ }; but it still improves data locality somewhat, so let me fix this up now that commits are fresh. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-04-09 22:52:32 +02:00
David Gibson	9725e79888	udp_flow: Don't discard packets that arrive between bind() and connect() When we establish a new UDP flow we create connect()ed sockets that will only handle datagrams for this flow. However, there is a race between bind() and connect() where they might get some packets queued for a different flow. Currently we handle this by simply discarding any queued datagrams after the connect. UDP protocols should be able to handle such packet loss, but it's not ideal. We now have the tools we need to handle this better, by redirecting any datagrams received during that race to the appropriate flow. We need to use a deferred handler for this to avoid unexpectedly re-ordering datagrams in some edge cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Update comment to udp_flow_defer()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:44:31 +02:00
David Gibson	159beefa36	udp_flow: Take pif and port as explicit parameters to udp_flow_from_sock() Currently udp_flow_from_sock() is only used when receiving a datagram from a "listening" socket. It takes the listening socket's epoll reference to get the interface and port on which the datagram arrived. We have some upcoming cases where we want to use this in different contexts, so make it take the pif and port as direct parameters instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Drop @ref from comment to udp_flow_from_sock()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:43:53 +02:00
Jon Maloy	a7775e9550	udp: support traceroute in direction tap-socket Now that ICMP pass-through from socket-to-tap is in place, it is easy to support UDP based traceroute functionality in direction tap-to-socket. We fix that in this commit. Link: https://bugs.passt.top/show_bug.cgi?id=64 Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:23:35 +02:00
David Gibson	1166401c2f	udp: Allow UDP flows to be prematurely closed Unlike TCP, UDP has no in-band signalling for the end of a flow. So the only way we remove flows is on a timer if they have no activity for 180s. However, we've started to investigate some error conditions in which we want to prematurely abort / abandon a UDP flow. We can call udp_flow_close(), which will make the flow inert (sockets closed, no epoll events, can't be looked up in hash). However it will still wait 3 minutes to clear away the stale entry. Clean this up by adding an explicit 'closed' flag which will cause a flow to be more promptly cleaned up. We also publish udp_flow_close() so it can be called from other places to abort UDP flows(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-06 12:53:24 +02:00
Laurent Vivier	e877f905e5	udp_flow: move all udp_flow functions to udp_flow.c No code change. They need to be exported to be available by the vhost-user version of passt. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-05 17:38:17 +02:00
David Gibson	e0647ad80c	udp: Handle "spliced" datagrams with per-flow sockets When forwarding a datagram to a socket, we need to find a socket with a suitable local address to send it. Currently we keep track of such sockets in an array indexed by local port, but this can't properly handle cases where we have multiple local addresses in active use. For "spliced" (socket to socket) cases, improve this by instead opening a socket specifically for the target side of the flow. We connect() as well as bind()ing that socket, so that it will only receive the flow's reply packets, not anything else. We direct datagrams sent via that socket using the addresses from the flow table, effectively replacing bespoke addressing logic with the unified logic in fwd.c When we create the flow, we also take a duplicate of the originating socket, and use that to deliver reply datagrams back to the origin, again using addresses from the flow table entry. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-19 18:33:42 +02:00
David Gibson	a45a7e9798	udp: Create flows for datagrams from originating sockets This implements the first steps of tracking UDP packets with the flow table rather than its own (buggy) set of port maps. Specifically we create flow table entries for datagrams received from a socket (PIF_HOST or PIF_SPLICE). When splitting datagrams from sockets into batches, we group by the flow as well as splicesrc. This may result in smaller batches, but makes things easier down the line. We can re-optimise this later if necessary. For now we don't do anything else with the flow, not even match reply packets to the same flow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-19 18:33:39 +02:00

9 commits