1
0
Fork 0
mirror of https://passt.top/passt synced 2025-08-13 10:33:13 +02:00
Commit graph

9 commits

Author SHA1 Message Date
David Gibson
59cc89f4cc udp, udp_flow: Track our specific address on socket interfaces
So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs.  That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address.  Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.

This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.

We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams.  We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.

In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works).  So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets.  Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table.  In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.

This adds getsockname() to our seccomp profile for all variants.

Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2025-04-10 19:46:16 +02:00
Stefano Brivio
06ef64cdb7 udp_flow: Save 8 bytes in struct udp_flow on 64-bit architectures
Shuffle the fields just added by commits a7775e9550 ("udp: support
traceroute in direction tap-socket") and 9725e79888 ("udp_flow:
Don't discard packets that arrive between bind() and connect()").

On x86_64, as reported by pahole(1), before:

struct udp_flow {
        struct flow_common         f;                    /*     0    76 */
        /* --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- */
        _Bool                      closed:1;             /*    76: 0  1 */

        /* XXX 7 bits hole, try to pack */

        _Bool                      flush0;               /*    77     1 */
        _Bool                      flush1:1;             /*    78: 0  1 */

        /* XXX 7 bits hole, try to pack */
        /* XXX 1 byte hole, try to pack */

        time_t                     ts;                   /*    80     8 */
        int                        s[2];                 /*    88     8 */
        uint8_t                    ttl[2];               /*    96     2 */

        /* size: 104, cachelines: 2, members: 7 */
        /* sum members: 95, holes: 1, sum holes: 1 */
        /* sum bitfield members: 2 bits, bit holes: 2, sum bit holes: 14 bits */
        /* padding: 6 */
        /* last cacheline: 40 bytes */
};

and after:

struct udp_flow {
        struct flow_common         f;                    /*     0    76 */
        /* --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- */
        uint8_t                    ttl[2];               /*    76     2 */
        _Bool                      closed:1;             /*    78: 0  1 */
        _Bool                      flush0:1;             /*    78: 1  1 */
        _Bool                      flush1:1;             /*    78: 2  1 */

        /* XXX 5 bits hole, try to pack */
        /* XXX 1 byte hole, try to pack */

        time_t                     ts;                   /*    80     8 */
        int                        s[2];                 /*    88     8 */

        /* size: 96, cachelines: 2, members: 7 */
        /* sum members: 94, holes: 1, sum holes: 1 */
        /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
        /* last cacheline: 32 bytes */
};

It doesn't matter much because anyway the typical storage for struct
udp_flow is given by union flow:

union flow {
        struct flow_common         f;                  /*     0    76 */
        struct flow_free_cluster   free;               /*     0    84 */
        struct tcp_tap_conn        tcp;                /*     0   120 */
        struct tcp_splice_conn     tcp_splice;         /*     0   120 */
        struct icmp_ping_flow      ping;               /*     0    96 */
        struct udp_flow            udp;                /*     0    96 */
};

but it still improves data locality somewhat, so let me fix this up
now that commits are fresh.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2025-04-09 22:52:32 +02:00
David Gibson
9725e79888 udp_flow: Don't discard packets that arrive between bind() and connect()
When we establish a new UDP flow we create connect()ed sockets that will
only handle datagrams for this flow.  However, there is a race between
bind() and connect() where they might get some packets queued for a
different flow.  Currently we handle this by simply discarding any
queued datagrams after the connect.  UDP protocols should be able to handle
such packet loss, but it's not ideal.

We now have the tools we need to handle this better, by redirecting any
datagrams received during that race to the appropriate flow.  We need to
use a deferred handler for this to avoid unexpectedly re-ordering datagrams
in some edge cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update comment to udp_flow_defer()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2025-04-07 21:44:31 +02:00
David Gibson
159beefa36 udp_flow: Take pif and port as explicit parameters to udp_flow_from_sock()
Currently udp_flow_from_sock() is only used when receiving a datagram
from a "listening" socket.  It takes the listening socket's epoll
reference to get the interface and port on which the datagram arrived.

We have some upcoming cases where we want to use this in different
contexts, so make it take the pif and port as direct parameters instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Drop @ref from comment to udp_flow_from_sock()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2025-04-07 21:43:53 +02:00
Jon Maloy
a7775e9550 udp: support traceroute in direction tap-socket
Now that ICMP pass-through from socket-to-tap is in place, it is
easy to support UDP based traceroute functionality in direction
tap-to-socket.

We fix that in this commit.

Link: https://bugs.passt.top/show_bug.cgi?id=64
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2025-04-07 21:23:35 +02:00
David Gibson
1166401c2f udp: Allow UDP flows to be prematurely closed
Unlike TCP, UDP has no in-band signalling for the end of a flow.  So the
only way we remove flows is on a timer if they have no activity for 180s.

However, we've started to investigate some error conditions in which we
want to prematurely abort / abandon a UDP flow.  We can call
udp_flow_close(), which will make the flow inert (sockets closed, no epoll
events, can't be looked up in hash).  However it will still wait 3 minutes
to clear away the stale entry.

Clean this up by adding an explicit 'closed' flag which will cause a flow
to be more promptly cleaned up.  We also publish udp_flow_close() so it
can be called from other places to abort UDP flows().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-09-06 12:53:24 +02:00
Laurent Vivier
e877f905e5 udp_flow: move all udp_flow functions to udp_flow.c
No code change.

They need to be exported to be available by the vhost-user version of
passt.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-08-05 17:38:17 +02:00
David Gibson
e0647ad80c udp: Handle "spliced" datagrams with per-flow sockets
When forwarding a datagram to a socket, we need to find a socket with a
suitable local address to send it.  Currently we keep track of such sockets
in an array indexed by local port, but this can't properly handle cases
where we have multiple local addresses in active use.

For "spliced" (socket to socket) cases, improve this by instead opening
a socket specifically for the target side of the flow.  We connect() as
well as bind()ing that socket, so that it will only receive the flow's
reply packets, not anything else.  We direct datagrams sent via that socket
using the addresses from the flow table, effectively replacing bespoke
addressing logic with the unified logic in fwd.c

When we create the flow, we also take a duplicate of the originating
socket, and use that to deliver reply datagrams back to the origin, again
using addresses from the flow table entry.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:42 +02:00
David Gibson
a45a7e9798 udp: Create flows for datagrams from originating sockets
This implements the first steps of tracking UDP packets with the flow table
rather than its own (buggy) set of port maps.  Specifically we create flow
table entries for datagrams received from a socket (PIF_HOST or
PIF_SPLICE).

When splitting datagrams from sockets into batches, we group by the flow
as well as splicesrc.  This may result in smaller batches, but makes things
easier down the line.  We can re-optimise this later if necessary.  For now
we don't do anything else with the flow, not even match reply packets to
the same flow.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-07-19 18:33:39 +02:00