So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs. That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address. Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.
This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.
We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams. We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.
In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works). So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets. Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table. In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.
This adds getsockname() to our seccomp profile for all variants.
Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When we establish a new UDP flow we create connect()ed sockets that will
only handle datagrams for this flow. However, there is a race between
bind() and connect() where they might get some packets queued for a
different flow. Currently we handle this by simply discarding any
queued datagrams after the connect. UDP protocols should be able to handle
such packet loss, but it's not ideal.
We now have the tools we need to handle this better, by redirecting any
datagrams received during that race to the appropriate flow. We need to
use a deferred handler for this to avoid unexpectedly re-ordering datagrams
in some edge cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update comment to udp_flow_defer()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently udp_flow_from_sock() is only used when receiving a datagram
from a "listening" socket. It takes the listening socket's epoll
reference to get the interface and port on which the datagram arrived.
We have some upcoming cases where we want to use this in different
contexts, so make it take the pif and port as direct parameters instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Drop @ref from comment to udp_flow_from_sock()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Now that ICMP pass-through from socket-to-tap is in place, it is
easy to support UDP based traceroute functionality in direction
tap-to-socket.
We fix that in this commit.
Link: https://bugs.passt.top/show_bug.cgi?id=64
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Unlike TCP, UDP has no in-band signalling for the end of a flow. So the
only way we remove flows is on a timer if they have no activity for 180s.
However, we've started to investigate some error conditions in which we
want to prematurely abort / abandon a UDP flow. We can call
udp_flow_close(), which will make the flow inert (sockets closed, no epoll
events, can't be looked up in hash). However it will still wait 3 minutes
to clear away the stale entry.
Clean this up by adding an explicit 'closed' flag which will cause a flow
to be more promptly cleaned up. We also publish udp_flow_close() so it
can be called from other places to abort UDP flows().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
No code change.
They need to be exported to be available by the vhost-user version of
passt.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When forwarding a datagram to a socket, we need to find a socket with a
suitable local address to send it. Currently we keep track of such sockets
in an array indexed by local port, but this can't properly handle cases
where we have multiple local addresses in active use.
For "spliced" (socket to socket) cases, improve this by instead opening
a socket specifically for the target side of the flow. We connect() as
well as bind()ing that socket, so that it will only receive the flow's
reply packets, not anything else. We direct datagrams sent via that socket
using the addresses from the flow table, effectively replacing bespoke
addressing logic with the unified logic in fwd.c
When we create the flow, we also take a duplicate of the originating
socket, and use that to deliver reply datagrams back to the origin, again
using addresses from the flow table entry.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This implements the first steps of tracking UDP packets with the flow table
rather than its own (buggy) set of port maps. Specifically we create flow
table entries for datagrams received from a socket (PIF_HOST or
PIF_SPLICE).
When splitting datagrams from sockets into batches, we group by the flow
as well as splicesrc. This may result in smaller batches, but makes things
easier down the line. We can re-optimise this later if necessary. For now
we don't do anything else with the flow, not even match reply packets to
the same flow.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>