Commit graph

105 commits

Author SHA1 Message Date
David Gibson
97d1ca2ed6 udp: Use abstracted tap header
Update the UDP code to use the tap layer abstractions for initializing and
updating the L2 and lower headers.  This will make adding other tap
backends in future easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:59 +01:00
David Gibson
716a926ef4 util: Parameterize ethernet header initializer macro
We have separate IPv4 and IPv6 versions of a macro to construct an
initializer for ethernet headers.  However, now that we have htons_constant
it's easy to simply paramterize this with the ethernet protocol number.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:46 +01:00
David Gibson
67afaab411 tcp, udp: Use named field initializers in iov_init functions
Both the TCP and UDP iov_init functions have some large structure literals
defined in "field order" style.  These are pretty hard to read since it's
not obvious what value corresponds to what field.  Use named field style
initializers instead to make this clearer.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:44 +01:00
David Gibson
b93d025d50 udp: Don't use separate sockets to listen for spliced packets
Currently, when ports are forwarded inbound in pasta mode, we open two
sockets for incoming traffic: one listens on the public IP address and will
forward packets to the tuntap interface.  The other listens on localhost
and forwards via "splicing" (resending directly via sockets in the ns).

Now that we've improved the logic about whether we "splice" any individual
packet, we don't need this.  Instead we can have a single socket bound to
0.0.0.0 or ::, marked as able to splice and udp_sock_handler() will deal
with each packet as appropriate.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:07:12 +01:00
David Gibson
8d503e825f udp: Decide whether to "splice" per datagram rather than per socket
Currently we have special sockets for receiving datagrams from locahost
which can use the optimized "splice" path rather than going across the tap
interface.

We want to loosen this so that sockets can receive sockets that will be
forwarded by both the spliced and non-spliced paths.  To do this, we alter
the meaning of the @splice bit in the reference to mean that packets
receieved on this socket *can* be spliced, not that they *will* be spliced.
They'll only actually be spliced if they come from 127.0.0.1 or ::1.

We can't (for now) remove the splice bit entirely, unlike with TCP.  Our
gateway mapping means that if the ns initiates communication to the gw
address, we'll translate that to target 127.0.0.1 on the host side.  Reply
packets will therefore have source address 127.0.0.1 when received on the
host, but these need to go via the tap path where that will be translated
back to the gateway address.  We need the @splice bit to distinguish that
case from packets going from localhost to a port mapped explicitly with
-u which should be spliced.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:07:09 +01:00
David Gibson
8a10f23720 udp: Unify udp_sock_handler_splice() with udp_sock_handler()
These two functions now have a very similar structure, and their first
part (calling recvmmsg()) is functionally identical.  So, merge the two
functions into one.

This does have the side effect of meaning we no longer receive multiple
packets at once for splice (we already didn't for tap).  This does hurt
throughput for small spliced packets, but improves it for large spliced
packets and tap packets.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:07:06 +01:00
David Gibson
f1ed8dbfa7 udp: Pre-populate msg_names with local address
udp_splice_namebuf is now used only for spliced sending, and so it is
only ever populated with the localhost address, either IPv4 or IPv6.
So, replace the awkward initialization in udp_sock_handler_splice()
with statically initialized versions for IPv4 and IPv6.  We then just
need to update the port number in udp_sock_handler_splice().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:07:03 +01:00
David Gibson
c9e193b5ae udp: Don't handle tap receive batch size calculation within a #define
UDP_MAX_FRAMES gives the maximum number of datagrams we'll ever handle as a
batch for sizing our buffers and control structures.  The subtly different
UDP_TAP_FRAMES gives the maximum number of datagrams we'll actually try to
receive at once for tap packets in the current configuration.

This depends on the mode, meaning that the macro has a non-obvious
dependency on the usual 'c' context variable being available.  We only use
it in one place, so it makes more sense to open code this.  Add an
explanatory comment while we're there.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:07:01 +01:00
David Gibson
4eb54fd2e7 udp: Split receive from preparation and send in udp_sock_handler()
The receive part of udp_sock_handler() and udp_sock_handler_splice() is now
almost identical.  In preparation for merging that, split the receive part
of udp_sock_handler() from the part preparing and sending the frames for
sending on the tap interface.  The latter goes into a new udp_tap_send()
function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:06:58 +01:00
David Gibson
09c00f1d2a udp: Split sending to passt tap interface into separate function
The last part of udp_sock_handler() does the actual sending of frames
to the tap interface.  For pasta that's just a call to
udp_tap_send_pasta() but for passt, it's moderately complex and open
coded.

For symmetry, move the passt send path into its own function,
udp_tap_send_passt().  This will make it easier to abstract the tap
interface in future (e.g. when we want to add vhost-user).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:06:55 +01:00
David Gibson
a6e919a951 udp: Move sending pasta tap frames to the end of udp_sock_handler()
udp_sock_handler() has a surprising difference in flow between pasta and
passt mode: For pasta we send each frame to the tap interface as we prepare
it.  For passt, though, we prepare all the frames, then send them with a
single sendmmsg().

Alter the pasta path to also prepare all the frames, then send them at the
end.  We already have a suitable data structure for the passt case.  This
will make it easier to abstract out the tap backend difference in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-13 01:06:51 +01:00
David Gibson
310bdbdcf4 udp: Factor out control structure management from udp_sock_fill_data_v[46]
The main purpose of udp_sock_fill_data_v[46]() is to construct the IP, UDP
and other headers we'll need to forward data onto the tap interface.  In
addition they update the control structures (iovec and mmsghdr) we'll need
to send the messages, and in the case of pasta actually sends it.

This leads the control structure management and the send itself awkwardly
split between udp_sock_fill_data_v[46]() and their caller
udp_sock_handler().  In addition, this tail part of udp_sock_fill_datav[46]
is essentially common between the IPv4 and IPv6 versions, apart from which
control array we're working on.

Clean this up by reducing these functions to just construct the headers
and renaming them to udp_update_hdr[46]() accordingly.  The control
structure updates are now all in the caller, and common for IPv4 and IPv6.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:42:22 +01:00
David Gibson
4b2d227c86 udp: Preadjust udp[46]_l2_iov_tap[].iov_base for pasta mode
Currently, we always populate udp[46]_l2_iov_tap[].iov_base with the
very start of the header buffers, including space for the qemu vnet_len
tag suitable for passt mode.  That's ok because we don't actually use these
iovecs for pasta mode.

However, we do know the mode in udp_sock[46]_iov_init() so adjust these
to the beginning of the headers we'll actually need for the mode: including
the vnet_len tag for passt, but excluding it for pasta.

This allows a slightly nicer way to locate the right buffer to send in the
pasta case, and will allow some additional cleanups later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:42:19 +01:00
David Gibson
d533807027 udp: Better factor IPv4 and IPv6 paths in udp_sock_handler()
Apart from which mh array they're operating on the recvmmsg() calls in
udp_sock_handler() are identical between the IPv4 and IPv6 paths, as are
some of the control structure updates.

By using some local variables to refer to the IP version specific control
arrays, make some more logic common between the IPv4 and IPv6 paths.  As
well as slightly reducing the code size, this makes it less likely that
we'll accidentally use the IPv4 arrays in the IPv6 path or vice versa as we
did in a recently fixed bug.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:42:16 +01:00
David Gibson
6af7ee74cf udp: Fix incorrect use of IPv6 mh buffers in IPv4 path
udp_sock_handler() incorrectly uses udp6_l2_mh_tap[] on the IPv4 path.  In
fact this is harmless because this assignment is redundant (the 0th entry
msg_hdr will always point to the 0th iov entry for both IPv4 and IPv6 and
won't change).

There is also an incorrect usage of udp6_l2_mh_tap[] in
udp_sock_fill_data_v4.  This one can cause real problems, because we'll
use stale iov_len values if we send multiple messages to the qemu socket.
Most of the time that will be relatively harmless - we're likely to either
drop UDP packets, or send duplicates.  However, if the stale iov_len we
use ends up referencing an uninitialized buffer we could desynchronize the
qemu stream socket.

Correct both these bugs.  The UDP6 path appears to be correct, but it does
have some comments that incorrectly reference the IPv4 versions, so fix
those as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:42:07 +01:00
David Gibson
34764ea4f3 udp: Correct splice forwarding when receiving from multiple sources
udp_sock_handler_splice() reads a whole batch of datagrams at once with
recvmmsg().  It then forwards them all via a single socket on the other
side, based on the source port.

However, it's entirely possible that the datagrams in the set have
different source ports, and thus ought to be forwarded via different
sockets on the destination side.  In fact this situation arises with the
iperf -P4 throughput tests in our own test suite.  AFAICT we only get away
with this because iperf3 is strictly one way and doesn't send reply packets
which would be misdirected because of the incorrect source ports.

Alter udp_sock_handler_splice() to split the packets it receives into
batches with the same source address and send each batch with a separate
sendmmsg().

For now we only look for already contiguous batches, which means that if
there are multiple active flows interleaved this is likely to degenerate
to batches of size 1.  For now this is the simplest way to correct the
behaviour and we can try to optimize later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:42:03 +01:00
David Gibson
2dec914209 udp: Split send half of udp_sock_handler_splice() from the receive half
Move the part of udp_sock_handler_splice() concerned with sending out the
datagrams into a new udp_splice_sendfrom() helper.  This will make later
cleanups easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:42:00 +01:00
David Gibson
fc7f91e709 udp: Unify buffers for tap and splice paths
We maintain a set of buffers for UDP packets to be forwarded via the tap
interface in udp[46]_l2_buf.  We then have a separate set of buffers for
packets to be "spliced" in udp_splice_buf[].  However, we only use one of
these at a time, so we can share the buffer space.

For the receiving splice packets we can not only re-use the data buffers
but also the udp[46]_l2_iov_sock and udp[46]_l2_mh_sock control structures.

For sending the splice packets we keep the same data buffers, but we need
specific control structures.  We create udp[46]_iov_splice - we can't
reuse udp_l2_iov_sock[] because we need to write iov_len as we're writing
spliced packets, but the tap path expects iov_len to remain the same (it
only uses it for receive).  Likewise we create udp[46]_mh_splice with the
mmsghdr structures for sending spliced packets.  As well as needing to
reference different iovs, these need to all reference udp_splice_namebuf
instead of individual msg_name fields for each slot.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:56 +01:00
David Gibson
c52ca4aecf udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
udp_sock_handler_splice() has a somewhat clunky if to extract the port from
a socket address which could be either IPv4 or IPv6.  Future changes are
going to make this even more clunky, so introduce a helper function to
do this extraction.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:53 +01:00
David Gibson
e7e2a321ba udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing
These two constants have the same value, and there's not a lot of reason
they'd ever need to be different.  Future changes will further integrate
the spliced and "tap" paths so that these need to be the same.  So, merge
them into UDP_MAX_FRAMES.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:51 +01:00
David Gibson
5b0027f942 udp: Simplify udp_sock_handler_splice
Previous cleanups mean that we can now rework some complex ifs in
udp_sock_handler_splice() into a simpler set.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:48 +01:00
David Gibson
71d2595a8f udp: Update UDP "connection" timestamps in both directions
A UDP pseudo-connection between port A in the init namespace and port B in
the pasta guest namespace involves two sockets: udp_splice_init[v6][B]
and udp_splice_ns[v6][A].  The socket which originated this "connection"
will be permanent but the other one will be closed on a timeout.

When we get a packet from the originating socket, we update the timeout on
the other socket, but we don't do the same when we get a reply packet from
the other socket.  However any activity on the "connection" probably
indicates that it's still in use.  Without this we could incorrectly time
out a "connection" if it's using a protocol which involves a single
initiating packet, but which then gets continuing replies from the target.

Correct this by updating the timeout on both sockets for a packet in either
direction.  This also updates the timestamps for the permanent originating
sockets which is unnecessary, but harmless.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:45 +01:00
David Gibson
7610034fef udp: Don't explicitly track originating socket for spliced "connections"
When we look up udp_splice_to_ns[][].orig_sock in udp_sock_handler_splice()
we're finding the socket on which the originating packet for the
"connection" was received on.  However, we don't specifically need this
socket to be the originating one - we just need one that's bound to the
the source port of this reply packet in the init namespace.  We can look
this up in udp_splice_to_init[v6][src].target_sock, whose defining
characteristic is exactly that.  The same applies with init and ns swapped.

In practice, of course, the port we locate this way will always be the
originating port, since we couldn't have started this "connection" if it
wasn't.

Change this, and we no longer need the @orig_sock field at all. That
leaves just @target_sock which we rename to simply @sock.  The whole
udp_splice_flow structure now more represents a single bound port than
a "flow" per se, so rename and recomment it accordingly.  Likewise the
udp_splice_to_{ns,init} names are now misleading, since the ports in
those maps are used in both directions.  Rename them to
udp_splice_{ns,init} indicating the location where the described
socket is bound.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:42 +01:00
David Gibson
27bfebb061 udp: Re-use fixed bound sockets for packet forwarding when possible
When we look up udp_splice_to_ns[v6][src].target_sock in
udp_sock_handler_splice, all we really require of the socket is that it
be bound to port src in the pasta guest namespace.  Similarly for
udp_splice_to_init but bound in the init namespace.

Usually these sockets are created temporarily by udp_splice_connect() and
cleaned up by udp_timer().  However, depending on the -u and -U options its
possible we have a permanent socket bound to the relevant port created by
udp_sock_init().  If such a socket exists, we could use it instead of
creating a temporary one.  In fact we *must* use it, because we'll fail
trying to bind() a temporary one to the same port.

So allow this, store permanently bound sockets into udp_splice_to_{ns,init}
in udp_sock_init().  These won't get incorrectly removed by the timer
because we don't put a corresponding entry in the udp_act[] structure
which directs the timer what to clean up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:40 +01:00
David Gibson
c277c6dd7d udp: Don't create double sockets for -U port
For each IP version udp_socket() has 3 possible calls to sock_l4().  One
is for the "non-spliced" bound socket in the init namespace, one for the
"spliced" bound socket in the init namespace and one for the "spliced"
bound socket in the pasta namespace.

However when this is called to create a socket in the pasta namspeace there
is a logic error which causes it to take the path for the init side spliced
socket as well as the ns socket.  This essentially tries to create two
identical sockets on the ns side.  Unsurprisingly the second bind() call
fails according to strace.

Correct this to only attempt to open one socket within the ns.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:35 +01:00
David Gibson
d9394eb9b7 udp: Split splice field in udp_epoll_ref into (mostly) independent bits
The @splice field in union udp_epoll_ref can have a number of values for
different types of "spliced" packet flows.  Split it into several single
bit fields with more or less independent meanings.  The new @splice field
is just a boolean indicating whether the socket is associated with a
spliced flow, making it identical to the @splice fiend in tcp_epoll_ref.

The new bit @orig, indicates whether this is a socket which can originate
new udp packet flows (created with -u or -U) or a socket created on the
fly to handle reply socket.  @ns indicates whether the socket lives in the
init namespace or the pasta namespace.

Making these bits more orthogonal to each other will simplify some future
cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:31 +01:00
David Gibson
8517239243 udp: Remove the @bound field from union udp_epoll_ref
We set this field, but nothing ever checked it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:28 +01:00
David Gibson
1cd684b09b udp: Don't connect "forward" sockets for spliced flows
Currently we connect() the socket we use to forward spliced UDP flows.
However, we now only ever use sendto() rather than send() on this socket
so there's not actually any need to connect it.  Don't do so.

Rename a number of things that referred to "connect" or "conn" since that
would now be misleading.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:25 +01:00
David Gibson
9ef31b7619 udp: Always use sendto() rather than send() for forwarding spliced packets
udp_sock_handler_splice() has two different ways of sending out packets
once it has determined the correct destination socket.  For the originating
sockets (which are not connected) it uses sendto() to specify a specific
address.  For the forward socket (which is connected) we use send().

However we know the correct destination address even for the forward socket
we do also know the correct destination address.  We can use this to use
sendto() instead of send(), removing the need for two different paths and
some staging data structures.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:22 +01:00
David Gibson
729edc241d udp: Separate tracking of inbound and outbound packet flows
Each entry udp_splice_map[v6][N] keeps information about two essentially
unrelated packet flows. @ns_conn_sock, @ns_conn_ts and @init_bound_sock
track a packet flow from port N in the host init namespace to some other
port in the pasta namespace (the one @ns_conn_sock is connected to).
@init_conn_sock, @init_conn_ts and @ns_bound_sock track packet flow from
port N in the pasta namespace to some other port in the host init namespace
(the one @init_conn_sock is connected to).

Split udp_splice_map[][] into two separate tables for the two directions.
Each entry in each table is a 'struct udp_splice_flow' with @orig_sock
(previously the bound socket), @target_sock (previously the connected
socket) and @ts (the timeout for the target socket).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:41:18 +01:00
David Gibson
4ebb4905e9 udp: Also bind() connected ports for "splice" forwarding
pasta handles "spliced" port forwarding by resending datagrams received on
a bound socket in the init namespace to a connected socket in the guest
namespace.  This means there are actually three ports associated with each
"connection".  First there's the source and destination ports of the
originating datagram.  That's also the destination port of the forwarded
datagram, but the source port of the forwarded datagram is the kernel
allocated bound address of the connected socket.

However, by bind()ing as well as connect()ing the forwarding socket we can
choose the source port of the forwarded datagrams.  By choosing it to match
the original source port we remove that surprising third port number and
no longer need to store port numbers in struct udp_splice_port.

As a bonus this means that the recipient of the packets will see the
original source port if they call getpeername().  This rarely matters, but
it can't hurt.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-12-06 07:40:56 +01:00
Stefano Brivio
3a2afde87d conf, udp: Drop mostly duplicated dns_send arrays, rename related fields
Given that we use just the first valid DNS resolver address
configured, or read from resolv.conf(5) on the host, to forward DNS
queries to, in case --dns-forward is used, we don't need to duplicate
dns[] to dns_send[]:

- rename dns_send[] back to dns[]: those are the resolvers we
  advertise to the guest/container

- for forwarding purposes, instead of dns[], use a single field (for
  each protocol version): dns_host

- and rename dns_fwd to dns_match, so that it's clear this is the
  address we are matching DNS queries against, to decide if they need
  to be forwarded

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-11-16 15:09:31 +01:00
Stefano Brivio
817eedc28a tcp, udp: Don't initialise IPv6/IPv4 sockets if IPv4/IPv6 are not enabled
If we disable a given IP version automatically (no corresponding
default route on host) or administratively (--ipv4-only or
--ipv6-only options), we don't initialise related buffers and
services (DHCP for IPv4, NDP and DHCPv6 for IPv6). The "tap"
handlers will also ignore packets with a disabled IP version.

However, in commit 3c6ae62510 ("conf, tcp, udp: Allow address
specification for forwarded ports") I happily changed socket
initialisation functions to take AF_UNSPEC meaning "any enabled
IP version", but I forgot to add checks back for the "enabled"
part.

Reported by Paul: on a host without default IPv6 route, but IPv6
enabled, connect, using IPv6, to a port handled by pasta, which
tries to send data to a tap device without initialised buffers
for that IP version and exits because the resulting write() fails.

Simpler way to reproduce: pasta -6 and inbound IPv4 connection, or
pasta -4 and inbound IPv6 connection.

Reported-by: Paul Holzinger <pholzing@redhat.com>
Fixes: 3c6ae62510 ("conf, tcp, udp: Allow address specification for forwarded ports")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-11-10 11:17:50 +01:00
Stefano Brivio
db74679f98 udp: Check for answers to forwarded DNS queries before handling local redirects
Now that we allow loopback DNS addresses to be used as targets for
forwarding, we need to check if DNS answers come from those targets,
before deciding to eventually remap traffic for local redirects.

Otherwise, the source address won't match the one configured as
forwarder, which means that the guest or the container will refuse
those responses.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:32 +01:00
David Gibson
7c7b68dbe0 Use typing to reduce chances of IPv4 endianness errors
We recently corrected some errors handling the endianness of IPv4
addresses.  These are very easy errors to make since although we mostly
store them in network endianness, we sometimes need to manipulate them in
host endianness.

To reduce the chances of making such mistakes again, change to always using
a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network
endian addresses.  This makes it harder to accidentally do arithmetic or
comparisons on such addresses as if they were host endian.

We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to
directly work with struct in_addr values.  This has the additional benefit
of making the IPv4 and IPv6 paths more visually similar.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:24 +01:00
David Gibson
dd3470d9a9 Use IPV4_IS_LOOPBACK more widely
This macro checks if an IPv4 address is in the loopback network
(127.0.0.0/8).  There are two places where we open code an identical check,
use the macro instead.

There are also a number of places we specifically exclude the loopback
address (127.0.0.1), but we should actually be excluding anything in the
loopback network.  Change those sites to use the macro as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:21 +01:00
Stefano Brivio
346da48fe6 udp: Fix port and address checks for DNS forwarder
First off, as we swap endianness for source ports in
udp_fill_data_v{4,6}(), we want host endianness, not network
endianness. It doesn't actually matter if we use htons() or ntohs()
here, but the current version is confusing.

In the IPv4 path, when we remap DNS answers, we already swapped the
endianness as needed for the source port: don't swap it again,
otherwise we'll not map DNS answers for IPv4.

In the IPv6 path, when we remap DNS answers, we want to check that
they came from our upstream DNS server, not the one configured via
--dns-forward (which doesn't even need to exist for this
functionality to work).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
c1eff9a3c6 conf, tcp, udp: Allow specification of interface to bind to
Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable
SO_BINDTODEVICE for non-root users"), we can bind sockets to
interfaces, if they haven't been bound yet (as in bind()).

Introduce an optional interface specification for forwarded ports,
prefixed by %, that can be passed together with an address.

Reported use case: running local services that use ports we want
to have externally forwarded:
  https://github.com/containers/podman/issues/14425

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-15 02:10:36 +02:00
Stefano Brivio
da152331cf Move logging functions to a new file, log.c
Logging to file is going to add some further complexity that we don't
want to squeeze into util.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:25 +02:00
Stefano Brivio
5290b6f13e udp: Replace pragma to ignore bogus stringop-overread warning with workaround
Commit c318ffcb4c ("udp: Ignore bogus -Wstringop-overread for
write() from gcc 12.1") uses a gcc pragma to ignore a bogus warning,
which started appearing on gcc 12.1 (aarch64 and x86_64) due to:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483

...but gcc 12.2 has the same issue. Not just that: if LTO is enabled,
the pragma itself is ignored (this wasn't the case with gcc 12.1),
because of:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922

Drop the pragma, and assign a frame (in the networking sense) pointer
from the offset of the Ethernet header in the buffer, then pass it to
write() and pcap(), so that gcc doesn't obsess anymore with the fact
that an Ethernet header is 14 bytes and we're sending more than that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-09-29 12:23:10 +02:00
David Gibson
d5b80ccc72 Fix widespread off-by-one error dealing with port numbers
Port numbers (for both TCP and UDP) are 16-bit, and so fit exactly into a
'short'.  USHRT_MAX is therefore the maximum port number and this is widely
used in the code.  Unfortunately, a lot of those places don't actually
want the maximum port number (USHRT_MAX == 65535), they want the total
number of ports (65536).  This leads to a number of potentially nasty
consequences:

 * We have buffer overruns on the port_fwd::delta array if we try to use
   port 65535
 * We have similar potential overruns for the tcp_sock_* arrays
 * Interestingly udp_act had the correct size, but we can calculate it in
   a more direct manner
 * We have a logical overrun of the ports bitmap as well, although it will
   just use an unused bit in the last byte so isnt harmful
 * Many loops don't consider port 65535 (which does mitigate some but not
   all of the buffer overruns above)
 * In udp_invert_portmap() we incorrectly compute the reverse port
   translation for return packets

Correct all these by using a new NUM_PORTS defined explicitly for this
purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
3ede07aac9 Treat port numbers as unsigned
Port numbers are unsigned values, but we're storing them in (signed) int
variables in some places.  This isn't actually harmful, because int is
large enough to hold the entire range of ports.  However in places we don't
want to use an in_port_t (usually to avoid overflow on the last iteration
of a loop) it makes more conceptual sense to use an unsigned int. This will
also avoid some problems with later cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
f5a31ee94c Don't use indirect remap functions for conf_ports()
Now that we've delayed initialization of the UDP specific "reverse" map
until udp_init(), the only difference between the various 'remap' functions
used in conf_ports() is which array they target.  So, simplify by open
coding the logic into conf_ports() with a pointer to the correct mapping
array.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
1467a35b5a udp: Delay initialization of UDP reversed port mapping table
Because it's connectionless, when mapping UDP ports we need, in addition
to the table of deltas for destination ports needed by TCP, we need an
inverted table to translate the source ports on return packets.

Currently we fill out the inverted table at the same time we construct the
main table in udp_remap_to_tap() and udp_remap_to_init().  However, we
don't use either table until after we've initialized UDP, so we can delay
the construction of the reverse table to udp_init().  This makes the
configuration more symmetric between TCP and UDP which will enable further
cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
163dc5f188 Consolidate port forwarding configuration into a common structure
The configuration for how to forward ports in and out of the guest/ns is
divided between several different variables.  For each connect direction
and protocol we have a mode in the udp/tcp context structure, a bitmap
of which ports to forward also in the context structure and an array of
deltas to apply if the outward facing and inward facing port numbers are
different.  This last is a separate global variable, rather than being in
the context structure, for no particular reason.  UDP also requires an
additional array which has the reverse mapping used for return packets.

Consolidate these into a re-used substructure in the context structure.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-24 14:48:35 +02:00
David Gibson
af9c98af5f udp: Don't drop zero-length outbound UDP packets
udp_tap_handler() currently skips outbound packets if they have a payload
length of zero.  This is not correct, since in a datagram protocol zero
length packets still have meaning.

Adjust this to correctly forward the zero-length packets by using a msghdr
with msg_iovlen == 0.

Bugzilla: https://bugs.passt.top/show_bug.cgi?id=19

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 11:14:29 +02:00
David Gibson
484652c632 udp: Don't pre-initialize msghdr array
In udp_tap_handler() the array of msghdr structures, mm[], is initialized
to zero.  Since UIO_MAXIOV is 1024, this can be quite a large zero, which
is expensive if we only end up using a few of its entries.  It also makes
it less obvious how we're setting all the control fields at the point we
actually invoke sendmmsg().

Rather than pre-initializing it, just initialize each element as we use it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 11:14:29 +02:00
David Gibson
16f5586bb8 Make substructures for IPv4 and IPv6 specific context information
The context structure contains a batch of fields specific to IPv4 and to
IPv6 connectivity.  Split those out into a sub-structure.

This allows the conf_ip4() and conf_ip6() functions, which take the
entire context but touch very little of it, to be given more specific
parameters, making it clearer what it affects without stepping through the
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 22:14:07 +02:00
David Gibson
5e12d23acb Separate IPv4 and IPv6 configuration
After recent changes, conf_ip() now has essentially entirely disjoint paths
for IPv4 and IPv6 configuration.  So, it's cleaner to split them out into
different functions conf_ip4() and conf_ip6().

Splitting these out also lets us make the interface a bit nicer, having
them return success or failure directly, rather than manipulating c->v4
and c->v6 to indicate success/failure of the two versions.

Since these functions may also initialize the interface index for each
protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing
tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since
a 0 interface index is never valid).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Whitespace fixes]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-30 22:12:50 +02:00
Stefano Brivio
c318ffcb4c udp: Ignore bogus -Wstringop-overread for write() from gcc 12.1
With current OpenSUSE Tumbleweed on aarch64 (gcc-12-1.3.aarch64) and
on x86_64 (gcc-12-1.4.x86_64), but curiously not on armv7hl
(gcc-12-1.3.armv7hl), gcc warns about using the _pointer_ to the
802.3 header to write the whole frame to the tap descriptor:
  reading between 62 and 4294967357 bytes from a region of size 14

which is bogus:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483

Probably declaring udp_sock_fill_data_v{4,6}() as noinline would
"fix" this, but that's on the data path, so I'd rather not. Use
a gcc pragma instead.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-05-19 16:27:20 +02:00