When we receive a ping packet from the tap interface, we currently locate
the correct flow entry (if present) using an anciliary data structure, the
icmp_id_map[] tables. However, we can look this up using the flow hash
table - that's what it's for.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
icmp_sock_handler() obtains the guest address from it's most recently
observed IP. However, this can now be obtained from the common flowside
information.
icmp_tap_handler() builds its socket address for sendto() directly
from the destination address supplied by the incoming tap packet.
This can instead be generated from the flow.
Using the flowsides as the common source of truth here prepares us for
allowing more flexible NAT and forwarding by properly initialising
that flowside information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
struct icmp_ping_flow contains a field for the ICMP id of the ping, but
this is now redundant, since the id is also stored as the "port" in the
common flowsides.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Require the address and port information for the target (non
initiating) side to be populated when a flow enters TGT state.
Implement that for TCP and ICMP. For now this leaves some information
redundantly recorded in both generic and type specific fields. We'll
fix that in later patches.
For TCP we now use the information from the flow to construct the
destination socket address in both tcp_conn_from_tap() and
tcp_splice_connect().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Handling of each protocol needs some degree of tracking of the
addresses and ports at the end of each connection or flow. Sometimes
that's explicit (as in the guest visible addresses for TCP
connections), sometimes implicit (the bound and connected addresses of
sockets).
To allow more consistent handling across protocols we want to
uniformly track the address and port at each end of the connection.
Furthermore, because we allow port remapping, and we sometimes need to
apply NAT, the addresses and ports can be different as seen by the
guest/namespace and as by the host.
Introduce 'struct flowside' to keep track of address and port
information related to one side of a flow. Store two of these in the
common fields of a flow to track that information for both sides.
For now we only populate the initiating side, requiring that
information be completed when a flows enter INI. Later patches will
populate the target side.
For now this leaves some information redundantly recorded in both generic
and type specific fields. We'll fix that in later patches.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
TCP (both regular and spliced) and ICMP both have macros to retrieve the
relevant protcol specific flow structure from a flow index. In most cases
what we actually want is to get the specific flow from a sidx. Replace
those simple macros with a more precise inline, which also asserts that
the flow is of the type we expect.
While we're they're also add a pif_at_sidx() helper to get the interface of
a specific flow & side, which is useful in some places.
Finally, fix some minor style issues in the comments on some of the
existing sidx related helpers.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
sock_l4() creates a socket of the given IP protocol number, and adds it to
the epoll state. Currently it determines the correct tag for the epoll
data based on the protocol. However, we have some future cases where we
might want different semantics, and therefore epoll types, for sockets of
the same protocol. So, change sock_l4() to take the epoll type as an
explicit parameter, and determine the protocol from that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we have no generic information flows apart from the type and
state, everything else is specific to the flow type. Start introducing
generic flow information by recording the pifs which the flow connects.
To keep track of what information is valid, introduce new flow states:
INI for when the initiating side information is complete, and TGT for
when both sides information is complete, but we haven't chosen the
flow type yet. For now, these states don't do an awful lot, but
they'll become more important as we add more generic information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Each flow in the flow table has two sides, 0 and 1, representing the
two interfaces between which passt/pasta will forward data for that flow.
Which side is which is currently up to the protocol specific code: TCP
uses side 0 for the host/"sock" side and 1 for the guest/"tap" side, except
for spliced connections where it uses 0 for the initiating side and 1 for
the target side. ICMP also uses 0 for the host/"sock" side and 1 for the
guest/"tap" side, but in its case the latter is always also the initiating
side.
Make this generically consistent by always using side 0 for the initiating
side and 1 for the target side. This doesn't simplify a lot for now, and
arguably makes TCP slightly more complex, since we add an extra field to
the connection structure to record which is the guest facing side. This is
an interim change, which we'll be able to remove later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>q
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Flows move over several different states in their lifetime. The rules for
these are documented in comments, but they're pretty complex and a number
of the transitions are implicit, which makes this pretty fragile and
error prone.
Change the code to explicitly track the states in a field. Make all
transitions explicit and logged. To the extent that it's practical in C,
enforce what can and can't be done in various states with ASSERT()s.
While we're at it, tweak the docs to clarify the restrictions on each state
a bit.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The flow dispatches deferred and timer handling for flows centrally, but
needs to call into protocol specific code for the handling of individual
flows. Currently this passes a general union flow *. It makes more sense
to pass the specific relevant flow type structure. That brings the check
on the flow type adjacent to casting to the union variant which it tags.
Arguably, this is a slight abstraction violation since it involves the
generic flow code using protocol specific types. It's already calling into
protocol specific functions, so I don't think this really makes any
difference.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
At various points we need to track the lengths of a packet including or
excluding various different sets of headers. We don't always use the same
variable names for doing so. Worse in some places we use the same name
for different things: e.g. tcp_fill_headers[46]() use ip_len for the
length including the IP headers, but then tcp_send_flag() which calls it
uses it to mean the IP payload length only.
To improve clarity, standardise on these names:
dlen: L4 protocol payload length ("data length")
l4len: plen + length of L4 protocol header
l3len: l4len + length of IPv4/IPv6 header
l2len: l3len + length of L2 (ethernet) header
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently ping sockets use a custom epoll reference type which includes
the ICMP id. However, now that we have entries in the flow table for
ping flows, finding that is sufficient to get everything else we want,
including the id. Therefore remove the icmp_epoll_ref type and use the
general 'flowside' field for ping sockets.
Having done this we no longer need separate EPOLL_TYPE_ICMP and
EPOLL_TYPE_ICMPV6 reference types, because we can easily determine
which case we have from the flow type. Merge both types into
EPOLL_TYPE_PING.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Use flow_dbg() and flow_err() helpers to generate flow-linked error
messages in most places. Make a few small improvements to the messages
while we're at it. This allows us to avoid the awkward 'pname' variables
since whether we're dealing with ICMP or ICMPv6 is already built into the
flow type which these helpers include.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Coding style fix in icmp_tap_handler()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently icmp_id_map[][] stores information about ping sockets in a
bespoke structure. Move the same information into new types of flow
in the flow table. To match that change, replace the existing ICMP
timer with a flow-based timer for expiring ping sockets. This has the
advantage that we only need to scan the active flows, not all possible
ids.
We convert icmp_id_map[][] to point to the flow table entries, rather
than containing its own information. We do still use that array for
locating the right ping flows, rather than using a "flow native" form
of lookup for the time being.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update id_sock description in comment to icmp_ping_new()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Introduce ip.[ch] file to encapsulate IP protocol handling functions and
structures. Modify various files to include the new header ip.h when
it's needed.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-5-lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
There are a number of places where we want to handle either a
sockaddr_in or a sockaddr_in6. In some of those we use a void *,
which works ok and matches some standard library interfaces, but
doesn't give a signature level hint that we're dealing with only
sockaddr_in or sockaddr_in6, not (say) sockaddr_un or another type of
socket address. Other places we use a sockaddr_storage, which also
works, but has the same problem in addition to allocating more on the
stack than we need to.
Introduce union sockaddr_inany to explictly handle this case: it has
variants for sockaddr_in and sockaddr_in6. Use it in a number of
places where it's easy to do so.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Sometimes we use sa_family_t for variables and parameters containing a
socket address family, other times we use a plain int. Since sa_family_t
is what's actually used in struct sockaddr and friends, standardise on
that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
ICMP sockets are cleaned up on a timeout implemented in icmp_timer_one(),
and the logic to do that cleanup is open coded in that function. Similarly
new sockets are opened when we discover we don't have an existing one in
icmp_tap_handler(), and again the logic is open-coded.
That's not the worst thing, but it's a bit cleaner to have dedicated
functions for the creation and destruction of ping sockets. This will also
make things a bit easier for future changes we have in mind.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We access fields of packets received from ping sockets assuming they're
echo replies, without actually checking that. Of course, we don't expect
anything else from the kernel, but it's probably best to verify.
While we're at it, also check for short packets, or a receive address of
the wrong family.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we silently ignore an errors receiving a packet from a ping
socket. We don't expect that to happen, so it's probably worth reporting
if it does.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we have separate handlers for ICMP and ICMPv6 ping replies.
Although there are a number of points of difference, with some creative
refactoring we can combine these together sensibly. Although it doesn't
save a vast amount of code, it does make it clearer that we're performing
basically the same steps for each case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently icmp_tap_handler() consists of two almost disjoint paths for the
IPv4 and IPv6 cases. The only thing they share is an error message.
We can use some intermediate variables to refactor this to share some more
code between those paths.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we use icmp_act[] to scan for ICMP ids which might have an open
socket which could time out. However icmp_act[] contains no information
that's not already in icmp_id_map[] - it's just an "index" which allows
scanning for relevant entries with less cache footprint.
We only scan for ICMP socket expiry every 1s, though, so it's not clear
that cache footprint really matters. Furthermore, there's no strong reason
we need to scan even that often - the timeout is fairly arbitrary and
approximate.
So, eliminate icmp_act[] in favour of directly scanning icmp_id_map[] and
compensate for the cache impact by reducing the scan frequency to once
every 10s.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
icmp_id_map[] contains, amongst other things, fds for "ping" sockets
associated with various ICMP echo ids. However, we only lazily open()
those sockets, so many will be missing. We currently represent that with
a 0, which isn't great, since that's technically a valid fd. Use -1
instead. This does require initializing the fields in icmp_id_map[] but
we already have an obvious place to do that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When forwarding pings from tap, currently we create a ping socket with
a socket address whose port is set to the ID of the ping received from the
guest. This causes the socket to send pings with the same ID on the host.
Although this seems look a good idea for maximum transparency, it's
probably unwise.
First, it's fallible - the bind() could fail, and we already have fallback
logic which will overwrite the packets with the expected guest id if the
id we get on replies doesn't already match. We might as well do that
unconditionally.
But more importantly, we don't know what else on the host might be using
ping sockets, so we could end up with an ID that's the same as an existing
socket. You'd expect that to fail the bind() with EADDRINUSE, which would
be fine: we'd fall back to rewriting the reply ids. However it appears the
kernel (v6.6.3 at least), does *not* fail the bind() and instead it's
"last socket wins" in terms of who gets the replies. So we could
accidentally intercept ping replies for something else on the host.
So, instead of using bind() to set the id, just let the kernel pick one
and expect to translate the replies back. Although theoretically this
makes the passt/pasta link a bit less "transparent", essentially nothing
cares about specific ping IDs, much like TCP source ports, which we also
don't preserve.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Linux ICMP "ping" sockets are very specific in what they do. They let
userspace send ping requests (ICMP_ECHO or ICMP6_ECHO_REQUEST), and receive
matching replies (ICMP_ECHOREPLY or ICMP6_ECHO_REPLY). They don't let you
intercept or handle incoming ping requests.
In the case of passt/pasta that means we can process echo requests from tap
and forward them to a ping socket, then take the replies from the ping
socket and forward them to tap. We can't do the reverse: take echo
requests from the host and somehow forward them to the guest. There's
really no way for something outside to initiate a ping to a passt/pasta
connected guest and if there was we'd need an entirely different mechanism
to handle it.
However, we have some logic to deal with packets going in that reverse
direction. Remove it, since it can't ever be used that way. While we're
there use defines for the ICMPv6 types, instead of open coded type values.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We initialise the address portion of the sockaddr for sendto() to the
unspecified address, but then always overwrite it with the actual
destination address before we call the sendto().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We set the port to the ICMP id on the sendto() address when using ICMP
ping sockets. However, this has no effect: the ICMP id the kernel
uses is determined only by the "port" on the socket's *bound* address
(which is constructed inside sock_l4(), using the id we also pass to
it).
For unclear reasons this change triggers cppcheck 2.13.0 to give new
"variable could be const pointer" warnings, so make *ih const as well to
fix that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In a number of places we pass around a struct timespec representing the
(more or less) current time. Sometimes we call it 'now', and sometimes we
call it 'ts'. Standardise on the more informative 'now'.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
sock_l4() takes NULL for ifname if you don't want to bind the socket to a
particular interface. However, for a number of the callers, it's more
natural to use an empty string for that case. Change sock_l4() to accept
either NULL or an empty string equivalently, and simplify some callers
using that change.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We go to some trouble, if the configured output address is unspecified, to
pass NULL to sock_l4(). But while passing NULL is one way to get sock_l4()
not to specify a bind address, passing the "any" address explicitly works
too. Use this to simplify icmp_tap_handler().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We already define IN4ADDR_LOOPBACK_INIT to initialise a struct in_addr to
the loopback address, make a similar one for the unspecified / any address.
This avoids messying things with the internal structure of struct in_addr
where we don't care about it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For now, packets passed to the various *_tap_handler() functions always
come from the single "tap" interface. We want to allow the possibility to
broaden that in future. As preparation for that, have the code in tap.c
pass the pif id of the originating interface to each of those handler
functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have several workarounds for a clang-tidy bug where the checker doesn't
recognize that a number of system calls write to - and therefore initialise
- a socket address. We can't neatly use a suppression, because the bogus
warning shows up some time after the actual system call, when we access
a field of the socket address which clang-tidy erroneously thinks is
uninitialised.
Consolidate these workarounds into one place by using macros to implement
wrappers around affected system calls which add a memset() of the sockaddr
to silence clang-tidy. This removes the need for the individual memset()
workarounds at the callers - and the somewhat longwinded explanatory
comments.
We can then use a #define to not include the hack in "real" builds, but
only consider it for clang-tidy.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The tap code passes the IPv4 or IPv6 destination address of packets it
receives to the protocol specific code. Currently that protocol code
doesn't use the source address, but we want it to in future. So, in
preparation, pass the IPv4/IPv6 source address of tap packets to those
functions as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have different epoll type values for ICMP and ICMPv6 sockets, but they
both call the same handler function, icmp_sock_handler(). However that
function does essentially nothing in common for the two cases. So, split
it into icmp_sock_handler() and icmpv6_sock_handler() and dispatch them
separately from the top level.
While we're there remove some parameters that the function was never using
anyway. Also move the test for c->no_icmp into the functions, so that all
the logic specific to ICMP is within the handler, rather than in the top
level dispatch code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The epoll_ref type includes fields for the IP protocol of a socket, and the
socket fd. However, we already have a few things in the epoll which aren't
protocol sockets, and we may have more in future. Rename these fields to
an abstract "fd type" and file descriptor for more generality.
Similarly, rather than using existing IP protocol numbers for the type,
introduce our own number space. For now these just correspond to the
supported protocols, but we'll expand on that in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
union epoll_ref has a deeply nested set of structs and unions to let us
subdivide it into the various different fields we want. This means that
referencing elements can involve an awkward long string of intermediate
fields.
Using C11 anonymous structs and unions lets us do this less clumsily.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.
Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.
Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I didn't notice earlier: libslirp (and slirp4netns) supports binding
outbound sockets to specific IPv4 and IPv6 addresses, to force the
source addresse selection. If we want to claim feature parity, we
should implement that as well.
Further, Podman supports specifying outbound interfaces as well, but
this is simply done by resolving the primary address for an interface
when the network back-end is started. However, since kernel version
5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for
non-root users"), we can actually bind to a specific interface name,
which doesn't need to be validated in advance.
Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses,
and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets
to given interfaces.
Given that it probably makes little sense to select addresses and
routes from interfaces different than the ones given for outbound
sockets, also assign those as "template" interfaces, by default,
unless explicitly overridden by '-i'.
For ICMP and UDP, we call sock_l4() to open outbound sockets, as we
already needed to bind to given ports or echo identifiers, and we
can bind() a socket only once: there, pass address (if any) and
interface (if any) for the existing bind() and setsockopt() calls.
For TCP, in general, we wouldn't otherwise bind sockets. Add a
specific helper to do that.
For UDP outbound sockets, we need to know if the final destination
of the socket is a loopback address, before we decide whether it
makes sense to bind the socket at all: move the block mangling the
address destination before the creation of the socket in the IPv4
path. This was already the case for the IPv6 path.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We recently corrected some errors handling the endianness of IPv4
addresses. These are very easy errors to make since although we mostly
store them in network endianness, we sometimes need to manipulate them in
host endianness.
To reduce the chances of making such mistakes again, change to always using
a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network
endian addresses. This makes it harder to accidentally do arithmetic or
comparisons on such addresses as if they were host endian.
We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to
directly work with struct in_addr values. This has the additional benefit
of making the IPv4 and IPv6 paths more visually similar.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The INADDR_LOOPBACK constant is in host endianness, and similarly the
IN_MULTICAST macro expects a host endian address. However, there are some
places in passt where we use those with network endian values. This means
that passt will incorrectly allow you to set 127.0.0.1 or a multicast
address as the guest address or DNS forwarding address. Add the necessary
conversions to correct this.
INADDR_ANY and INADDR_BROADCAST logically behave the same way, although
because they're palindromes it doesn't have an effect in practice. Change
them to be logically correct while we're there, though.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any
reply we send: we're on the same host as the target, after all. We
discard them by comparing the last sequence we sent with the sequence
we receive.
However, on the first reply for a given identifier, the sequence
might be zero, depending on the implementation of ping(8): we need
another value to indicate we haven't sent any sequence number, yet.
Use -1 as initialiser in the echo identifier map.
This is visible with Busybox's ping, and was reported by Paul on the
integration at https://github.com/containers/podman/pull/16141, with:
$ podman run --net=pasta alpine ping -c 2 192.168.188.1
...where only the second reply would be routed back.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Fixes: 33482d5bf2 ("passt: Add PASTA mode, major rework")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
tap_ip4_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation. By using a suitable
helper we can split it into tap_udp4_send() and tap_icmp4_send() functions
without greatly increasing the code size, this removing that layering
violation.
We make some small changes to the interface while there. In both cases
we make the destination IPv4 address a parameter, which will be useful
later. For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header. For the ICMP variant we pass in the ICMP
header as before. The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tap_ip6_send() has special case logic to compute the checksums for UDP
and ICMP packets, which is a mild layering violation. By using a suitable
helper we can split it into tap_udp6_send() and tap_icmp6_send() functions
without greatly increasing the code size, this removing that layering
violation.
We make some small changes to the interface while there. In both cases
we make the destination IPv6 address a parameter, which will be useful
later. For the UDP variant we make it take just the UDP payload, and it
will generate the UDP header. For the ICMP variant we pass in the ICMP
header as before. The inconsistency is because that's what seems to be
the more natural way to invoke the function in the callers in each case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and
it turns out that every caller (statically) knows if it is using IPv4 or
IPv6. So split into separate tap_ip4_send() and tap_ip6_send() functions.
Use a new tap_l2_hdr() function for the very small common part.
While we're there, make some minor cleanups:
- We were double writing some fields in the IPv6 header, so that it
temporary matched the pseudo-header for checksum calculation. With
recent checksum reworks, this isn't neccessary any more.
- We don't use any IPv4 header options, so use some sizeof() constructs
instead of some open coded values for header length.
- The comment used to say that the flow label was for TCP over IPv6, but
in fact the only thing we used it for was DHCPv6 over UDP traffic
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If we ping a link-local address, we need to pass this to sendto(), as
it will obviously fail with -EINVAL otherwise.
If we ping other addresses, it's probably a good idea anyway to
specify the configured outbound interface here.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable
SO_BINDTODEVICE for non-root users"), we can bind sockets to
interfaces, if they haven't been bound yet (as in bind()).
Introduce an optional interface specification for forwarded ports,
prefixed by %, that can be passed together with an address.
Reported use case: running local services that use ports we want
to have externally forwarded:
https://github.com/containers/podman/issues/14425
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>