passt

Author	SHA1	Message	Date
David Gibson	fbd78b6f3e	udp: Rename IOV and mmsghdr arrays Make the salient points about these various arrays clearer with renames: * udp_l2_iov_sock and udp[46]_l2_mh_sock don't really have anything to do with L2. They are, however, specific to receiving not sending. Rename to udp_iov_recv and udp[46]_mh_recv. * udp[46]_l2_iov_tap is redundant - "tap" implies L2 and vice versa. Rename to udp[46]_l2_iov * udp[46]_localname are (for now) pre-populated with the local address but the more salient point is that these are the destination address for the splice arrays. Rename to udp[46]_splice_to Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-05 15:26:30 +02:00
David Gibson	f62c33d85f	udp: Pass full epoll reference through more of sock handler path udp_buf_sock_handler() takes the epoll reference from the receiving socket, and passes the UDP relevant part on to several other functions. Future changes are going to need several different epoll types for UDP, and to pass that information through to some of those functions. To avoid extra noise in the patches making the real changes, change those functions now to take the full epoll reference, rather than just the UDP part. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-05 15:26:28 +02:00
David Gibson	74c1c5efcf	util: sock_l4() determine protocol from epoll type rather than the reverse sock_l4() creates a socket of the given IP protocol number, and adds it to the epoll state. Currently it determines the correct tag for the epoll data based on the protocol. However, we have some future cases where we might want different semantics, and therefore epoll types, for sockets of the same protocol. So, change sock_l4() to take the epoll type as an explicit parameter, and determine the protocol from that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-05 15:26:09 +02:00
Stefano Brivio	1ee2ecade3	udp: Reduce scope of rport in udp_invert_portmap() cppcheck 2.14 warns that the scope of the rport variable could be reduced: do that, as reverted commit `c80fa6a6bb` ("udp: Make rport calculation more local") did, but keep the temporary variable of in_port_t type, otherwise the sum gets promoted to int. While at it, add a comment explaining why we calculate rport like this instead of directly using the sum as array index. Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-06-24 15:41:38 +02:00
Stefano Brivio	054697598f	Revert "udp: Make rport calculation more local" This reverts commit `c80fa6a6bb`, as it reintroduces the issue fixed by commit `1e6f92b995` ("udp: Fix 16-bit overflow in udp_invert_portmap()"). Reported-by: Laurent Jacquot <jk@lutty.net> Link: https://bugs.passt.top/show_bug.cgi?id=80 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-06-24 15:41:38 +02:00
David Gibson	7e87bd98ac	udp: Move management of udp[46]_localname into udp_splice_send() Mostly, udp_sock_handler() is independent of how the datagrams it processes will be forwarded (tap or splice). However, it also updates the msg_name fields for spliced sends, which doesn't really make sense here. Move it into udp_splice_send() which is all about spliced sends. This does potentially mean we'll update the field to the same value several times, but we're going to need this in future anyway: with the extensions the flow table allows, it might not be the same value each time after all. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-14 12:11:46 +02:00
David Gibson	ff57f8ddc6	udp: Rework how we divide queued datagrams between sending methods udp_sock_handler() takes a number of datagrams from sockets that depending on their addresses could be forwarded either to the L2 interface ("tap") or to another socket ("spliced"). In the latter case we can also only send packets together if they have the same source port, and therefore are sent via the same socket. To reduce the total number of system calls we gather contiguous batches of datagrams with the same destination interface and socket where applicable. The determination of what the target is is made by udp_mmh_splice_port(). It returns the source port for splice packets and -1 for "tap" packets. We find batches by looking ahead in our queue until we find a datagram whose "splicefrom" port doesn't match the first in our current batch. udp_mmh_splice_port() is moderately expensive, and unfortunately we can call it twice on the same datagram: once as the (last + 1) entry in one batch (to check it's not in that batch), then again as the first entry in the next batch. Avoid this by keeping track of the "splice port" in the metadata structure, and filling it in one entry ahead of the one we're currently considering. This is a bit subtle, but not that hard. It will also generalise better when we have more complex possibilities based on the flow table. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-14 12:11:42 +02:00
David Gibson	63db7dcdbf	udp: Fold checking of splice flag into udp_mmh_splice_port() udp_mmh_splice_port() is used to determine if a UDP datagram can be "spliced" (forwarded via a socket instead of tap). We only invoke it if the origin socket has the 'splice' flag set. Fold the checking of the flag into the helper itself, which makes the caller simpler. It does mean we have a loop looking for a batch of spliceable or non-spliceable packets even in the case where the flag is clear. This shouldn't be that expensive though, since each call to udp_mmh_splice_port() will return without accessing memory in that case. In any case we're going to need a similar loop in more cases with upcoming flow table work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-14 12:11:39 +02:00
Laurent Vivier	0c335d751a	vhost-user: compare mode MODE_PASTA and not MODE_PASST As we are going to introduce the MODE_VU that will act like the mode MODE_PASST, compare to MODE_PASTA rather than to add a comparison to MODE_VU when we check for MODE_PASST. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-13 15:45:38 +02:00
Laurent Vivier	377b666dc9	udp: rename udp_sock_handler() to udp_buf_sock_handler() We are going to introduce a variant of the function to use vhost-user buffers rather than passt internal buffers. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-13 15:45:34 +02:00
Laurent Vivier	e7ac995217	udp: refactor UDP header update functions This commit refactors the udp_update_hdr4() and udp_update_hdr6() functions to improve code portability by replacing the udp_meta_t parameter with more specific parameters for the IPv4 and IPv6 headers (iphdr/ipv6hdr) and the source socket address (sockaddr_in/sockaddr_in6). It also moves the tap_hdr_update() function call inside the udp_tap_send() function not to have to pass the TAP header to udp_update_hdr4() and udp_update_hdr6() This refactor reduces complexity by making the functions more modular and ensuring that each function operates on more narrowly scoped data structures. This will facilitate future backend introduction like vhost-user. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-13 15:45:32 +02:00
David Gibson	c80fa6a6bb	udp: Make rport calculation more local cppcheck 2.14.1 complains about the rport variable not being in as small as scope as it could be. It's also only used once, so we might as well just open code the calculation for it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-07 20:44:44 +02:00
David Gibson	1ba76c9e8c	udp: Single buffer for IPv4, IPv6 headers and metadata Currently we have separate arrays for IPv4 and IPv6 which contain the headers for guest-bound packets, and also the originating socket address. We can combine these into a single array of "metadata" structures with space for both pre-cooked IPv4 and IPv6 headers, as well as shared space for the tap specific header and socket address (using sockaddr_inany). Because we're using IOVs to separately address the pieces of each frame, these structures don't need to be packed to keep the headers contiguous so we can more naturally arrange for the alignment we want. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:52 +02:00
David Gibson	d4598e1d18	udp: Use the same buffer for the L2 header for all frames Currently each tap-bound frame buffer has room for its own ethernet header. However the ethernet header is always the same for such frames, so now that we're indirectly referencing the ethernet header via iov, we can use a single buffer for all of them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:49 +02:00
David Gibson	6170688616	udp: Share payload buffers between IPv4 and IPv6 Currently the IPv4 and IPv6 paths unnecessarily use different buffers for the UDP payload. Now that we're handling the various pieces of the UDP packets with an iov, we can split the payload part of the buffers off into its own array shared between IPv4 and IPv6. As well as saving a little memory, this allows the payload buffers to be neatly page aligned. With the buffers merged, udp[46]_l2_iov_sock contain exactly the same thing as each other and can also be merged. Likewise udp[46]_iov_splice can be merged together. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:46 +02:00
David Gibson	2d16946bac	udp: Explicitly set checksum in guest-bound UDP headers For IPv4, UDP checksums are optional and can just be set to 0. udp_update_hdr4() ignores the checksum field entirely. Since these are set to 0 during startup, this works as intended for now. However, we'd like to share payload and UDP header buffers betweem IPv4 and IPv6, which does calculate UDP checksums. Therefore, for robustness, we should explicitly set the checksum field to 0 for guest-bound UDP packets. In the tap_udp4_send() slow path, however, we do allow IPv4 UDP checksums to be calculated as a compile time option. For consistency, use the same thing in the udp_update_hdr4() path, which will typically initialize to 0, but calculate a real checksum if configured to do so. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:44 +02:00
David Gibson	6c4d26a364	udp: Combine initialisation of IPv4 and IPv6 iovs We're going to introduce more sharing between the IPv4 and IPv6 buffer structures. Prepare for this by combinng the initialisation functions. While we're at it remove the misleading "sock" from the name since these initialise both tap side and sock side structures. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:41 +02:00
David Gibson	3f9bd867b5	udp: Split tap-bound UDP packets into multiple buffers using io vector When sending to the tap device, currently we assemble the headers and payload into a single contiguous buffer. Those are described by a single struct iovec, then a batch of frames is sent to the device with tap_send_frames(). In order to better integrate the IPv4 and IPv6 paths, we want the IP header in a different buffer that might not be contiguous with the payload. To prepare for that, split the UDP packet into an iovec of buffers. We use the same split that Laurent recently introduced for TCP for convenience. This removes the last use of tap_hdr_len_(), tap_frame_base() and tap_frame_len(), so remove those too. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:38 +02:00
David Gibson	3559899586	iov: Helper macro to construct iovs covering existing variables or fields Laurent's recent changes mean we use IO vectors much more heavily in the TCP code. In many of those cases, and few others around the code base, individual iovs of these vectors are constructed to exactly cover existing variables or fields. We can make initializing such iovs shorter and clearer with a macro for the purpose. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:31 +02:00
David Gibson	5566386f5f	treewide: Standardise variable names for various packet lengths At various points we need to track the lengths of a packet including or excluding various different sets of headers. We don't always use the same variable names for doing so. Worse in some places we use the same name for different things: e.g. tcp_fill_headers[46]() use ip_len for the length including the IP headers, but then tcp_send_flag() which calls it uses it to mean the IP payload length only. To improve clarity, standardise on these names: dlen: L4 protocol payload length ("data length") l4len: plen + length of L4 protocol header l3len: l4len + length of IPv4/IPv6 header l2len: l3len + length of L2 (ethernet) header Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:23 +02:00
David Gibson	9e22c53aa9	checksum: Make csum_ip4_header() take a host endian length csum_ip4_header() takes the packet length as a network endian value. In general it's very error-prone to pass non-native-endian values as a raw integer. It's particularly bad here because this differs from other checksum functions (e.g. proto_ipv4_header_psum()) which take host native lengths. It turns out all the callers have easy access to the native endian value, so switch it to use host order like everything else. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:21 +02:00
David Gibson	34fb381b5a	tap: Split tap specific and L2 (ethernet) headers In some places (well, actually only UDP now) we use struct tap_hdr to represent both tap backend specific and L2 ethernet headers. Handling these together seemed like a good idea at the time, but Laurent's changes in the TCP code working towards vhost-user support suggest that treating them separately is more useful, more often. Alter struct tap_hdr to represent only the TAP backend specific headers. Updated related helpers and the UDP code to match. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-05-02 16:13:08 +02:00
David Gibson	0804fdbc28	udp: Correctly look up outbound socket with port remappings Commit `bb9bf0bb` ("tcp, udp: Don't precompute port remappings in epoll references") changed the epoll reference for UDP sockets to include the bound port as seen by the socket itself, rather than the bound port it would be translated to on the guest side. As a side effect, it also means that udp_tap_map[] is indexed by the bound port on the host side, rather than on the guest side. This is consistent and a good idea, however we forgot to account for it when finding the correct outgoing socket for packets originating in the guest. This means that if forwarding UDP inbound with a port number change, reply packets would be misdirected. Fix this by applying the reverse mapping before looking up the socket in udp_tap_handler(). While we're at it, use 'port' directly instead of 'uref.port' in udp_sock_init(). Those now always have the same value - failing to realise that is the same error as above. Reported-by: Laurent Jacquot <jk@lutty.net> Link: https://bugs.passt.top/show_bug.cgi?id=87 Fixes: `bb9bf0bb8f` ("tcp, udp: Don't precompute port remappings in epoll references") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-04-25 00:00:34 +02:00
Stefano Brivio	d989eae308	udp: Translate source address of resolver only for DNS remapped queries Paul reports that if pasta is configured with --dns-forward, and the container queries a resolver which is configured on the host directly, without using the address given for --dns-forward, we'll translate the source address of the response pretending it's coming from the address passed as --dns-forward, and the client will discard the reply. That is, $ cat /etc/resolv.conf 198.51.100.1 $ pasta --config-net --dns-forward 192.0.2.1 nslookup passt.top will not work, because we change the source address of the reply from 198.51.100.1 to 192.0.2.1. But the client contacted 198.51.100.1, and it's from that address that it expects an answer. Add a PORT_DNS_FWD flag for tap-facing ports, which is triggered by activity in the opposite direction as the other flags. If the tap-facing port was seen sending a DNS query that was remapped, we'll remap the source address of the response, otherwise we'll leave it unaffected. Reported-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-03-18 08:57:40 +01:00
David Gibson	d3eb0d7b59	tap: Rename tap_iov_{base,len} These two functions are typically used to calculate values to go into the iov_base and iov_len fields of a struct iovec. They don't have to be used for that, though. Rename them in terms of what they actually do: calculate the base address and total length of the complete frame, including both L2 and tap specific headers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-14 16:57:43 +01:00
David Gibson	2d0e0084b6	tap: Extend tap_send_frames() to allow multi-buffer frames tap_send_frames() takes a vector of buffers and requires exactly one frame per buffer. We have future plans where we want to have multiple buffers per frame in some circumstances, so extend tap_send_frames() to take the number of buffers per frame as a parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Improve comment to rembufs calculation] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-14 16:57:28 +01:00
David Gibson	413c15988e	udp: Use existing helper for UDP checksum on inbound IPv6 packets Currently we open code the calculation of the UDP checksum in udp_update_hdr6(). We calling a helper to handle the IPv6 pseudo-header, and preset the checksum field to 0 so an uninitialised value doesn't get folded in. We already have a helper to do this: csum_udp6() which we use in some slow paths. Use it here as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:25 +01:00
David Gibson	ae69838db0	udp: Avoid unnecessary pointer in udp_update_hdr4() We carry around the source address as a pointer to a constant struct in_addr. But it's silly to carry around a 4 or 8 byte pointer to a 4 byte IPv4 address. Just copy the IPv4 address around by value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:21 +01:00
David Gibson	b0419d150a	udp: Re-order udp_update_hdr[46] for clarity and brevity The order of things in these functions is a bit odd for historical reasons. We initialise some IP header fields early, the more later after making some tests. Likewise we declare some variables without initialisation, but then unconditionally set them to values we could calculate at the start of the function. Previous cleanups have removed the reasons for some of these choices, so reorder for clarity, and where possible move the first assignment into an initialiser. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:19 +01:00
David Gibson	8a842e03cd	udp: Pass data length explicitly to to udp_update_hdr[46] These functions take an index to the L2 buffer whose header information to update. They use that for two things: to locate the buffer pointer itself, and to retrieve the length of the received message from the paralllel udp[46]_l2_mh_sock array. The latter is arguably a failure to separate concerns. Change these functions to explicitly take a buffer pointer and payload length as parameters. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:17 +01:00
David Gibson	76571ae869	udp: Consistent port variable names in udp_update_hdr[46] In these functions we have 'dstport' for the destination port, but 'src_port' for the source port. Change the latter to 'srcport' for consistency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:37:15 +01:00
David Gibson	205b140dec	udp: Refactor udp_sock[46]_iov_init() Each of these functions have 3 essentially identical loops in a row. Merge the loops into a single common udp_sock_iov_init() function, calling udp_sock[46]_iov_init_one() helpers to initialize each "slot" in the various parallel arrays. This is slightly neater now, and more naturally allows changes we want to make where more initialization will become common between IPv4 and IPv6. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-13 14:36:59 +01:00
Laurent Vivier	6b22e10a26	tap: make tap_update_mac() generic Use ethhdr rather than tap_hdr. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-9-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-06 08:03:49 +01:00
Laurent Vivier	7df624e79a	checksum: introduce functions to compute the header part checksum for TCP/UDP The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-ID: <20240303135114.1023026-8-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-06 08:03:47 +01:00
Laurent Vivier	feb4900c25	checksum: use csum_ip4_header() in udp.c and tcp.c We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-7-lvivier@redhat.com> [dwg: Fix weird cppcheck regression; it appears to be a problem in pre-existing code, but somehow this patch is exposing it] Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-06 08:03:44 +01:00
Laurent Vivier	e82b4fe5fc	udp: little cleanup in udp_update_hdrX() to prepare future changes in udp_update_hdr4(): Assign the source address to src, either b->s_in.sin_addr, c->ip4.dns_match or c->ip4.gw and then set b->iph.saddr to src->s_addr. in udp_update_hdr6(): Assign the source address to src, either b->s_in6.sin6_addr, c->ip6.dns_match, c->ip6.gw or c->ip6.addr_ll. Assign the destination to dst, either c->ip6.addr_seen or &c->ip6.addr_ll_seen. Then set dst to b->ip6h.daddr and src to b->ip6h.saddr. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-ID: <20240303135114.1023026-6-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-06 08:03:41 +01:00
Laurent Vivier	324bd46782	util: move IP stuff from util.[ch] to ip.[ch] Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-5-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-03-06 08:03:38 +01:00
David Gibson	3b9098aa49	fwd: Rename port_fwd.[ch] and their contents Currently port_fwd.[ch] contains helpers related to port forwarding, particular automatic port forwarding. We're planning to allow much more flexible sorts of forwarding, including both port translation and NAT based on the flow table. This will subsume the existing port forwarding logic, so rename port_fwd.[ch] to fwd.[ch] with matching updates to all the names within. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 09:48:27 +01:00
David Gibson	bb9bf0bb8f	tcp, udp: Don't precompute port remappings in epoll references The epoll references for both TCP listening sockets and UDP sockets includes a port number. This gives the destination port that traffic to that socket will be sent to on the other side. That will usually be the same as the socket's bound port, but might not if the -t, -u, -T or -U options are given with different original and forwarded port numbers. As we move towards a more flexible forwarding model for passt, it's going to become possible for that destination port to vary depending on more things (for example the source or destination address). So, it will no longer make sense to have a fixed value for a listening socket. Change to simpler semantics where this field in the reference gives the bound port of the socket. We apply the translations to the correct destination port later on, when we're actually forwarding. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 09:47:40 +01:00
David Gibson	0cf6b2d89d	inany: Provide more conveniently typed constants for special addresses Our inany_addr type is used in some places to represent either IPv4 or IPv6 addresses, and we plan to use it more widely. We don't yet provide constants of this type for special addresses (loopback and "any"). Add some of these, both the IPv4 and IPv6 variants of those addresses, but typed as union inany_addr. To avoid actually adding more things to .data we can use some macros and casting to overlay the IPv6 versions of these with the standard library's in6addr_loopback and in6addr_any. For the IPv4 versions we need to create new constant globals. For complicated historical reasons, the standard library doesn't provide constants for IPv4 loopback and any addresses as struct in_addr. It just has macros of type in_addr_t == uint32_t, which has some gotchas w.r.t. endianness. We can use some more macros to address this lack, using macros to effectively create these IPv4 constants as pieces of the inany constants above. We use this last to avoid some awkward temporary variables just used to get an address of an IPv4 loopback address. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 09:47:28 +01:00
David Gibson	90f1d3b354	udp: Remove unnecessary test for unspecified addr_out If the configured output address is unspecified, we don't set the bind address to it when creating a new socket in udp_tap_handler(). That sounds sensible, but what we're leaving the bind address as is, exactly, the unspecified address, so this test makes no difference. Remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 06:23:47 +01:00
David Gibson	745fa38169	udp: Fix incorrect usage of IPv6 state in IPv4 path When forwarding IPv4 packets in udp_tap_handler(), we incorrectly use an IPv6 address test on our IPv4 address (which could cause an out of bounds access), and possibly set our bind interface to the IPv6 interface based on it. Adjust to correctly look at the IPv4 address and IPv4 interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 05:41:03 +01:00
David Gibson	deea5a8437	udp: Small streamline to udp_update_hdr4() Streamline the logic here slightly, by introducing a 'src' temporary for brevity. We also transform the logic for setting/clearing PORT_LOOPBACK. This makes udp_update_hdr4() more closely match the corresponding logic from udp_update_udp6(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 05:41:01 +01:00
David Gibson	bc2d0d381c	udp: Set pif in epoll reference for ephemeral host sockets The udp_epoll_ref contains a field for the pif to which the socket belongs. We fill this in for permanent sockets created with udp_sock_init() and for spliced sockets, however, we omit it for ephemeral sockets created for tap originated flows. This is a bug, although we currently get away with it, because we don't consult that field for such flows. Correctly fill it in. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 05:40:59 +01:00
David Gibson	720d777a69	udp: Don't attempt to translate a 0.0.0.0 source address If an incoming packet has a source address of 0.0.0.0 we translate that to the gateway address. This doesn't really make sense, because we have no way to do a reverse translation for reply packets. Certain UDP protocols do use an unspecified source address in some circumstances (e.g. DHCP). These generally either require no reply, a multicast reply, or provide a suitable reply address by other means. In none of those cases does translating it in passt/pasta make sense. The best we can really do here is just leave it as is. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-29 05:40:46 +01:00
David Gibson	4e08d9b9c6	treewide: Use sa_family_t for address family variables Sometimes we use sa_family_t for variables and parameters containing a socket address family, other times we use a plain int. Since sa_family_t is what's actually used in struct sockaddr and friends, standardise on that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-27 12:52:02 +01:00
David Gibson	1e6f92b995	udp: Fix 16-bit overflow in udp_invert_portmap() The code in udp_invert_portmap() is written based on an incorrect understanding of C's (arcane) integer promotion rules. We calculate '(in_port_t)i + delta' expecting the result to be of type in_port_t (16 bits). However "small integer types" (those narrower than 'int') are always promoted to int for expressions, meaning this calculation can overrun the rdelta[] array. Fix this, and use a new intermediate for the index, to make it very clear what it's type is. We also change i to unsigned, to avoid any possible confusion from mixing signed and unsigned types. Link: https://bugs.passt.top/show_bug.cgi?id=80 Reported-by: Laurent Jacquot <jk@lutty.net> Suggested-by: Laurent Jacquot <jk@lutty.net> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-20 08:14:28 +01:00
David Gibson	8954c4a91b	udp: Assertion in udp_invert_portmap() can be calculated at compile time All the values in this ASSERT() are known at compile time, so this can be converted to a static_assert(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-20 08:14:08 +01:00
David Gibson	927cb84fff	udp: udp_sock_init_ns() partially duplicats udp_port_rebind_outbound() Usually automatically forwarded UDP outbound ports are set up by udp_port_rebind_outbound() called from udp_timer(). However, the very first time they're created and bound is by udp_sock_init_ns() called from udp_init(). udp_sock_init_ns() is essentially an unnecessary cut down version of udp_port_rebind_outbound(), so we can jusat remove it. Doing so does require moving udp_init() below udp_port_rebind_outbound()'s definition. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-14 03:24:23 +01:00
David Gibson	96ad5c5acd	udp: Don't prematurely (and incorrectly) set up automatic inbound forwards For automated inbound port forwarding in pasta mode we scan bound ports within the guest namespace via /proc and bind matching ports on the host to listen for packets. For UDP this is usually handled by udp_timer() which calls port_fwd_scan_udp() followed by udp_port_rebind(). However there's one initial scan before the the UDP timer is started: we call port_fwd_scan_udp() from port_fwd_init(), and actually bind the resulting ports in udp_sock_init_init() called from udp_init(). Unfortunately, the version in udp_sock_init_init() isn't correct. It unconditionally opens a new socket for every forwarded port, even if a socket has already been explicit created with the -u option. If the explicitly forwarded ports have particular configuration (such as a specific bound address address, or one implied by the -o option) those will not be replicated in the new socket. We essentially leak the original correctly configured socket, replacing it with one which might not be right. We could make udp_sock_init_init() use udp_port_rebind() to get that right, but there's actually no point doing so: * The initial bind was introduced by `ccf6d2a7b4` ("udp: Actually bind detected namespace ports in init namespace") at which time we didn't periodically scan for bound UDP ports. Periodic scanning was introduced in `457ff122e` ("udp,pasta: Periodically scan for ports to automatically forward") making the bind from udp_init() redundant. * At the time of udp_init(), programs in the guest namespace are likely not to have started yet (unless attaching a pre-existing namespace) so there's likely not anything to scan for anyway. So, simply remove the initial, broken socket create/bind, allowing automatic port forwards to be created the first time udp_timer() runs. Reported-by: Laurent Jacquot <jk@lutty.net> Suggested-by: Laurent Jacquot <jk@lutty.net> Link: https://bugs.passt.top/show_bug.cgi?id=79 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-02-14 03:24:01 +01:00

1 2 3 4

189 commits