passt

mirror of https://passt.top/passt synced 2025-05-30 04:45:34 +02:00

Author	SHA1	Message	Date
Laurent Vivier	3262c9b088	iov: Standardize function comment headers Update function comment headers in iov.c to a consistent and standardized format. This change ensures: - Comment blocks for functions consistently start with /**. - Function names in the comment summary line include parentheses (). This improves overall comment clarity and uniformity within the file. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-16 18:27:13 +02:00
Laurent Vivier	b915375a42	virtio: Correct and align comment headers Standardize and fix issues in `virtio.c` and `virtio.h` comment headers. Improvements include: - Added `()` to function names in comment summaries. - Added colons after parameter and enum member tags. - Changed `/` to `/*` for `virtq_avail_event()` comment. - Fixed typos (e.g., "file"->"fill", "virqueue"->"virtqueue"). - Added missing `Return:` tag for `vu_queue_rewind()`. - Corrected parameter names in `virtio.h` comments to match code. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-16 18:27:11 +02:00
Laurent Vivier	2fd0944f21	vhost_user: Correct and align function comment headers This commit cleans up function comment headers in vhost_user.c to ensure accuracy and consistency with the code. Changes include correcting parameter names in comments and signatures (e.g., standardizing on vmsg for vhost messages, fixing dev to vdev), updating function names in comment descriptions, and removing/rectifying erroneous parameter documentation. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-16 18:27:08 +02:00
Laurent Vivier	2046976866	codespell: Correct typos in comments and error message This commit addresses several spelling errors identified by the `codespell` tool. The corrections apply to: - Code comments in `fwd.c`, `ip.h`, `isolation.c`, and `log.c`. - An error message string in `vhost_user.c`. Specifically, the following misspellings were corrected: - "adddress" to "address" - "capabilites" to "capabilities" - "Musn't" to "Mustn't" - "calculatd" to "calculated" - "Invalide" to "Invalid" Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-15 18:06:30 +02:00
Laurent Vivier	4234ace84c	test: Display count of skipped tests in status and summary This commit enhances test reporting by tracking and displaying the number of skipped tests. The skipped test count is now visible in the tmux status bar during execution and included in the final test summary log. This provides a more complete overview of test suite results. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-15 18:06:19 +02:00
Laurent Vivier	2d3d69c5c3	flow: Fix clang error (clang-analyzer-security.PointerSub) Fixes the following clang-analyzer warning: flow_table.h:96:25: note: Subtraction of two pointers that do not point into the same array is undefined behavior 96 \| return (union flow *)f - flowtab; The `flow_idx()` function is called via `FLOW_IDX()` from `flow_foreach_slot()`, where `f` is set to `&flowtab[idx].f`. Therefore, `f` and `flowtab` do point to the same array. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-14 17:51:37 +02:00
Laurent Vivier	0f7bf10b0a	ndp: Fix Clang analyzer warning (clang-analyzer-security.PointerSub) Addresses Clang warning: "Subtraction of two pointers that do not point into the same array is undefined behavior" for the line: `ndp_send(c, dst, &ra, ptr - (unsigned char *)&ra);` Here, `ptr` is `&ra.var[0]`. The subtraction calculates the offset of `var[0]` within the `struct ra_options ra`. Since `ptr` points inside `ra`, this pointer arithmetic is well-defined for calculating the size of the data to send, even if `ptr` and `&ra` are not strictly considered part of the same "array" by the analyzer. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-14 17:51:35 +02:00
Laurent Vivier	a6b9832e49	virtio: Fix Clang warning (bugprone-sizeof-expression, cert-arr39-c) In `virtqueue_read_indirect_desc()`, the pointer arithmetic involving `desc` is intentional. We add the length in bytes (`read_len`) divided by the size of `struct vring_desc` to `desc`, which is an array of `struct vring_desc`. This correctly calculates the offset in terms of the number of `struct vring_desc` elements. Clang issues the following warning due to this explicit scaling: virtio.c:238:8: error: suspicious usage of 'sizeof(...)' in pointer arithmetic; this scaled value will be scaled again by the '+=' operator [bugprone-sizeof-expression,cert-arr39-c,-Werror] 238 \| desc += read_len / sizeof(struct vring_desc); \| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~ virtio.c:238:8: note: '+=' in pointer arithmetic internally scales with 'sizeof(struct vring_desc)' == 16 This behavior is intended, so the warning can be considered a false positive in this context. The code correctly advances the pointer by the desired number of descriptor entries. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-14 17:51:32 +02:00
Laurent Vivier	570e7b4454	dhcpv6: fix GCC error (unterminated-string-initialization) The string STR_NOTONLINK is intentionally not NUL-terminated. Ignore the GCC error using __attribute__((nonstring)). This error is reported by GCC 15.1.1 on Fedora 42. However, Clang 20.1.3 does not support __attribute__((nonstring)). Therefore, NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) is also added to suppress Clang's unknown attribute warning. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-14 17:51:20 +02:00
Laurent Vivier	8ec134109e	flow: close socket fd on error In `eea8a76caf` ("flow: fix podman issue #26073"), we unregister the fd from epoll_ctl() in case of error, but we also need to close it. As flowside_sock_l4() also calls sock_l4_sa() via flowside_sock_splice() we can do it unconditionally. Fixes: `eea8a76caf` ("flow: fix podman issue #26073") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-12 21:04:57 +02:00
Laurent Vivier	92d5d68013	flow: fix wrong macro name in comments The maximum number of flow macro name is FLOW_MAX, not MAX_FLOW. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-08 13:24:14 +02:00
Laurent Vivier	eea8a76caf	flow: fix podman issue #26073 While running pasta, we trigger the following assert: ASSERTION FAILED in udp_at_sidx (udp_flow.c:35): flow->f.type == FLOW_UDP in udp_at_sidx() in the following path: 902 void udp_sock_handler(const struct ctx c, union epoll_ref ref, 903 uint32_t events, const struct timespec now) 904 { 905 struct udp_flow uflow = udp_at_sidx(ref.flowside); The invalid sidx is comming from the epoll_ref provided by epoll_wait(). This assert follows the following error: Couldn't connect flow socket: Permission denied It appears that an error happens in udp_flow_sock() and the recently created fd is not removed from the epoll_ctl() pool: 71 static int udp_flow_sock(const struct ctx c, 72 struct udp_flow *uflow, unsigned sidei) 73 { ... 82 s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side, fref.data); 83 if (s < 0) { 84 flow_dbg_perror(uflow, "Couldn't open flow specific socket"); 85 return s; 86 } 87 88 if (flowside_connect(c, s, pif, side) < 0) { 89 int rc = -errno; 90 flow_dbg_perror(uflow, "Couldn't connect flow socket"); 91 return rc; 92 } ... flowside_sock_l4() calls sock_l4_sa() that adds 's' to the epoll_ctl() pool. So to cleanly manage the error of flowside_connect() we need to remove 's' from the epoll_ctl() pool using epoll_del(). Link: https://github.com/containers/podman/issues/26073 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-07 14:42:48 +02:00
Stefano Brivio	587980ca1e	udp: Actually discard datagrams we can't forward Given that udp_sock_fwd() now loops on udp_peek_addr() to get endpoint addresses for datagrams, if we can't forward one of these datagrams, we need to make sure we actually discard it. Otherwise, with MSG_PEEK, we won't dequeue and loop on it forever. For example, if we fail to create a socket for a new flow, because, say, the destination of an inbound packet is multicast, and we can't bind() to a multicast address, the loop will look like this: 18.0563: Flow 0 (NEW): FREE -> NEW 18.0563: Flow 0 (INI): NEW -> INI 18.0563: Flow 0 (INI): HOST [127.0.0.1]:42487 -> [127.0.0.1]:9997 => ? 18.0563: Flow 0 (TGT): INI -> TGT 18.0563: Flow 0 (TGT): HOST [127.0.0.1]:42487 -> [ff02::c]:9997 => SPLICE [0.0.0.0]:42487 -> [88.198.0.164]:9997 18.0563: Flow 0 (UDP flow): TGT -> TYPED 18.0564: Flow 0 (UDP flow): HOST [127.0.0.1]:42487 -> [ff02::c]:9997 => SPLICE [0.0.0.0]:42487 -> [88.198.0.164]:9997 18.0564: Flow 0 (UDP flow): Couldn't open flow specific socket: Invalid argument 18.0564: Flow 0 (FREE): TYPED -> FREE 18.0564: Flow 0 (FREE): HOST [127.0.0.1]:42487 -> [ff02::c]:9997 => SPLICE [0.0.0.0]:42487 -> [88.198.0.164]:9997 18.0564: Discarding datagram without flow 18.0564: Flow 0 (NEW): FREE -> NEW 18.0564: Flow 0 (INI): NEW -> INI 18.0564: Flow 0 (INI): HOST [127.0.0.1]:42487 -> [127.0.0.1]:9997 => ? 18.0564: Flow 0 (TGT): INI -> TGT 18.0564: Flow 0 (TGT): HOST [127.0.0.1]:42487 -> [ff02::c]:9997 => SPLICE [0.0.0.0]:42487 -> [88.198.0.164]:9997 18.0564: Flow 0 (UDP flow): TGT -> TYPED 18.0564: Flow 0 (UDP flow): HOST [127.0.0.1]:42487 -> [ff02::c]:9997 => SPLICE [0.0.0.0]:42487 -> [88.198.0.164]:9997 18.0564: Flow 0 (UDP flow): Couldn't open flow specific socket: Invalid argument 18.0564: Flow 0 (FREE): TYPED -> FREE 18.0564: Flow 0 (FREE): HOST [127.0.0.1]:42487 -> [ff02::c]:9997 => SPLICE [0.0.0.0]:42487 -> [88.198.0.164]:9997 18.0564: Discarding datagram without flow and seen from strace: epoll_wait(3, [{events=EPOLLIN, data=0x1076c00000705}], 8, 1000) = 1 recvmsg(7, {msg_name={sa_family=AF_INET6, sin6_port=htons(55899), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::26e8:53ff:fef3:13b6", &sin6_addr), sin6_scope_id=if_nametoindex("wlp4s0")}, msg_namelen=28, msg_iov=NULL, msg_iovlen=0, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x03\x00\x00\x00"}], msg_controllen=40, msg_flags=MSG_TRUNC}, MSG_PEEK\|MSG_DONTWAIT) = 0 socket(AF_INET6, SOCK_DGRAM\|SOCK_NONBLOCK, IPPROTO_UDP) = 12 setsockopt(12, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0 setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 setsockopt(12, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0 setsockopt(12, SOL_IPV6, IPV6_RECVPKTINFO, [1], 4) = 0 bind(12, {sa_family=AF_INET6, sin6_port=htons(1900), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "ff02::c", &sin6_addr), sin6_scope_id=0}, 28) = -1 EINVAL (Invalid argument) close(12) = 0 recvmsg(7, {msg_name={sa_family=AF_INET6, sin6_port=htons(55899), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fe80::26e8:53ff:fef3:13b6", &sin6_addr), sin6_scope_id=if_nametoindex("wlp4s0")}, msg_namelen=28, msg_iov=NULL, msg_iovlen=0, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32, cmsg_data="\xff\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x03\x00\x00\x00"}], msg_controllen=40, msg_flags=MSG_TRUNC}, MSG_PEEK\|MSG_DONTWAIT) = 0 socket(AF_INET6, SOCK_DGRAM\|SOCK_NONBLOCK, IPPROTO_UDP) = 12 setsockopt(12, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0 setsockopt(12, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 setsockopt(12, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0 setsockopt(12, SOL_IPV6, IPV6_RECVPKTINFO, [1], 4) = 0 bind(12, {sa_family=AF_INET6, sin6_port=htons(1900), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "ff02::c", &sin6_addr), sin6_scope_id=0}, 28) = -1 EINVAL (Invalid argument) close(12) = 0 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-05-03 10:21:20 +02:00
Emanuel Valasiadis	f0021f9e1d	fwd: fix doc typo Signed-off-by: Emanuel Valasiadis <emanuel@valasiadis.space> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-03 03:42:51 +02:00
Janne Grunau	93394f4ef0	selinux: Add getattr to class udp_socket Commit `59cc89f` ("udp, udp_flow: Track our specific address on socket interfaces") added a getsockname() call in udp_flow_new(). This requires getattr. Fixes "Flow 0 (UDP flow): Unable to determine local address: Permission denied" errors in muvm/passt on Fedora Linux 42 with SELinux. The SELinux audit message is \| type=AVC msg=audit(1746083799.606:235): avc: denied { getattr } for \| pid=2961 comm="passt" laddr=127.0.0.1 lport=49221 \| faddr=127.0.0.53 fport=53 \| scontext=unconfined_u:unconfined_r:passt_t:s0-s0:c0.c1023 \| tcontext=unconfined_u:unconfined_r:passt_t:s0-s0:c0.c1023 \| tclass=udp_socket permissive=0 Fixes: `59cc89f4cc` ("udp, udp_flow: Track our specific address on socket interfaces") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2363238 Signed-off-by: Janne Grunau <janne-psst@jannau.net> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-02 12:00:51 +02:00
Laurent Vivier	11be695f5c	flow: fix podman issue #25959 While running piHole using podman, traffic can trigger the following assert: ASSSERTION FAILED in flow_alloc (flow.c:521): flow->f.state == FLOW_STATE_FREE Backtrace shows that this happens in flow_defer_handler(): #4 0x00005610d6f5b481 flow_alloc (passt + 0xb481) #5 0x00005610d6f74f86 udp_flow_from_sock (passt + 0x24f86) #6 0x00005610d6f737c3 udp_sock_fwd (passt + 0x237c3) #7 0x00005610d6f74c07 udp_flush_flow (passt + 0x24c07) #8 0x00005610d6f752c2 udp_flow_defer (passt + 0x252c2) #9 0x00005610d6f5bce1 flow_defer_handler (passt + 0xbce1) We are trying to allocate a new flow inside the loop freeing them. Inside the loop free_head points to the first free flow entry in the current cluster. But if we allocate a new entry during the loop, free_head is not updated and can point now to the entry we have just allocated. We can fix the problem by spliting the loop in two parts: - first part where we can close some of them and allocate some new flow entries, - second part where we free the entries closed in the previous loop and we aggregate the free entries to merge consecutive the clusters. Reported-by: Martin Rijntjes <bugs@air-global.nl> Link: https://github.com/containers/podman/issues/25959 Fixes: `9725e79888` ("udp_flow: Don't discard packets that arrive between bind() and connect()") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-05-02 11:58:25 +02:00
Stefano Brivio	6a96cd97a5	util: Fix typo, ASSSERTION -> ASSERTION Fixes: `9153aca15b` ("util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-05-02 11:58:10 +02:00
Stefano Brivio	ea0a1240df	passt-repair: Hide bogus gcc warning from -Og When building with gcc 13 and -Og, we get: passt-repair.c: In function ‘main’: passt-repair.c:161:23: warning: ‘ev’ may be used uninitialized [-Wmaybe-uninitialized] 161 \| if (ev->len > NAME_MAX + 1 \|\| ev->name[ev->len - 1] != '\0') { \| ~~^~~~~ but that can't actually happen, because we only exit the preceding while loop if 'found' is true, and that only happens, in turn, as we assign 'ev'. Get rid of the warning by (redundantly) initialising ev to NULL. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-30 16:58:58 +02:00
Alyssa Ross	aa1cc89228	conf: allow --fd 0 inetd-style socket passing traditionally starts a service with a connected socket on file descriptors 0 and 1. passt disallowing obtaining its socket from either of these descriptors made it difficult to use with super-servers providing this interface — in my case I wanted to use passt with s6-ipcserver[1]. Since (as far as I can tell) passt does not use standard input for anything else (unlike standard output), it should be safe to relax the restrictions on --fd to allow setting it to 0, enabling this use case. Link: https://skarnet.org/software/s6/s6-ipcserver.html [1] Signed-off-by: Alyssa Ross <hi@alyssa.is> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-28 14:01:17 +02:00
David Gibson	436afc3044	udp: Translate offender addresses for ICMP messages We've recently added support for propagating ICMP errors related to a UDP flow from the host to the guest, by handling the extended UDP error on the socket and synthesizing a suitable ICMP on the tap interface. Currently we create that ICMP with a source address of the "offender" from the extended error information - the source of the ICMP error received on the host. However, we don't translate this address for cases where we NAT between host and guest. This means (amongst other things) that we won't get a "Connection refused" error as expected if send data from the guest to the --map-host-loopback address. The error comes from 127.0.0.1 on the host, which doesn't make sense on the tap interface and will be discarded by the guest. Because ICMP errors can be sent by an intermediate host, not just by the endpoints of the flow, we can't handle this translation purely with the information in the flow table entry. We need to explicitly translate this address by our NAT rules, which we can do with the nat_inbound() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-22 12:42:05 +02:00
David Gibson	08e617ec2b	udp: Rework offender address handling in udp_sock_recverr() Make a number of changes to udp_sock_recverr() to improve the robustness of how we handle addresses. * Get the "offender" address (source of the ICMP packet) using the SO_EE_OFFENDER() macro, reducing assumptions about structure layout. * Parse the offender sockaddr using inany_from_sockaddr() * Check explicitly that the source and destination pifs are what we expect. Previously we checked something that was probably equivalent in practice, but isn't strictly speaking what we require for the rest of the code. * Verify that for an ICMPv4 error we also have an IPv4 source/offender and destination/endpoint address * Verify that for an ICMPv6 error we have an IPv6 endpoint * Improve debug reporting of any failures Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-22 12:42:03 +02:00
David Gibson	4668e91378	treewide: Improve robustness against sockaddrs of unexpected family inany_from_sockaddr() expects a socket address of family AF_INET or AF_INET6 and ASSERT()s if it gets anything else. In many of the callers we can handle an unexpected family more gracefully, though, e.g. by failing a single flow rather than killing passt. Change inany_from_sockaddr() to return an error instead of ASSERT()ing, and handle those errors in the callers. Improve the reporting of any such errors while we're at it. With this greater robustness, allow inany_from_sockaddr() to take a void * rather than specifically a union sockaddr_inany *. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-22 12:42:00 +02:00
David Gibson	9128f6e8f4	fwd: Split out helpers for port-independent NAT Currently the functions fwd_nat_from_*() make some address translations based on both the IP address and protocol port numbers, and others based only on the address. We have some upcoming cases where it's useful to use the IP-address-only translations separately, so split them out into helper functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-22 12:41:47 +02:00
David Gibson	2340bbf867	udp: Propagate errors on listening and brand new sockets udp_sock_recverr() processes errors on UDP sockets and attempts to propagate them as ICMP packets on the tap interface. To do this it currently requires the flow with which the error is associated as a parameter. If that's missing it will clear the error condition, but not propagate it. That means that we largely ignore errors on "listening" sockets. It also means we may discard some errors on flow specific sockets if they occur very shortly after the socket is created. In udp_flush_flow() we need to clear any datagrams received between bind() and connect() which might not be associated with the "final" flow for the socket. If we get errors before that point we'll ignore them in the same way because we don't know the flow they're associated with in advance. This can happen in practice if we have errors which occur almost immediately after connect(), such as ECONNREFUSED when we connect() to a local address where nothing is listening. Between the extended error message itself and the PKTINFO information we do actually have enough information to find the correct flow. So, rather than ignoring errors where we don't have a flow "hint", determine the flow the hard way in udp_sock_recverr(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Change warn() to debug() in udp_sock_recverr()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:56:16 +02:00
David Gibson	cfc0ee145a	udp: Minor re-organisation of udp_sock_recverr() Usually we work with the "exit early" flow style, where we return early on "error" conditions in functions. We don't currently do this in udp_sock_recverr() for the case where we don't have a flow to associate the error with. Reorganise to use the "exit early" style, which will make some subsequent changes less awkward. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:49:06 +02:00
David Gibson	f107a86cc0	udp: Add udp_pktinfo() helper Currently we open code parsing the control message for IP_PKTINFO in udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO in another place, so split this out into a helper function. While we're there, make the parsing a bit more robust: scan all cmsgs to look for the one we want, rather than assuming there's only one. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: udp_pktinfo(): Fix typo in comment and change err() to debug()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:48:35 +02:00
David Gibson	04984578b0	udp: Deal with errors as we go in udp_sock_fwd() When we get an epoll event on a listening socket, we first deal with any errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()). However, it's theoretically possible that new errors could get flagged on the socket after we call udp_sock_errs(), in which case we could get errors returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg(). In fact, we do deal with this correctly, although the path is somewhat non-obvious. The recvmsg() error will cause us to bail out of udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come back here next epoll loop and call udp_sock_errs(). Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from epoll events. This is to deal with any packets that arrived between bind() and connect(), and so might not be associated with the socket's intended flow. This expects udp_sock_fwd() to flush _all_ queued datagrams, so that anything received later must be for the correct flow. At the moment, udp_sock_errs() might fail to flush all datagrams if errors occur. In particular this can happen in practice for locally reported errors which occur immediately after connect() (e.g. connecting to a local port with nothing listening). We can deal with the problem case, and also make the flow a little more natural for the common case by having udp_sock_fwd() call udp_sock_errs() to handle errors as the occur, rather than trying to deal with all errors in advance. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:45:19 +02:00
David Gibson	3f995586b3	udp: Pass socket & flow information direction to error handling functions udp_sock_recverr() and udp_sock_errs() take an epoll reference from which they obtain both the socket fd to receive errors from, and - for flow specific sockets - the flow and side the socket is associated with. We have some upcoming cases where we want to clear errors when we're not directly associated with receiving an epoll event, so it's not natural to have an epoll reference. Therefore, make these functions take the socket and flow from explicit parameters. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:45:09 +02:00
David Gibson	1bb8145c22	udp: Be quieter about errors on UDP receive If we get an error on UDP receive, either in udp_peek_addr() or udp_sock_recv(), we'll print an error message. However, this could be a perfectly routine UDP error triggered by an ICMP, which need not go to the error log. This doesn't usually happen, because before receiving we typically clear the error queue from udp_sock_errs(). However, it's possible an error could be flagged after udp_sock_errs() but before we receive. So it's better to handle this error "silently" (trace level only). We'll bail out of the receive, return to the epoll loop, and get an EPOLLERR where we'll handle and report the error properly. In particular there's one situation that can trigger this case much more easily. If we start a new outbound UDP flow to a local destination with nothing listening, we'll get a more or less immediate connection refused error. So, we'll get that error on the very first receive after the connect(). That will occur in udp_flow_defer() -> udp_flush_flow() -> udp_sock_fwd() -> udp_peek_addr() -> recvmsg(). This path doesn't call udp_sock_errs() first, so isn't (imperfectly) protected the way we are most of the time. Fixes: `84ab1305fa` ("udp: Polish udp_vu_sock_info() and remove from vu specific code") Fixes: `69e5393c37` ("udp: Move some more of sock_handler tasks into sub-functions") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:43:56 +02:00
David Gibson	baf049f8e0	udp: Fix breakage of UDP error handling by PKTINFO support We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our UDP sockets. This lets us obtain and properly handle the specific local address used when we're "listening" with a socket on 0.0.0.0 or ::. However, the PKTINFO cmsgs this option generates appear on error queue messages as well as regular datagrams. udp_sock_recverr() doesn't expect this and so flags an unrecoverable error when it can't parse the control message. Correct this by adding space in udp_sock_recverr()s control buffer for the additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather than only looking at the first one. Link: https://bugs.passt.top/show_bug.cgi?id=99 Fixes: `f4b0dd8b06` ("udp: Use PKTINFO cmsgs to get destination address for received datagrams") Reported-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-15 19:43:00 +02:00
Stefano Brivio	50249086a9	conf: Honour --dns-forward for local resolver even with --no-map-gw If the first resolver listed in the host's /etc/resolv.conf is a loopback address, and --no-map-gw is given, we automatically conclude that the resolver is not reachable, discard it, and, if it's the only nameserver listed in /etc/resolv.conf, we'll warn that we: Couldn't get any nameserver address However, this isn't true in a general case: the user might have passed --dns-forward, and in that case, while we won't map the address of the default gateway to the host, we're still supposed to map that particular address. Otherwise, in this common Podman usage: pasta --config-net --dns-forward 169.254.1.1 -t none -u none -T none -U none --no-map-gw --netns /run/user/1000/netns/netns-c02a8d8f-6ee3-902e-33c5-317e0f24e0af --map-guest-addr 169.254.1.2 and with a loopback address in /etc/resolv.conf, we'll unexpectedly refuse to forward DNS queries: # nslookup passt.top 169.254.1.1 ;; connection timed out; no servers could be reached To fix this, make an exception for --dns-forward: if &c->ip4.dns_match or &c->ip6.dns_match are set in add_dns_resolv4() / add_dns_resolv6(), use that address as guest-facing resolver. We already set 'dns_host' to the address we found in /etc/resolv.conf, that's correct in this case and it makes us forward queries as expected. I'm not changing the man page as the current description of --dns-forward is already consistent with the new behaviour: there's no described way in which --no-map-gw should affect it. Reported-by: Andrew Sayers <andrew-bugs.passt.top@pileofstuff.org> Link: https://bugs.passt.top/show_bug.cgi?id=111 Suggested-by: Paul Holzinger <pholzing@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-04-15 19:42:59 +02:00
Stefano Brivio	bbff3653d6	conf: Split add_dns_resolv() into separate IPv4 and IPv6 versions Not really valuable by itself, but dropping one level of nested blocks makes the next change more convenient. No functional changes intended. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Paul Holzinger <pholzing@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-04-15 19:42:57 +02:00
David Gibson	59cc89f4cc	udp, udp_flow: Track our specific address on socket interfaces So far for UDP flows (like TCP connections) we didn't record our address (oaddr) in the flow table entry for socket based pifs. That's because we didn't have that information when a flow was initiated by a datagram coming to a "listening" socket with 0.0.0.0 or :: address. Even when we did have the information, we didn't record it, to simplify address matching on lookups. This meant that in some circumstances we could send replies on a UDP flow from a different address than the originating request came to, which is surprising and breaks certain setups. We now have code in udp_peek_addr() which does determine our address for incoming UDP datagrams. We can use that information to properly populate oaddr in the flow table for flow initiated from a socket. In order to be able to consistently match datagrams to flows, we must always have a specific oaddr, not an unspecified address (that's how the flow hash table works). So, we also need to fill in oaddr correctly for flows we initiate to sockets. Our forwarding logic doesn't specify oaddr here, letting the kernel decide based on the routing table. In this case we need to call getsockname() after connect()ing the socket to find which local address the kernel picked. This adds getsockname() to our seccomp profile for all variants. Link: https://bugs.passt.top/show_bug.cgi?id=99 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-10 19:46:16 +02:00
David Gibson	695c62396e	inany: Improve ASSERT message for bad socket family inany_from_sockaddr() can only handle sockaddrs of family AF_INET or AF_INET6 and asserts if given something else. I hit this assertion while debugging something else, and wanted to see what the bad sockaddr family was. Now that we have ASSERT_WITH_MSG() its easy to add this information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-10 19:46:13 +02:00
David Gibson	f4b0dd8b06	udp: Use PKTINFO cmsgs to get destination address for received datagrams Currently we get the source address for received datagrams from recvmsg(), but we don't get the local destination address. Sometimes we implicitly know this because the receiving socket is bound to a specific address, but when listening on 0.0.0.0 or ::, we don't. We need this information to properly direct replies to flows which come in to a non-default local address. So, enable the IP_PKTINFO and IPV6_PKTINFO control messages to obtain this information in udp_peek_addr(). For now we log a trace messages but don't do anything more with the information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-10 19:45:59 +02:00
David Gibson	6693fa1158	tcp_splice: Don't clobber errno before checking for EAGAIN Like many places, tcp_splice_sock_handler() needs to handle EAGAIN specially, in this case for both of its splice() calls. Unfortunately it tests for EAGAIN some time after those calls. In between there has been at least a flow_trace() which could have clobbered errno. Move the test on errno closer to the relevant system calls to avoid this problem. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-09 22:57:27 +02:00
David Gibson	d3f33f3b8e	tcp_splice: Don't double count bytes read on EINTR In tcp_splice_sock_handler(), if we get an EINTR on our second splice() (pipe to output socket) we - as we should - go back and retry it. However, we do so after we've already updated our byte counters. That does no harm for the conn->written[] counter - since the second splice() returned an error it will be advanced by 0. However we also advance the conn->read[] counter, and then do so again when the splice() succeeds. This results in the counters being out of sync, and us thinking we have remaining data in the pipe when we don't, which can leave us in an infinite loop once the stream finishes. Fix this by moving the EINTR handling to directly next to the splice() call (which is what we usually do for EINTR). As a bonus this removes one mildly confusing goto. For symmetry, also rework the EINTR handling on the first splice() the same way, although that doesn't (as far as I can tell) have buggy side effects. Link: https://github.com/containers/podman/issues/23686#issuecomment-2779347687 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-09 22:57:16 +02:00
Stefano Brivio	ffbef85e97	conf: Add missing return in conf_nat(), fix --map-guest-addr none As reported by somebody on IRC: $ pasta --map-guest-addr none Invalid address to remap to host: none that's because once we parsed "none", we try to parse it as an address as well. But we already handled it, so stop once we're done. Fixes: `e813a4df7d` ("conf: Allow address remapped to host to be configured") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-04-09 22:52:49 +02:00
Stefano Brivio	06ef64cdb7	udp_flow: Save 8 bytes in struct udp_flow on 64-bit architectures Shuffle the fields just added by commits `a7775e9550` ("udp: support traceroute in direction tap-socket") and `9725e79888` ("udp_flow: Don't discard packets that arrive between bind() and connect()"). On x86_64, as reported by pahole(1), before: struct udp_flow { struct flow_common f; /* 0 76 / / --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- / _Bool closed:1; / 76: 0 1 / / XXX 7 bits hole, try to pack / _Bool flush0; / 77 1 / _Bool flush1:1; / 78: 0 1 / / XXX 7 bits hole, try to pack / / XXX 1 byte hole, try to pack / time_t ts; / 80 8 / int s[2]; / 88 8 / uint8_t ttl[2]; / 96 2 / / size: 104, cachelines: 2, members: 7 / / sum members: 95, holes: 1, sum holes: 1 / / sum bitfield members: 2 bits, bit holes: 2, sum bit holes: 14 bits / / padding: 6 / / last cacheline: 40 bytes / }; and after: struct udp_flow { struct flow_common f; / 0 76 / / --- cacheline 1 boundary (64 bytes) was 12 bytes ago --- / uint8_t ttl[2]; / 76 2 / _Bool closed:1; / 78: 0 1 / _Bool flush0:1; / 78: 1 1 / _Bool flush1:1; / 78: 2 1 / / XXX 5 bits hole, try to pack / / XXX 1 byte hole, try to pack / time_t ts; / 80 8 / int s[2]; / 88 8 / / size: 96, cachelines: 2, members: 7 / / sum members: 94, holes: 1, sum holes: 1 / / sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits / / last cacheline: 32 bytes / }; It doesn't matter much because anyway the typical storage for struct udp_flow is given by union flow: union flow { struct flow_common f; / 0 76 / struct flow_free_cluster free; / 0 84 / struct tcp_tap_conn tcp; / 0 120 / struct tcp_splice_conn tcp_splice; / 0 120 / struct icmp_ping_flow ping; / 0 96 / struct udp_flow udp; / 0 96 */ }; but it still improves data locality somewhat, so let me fix this up now that commits are fresh. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-04-09 22:52:32 +02:00
David Gibson	9725e79888	udp_flow: Don't discard packets that arrive between bind() and connect() When we establish a new UDP flow we create connect()ed sockets that will only handle datagrams for this flow. However, there is a race between bind() and connect() where they might get some packets queued for a different flow. Currently we handle this by simply discarding any queued datagrams after the connect. UDP protocols should be able to handle such packet loss, but it's not ideal. We now have the tools we need to handle this better, by redirecting any datagrams received during that race to the appropriate flow. We need to use a deferred handler for this to avoid unexpectedly re-ordering datagrams in some edge cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Update comment to udp_flow_defer()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:44:31 +02:00
David Gibson	9eb5406260	udp: Fold udp_splice_prepare and udp_splice_send into udp_sock_to_sock udp_splice() prepare and udp_splice_send() are both quite simple functions that now have only one caller: udp_sock_to_sock(). Fold them both into that caller. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:44:31 +02:00
David Gibson	bd6a41ee76	udp: Rework udp_listen_sock_data() into udp_sock_fwd() udp_listen_sock_data() forwards datagrams from a "listening" socket until there are no more (for now). We have an upcoming use case where we want to do that for a socket that's not a "listening" socket, and uses a different epoll reference. So, adjust the function to take the pieces it needs from the reference as direct parameters and rename to udp_sock_fwd(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:43:53 +02:00
David Gibson	159beefa36	udp_flow: Take pif and port as explicit parameters to udp_flow_from_sock() Currently udp_flow_from_sock() is only used when receiving a datagram from a "listening" socket. It takes the listening socket's epoll reference to get the interface and port on which the datagram arrived. We have some upcoming cases where we want to use this in different contexts, so make it take the pif and port as direct parameters instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Drop @ref from comment to udp_flow_from_sock()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:43:53 +02:00
David Gibson	fd844a90bc	udp: Move UDP_MAX_FRAMES to udp.c Recent changes mean that this define is no longer used anywhere except in udp.c. Move it back into udp.c from udp_internal.h. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:43:53 +02:00
David Gibson	fc6ee68ad3	udp: Merge vhost-user and "buf" listening socket paths udp_buf_listen_sock_data() and udp_vu_listen_sock_data() now have effectively identical structure. The forwarding functions used for flow specific sockets (udp_buf_sock_to_tap(), udp_vu_sock_to_tap() and udp_sock_to_sock()) also now take a number of datagrams. This means we can re-use them for the listening socket path, just passing '1' so they handle a single datagram at a time. This allows us to merge both the vhost-user and flow specific paths into a single, simpler udp_listen_sock_data(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:43:52 +02:00
David Gibson	0304dd9c34	udp: Split spliced forwarding path from udp_buf_reply_sock_data() udp_buf_reply_sock_data() can handle forwarding data either from socket to socket ("splicing") or from socket to tap. It has a test on each datagram for which case we're in, but that will be the same for everything in the batch. Split out the spliced path into a separate udp_sock_to_sock() function. This leaves udp_{buf,vu}_reply_sock_data() handling only forwards from socket to tap, so rename and simplify them accordingly. This makes the code slightly longer for now, but will allow future cleanups to shrink it back down again. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Fix typos in comments to udp_sock_recv() and udp_vu_listen_sock_data()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:41:32 +02:00
David Gibson	5221e177e1	udp: Parameterize number of datagrams handled by udp_*_reply_sock_data() Both udp_buf_reply_sock_data() and udp_vu_reply_sock_data() internally decide what the maximum number of datagrams they will forward is. We have some upcoming reasons to allow the caller to decide that instead, so make the maximum number of datagrams a parameter for both of them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:31:54 +02:00
David Gibson	3a0881dfd0	udp: Don't bother to batch datagrams from "listening" socket A "listening" UDP socket can receive datagrams from multiple flows. So, we currently have some quite subtle and complex code in udp_buf_listen_sock_data() to group contiguously received packets for the same flow into batches for forwarding. However, since we are now always using flow specific connect()ed sockets once a flow is established, handling of datagrams on listening sockets is essentially a slow path. Given that, it's not worth the complexity. Substantially simplify the code by using an approach more like vhost-user, and "peeking" at the address of the next datagram, one at a time to determine the correct flow before we actually receive the data, This removes all meaningful use of the s_in and tosidx fields in udp_meta_t, so they too can be removed, along with setting of msg_name and msg_namelen in the msghdr arrays which referenced them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:30:17 +02:00
David Gibson	84ab1305fa	udp: Polish udp_vu_sock_info() and remove from vu specific code udp_vu_sock_info() uses MSG_PEEK to look ahead at the next datagram to be received and gets its source address. Currently we only use it in the vhost-user path, but there's nothing inherently vhost-user specific about it. We have upcoming uses for it elsewhere so rename and move to udp.c. While we're there, polish its error reporting a litle. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Drop excess newline before udp_sock_recv()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:29:23 +02:00
David Gibson	1d7bbb101a	udp: Make udp_sock_recv() take max number of frames as a parameter Currently udp_sock_recv() decides the maximum number of frames it is willing to receive based on the mode. However, we have upcoming use cases where we will have different criteria for how many frames we want with information that's not naturally available here but is in the caller. So make the maximum number of frames a parameter. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Fix typo in comment in udp_buf_reply_sock_data()] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-04-07 21:25:33 +02:00

1 2 3 4 5 ...

2036 commits