passt

mirror of https://passt.top/passt synced 2025-06-01 22:05:43 +02:00

Author	SHA1	Message	Date
David Gibson	71a16dbc49	tcp: Move tcp_l2_buf_fill_headers() to tcp_buf.c This function only has callers in tcp_buf.c. More importantly, it's inherently tied to the "buf" path, because it uses internal knowledge of how we lay out the various headers across our locally allocated buffers. Therefore, move it to tcp_buf.c. Slightly reformat the prototypes while we're at it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-11-15 10:55:53 +01:00
David Gibson	3958736de5	tcp_vu: Share more header construction between IPv4 and IPv6 paths tcp_vu_send_flag() and tcp_vu_prepare() both needs to do some different things for IPv4 vs. IPv6. However the two paths have a number of lines of duplicated code. We can share those at the expense of an additional conditional (which we might be able to simplify again later). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-11-15 10:55:53 +01:00
Stefano Brivio	9392ea7e5a	test: Add tests for passt in vhost-user mode Run functional and performance tests for vhost-user mode as well. For functional tests, we add passt_vu and passt_vu_in_ns as symbolic links to their non-vhost-user counterparts, as no differences are intended but we want to distinguish them in test logs. For performance tests, instead, we add separate perf/passt_vu_tcp and perf/passt_vu_udp files, as we need longer test duration, as well as higher UDP sending bandwidths and larger TCP windows, to actually get the highest throughput vhost-user mode offers. For valgrind tests, vhost-user mode needs two extra system calls: statx and readlink. Add them as EXTRA_SYSCALLS for the valgrind target. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-15 10:55:53 +01:00
Laurent Vivier	92fe7e967a	vhost-user: add vhost-user add virtio and vhost-user functions to connect with QEMU. $ ./passt --vhost-user and # qemu-system-x86_64 ... -m 4G \ -object memory-backend-memfd,id=memfd0,share=on,size=4G \ -numa node,memdev=memfd0 \ -chardev socket,id=chr0,path=/tmp/passt_1.socket \ -netdev vhost-user,id=netdev0,chardev=chr0 \ -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \ ... Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-11-15 10:55:53 +01:00
Laurent Vivier	007af94bb9	passt: rename tap_sock_init() to tap_backend_init() Extract pool storage initialization loop to tap_sock_update_pool(), extract QEMU hints to tap_backend_show_hints(). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-15 10:54:01 +01:00
Laurent Vivier	1ceee36c57	tcp: Export headers functions Export tcp_fill_headers[4\|6]() and tcp_update_check_tcp[4\|6](). They'll be needed by vhost-user. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-15 10:53:40 +01:00
Laurent Vivier	7f6b184fb8	udp: Prepare udp.c to be shared with vhost-user Export udp_payload_t, udp_update_hdr4(), udp_update_hdr6() and udp_sock_errs(). Rename udp_listen_sock_handler() to udp_buf_listen_sock_handler() and udp_reply_sock_handler to udp_buf_reply_sock_handler(). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-15 10:53:40 +01:00
Laurent Vivier	23cc8f892f	vhost-user: introduce vhost-user API Add vhost_user.c and vhost_user.h that define the functions needed to implement vhost-user backend. Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-11-15 10:53:40 +01:00
Laurent Vivier	119b45358c	vhost-user: introduce virtio API Add virtio.c and virtio.h that define the functions needed to manage virtqueues. Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2024-11-15 10:53:40 +01:00
Laurent Vivier	8ac20f4795	packet: replace struct desc by struct iovec To be able to manage buffers inside a shared memory provided by a VM via a vhost-user interface, we cannot rely on the fact that buffers are located in a pre-defined memory area and use a base address and a 32bit offset to address them. We need a 64bit address, so replace struct desc by struct iovec and update range checking. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-15 10:53:40 +01:00
Stefano Brivio	7f6c10626d	selinux: Use auth_read_passwd() interface for all our getpwnam() needs If passt or pasta are started as root, we need to read the passwd file (be it /etc/passwd or whatever sssd provides) to find out UID and GID of 'nobody' so that we can switch to it. Instead of a bunch of allow rules for passwd_file_t and sssd macros, use the more convenient auth_read_passwd() interface which should cover our usage of getpwnam(). The existing rules weren't actually enough: # strace -e openat passt -f [...] Started as root, will change to nobody. openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY\|O_CLOEXEC) = 4 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY\|O_CLOEXEC) = 4 openat(AT_FDCWD, "/lib64/libnss_sss.so.2", O_RDONLY\|O_CLOEXEC) = 4 openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY\|O_CLOEXEC) = -1 EACCES (Permission denied) openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY\|O_CLOEXEC) = -1 EACCES (Permission denied) openat(AT_FDCWD, "/etc/passwd", O_RDONLY\|O_CLOEXEC) = 4 with corresponding SELinux warnings logged in audit.log. Reported-by: Minxi Hou <mhou@redhat.com> Analysed-by: Miloš Malik <mmalik@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 23:41:52 +01:00
David Gibson	6e1e44293e	ndp: Send unsolicited Router Advertisements Currently, our NDP implementation only sends Router Advertisements (RA) when it receives a Router Solicitation (RS) from the guest. However, RFC 4861 requires that we periodically send unsolicited RAs. Linux as a guest also requires this: it will send an RS when a link first comes up, but the route it gets from this will have a finite lifetime (we set this to 65535s, the maximum allowed, around 18 hours). When that expires the guest will not send a new RS, but instead expects the route to have been renewed (if still valid) by an unsolicited RA. Implement sending unsolicited RAs on a partially randomised timer, as required by RFC 4861. The RFC also specifies that solicited RAs should also be delayed, or even omitted, if the next unsolicited RA is soon enough. For now we don't do that, always sending an immediate RA in response to an RS. We can get away with this because in our use cases we expect to just have passt itself and the guest on the link, rather than a large broadcast domain. Link: https://github.com/kubevirt/kubevirt/issues/13191 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:40 +01:00
David Gibson	b39760cc7d	passt: Seed libc's pseudo random number generator We have an upcoming case where we need pseudo-random numbers to scatter timings, but we don't need cryptographically strong random numbers. libc's built in random() is fine for this purpose, but we should seed it. Extend secret_init() - the only current user of random numbers - to do this as well as generating the SipHash secret. Using /dev/random for a PRNG seed is probably overkill, but it's simple and we only do it once, so we might as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:38 +01:00
David Gibson	71d5deed5e	util: Add general low-level random bytes helper Currently secret_init() open codes getting good quality random bytes from the OS, either via getrandom(2) or reading /dev/random. We're going to add at least one more place that needs random data in future, so make a general helper for getting random bytes. While we're there, fix a number of minor bugs: - getrandom() can theoretically return a "short read", so handle that case - getrandom() as well as read can return a transient EINTR - We would attempt to read data from /dev/random if we failed to open it (open() returns -1), but not if we opened it as fd 0 (unlikely, but ok) - More specific error reporting Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:36 +01:00
David Gibson	a60703e899	ndp: Make route lifetime a #define Currently we open-code the lifetime of the route we advertise via NDP to be 65535s (the maximum). Change it to a #define. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:34 +01:00
David Gibson	36c070e6e3	ndp: Use struct assignment in preference to memcpy() for IPv6 addresses There are a number of places we can simply assign IPv6 addresses about, rather than the current mildly ugly memcpy(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:31 +01:00
David Gibson	cbc83e14df	ndp: Split out helpers for sending specific NDP message types Currently the large ndp() function responds to all NDP messages we handle, both parsing the message as necessary and sending the response. Split out the code to construct and send specific message types into ndp_na() (to send NA messages) and ndp_ra() (to send RA messages). As well as breaking up an excessively large function, this is a first step to being able to send unsolicited NDP messages. While we're there, remove a slighty ugly goto. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:29 +01:00
David Gibson	4e47167035	ndp: Add ndp_send() helper ndp() has a conditional on message type generating the reply message, then a tiny amount of common code, then another conditional to send the reply with slightly different parameters. We can make this a bit neater by making a helper function for sending the reply, and call it from each of the different message type paths. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:28 +01:00
David Gibson	71f228d04b	ndp: Remove redundant update to addr_seen ndp() updates addr_seen or addr_ll_seen based on the source address of the received packet. This is redundant since tap6_handler() has already updated addr_seen for any type of packet, not just NDP. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-14 19:00:13 +01:00
David Gibson	0588163b1f	cppcheck: Don't check the system headers We pass -I options to cppcheck so that it will find the system headers. Then we need to pass a bunch more options to suppress the zillions of cppcheck errors found in those headers. It turns out, however, that it's not recommended to give the system headers to cppcheck anyway. Instead it has built-in knowledge of the ANSI libc and uses that as the basis of its checks. We do need to suppress missingIncludeSystem warnings instead though. Not bothering with the system headers makes the cppcheck runtime go from ~37s to ~14s on my machine, which is a pretty nice win. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-08 08:26:21 +01:00
David Gibson	14dd70e2b3	linux_dep: Fix CLOSE_RANGE_UNSHARE availability handling If CLOSE_RANGE_UNSHARE isn't defined, we define a fallback version of close_range() which is a (successful) no-op. This is broken in several ways: * It doesn't actually fix compile if using old kernel headers, because the caller of close_range() still directly uses CLOSE_RANGE_UNSHARE unprotected by ifdefs * Even if it did fix the compile, it means inconsistent behaviour between a compile time failure to find the value (we silently don't close files) and a runtime failure (we die with an error from close_range()) * Silently not closing the files we intend to close for security reasons is probably not a good idea in any case We don't want to simply error if close_range() or CLOSE_RANGE_UNSHARE isn't available, because that would require running on kernel >= 5.9. On the other hand there's not really any other way to flush all possible fds leaked by the parent (close() in a loop takes over a minute). So in this case print a warning and carry on. As bonus this fixes a cppcheck error I see with some different options I'm looking to apply in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-08 08:26:17 +01:00
David Gibson	d64f257243	linux_dep: Move close_range() conditional handling to linux_dep.h util.h has some #ifdefs and weak definitions to handle compatibility with various kernel versions. Move this to linux_dep.h which handles several other similar cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-08 08:26:15 +01:00
David Gibson	b84cd05098	log: Only check for FALLOC_FL_COLLAPSE_RANGE availability at runtime log.c has several #ifdefs on FALLOC_FL_COLLAPSE_RANGE that won't attempt to use it if not defined. But even if the value is defined at compile time, it might not be available in the runtime kernel, so we need to check for errors from a fallocate() call and fall back to other methods. Simplify this to only need the runtime check by using linux_dep.h to define FALLOC_FL_COLLAPSE_RANGE if it's not in the kernel headers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-08 08:25:58 +01:00
Stefano Brivio	58fa5508bd	tap, tcp, util: Add some missing SOCK_CLOEXEC flags I have no idea why, but these are reported by clang-tidy (19.2.1) on Alpine (x86) only: /home/sbrivio/passt/tap.c:1139:38: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors] 1139 \| int fd = socket(AF_UNIX, SOCK_STREAM, 0); \| ^ \| \| SOCK_CLOEXEC /home/sbrivio/passt/tap.c:1158:51: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors] 1158 \| ex = socket(AF_UNIX, SOCK_STREAM \| SOCK_NONBLOCK, 0); \| ^ \| \| SOCK_CLOEXEC /home/sbrivio/passt/tcp.c:1413:44: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors] 1413 \| s = socket(af, SOCK_STREAM \| SOCK_NONBLOCK, IPPROTO_TCP); \| ^ \| \| SOCK_CLOEXEC /home/sbrivio/passt/util.c:188:38: error: 'socket' should use SOCK_CLOEXEC where possible [android-cloexec-socket,-warnings-as-errors] 188 \| if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) { \| ^ \| \| SOCK_CLOEXEC Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-08 08:24:58 +01:00
Stefano Brivio	71869e2912	passt: Use NOLINT clang-tidy block instead of NOLINTNEXTLINE For some reason, this is only reported by clang-tidy 19.1.2 on Alpine: /home/sbrivio/passt/passt.c:314:53: error: conditional operator with identical true and false expressions [bugprone-branch-clone,-warnings-as-errors] 314 \| nfds = epoll_wait(c.epollfd, events, EPOLL_EVENTS, TIMER_INTERVAL); \| ^ We do have a suppression, but not on the line preceding it, because we also need a cppcheck suppression there. Use NOLINTBEGIN/NOLINTEND for the clang-tidy suppression. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-08 08:24:52 +01:00
Stefano Brivio	d4f09c9b96	util: Define small and big thresholds for socket buffers as unsigned long long On 32-bit architectures, clang-tidy reports: /home/pi/passt/tcp.c:728:11: error: performing an implicit widening conversion to type 'uint64_t' (aka 'unsigned long long') of a multiplication performed in type 'unsigned long' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors] 728 \| if (v >= SNDBUF_BIG) \| ^ /home/pi/passt/util.h:158:22: note: expanded from macro 'SNDBUF_BIG' 158 \| #define SNDBUF_BIG (4UL * 1024 * 1024) \| ^ /home/pi/passt/tcp.c:728:11: note: make conversion explicit to silence this warning 728 \| if (v >= SNDBUF_BIG) \| ^ /home/pi/passt/util.h:158:22: note: expanded from macro 'SNDBUF_BIG' 158 \| #define SNDBUF_BIG (4UL * 1024 * 1024) \| ^~~~~~~~~~~~~~~~~ /home/pi/passt/tcp.c:728:11: note: perform multiplication in a wider type 728 \| if (v >= SNDBUF_BIG) \| ^ /home/pi/passt/util.h:158:22: note: expanded from macro 'SNDBUF_BIG' 158 \| #define SNDBUF_BIG (4UL * 1024 * 1024) \| ^~~~~~~~~~ /home/pi/passt/tcp.c:730:15: error: performing an implicit widening conversion to type 'uint64_t' (aka 'unsigned long long') of a multiplication performed in type 'unsigned long' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors] 730 \| else if (v > SNDBUF_SMALL) \| ^ /home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL' 159 \| #define SNDBUF_SMALL (128UL * 1024) \| ^ /home/pi/passt/tcp.c:730:15: note: make conversion explicit to silence this warning 730 \| else if (v > SNDBUF_SMALL) \| ^ /home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL' 159 \| #define SNDBUF_SMALL (128UL * 1024) \| ^~~~~~~~~~~~ /home/pi/passt/tcp.c:730:15: note: perform multiplication in a wider type 730 \| else if (v > SNDBUF_SMALL) \| ^ /home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL' 159 \| #define SNDBUF_SMALL (128UL * 1024) \| ^~~~~ /home/pi/passt/tcp.c:731:17: error: performing an implicit widening conversion to type 'uint64_t' (aka 'unsigned long long') of a multiplication performed in type 'unsigned long' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors] 731 \| v -= v * (v - SNDBUF_SMALL) / (SNDBUF_BIG - SNDBUF_SMALL) / 2; \| ^ /home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL' 159 \| #define SNDBUF_SMALL (128UL * 1024) \| ^ /home/pi/passt/tcp.c:731:17: note: make conversion explicit to silence this warning 731 \| v -= v * (v - SNDBUF_SMALL) / (SNDBUF_BIG - SNDBUF_SMALL) / 2; \| ^ /home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL' 159 \| #define SNDBUF_SMALL (128UL * 1024) \| ^~~~~~~~~~~~ /home/pi/passt/tcp.c:731:17: note: perform multiplication in a wider type 731 \| v -= v * (v - SNDBUF_SMALL) / (SNDBUF_BIG - SNDBUF_SMALL) / 2; \| ^ /home/pi/passt/util.h:159:24: note: expanded from macro 'SNDBUF_SMALL' 159 \| #define SNDBUF_SMALL (128UL * 1024) \| ^~~~~ because, wherever we use those thresholds, we define the other term of comparison as uint64_t. Define the thresholds as unsigned long long as well, to make sure we match types. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-08 08:24:49 +01:00
Stefano Brivio	87940f9aa7	tap: Cast TAP_BUF_BYTES - ETH_MAX_MTU to ssize_t, not TAP_BUF_BYTES Given that we're comparing against 'n', which is signed, we cast TAP_BUF_BYTES to ssize_t so that the maximum buffer usage, calculated as the difference between TAP_BUF_BYTES and ETH_MAX_MTU, will also be signed. This doesn't necessarily happen on 32-bit architectures, though. On armhf and i686, clang-tidy 18.1.8 and 19.1.2 report: /home/pi/passt/tap.c:1087:16: error: comparison of integers of different signs: 'ssize_t' (aka 'int') and 'unsigned int' [clang-diagnostic-sign-compare,-warnings-as-errors] 1087 \| for (n = 0; n <= (ssize_t)TAP_BUF_BYTES - ETH_MAX_MTU; n += len) { \| ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cast the whole difference to ssize_t, as we know it's going to be positive anyway, instead of relying on that side effect. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-08 08:24:45 +01:00
Stefano Brivio	1feb90fe62	dhcpv6: Turn some option headers pointers to const cppcheck 2.14.2 on Alpine reports: dhcpv6.c:431:32: style: Variable 'client_id' can be declared as pointer to const [constVariablePointer] struct opt_hdr ia, bad_ia, *client_id; ^ It's not only 'client_id': we can declare 'ia' as const pointer too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-08 08:24:41 +01:00
Stefano Brivio	5f5e814cfc	dhcpv6: Use for loop instead of goto to avoid false positive cppcheck warning cppcheck 2.16.0 reports: dhcpv6.c:334:14: style: The comparison 'ia_type == 3' is always true. [knownConditionTrueFalse] if (ia_type == OPT_IA_NA) { ^ dhcpv6.c:306:12: note: 'ia_type' is assigned value '3' here. ia_type = OPT_IA_NA; ^ dhcpv6.c:334:14: note: The comparison 'ia_type == 3' is always true. if (ia_type == OPT_IA_NA) { ^ this is not really the case as we set ia_type to OPT_IA_TA and then jump back. Anyway, there's no particular reason to use a goto here: add a trivial foreach() macro to go through elements of an array and use it instead. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2024-11-08 08:24:11 +01:00
Jon Maloy	78da088f7b	tcp: unify payload and flags l2 frames array In order to reduce static memory and code footprint, we merge the array for l2 flag frames into the one for payload frames. This change also ensures that no flag message will be sent out over the l2 media bypassing already queued payload messages. Performance measurements with iperf3, where we force all traffic via the tap queue, show no significant difference: Dual traffic both directions sinmultaneously, with patch: ======================================================== host->ns: -------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 36.3 GBytes 3.12 Gbits/sec 4759 sender [ 5] 0.00-100.04 sec 36.3 GBytes 3.11 Gbits/sec receiver ns->host: --------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 321 GBytes 27.6 Gbits/sec receiver Dual traffic both directions sinmultaneously, without patch: ============================================================ host->ns: -------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 35.0 GBytes 3.01 Gbits/sec 6001 sender [ 5] 0.00-100.04 sec 34.8 GBytes 2.99 Gbits/sec receiver ns->host -------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 345 GBytes 29.6 Gbits/sec receiver Single connection, with patch: ============================== host->ns: --------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 138 GBytes 11.8 Gbits/sec 922 sender [ 5] 0.00-100.04 sec 138 GBytes 11.8 Gbits/sec receiver ns->host: ----------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 430 GBytes 36.9 Gbits/sec receiver Single connection, without patch: ================================= host->ns: ------------ [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 139 GBytes 11.9 Gbits/sec 900 sender [ 5] 0.00-100.04 sec 139 GBytes 11.9 Gbits/sec receiver ns->host: --------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 440 GBytes 37.8 Gbits/sec receiver Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:41 +01:00
David Gibson	9a0e544f05	test: Improve test for NDP assigned prefix In the NDP tests we search explicitly for a guest address with prefix length 64. AFAICT this is an attempt to specifically find the SLAAC assigned address, rather than something assigned by other means. We can do that more explicitly by checking for .protocol == "kernel_ra". however. The SLAAC prefixes we assigned will always be 64-bit, that's hard-coded into our NDP implementation. RFC4862 doesn't really allow anything else since the interface identifiers for an Ethernet-like link are 64-bits. Let's actually verify that, rather than just assuming it, by extracting the prefix length assigned in the guest and checking it as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:37 +01:00
David Gibson	910f4f9103	test: Don't require 64-bit prefixes in perf tests When determining the namespace's IPv6 address in the perf test setup, we explicitly filter for addresses with a 64-bit prefix length. There's no real reason we need that - as long as it's a global address we can use it. I suspect this was copied without thinking from a similar example in the NDP tests, where the 64-bit prefix length _is_ meaningful (though it's not entirely clear if the handling is correct there either). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:34 +01:00
David Gibson	1699083f29	test: Make nstool hold robust against interruptions to control clients Currently nstool die()s on essentially any error. In most cases that's fine for our purposes. However, it's a problem when in "hold" mode and getting an IO error on an accept()ed socket. This could just indicate that the control client aborted prematurely, in which case we don't want to kill of the namespace we're holding. Adjust these to print an error, close() the control client socket and carry on. In addition, we need to explicitly ignore SIGPIPE in order not to be killed by an abruptly closed client connection. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:30 +01:00
David Gibson	b456ee1b53	test: Rename propagating signal handler nstool in "exec" mode will propagate some signals (specifically SIGTERM) to the process in the namespace it executes. The signal handler which accomplishes this is called simply sig_handler(). However, it turns out we're going to need some other signal handlers, so rename this to the more specific sig_propagate(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:27 +01:00
David Gibson	867db07fcf	util: Work around cppcheck bug 6936 While experimenting with cppcheck options, I hit several false positives caused by this bug: https://trac.cppcheck.net/ticket/13227 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:24 +01:00
David Gibson	6f913b3af0	udp: Don't dereference uflow before NULL check in udp_reply_sock_handler() We have an ASSERT() verifying that we're able to look up the flow in udp_reply_sock_handler(). However, we dereference uflow before that in an initializer, rather defeating the point. Rearrange to avoid that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:22 +01:00
David Gibson	d8e05a3fe0	ndp: Use const pointer for ndp_ns packet We don't modify this structure at all. For some reason cppcheck doesn't catch this with our current options, but did when I was experimenting with some different options. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:19 +01:00
David Gibson	0d7b8201ed	linux_dep: Generalise tcp_info.h to handling Linux extension compatibility tcp_info.h exists just to contain a modern enough version of struct tcp_info for our needs, removing compile time dependency on the version of kernel headers. There are several other cases where we can remove similar compile time dependencies on kernel version. Prepare for that by renaming tcp_info.h to linux_dep.h. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:16 +01:00
David Gibson	c5f4e4d146	fwd: Squash different-signedness comparison warning On certain architectures we get a warning about comparison between different signedness integers in fwd_probe_ephemeral(). This is because NUM_PORTS evaluates to an unsigned integer. It's a fixed value, though and we know it will fit in a signed long on anything reasonable, so add a cast to suppress the warning. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:14 +01:00
David Gibson	1e76a19895	util: Remove unused ffsl() function We supply a weak alias for ffsl() in case it's not defined in our libc. Except.. we don't have any users for it any more, so remove it. make cppcheck doesn't spot this at present for complicated reasons, but it might with tweaks to the options I'm experimenting with. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:11 +01:00
David Gibson	1d7cff3779	clang: Add rudimentary clangd configuration clangd's default configuration seems to try to treat .h files as C++ not C. There are many more spurious warnings generated at present, but this removes some of the most egregious ones. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:07 +01:00
David Gibson	c560e2f65b	Makefile: Don't attempt to auto-detect stack size We probe the available stack limit in the Makefile using rlimit, then use that to set the size of the stack when we clone() extra threads. But the rlimit at compile time need not be the same as the rlimit at runtime, so that's not particularly sensible. Ideally, we'd set the stack size based on an estimate of the actual maximum stack usage of all our clone()ed functions. We don't have that at the moment, but to keep things simple just set it to 1MiB - that's what the current probe will set things to on my default configuration Fedora 40, so it's likely to be fine in most cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:03 +01:00
David Gibson	13fc6d511e	Makefile: Use -DARCH for qrap only We insert -DARCH for all compiles, based on TARGET_ARCH determined in the Makefile. However, this is only used in qrap.c, not anywhere else in passt or pasta. Only supply this -D when compiling qrap specifically. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:59 +01:00
David Gibson	7917159005	seccomp: Simplify handling of AUDIT_ARCH Currently we construct the AUDIT_ARCH variable in the Makefile, then pass it into the C code with -D. The only place that uses it, though is the BPF filter generated by seccomp.sh. seccomp.sh already needs to do things differently depending on the arch, so it might as well just insert the expanded AUDIT_ARCH directly into the generated code, rather than using a #define. Arguably this is better, even, since it ensures more locally that the arch the BPF checks for matches the arch seccomp.sh built the filter for. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:55 +01:00
David Gibson	93bce404c1	Makefile: Move NETNS_RUN_DIR definition to C code NETNS_RUN_DIR is set in the Makefile, then passed into the C code with -D. But NETNS_RUN_DIR is just a fixed string, it doesn't depend on any make probes or variables, so there's really no reason to handle it via the Makefile. Just move it to a plain #define in conf.c. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:52 +01:00
David Gibson	c938d8a93e	netlink: RTA_PAYLOAD() returns int, not size_t Since it's the size of a chunk of memory it would seem logical that RTA_PAYLOAD() returns size_t. However, it doesn't - it explicitly casts its result to an int. RTNH_OK(), which often takes the result of RTA_PAYLOAD() as a parameter compares it to an int, so using size_t can result in comparison of different-signed integer warnings from clang. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:48 +01:00
David Gibson	f6b546c6e4	flow: Correct type of flowside_at_sidx() Due to a copy-pasta error, this returns 'PIF_NONE' instead of NULL on the failure case. PIF_NONE expands to 0, which turns into NULL, but it's still confusing, so fix it. This removes a clang warning. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:44 +01:00
David Gibson	30b4f88167	arch: Avoid explicit access to 'environ' We pass 'environ' to execve() in arch_avc2_exec(), so that we retain the environment in the current process. But the declaration of 'environ' is a bit weird - it doesn't seem to be in a standard header, requiring a manual explicit declaration. But, we can avoid needing to reference it explicitly by using execv() instead of execve(). This removes a clang warning. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:29 +01:00
David Gibson	b78e72da0b	clang: Move clang-tidy configuration from Makefile to .clang-tidy Currently we configure clang-tidy with a very long command line spelled out in the Makefile (mostly a big list of lints to disable). Move it from here into a .clang-tidy configuration file, so that the config is accessible if clang-tidy is invoked in other ways (e.g. via clangd) as well. As a bonus this also means that we can move the bulky comments about why we're suppressing various tests inline with the relevant config lines. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:19 +01:00
David Gibson	8346216c9a	Makefile: Simplify exclusion of qrap from static checks There are things in qrap.c that clang-tidy complains about that aren't worth fixing. So, we currently exclude it using $(filter-out). However, we already have a make variable which has just the passt sources, excluding qrap, so we can use that instead of the awkward filter-out expression. Currently, we still include qrap.c for cppcheck, but there's not much point doing so: it's, well, qrap, so we don't care that much about lints. Exclude it from cppcheck as well, for consistency. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:46:07 +01:00

1 2 3 4 5 ...

1783 commits