passt

mirror of https://passt.top/passt synced 2025-06-26 08:05:33 +02:00

Author	SHA1	Message	Date
Julian Wundrak	664c588be7	build: normalize arm targets Linux distributions use different dumpmachine outputs for the ARM architecture. arm, armv6l, armv7l. For the syscall annotation, these variants are standardized to “arm”. Link: https://bugs.passt.top/show_bug.cgi?id=117 Signed-off-by: Julian Wundrak <julian@wundrak.net> [sbrivio: Fix typo: assign from TARGET_ARCH, not from TARGET] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:43:22 +01:00
David Gibson	77883fbdd1	udp: Add helper function for creating connected UDP socket Currently udp_flow_new() open codes creating and connecting a socket to use for reply messages. We have in mind some more places to use this logic, plus it just makes for a rather large function. Split this handling out into a new udp_flow_sock() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:34 +01:00
David Gibson	37d78c9ef3	udp: Always hash socket facing flowsides For UDP packets from the tap interface (like TCP) we use a hash table to look up which flow they belong to. Unlike TCP, we sometimes also create a hash table entry for the socket side of UDP flows. We need that when we receive a UDP packet from a "listening" socket which isn't specific to a single flow. At present we only do this for the initiating side of flows, which re-use the listening socket. For the target side we use a connected "reply" socket specific to the single flow. We have in mind changes that maye introduce some edge cases were we could receive UDP packets on a non flow specific socket more often. To allow for those changes - and slightly simplifying things in the meantime - always put both sides of a UDP flow - tap or socket - in the hash table. It's not that costly, and means we always have the option of falling back to a hash lookup. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:32 +01:00
David Gibson	f67c488b81	udp: Better handling of failure to forward from reply socket In udp_reply_sock_handler() if we're unable to forward the datagrams we just print an error. Generally this means we have an unsupported pair of pifs in the flow table, though, and that hasn't change. So, next time we get a matching packet we'll just get the same failure. In vhost-user mode we don't even dequeue the incoming packets which triggered this so we're likely to get the same failure immediately. Instead, close the flow, in the same we we do for an unrecoverable error. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:30 +01:00
David Gibson	269cf6a12a	udp: Share more logic between vu and non-vu reply socket paths Share some additional miscellaneous logic between the vhost-user and "buf" paths for data on udp reply sockets. The biggest piece is error handling of cases where we can't forward between the two pifs of the flow. We also make common some more simple logic locating the correct flow and its parameters. This adds some lines of code due to extra comment lines, but nonetheless reduces logic duplication. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:28 +01:00
David Gibson	d924b7dfc4	udp_vu: Factor things out of udp_vu_reply_sock_data() loop At the start of every cycle of the loop in udp_vu_reply_sock_data() we: - ASSERT that uflow is not NULL - Check if the target pif is PIF_TAP - Initialize the v6 boolean However, all of these depend only on the flow, which doesn't change across the loop. This is probably a duplication from udp_vu_listen_sock_data(), where the flow can be different for each packet. For the reply socket case, however, factor that logic out of the loop. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:26 +01:00
David Gibson	5a977c2f4e	udp: Simplify checking of epoll event bits udp_{listen,reply}_sock_handler() can accept both EPOLLERR and EPOLLIN events. However, unlike most epoll event handlers we don't check the event bits right there. EPOLLERR is checked within udp_sock_errs() which we call unconditionally. Checking EPOLLIN is still more buried: it is checked within both udp_sock_recv() and udp_vu_sock_recv(). We can simplify the logic and pass less extraneous parameters around by moving the checking of the event bits to the top level event handlers. This makes udp_{buf,vu}_{listen,reply}_sock_handler() no longer general event handlers, but specific to EPOLLIN events, meaning new data. So, rename those functions to udp_{buf,vu}_{listen,reply}_sock_data() to better reflect their function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:23 +01:00
David Gibson	89b203b851	udp: Common invocation of udp_sock_errs() for vhost-user and "buf" paths The vhost-user and non-vhost-user paths for both udp_listen_sock_handler() and udp_reply_sock_handler() are more or less completely separate. Both, however, start with essentially the same invocation of udp_sock_errs(), so that can be made common. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-26 21:34:11 +01:00
David Gibson	cf4d3f05c9	packet: Upgrade severity of most packet errors All errors from packet_range_check(), packet_add() and packet_get() are trace level. However, these are for the most part actual error conditions. They're states that should not happen, in many cases indicating a bug in the caller or elswhere. We don't promote these to err() or ASSERT() level, for fear of a localised bug on very specific input crashing the entire program, or flooding the logs, but we can at least upgrade them to debug level. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:30 +01:00
David Gibson	0857515c94	packet: ASSERT on signs of pool corruption If packet_check_range() fails in packet_get_try_do() we just return NULL. But this check only takes places after we've already validated the given range against the packet it's in. That means that if packet_check_range() fails, the packet pool is already in a corrupted state (we should have made strictly stronger checks when the packet was added). Simply returning NULL and logging a trace() level message isn't really adequate for that situation; ASSERT instead. Similarly we check the given idx against both p->count and p->size. The latter should be redundant, because count should always be <= size. If that's not the case then, again, the pool is already in a corrupted state and we may have overwritten unknown memory. Assert for this case too. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:27 +01:00
David Gibson	9153aca15b	util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers We already have the ASSERT() macro which will abort() passt based on a condition. It always has a fixed error message based on its location and the asserted expression. We have some upcoming cases where we want to customise the message when hitting an assert. Add abort_with_msg() and ASSERT_WITH_MSG() helpers to allow this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:24 +01:00
David Gibson	38bcce9977	packet: Rework packet_get() versus packet_get_try() Most failures of packet_get() indicate a serious problem, and log messages accordingly. However, a few callers expect failures here, because they're probing for a certain range which might or might not be in a packet. They use packet_get_try() which passes a NULL func to packet_get_do() to suppress the logging which is unwanted in this case. However, this doesn't just suppress the log when packet_get_do() finds the requested region isn't in the packet. It suppresses logging for all other errors too, which do indicate serious problems, even for the callers of packet_get_try(). Worse it will pass the NULL func on to packet_check_range() which doesn't expect it, meaning we'll get unhelpful messages from there if there is a failure. Fix this by making packet_get_try_do() the primary function which doesn't log for the case of a range outside the packet. packet_get_do() becomes a trivial wrapper around that which logs a message if packet_get_try_do() returns NULL. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:22 +01:00
David Gibson	961aa6a0eb	packet: Move checks against PACKET_MAX_LEN to packet_check_range() Both the callers of packet_check_range() separately verify that the given length does not exceed PACKET_MAX_LEN. Fold that check into packet_check_range() instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:20 +01:00
David Gibson	37d9f374d9	packet: Avoid integer overflows in packet_get_do() In packet_get_do() both offset and len are essentially untrusted. We do some validation of len (check it's < PACKET_MAX_LEN), but that's not enough to ensure that (len + offset) doesn't overflow. Rearrange our calculation to make sure it's safe regardless of the given offset & len values. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:18 +01:00
David Gibson	c48331ca51	packet: Correct type of PACKET_MAX_LEN PACKET_MAX_LEN is usually involved in calculations on size_t values - the type of the iov_len field in struct iovec. However, being defined bare as UINT16_MAX, the compiled is likely to assign it a shorter type. This can lead to unexpected promotions (or lack thereof). Add a cast to force the type to be what we expect. Fixes: `c43972ad6` ("packet: Give explicit name to maximum packet size") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:15 +01:00
David Gibson	9866d146e6	tap: Clarify calculation of TAP_MSGS The rationale behind the calculation of TAP_MSGS isn't necessarily obvious. It's supposed to be the maximum number of packets that can fit in pkt_buf. However, the calculation is wrong in several ways: * It's based on ETH_ZLEN which isn't meaningful for virtual devices * It always includes the qemu socket header which isn't used for pasta * The size of pkt_buf isn't relevant for vhost-user We've already made sure this is just a tuning parameter, not a hard limit. Clarify what we're calculating here and why. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:12 +01:00
David Gibson	a41d6d125e	tap: Make size of pool_tap[46] purely a tuning parameter Currently we attempt to size pool_tap[46] so they have room for the maximum possible number of packets that could fit in pkt_buf (TAP_MSGS). However, the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as the minimum possible L2 frame size. But ETH_ZLEN is based on physical constraints of Ethernet, which don't apply to our virtual devices. It is possible to generate a legitimate frame smaller than this, for example an empty payload UDP/IPv4 frame on the 'pasta' backend is only 42 bytes long. Further more, the same limit applies for vhost-user, which is not limited by the size of pkt_buf like the other backends. In that case we don't even have full control of the maximum buffer size, so we can't really calculate how many packets could fit in there. If we exceed do TAP_MSGS we'll drop packets, not just use more batches, which is moderately bad. The fact that this needs to be sized just so for correctness not merely for tuning is a fairly non-obvious coupling between different parts of the code. To make this more robust, alter the tap code so it doesn't rely on everything fitting in a single batch of TAP_MSGS packets, instead breaking into multiple batches as necessary. This leaves TAP_MSGS as purely a tuning parameter, which we can freely adjust based on performance measures. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:09 +01:00
David Gibson	e43e00719d	packet: More cautious checks to avoid pointer arithmetic UB packet_check_range and vu_packet_check_range() verify that the packet or section of packet we're interested in lies in the packet buffer pool we expect it to. However, in doing so it doesn't avoid the possibility of an integer overflow while performing pointer arithmetic, with is UB. In fact, AFAICT it's UB even to use arbitrary pointer arithmetic to construct a pointer outside of a known valid buffer. To do this safely, we can't calculate the end of a memory region with pointer addition when then the length as untrusted. Instead we must work out the offset of one memory region within another using pointer subtraction, then do integer checks against the length of the outer region. We then need to be careful about the order of checks so that those integer checks can't themselves overflow. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:33:06 +01:00
David Gibson	4592719a74	vu_common: Tighten vu_packet_check_range() This function verifies that the given packet is within the mmap()ed memory region of the vhost-user device. We can do better, however. The packet should be not only within the mmap()ed range, but specifically in the subsection of that range set aside for shared buffers, which starts at dev_region->mmap_offset within there. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-20 20:32:50 +01:00
Stefano Brivio	32f6212551	Makefile: Enable -Wformat-security It looks like an easy win to prevent a number of possible security flaws. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-20 05:50:53 +01:00
Stefano Brivio	07c2d584b3	conf: Include libgen.h for basename(), fix build against musl Fixes: `4b17d042c7` ("conf: Move mode detection into helper function") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-20 05:50:49 +01:00
Stefano Brivio	ebdd46367c	tcp: Flush socket before checking for more data in active close state Otherwise, if all the pending data is acknowledged: - tcp_update_seqack_from_tap() updates the current tap-side ACK sequence (conn->seq_ack_from_tap) - next, we compare the sequence we sent (conn->seq_to_tap) to the ACK sequence (conn->seq_ack_from_tap) in tcp_data_from_sock() to understand if there's more data we can send. If they match, we conclude that we haven't sent any of that data, and keep re-sending it. We need, instead, to flush the socket (drop acknowledged data) before calling tcp_update_seqack_from_tap(), so that once we update conn->seq_ack_from_tap, we can be sure that all data until there is gone from the socket. Link: https://bugs.passt.top/show_bug.cgi?id=114 Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Fixes: `30f1e082c3` ("tcp: Keep updating window and checking for socket data after FIN from guest") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-20 05:50:43 +01:00
David Gibson	c250ffc5c1	migrate: Bump migration version number v1 of the migration stream format, had some flaws: it didn't properly handle endianness of the MSS field, and it didn't transfer the RFC7323 timestamp. We've now fixed those bugs, but it requires incompatible changes to the stream format. Because of the timestamps in particular, v1 is not really usable, so there is little point maintaining compatible support for it. However, v1 is in released packages, both upstream and downstream (RHEL at least). Just updating the stream format without bumping the version would lead to very cryptic errors if anyone did attempt to migrate between an old and new passt. So, bump the migration version to v2, so we'll get a clear error message if anyone attempts this. We don't attempt to maintain backwards compatibility with v1, however: we'll simply fail if given a v1 stream. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-19 17:17:18 +01:00
David Gibson	cfb3740568	migrate, tcp: Migrate RFC 7323 timestamp Currently our migration of the state of TCP sockets omits the RFC 7323 timestamp. In some circumstances that can result in data sent from the target machine not being received, because it is discarded on the peer due to PAWS checking. Add code to dump and restore the timestamp across migration. Link: https://bugs.passt.top/show_bug.cgi?id=115 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor style fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-19 15:27:27 +01:00
David Gibson	28772ee91a	migrate, tcp: More careful marshalling of mss parameter during migration During migration we extract the limit on segment size using TCP_MAXSEG, and set it on the other side with TCP_REPAIR_OPTIONS. However, unlike most 32-bit values we transfer we transfer it in native endian, not network endian. This is not correct; add it to the list of endian fixups we make. In addition, while MAXSEG will be 32-bits in practice, and is given as such to TCP_REPAIR_OPTIONS, the TCP_MAXSEG sockopt treats it as an 'int'. It's not strictly safe to pass a uint32_t to a getsockopt() expecting an int, although we'll get away with it on most (maybe all) platforms. Correct this as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Minor coding style fix] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-19 15:25:12 +01:00
Stefano Brivio	51f3c071a7	passt-repair: Fix build with -Werror=format-security Fixes: `0470170247` ("passt-repair: Add directory watch") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-18 17:18:47 +01:00
David Gibson	cb5b593563	tcp, flow: Better use flow specific logging heleprs A number of places in the TCP code use general logging functions, instead of the flow specific ones. That includes a few older ones as well as many places in the new migration code. Thus they either don't identify which flow the problem happened on, or identify it in a non-standard way. Convert many of these to use the existing flow specific helpers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-14 23:40:40 +01:00
David Gibson	96fe5548cb	conf: Unify several paths in conf_ports() In conf_ports() we have three different paths which actually do the setup of an individual forwarded port: one for the "all" case, one for the exclusions only case and one for the range of ports with possible exclusions case. We can unify those cases using a new helper which handles a single range of ports, with a bitmap of exclusions. Although this is slightly longer (largely due to the new helpers function comment), it reduces duplicated logic. It will also make future improvements to the tracking of port forwards easier. The new conf_ports_range_except() function has a pretty prodigious parameter list, but I still think it's an overall improvement in conceptual complexity. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-14 23:40:23 +01:00
David Gibson	78f1f0fdfc	test/perf: Simplify iperf3 server lifetime management After we start the iperf3 server in the background, we have a sleep to make sure it's ready to receive connections. We can simplify this slightly by using the -D option to have iperf3 background itself rather than backgrounding it manually. That won't return until the server is ready to use. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	26df8a3608	conf: Limit maximum MTU based on backend frame size The -m option controls the MTU, that is the maximum transmissible L3 datagram, not including L2 headers. We currently limit it to ETH_MAX_MTU which sounds like it makes sense. But ETH_MAX_MTU is confusing: it's not consistently used as to whether it means the maximum L3 datagram size or the maximum L2 frame size. Even within conf() we explicitly account for the L2 header size when computing the default --mtu value, but not when we compute the maximum --mtu value. Clean this up by reworking the maximum MTU computation to be the minimum of IP_MAX_MTU (65535) and the maximum sized IP datagram which can fit into our L2 frames when we account for the L2 header. The latter can vary depending on our tap backend, although it doesn't right now. Link: https://bugs.passt.top/show_bug.cgi?id=66 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	9d1a6b3eba	pcap: Correctly set snaplen based on tap backend type The pcap header includes a value indicating how much of each frame is captured. We always capture the entire frame, so we want to set this to the maximum possible frame size. Currently we do that by setting it to ETH_MAX_MTU, but that's a confusingly named constant which might not always be correct depending on the details of our tap backend. Instead add a tap_l2_max_len() function that explicitly returns the maximum frame size for the current mode and use that to set snaplen. While we're there, there's no particular need for the pcap header to be defined in a global; make it local to pcap_init() instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	b6945e0553	Simplify sizing of pkt_buf We define the size of pkt_buf as large enough to hold 128 maximum size packets. Well, approximately, since we round down to the page size. We don't have any specific reliance on how many packets can fit in the buffer, we just want it to be big enough to allow reasonable batching. The current definition relies on the confusingly named ETH_MAX_MTU and adds in sizeof(uint32_t) rather non-obviously for the pseudo-physical header used by the qemu socket (passt mode) protocol. Instead, just define it to be 8MiB, which is what that complex calculation works out to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	c4bfa3339c	tap: Use explicit defines for maximum length of L2 frame Currently in tap.c we (mostly) use ETH_MAX_MTU as the maximum length of an L2 frame. This define comes from the kernel, but it's badly named and used confusingly. First, it doesn't really have anything to do with Ethernet, which has no structural limit on frame lengths. It comes more from either a) IP which imposes a 64k datagram limit or b) from internal buffers used in various places in the kernel (and in passt). Worse, MTU generally means the maximum size of the IP (L3) datagram which may be transferred, _not_ counting the L2 headers. In the kernel ETH_MAX_MTU is sometimes used that way, but sometimes seems to be used as a maximum frame length, _including_ L2 headers. In tap.c we're mostly using it in the second way. Finally, each of our tap backends could have different limits on the frame size imposed by the mechanisms they're using. Start clearing up this confusion by replacing it in tap.c with new L2_MAX_LEN_* defines which specifically refer to the maximum L2 frame length for each backend. Signed-off-by: David Gibson <dgibson@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	1eda8de438	packet: Remove redundant TAP_BUF_BYTES define Currently we define both TAP_BUF_BYTES and PKT_BUF_BYTES as essentially the same thing. They'll be different only if TAP_BUF_BYTES is negative, which makes no sense. So, remove TAP_BUF_BYTES and just use PKT_BUF_BYTES. In addition, most places we use this to just mean the size of the main packet buffer (pkt_buf) for which we can just directly use sizeof. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	c43972ad67	packet: Give explicit name to maximum packet size We verify that every packet we store in a pool (and every partial packet we retreive from it) has a length no longer than UINT16_MAX. This originated in the older packet pool implementation which stored packet lengths in a uint16_t. Now, that packets are represented by a struct iovec with its size_t length, this check serves only as a sanity / security check that we don't have some wildly out of range length due to a bug elsewhere. We have may reasons to (slightly) increase this limit in future, so in preparation, give this quantity an explicit name - PACKET_MAX_LEN. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	74cd82adc8	conf: Detect vhost-user mode earlier We detect our operating mode in conf_mode(), unless we're using vhost-user mode, in which case we change it later when we parse the --vhost-user option. That means we need to delay parsing the --repair-path option (for vhost-user only) until still later. However, there are many other places in the main option parsing loop which also rely on mode. We get away with those, because they happen to be able to treat passt and vhost-user modes identically. This is potentially confusing, though. So, move setting of MODE_VU into conf_mode() so c->mode always has its final value from that point onwards. To match, we move the parsing of --repair-path back into the main option parsing loop. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	4b17d042c7	conf: Move mode detection into helper function One of the first things we need to do is determine if we're in passt mode or pasta mode. Currently this is open-coded in main(), by examining argv[0]. We want to complexify this a bit in future to cover vhost-user mode as well. Prepare for this, by moving the mode detection into a new conf_mode() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	bb00a0499f	conf: Use the same optstring for passt and pasta modes Currently we rely on detecting our mode first and use different sets of (single character) options for each. This means that if you use an option valid in only one mode in another you'll get the generic usage() message. We can give more helpful errors with little extra effort by combining all the options into a single value of the option string and giving bespoke messages if an option for the wrong mode is used; in fact we already did this for some single mode options like '-1'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
Stefano Brivio	c8b520c062	flow, repair: Wait for a short while for passt-repair to connect ...and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-12 23:08:33 +01:00
Stefano Brivio	0470170247	passt-repair: Add directory watch It might not be feasible for users to start passt-repair after passt is started, on a migration target, but before the migration process starts. For instance, with libvirt, the guest domain (and, hence, passt) is started on the target as part of the migration process. At least for the moment being, there's no hook a libvirt user (including KubeVirt) can use to start passt-repair before the migration starts. Add a directory watch using inotify: if PATH is a directory, instead of connecting to it, we'll watch for a .repair socket file to appear in it, and then attempt to connect to that socket. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-12 21:34:36 +01:00
David Gibson	2b58b22845	cppcheck: Add suppressions for "logically" exported functions We have some functions in our headers which are definitely there on purpose. However, they're not yet used outside the files in which they're defined. That causes sufficiently recent cppcheck versions (2.17) to complain they should be static. Suppress the errors for these "logically" exported functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	a83c806d17	vhost_user: Don't export several functions vhost-user added several functions which are exposed in headers, but not used outside the file where they're defined. I can't tell if these are really internal functions, or of they're logically supposed to be exported, but we don't happen to have anything using them yet. For the time being, just remove the exports. We can add them back if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	27395e67c2	tcp: Don't export tcp_update_csum() tcp_update_csum() is exposed in tcp_internal.h, but is only used in tcp.c. Remove the unneded prototype and make it static. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	12d5b36b2f	checksum: Don't export various functions Several of the exposed functions in checksum.h are no longer directly used. Remove them from the header, and make static. In particular sum_16b() should not be used outside: generally csum_unfolded() should be used which will automatically use either the AVX2 optimized version or sum_16b() as necessary. csum_fold() and csum() could have external uses, but they're not used right now. We can expose them again if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	e36c35c952	log: Don't export passt_vsyslog() passt_vsyslog() is an exposed function in log.h. However it shouldn't be called from outside log.c: it writes specifically to the system log, and most code should call passt's logging helpers which might go to the syslog or to a log file. Make passt_vsyslog() local to log.c. This requires a code motion to avoid a forward declaration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	57d2db370b	treewide: Mark assorted functions static This marks static a number of functions which are only used in their .c file, have no prototypes in a .h and were never intended to be globally exposed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
Jon Maloy	68b04182e0	udp: create and send ICMPv6 to local peer when applicable When a local peer sends a UDP message to a non-existing port on an existing remote host, that host will return an ICMPv6 message containing the error code ICMP6_DST_UNREACH_NOPORT, plus the IPv6 header, UDP header and the first 1232 bytes of the original message, if any. If the sender socket has been connected, it uses this message to issue a "Connection Refused" event to the user. Until now, we have only read such events from the externally facing socket, but we don't forward them back to the local sender because we cannot read the ICMP message directly to user space. Because of this, the local peer will hang and wait for a response that never arrives. We now fix this for IPv6 by recreating and forwarding a correct ICMP message back to the internal sender. We synthesize the message based on the information in the extended error structure, plus the returned part of the original message body. Note that for the sake of completeness, we even produce ICMP messages for other error types and codes. We have noticed that at least ICMP_PROT_UNREACH is propagated as an error event back to the user. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> [sbrivio: fix cppcheck warning, udp_send_conn_fail_icmp6() doesn't modify saddr which can be declared as const] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
Jon Maloy	87e6a46442	tap: break out building of udp header from tap_udp6_send function We will need to build the UDP header at other locations than in function tap_udp6_send(), so we break that part out to a separate function. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
Jon Maloy	55431f0077	udp: create and send ICMPv4 to local peer when applicable When a local peer sends a UDP message to a non-existing port on an existing remote host, that host will return an ICMP message containing the error code ICMP_PORT_UNREACH, plus the header and the first eight bytes of the original message. If the sender socket has been connected, it uses this message to issue a "Connection Refused" event to the user. Until now, we have only read such events from the externally facing socket, but we don't forward them back to the local sender because we cannot read the ICMP message directly to user space. Because of this, the local peer will hang and wait for a response that never arrives. We now fix this for IPv4 by recreating and forwarding a correct ICMP message back to the internal sender. We synthesize the message based on the information in the extended error structure, plus the returned part of the original message body. Note that for the sake of completeness, we even produce ICMP messages for other error codes. We have noticed that at least ICMP_PROT_UNREACH is propagated as an error event back to the user. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> [sbrivio: fix cppcheck warning: udp_send_conn_fail_icmp4() doesn't modify 'in', it can be declared as const] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:19 +01:00
Jon Maloy	82a839be98	tap: break out building of udp header from tap_udp4_send function We will need to build the UDP header at other locations than in function tap_udp4_send(), so we break that part out to a separate function. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-06 20:17:36 +01:00

1 2 3 4 5 ...

1971 commits