passt

mirror of https://passt.top/passt synced 2025-07-06 03:58:43 +02:00

Author	SHA1	Message	Date
David Gibson	78f1f0fdfc	test/perf: Simplify iperf3 server lifetime management After we start the iperf3 server in the background, we have a sleep to make sure it's ready to receive connections. We can simplify this slightly by using the -D option to have iperf3 background itself rather than backgrounding it manually. That won't return until the server is ready to use. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	26df8a3608	conf: Limit maximum MTU based on backend frame size The -m option controls the MTU, that is the maximum transmissible L3 datagram, not including L2 headers. We currently limit it to ETH_MAX_MTU which sounds like it makes sense. But ETH_MAX_MTU is confusing: it's not consistently used as to whether it means the maximum L3 datagram size or the maximum L2 frame size. Even within conf() we explicitly account for the L2 header size when computing the default --mtu value, but not when we compute the maximum --mtu value. Clean this up by reworking the maximum MTU computation to be the minimum of IP_MAX_MTU (65535) and the maximum sized IP datagram which can fit into our L2 frames when we account for the L2 header. The latter can vary depending on our tap backend, although it doesn't right now. Link: https://bugs.passt.top/show_bug.cgi?id=66 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	9d1a6b3eba	pcap: Correctly set snaplen based on tap backend type The pcap header includes a value indicating how much of each frame is captured. We always capture the entire frame, so we want to set this to the maximum possible frame size. Currently we do that by setting it to ETH_MAX_MTU, but that's a confusingly named constant which might not always be correct depending on the details of our tap backend. Instead add a tap_l2_max_len() function that explicitly returns the maximum frame size for the current mode and use that to set snaplen. While we're there, there's no particular need for the pcap header to be defined in a global; make it local to pcap_init() instead. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	b6945e0553	Simplify sizing of pkt_buf We define the size of pkt_buf as large enough to hold 128 maximum size packets. Well, approximately, since we round down to the page size. We don't have any specific reliance on how many packets can fit in the buffer, we just want it to be big enough to allow reasonable batching. The current definition relies on the confusingly named ETH_MAX_MTU and adds in sizeof(uint32_t) rather non-obviously for the pseudo-physical header used by the qemu socket (passt mode) protocol. Instead, just define it to be 8MiB, which is what that complex calculation works out to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	c4bfa3339c	tap: Use explicit defines for maximum length of L2 frame Currently in tap.c we (mostly) use ETH_MAX_MTU as the maximum length of an L2 frame. This define comes from the kernel, but it's badly named and used confusingly. First, it doesn't really have anything to do with Ethernet, which has no structural limit on frame lengths. It comes more from either a) IP which imposes a 64k datagram limit or b) from internal buffers used in various places in the kernel (and in passt). Worse, MTU generally means the maximum size of the IP (L3) datagram which may be transferred, _not_ counting the L2 headers. In the kernel ETH_MAX_MTU is sometimes used that way, but sometimes seems to be used as a maximum frame length, _including_ L2 headers. In tap.c we're mostly using it in the second way. Finally, each of our tap backends could have different limits on the frame size imposed by the mechanisms they're using. Start clearing up this confusion by replacing it in tap.c with new L2_MAX_LEN_* defines which specifically refer to the maximum L2 frame length for each backend. Signed-off-by: David Gibson <dgibson@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	1eda8de438	packet: Remove redundant TAP_BUF_BYTES define Currently we define both TAP_BUF_BYTES and PKT_BUF_BYTES as essentially the same thing. They'll be different only if TAP_BUF_BYTES is negative, which makes no sense. So, remove TAP_BUF_BYTES and just use PKT_BUF_BYTES. In addition, most places we use this to just mean the size of the main packet buffer (pkt_buf) for which we can just directly use sizeof. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	c43972ad67	packet: Give explicit name to maximum packet size We verify that every packet we store in a pool (and every partial packet we retreive from it) has a length no longer than UINT16_MAX. This originated in the older packet pool implementation which stored packet lengths in a uint16_t. Now, that packets are represented by a struct iovec with its size_t length, this check serves only as a sanity / security check that we don't have some wildly out of range length due to a bug elsewhere. We have may reasons to (slightly) increase this limit in future, so in preparation, give this quantity an explicit name - PACKET_MAX_LEN. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	74cd82adc8	conf: Detect vhost-user mode earlier We detect our operating mode in conf_mode(), unless we're using vhost-user mode, in which case we change it later when we parse the --vhost-user option. That means we need to delay parsing the --repair-path option (for vhost-user only) until still later. However, there are many other places in the main option parsing loop which also rely on mode. We get away with those, because they happen to be able to treat passt and vhost-user modes identically. This is potentially confusing, though. So, move setting of MODE_VU into conf_mode() so c->mode always has its final value from that point onwards. To match, we move the parsing of --repair-path back into the main option parsing loop. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	4b17d042c7	conf: Move mode detection into helper function One of the first things we need to do is determine if we're in passt mode or pasta mode. Currently this is open-coded in main(), by examining argv[0]. We want to complexify this a bit in future to cover vhost-user mode as well. Prepare for this, by moving the mode detection into a new conf_mode() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
David Gibson	bb00a0499f	conf: Use the same optstring for passt and pasta modes Currently we rely on detecting our mode first and use different sets of (single character) options for each. This means that if you use an option valid in only one mode in another you'll get the generic usage() message. We can give more helpful errors with little extra effort by combining all the options into a single value of the option string and giving bespoke messages if an option for the wrong mode is used; in fact we already did this for some single mode options like '-1'. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-12 23:08:33 +01:00
Stefano Brivio	c8b520c062	flow, repair: Wait for a short while for passt-repair to connect ...and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-12 23:08:33 +01:00
Stefano Brivio	0470170247	passt-repair: Add directory watch It might not be feasible for users to start passt-repair after passt is started, on a migration target, but before the migration process starts. For instance, with libvirt, the guest domain (and, hence, passt) is started on the target as part of the migration process. At least for the moment being, there's no hook a libvirt user (including KubeVirt) can use to start passt-repair before the migration starts. Add a directory watch using inotify: if PATH is a directory, instead of connecting to it, we'll watch for a .repair socket file to appear in it, and then attempt to connect to that socket. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2025-03-12 21:34:36 +01:00
David Gibson	2b58b22845	cppcheck: Add suppressions for "logically" exported functions We have some functions in our headers which are definitely there on purpose. However, they're not yet used outside the files in which they're defined. That causes sufficiently recent cppcheck versions (2.17) to complain they should be static. Suppress the errors for these "logically" exported functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	a83c806d17	vhost_user: Don't export several functions vhost-user added several functions which are exposed in headers, but not used outside the file where they're defined. I can't tell if these are really internal functions, or of they're logically supposed to be exported, but we don't happen to have anything using them yet. For the time being, just remove the exports. We can add them back if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	27395e67c2	tcp: Don't export tcp_update_csum() tcp_update_csum() is exposed in tcp_internal.h, but is only used in tcp.c. Remove the unneded prototype and make it static. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	12d5b36b2f	checksum: Don't export various functions Several of the exposed functions in checksum.h are no longer directly used. Remove them from the header, and make static. In particular sum_16b() should not be used outside: generally csum_unfolded() should be used which will automatically use either the AVX2 optimized version or sum_16b() as necessary. csum_fold() and csum() could have external uses, but they're not used right now. We can expose them again if we need to. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	e36c35c952	log: Don't export passt_vsyslog() passt_vsyslog() is an exposed function in log.h. However it shouldn't be called from outside log.c: it writes specifically to the system log, and most code should call passt's logging helpers which might go to the syslog or to a log file. Make passt_vsyslog() local to log.c. This requires a code motion to avoid a forward declaration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
David Gibson	57d2db370b	treewide: Mark assorted functions static This marks static a number of functions which are only used in their .c file, have no prototypes in a .h and were never intended to be globally exposed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
Jon Maloy	68b04182e0	udp: create and send ICMPv6 to local peer when applicable When a local peer sends a UDP message to a non-existing port on an existing remote host, that host will return an ICMPv6 message containing the error code ICMP6_DST_UNREACH_NOPORT, plus the IPv6 header, UDP header and the first 1232 bytes of the original message, if any. If the sender socket has been connected, it uses this message to issue a "Connection Refused" event to the user. Until now, we have only read such events from the externally facing socket, but we don't forward them back to the local sender because we cannot read the ICMP message directly to user space. Because of this, the local peer will hang and wait for a response that never arrives. We now fix this for IPv6 by recreating and forwarding a correct ICMP message back to the internal sender. We synthesize the message based on the information in the extended error structure, plus the returned part of the original message body. Note that for the sake of completeness, we even produce ICMP messages for other error types and codes. We have noticed that at least ICMP_PROT_UNREACH is propagated as an error event back to the user. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> [sbrivio: fix cppcheck warning, udp_send_conn_fail_icmp6() doesn't modify saddr which can be declared as const] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
Jon Maloy	87e6a46442	tap: break out building of udp header from tap_udp6_send function We will need to build the UDP header at other locations than in function tap_udp6_send(), so we break that part out to a separate function. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:24 +01:00
Jon Maloy	55431f0077	udp: create and send ICMPv4 to local peer when applicable When a local peer sends a UDP message to a non-existing port on an existing remote host, that host will return an ICMP message containing the error code ICMP_PORT_UNREACH, plus the header and the first eight bytes of the original message. If the sender socket has been connected, it uses this message to issue a "Connection Refused" event to the user. Until now, we have only read such events from the externally facing socket, but we don't forward them back to the local sender because we cannot read the ICMP message directly to user space. Because of this, the local peer will hang and wait for a response that never arrives. We now fix this for IPv4 by recreating and forwarding a correct ICMP message back to the internal sender. We synthesize the message based on the information in the extended error structure, plus the returned part of the original message body. Note that for the sake of completeness, we even produce ICMP messages for other error codes. We have noticed that at least ICMP_PROT_UNREACH is propagated as an error event back to the user. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> [sbrivio: fix cppcheck warning: udp_send_conn_fail_icmp4() doesn't modify 'in', it can be declared as const] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-07 02:21:19 +01:00
Jon Maloy	82a839be98	tap: break out building of udp header from tap_udp4_send function We will need to build the UDP header at other locations than in function tap_udp4_send(), so we break that part out to a separate function. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-06 20:17:36 +01:00
David Gibson	1924e25f07	conf: Be more precise about minimum MTUs Currently we reject the -m option if given a value less than ETH_MIN_MTU (68). That define is derived from the kernel, but its name is misleading: it doesn't really have anything to do with Ethernet per se, but is rather the minimum payload any L2 link must be able to handle in order to carry IPv4. For IPv6, it's not sufficient: that requires an MTU of at least 1280. Newer kernels have better named constants IPV4_MIN_MTU and IPv6_MIN_MTU. Copy and use those constants instead, along with some more specific error messages. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-06 20:17:23 +01:00
David Gibson	672d786de1	tcp: Send RST in response to guest packets that match no connection Currently, if a non-SYN TCP packet arrives which doesn't match any existing connection, we simply ignore it. However RFC 9293, section 3.10.7.1 says we should respond with an RST to a non-SYN, non-RST packet that's for a CLOSED (i.e. non-existent) connection. This can arise in practice with migration, in cases where some error means we have to discard a connection. We destroy the connection with tcp_rst() in that case, but because the guest is stopped, we may not be able to deliver the RST packet on the tap interface immediately. This change ensures an RST will be sent if the guest tries to use the connection again. A similar situation can arise if a passt/pasta instance is killed or crashes, but is then replaced with another attached to the same guest. This can leave the guest with stale connections that the new passt instance isn't aware of. It's better to send an RST so the guest knows quickly these are broken, rather than letting them linger until they time out. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-05 21:46:32 +01:00
David Gibson	1f236817ea	tap: Consider IPv6 flow label when building packet sequences To allow more batching, we group together related packets into "seqs" in the tap layer, before passing them to the L4 protocol layers. Currently we consider the IP protocol, both IP addresses and also the L4 ports when grouping things into seqs. We ignore the IPv6 flow label. We have some future cases where we want to consider the the flow label in the L4 code, which is awkward if we could be given a single batch with multiple labels. Add the flow label to tap6_l4_t and group by it as well as the other criteria. In future we could possibly use the flow label _instead_ of peeking into the L4 header for the ports, but we don't do so for now. The guest should use the same flow label for all packets in a low, but if it doesn't this change won't break anything, it just means we'll batch things a bit sub-optimally. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-05 21:46:29 +01:00
David Gibson	008175636c	ip: Helpers to access IPv6 flow label The flow label is a 20-bit field in the IPv6 header. The length and alignment make it awkward to pass around as is. Obviously, it can be packed into a 32-bit integer though, and we do this in two places. We have some further upcoming places where we want to manipulate the flow label, so make some helpers for marshalling and unmarshalling it to an integer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-03-05 21:46:17 +01:00
David Gibson	52419a64f2	migrate, tcp: Don't flow_alloc_cancel() during incoming migration In tcp_flow_migrate_target(), if we're unable to create and bind the new socket, we print an error, cancel the flow and carry on. This seems to make sense based on our policy of generally letting the migration complete even if some or all flows are lost in the process. But it doesn't quite work: the flow_alloc_cancel() means that the flows in the target's flow table are no longer one to one match to the flows which the source is sending data for. This means that data for later flows will be mismatched to a different flow. Most likely that will cause some nasty error later, but even worse it might appear to succeed but lead to data corruption due to incorrectly restoring one of the flows. Instead, we should leave the flow in the table until we've read all the data for it, then discard it. Technically removing the flow_alloc_cancel() would be enough for this: if tcp_flow_repair_socket() fails it leaves conn->sock == -1, which will cause the restore functions in tcp_flow_migrate_target_ext() to fail, discarding the flow. To make what's going on clearer (and with less extraneous error messages), put several explicit tests for a missing socket later in the migration path to read the data associated with the flow but explicitly discard it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-28 01:32:38 +01:00
David Gibson	b2708218a6	tcp: Unconditionally move to CLOSED state on tcp_rst() tcp_rst() attempts to send an RST packet to the guest, and if that succeeds moves the flow to CLOSED state. However, even if the tcp_send_flag() fails the flow is still dead: we've usually closed the socket already, and something has already gone irretrievably wrong. So we should still mark the flow as CLOSED. That will cause it to be cleaned up, meaning any future packets from the guest for it won't match a flow, so should generate new RSTs (they don't at the moment, but that's a separate bug). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-28 01:32:36 +01:00
David Gibson	56ce03ed0a	tcp: Correct error code handling from tcp_flow_repair_socket() There are two small bugs in error returns from tcp_low_repair_socket(), which is supposed to return a negative errno code: 1) On bind() failures, wedirectly pass on the return code from bind(), which is just 0 or -1, instead of an error code. 2) In the caller, tcp_flow_migrate_target() we call strerror_() directly on the negative error code, but strerror() requires a positive error code. Correct both of these. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-28 01:32:35 +01:00
David Gibson	39f85bce1a	migrate, flow: Don't attempt to migrate TCP flows without passt-repair Migrating TCP flows requires passt-repair in order to use TCP_REPAIR. If passt-repair is not started, our failure mode is pretty ugly though: we'll attempt the migration, hitting various problems when we can't enter repair mode. In some cases we may not roll back these changes properly, meaning we break network connections on the source. Our general approach is not to completely block migration if there are problems, but simply to break any flows we can't migrate. So, if we have no connection from passt-repair carry on with the migration, but don't attempt to migrate any TCP connections. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-28 01:32:33 +01:00
David Gibson	7b92f2e852	migrate, flow: Trivially succeed if migrating with no flows We could get a migration request when we have no active flows; or at least none that we need or are able to migrate. In this case after sending or receiving the number of flows we continue to step through various lists. In the target case, this could include communication with passt-repair. If passt-repair wasn't started that could cause further errors, but of course they shouldn't matter if we have nothing to repair. Make it more obvious that there's nothing to do and avoid such errors by short-circuiting flow_migrate_{source,target}() if there are no migratable flows. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-28 01:32:21 +01:00
Stefano Brivio	87471731e6	selinux: Fixes/workarounds for passt and passt-repair, mostly for libvirt usage Here are a bunch of workarounds and a couple of fixes for libvirt usage which are rather hard to split into single logical patches as there appear to be some obscure dependencies between some of them: - passt-repair needs to have an exec_type typeattribute (otherwise the policy for lsmd(1) causes a violation on getattr on its executable) file, and that typeattribute just happened to be there for passt as a result of init_daemon_domain(), but passt-repair isn't a daemon, so we need an explicit corecmd_executable_file() - passt-repair needs a workaround, which I'll revisit once https://github.com/fedora-selinux/selinux-policy/issues/2579 is solved, for usage with libvirt: allow it to use qemu_var_run_t and virt_var_run_t sockets - add 'bpf' and 'dac_read_search' capabilities for passt-repair: they are needed (for whatever reason I didn't investigate) to actually receive socket files via SCM_RIGHTS - passt needs further workarounds in the sense of https://github.com/fedora-selinux/selinux-policy/issues/2579: allow it to use map and use svirt_tmpfs_t (not just svirt_image_t): it depends on where the libvirt guest image is - ...it also needs to map /dev/null if <access mode='shared'/> is enabled in libvirt's XML for the memoryBacking object, for vhost-user operation - and 'ioctl' on the TCP socket appears to be actually needed, on top of 'getattr', to dump some socket parameters Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-28 01:14:01 +01:00
Michal Privoznik	be86232f72	seccomp.sh: Silence stty errors When printing list of allowed syscalls the width of terminal is obtained for nicer output (see commit below). The width is obtained by running 'stty'. While this works when building from a console, it doesn't work during rpmbuild/emerge/.. as stdout is usually not a console but a logfile and stdin is usually /dev/null or something. This results in stty reporting errors like this: stty: 'standard input': Inappropriate ioctl for device Redirect stty's stderr to /dev/null to silence it. Fixes: `712ca32353` ("seccomp.sh: Try to account for terminal width while formatting list of system calls") Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-24 18:46:28 +01:00
Jon Maloy	ea69ca6a20	tap: always set the no_frag flag in IPv4 headers When studying the Linux source code and Wireshark dumps it seems like the no_frag flag in the IPv4 header is always set. Following discussions in the Internet on this subject indicates that modern routers never fragment packets, and that it isn't even supported in many cases. Adding to this that incoming messages forwarded on the tap interface never even pass through a router it seems safe to always set this flag. This makes the IPv4 headers of forwarded messages identical to those sent by the external sockets, something we must consider desirable. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-20 12:43:00 +01:00
Stefano Brivio	4dac2351fa	contrib/fedora: Actually install passt-repair SELinux policy file Otherwise we build it, but we don't install it. Not an issue that warrants a a release right away as it's anyway usable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 23:33:53 +01:00
Stefano Brivio	16553c8280	dhcp: Add option code byte in calculation for OPT_MAX boundary check Otherwise we'll limit messages to 577 bytes, instead of 576 bytes as intended: $ fqdn="thirtytwocharactersforeachlabel.thirtytwocharactersforeachlabel.thirtytwocharactersforeachlabel.thirtytwocharactersforeachlabel.thirtytwocharactersforeachlabel.thirtytwocharactersforeachlabel.thirtytwocharactersforeachlabel.then_make_it_251_with_this" $ hostname="__eighteen_bytes__" $ ./pasta --fqdn ${fqdn} -H ${hostname} -p dhcp.pcap -- /sbin/dhclient -4 Saving packet capture to dhcp.pcap $ tshark -r dhcp.pcap -V -Y 'dhcp.option.value == 5' \| grep "Total Length" Total Length: 577 This was hidden by the issue fixed by commit `bcc4908c2b` ("dhcp Remove option 255 length byte") until now. Fixes: `31e8109a86` ("dhcp, dhcpv6: Add hostname and client fqdn ops") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Enrique Llorente <ellorent@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 23:33:19 +01:00
Stefano Brivio	183bedf478	Makefile: Use mmap2() as alternative for mmap() in valgrind extra syscalls ...instead of unconditionally trying to enable both: mmap2() is the 32-bit ARM variant for mmap() (and perhaps for other architectures), bot if mmap() is available, valgrind will use that one. This avoids seccomp.sh warning us about missing mmap2() if mmap() is present, and is consistent with what we do in vhost-user code. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 16:36:47 +01:00
David Gibson	1cc5d4c9fe	conf: Use 0 instead of -1 as "unassigned" mtu value On the command line -m 0 means "don't assign an MTU" (letting the guest use its default. However, internally we use (c->mtu == -1) to represent that state. We use (c->mtu == 0) to represent "the user didn't specify on the command line, so use the default" - but this is only used during conf(), never afterwards. This is unnecessarily confusing. We can instead just initialise c->mtu to its default (65520) before parsing options and use 0 on both the command line and internally to represent the "don't assign" special case. This ensures that c->mtu is always 0..65535, so we can store it in a uint16_t which is more natural. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:41 +01:00
David Gibson	3dc7da68a2	conf: More thorough error checking when parsing --mtu option We're a bit sloppy with parsing MTU which can lead to some surprising, though fairly harmless, results: * Passing a non-number like '-m xyz' will not give an error and act like -m 0 * Junk after a number (e.g. '-m 1500pqr') will be ignored rather than giving an error * We parse the MTU as a long, then immediately assign to an int, so on some platforms certain ludicrously out of bounds values will be silently truncated, rather than giving an error Be a bit more thorough with the error checking to avoid that. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:38 +01:00
David Gibson	65e317a8fc	flow: Clean up and generalise flow traversal macros The migration code introduced a number of 'foreach' macros to traverse the flow table. These aren't inherently tied to migration, so polish up their naming, move them to flow_table.h and also use in flow_defer_handler() which is the other place we need to traverse the whole table. For now we keep foreach_established_tcp_flow() as is. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:36 +01:00
David Gibson	b79a22d360	flow: Remove unneeded bound parameter from flow traversal macros The foreach macros used to step through flows each take a 'bound' parameter to only scan part of the flow table. Only one place actually passes a bound different from FLOW_MAX. So we can simplify every other invocation by having that one case manually handle the bound. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:34 +01:00
David Gibson	7ffca35fdd	flow: Remove unneeded index from foreach_* macros The foreach macros are odd in that they take two loop counters: an integer index, and a pointer to the flow. We nearly always want the latter, not the former, and we can get the index from the pointer trivially when we need it. So, rearrange the macros not to need the integer index. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-19 06:35:20 +01:00
David Gibson	adb46c11d0	flow: Add flow_perror() helper Our general logging helpers include a number of _perror() variants which, like perror(3) include the description of the current errno. We didn't have those for our flow specific logging helpers, though. Fill this gap with flow_perror() and flow_dbg_perror(), and use them where it's useful. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 13:33:12 +01:00
David Gibson	ba0823f8a0	tcp: Don't pass both flow pointer and flow index tcp_flow_migrate_source_ext() is passed both the index of the flow it operates on and the pointer to the connection structure. However, the former is trivially derived from the latter. Simplify the interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 13:33:10 +01:00
David Gibson	854bc7b1a3	tcp: Remove spurious prototype for tcp_flow_migrate_shrink_window This function existed in drafts of the migration code, but not the final version. Get rid of the prototype. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 13:33:08 +01:00
David Gibson	e56c8038fc	tcp: More type safety for tcp_flow_migrate_target_ext() tcp_flow_migrate_target_ext() takes a raw union flow , although it is TCP specific, and requires a FLOW_TYPE_TCP entry. Our usual convention is that such functions should take a struct tcp_tap_conn instead. Convert it to do so. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 13:32:52 +01:00
David Gibson	5a07eb3ccc	tcp_vu: head_cnt need not be global head_cnt is a global variable which tracks how many entries in head[] are currently used. The fact that it's global obscures the fact that the lifetime over which it has a meaningful value is quite short: a single call to of tcp_vu_data_from_sock(). Make it a local to tcp_vu_data_from_sock() to make that lifetime clearer. We keep the head[] array global for now - although technically it has the same valid lifetime - because it's large enough we might not want to put it on the stack. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 11:28:37 +01:00
David Gibson	6b4065153c	tap: Remove unused ETH_HDR_INIT() macro The uses of this macro were removed in `d4598e1d18` ("udp: Use the same buffer for the L2 header for all frames"). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 08:43:18 +01:00
David Gibson	354bc0bab1	packet: Don't pass start and offset separately to packet_check_range() Fundamentally what packet_check_range() does is to check whether a given memory range is within the allowed / expected memory set aside for packets from a particular pool. That range could represent a whole packet (from packet_add_do()) or part of a packet (from packet_get_do()), but it doesn't really matter which. However, we pass the start of the range as two parameters: @start which is the start of the packet, and @offset which is the offset within the packet of the range we're interested in. We never use these separately, only as (start + offset). Simplify the interface of packet_check_range() and vu_packet_check_range() to directly take the start of the relevant range. This will allow some additional future improvements. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 08:43:12 +01:00
David Gibson	0a51060f7a	packet: Use flexible array member in struct pool Currently we have a dummy pkt[1] array, which we alias with an array of a different size via various macros. However, we already require C11 which includes flexible array members, so we can do better. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2025-02-18 08:43:04 +01:00

1 2 3 4 5 ...

1943 commits