passt

Author	SHA1	Message	Date
Jon Maloy	78da088f7b	tcp: unify payload and flags l2 frames array In order to reduce static memory and code footprint, we merge the array for l2 flag frames into the one for payload frames. This change also ensures that no flag message will be sent out over the l2 media bypassing already queued payload messages. Performance measurements with iperf3, where we force all traffic via the tap queue, show no significant difference: Dual traffic both directions sinmultaneously, with patch: ======================================================== host->ns: -------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 36.3 GBytes 3.12 Gbits/sec 4759 sender [ 5] 0.00-100.04 sec 36.3 GBytes 3.11 Gbits/sec receiver ns->host: --------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 321 GBytes 27.6 Gbits/sec receiver Dual traffic both directions sinmultaneously, without patch: ============================================================ host->ns: -------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 35.0 GBytes 3.01 Gbits/sec 6001 sender [ 5] 0.00-100.04 sec 34.8 GBytes 2.99 Gbits/sec receiver ns->host -------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 345 GBytes 29.6 Gbits/sec receiver Single connection, with patch: ============================== host->ns: --------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 138 GBytes 11.8 Gbits/sec 922 sender [ 5] 0.00-100.04 sec 138 GBytes 11.8 Gbits/sec receiver ns->host: ----------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 430 GBytes 36.9 Gbits/sec receiver Single connection, without patch: ================================= host->ns: ------------ [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 139 GBytes 11.9 Gbits/sec 900 sender [ 5] 0.00-100.04 sec 139 GBytes 11.9 Gbits/sec receiver ns->host: --------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 440 GBytes 37.8 Gbits/sec receiver Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:41 +01:00
David Gibson	13f0291ede	tcp: Remove compile-time dependency on struct tcp_info version In the Makefile we probe to create several defines based on the presence of particular fields in struct tcp_info. These defines are used for two purposes, neither of which they accomplish well: 1) Determining if the tcp_info fields are available at runtime. For this purpose the defines are Just Plain Wrong, since the runtime kernel may not be the same as the compile time kernel. We corrected this for tcp_snd_wnd, but not for tcpi_bytes_acked or tcpi_min_rtt 2) Allowing the source to compile against older kernel headers which don't have the fields in question. This works in theory, but it does mean we won't be able to use the fields, even if later run against a newer kernel. Furthermore, it's quite fragile: without much more thorough tests of builds in different environments that we're currently set up for, it's very easy to miss cases where we're accessing a field without protection from an #ifdef. For example we currently access tcpi_snd_wnd without #ifdefs in tcp_update_seqack_wnd(). Improve this with a different approach, borrowed from qemu (which has many instances of similar problems). Don't compile against linux/tcp.h, using netinet/tcp.h instead. Then for when we need an extension field, define a struct tcp_info_linux, copied from the kernel, with all the fields we're interested in. That may need updating from future kernel versions, but only when we want to use a new extension, so it shouldn't be frequent. This allows us to remove the HAS_SND_WND define entirely. We keep HAS_BYTES_ACKED and HAS_MIN_RTT now, since they're used for purpose (1), we'll fix that in a later patch. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Trivial grammar fixes in comments] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-25 14:26:48 +02:00
David Gibson	9e5df350d6	tcp: Use structures to construct initial TCP options As a rule, we prefer constructing packets with matching C structures, rather than building them byte by byte. However, one case we still build byte by byte is the TCP options we include in SYN packets (in fact the only time we generate TCP options on the tap interface). Rework this to use a structure and initialisers which make it a bit clearer what's going on. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by; Stefano Brivio <sbrivio@redhat.com>	2024-10-21 18:51:04 +02:00
Laurent Vivier	72e7d3024b	tcp: Use tcp_payload_t rather than tcphdr As tcp_update_check_tcp4() and tcp_update_check_tcp6() compute the checksum using the TCP header and the TCP payload, it is clearer to use a pointer to tcp_payload_t that includes tcphdr and payload rather than a pointer to tcphdr (and guessing TCP header is followed by the payload). Move tcp_payload_t and tcp_flags_t to tcp_internal.h. (They will be used also by vhost-user). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-04 14:50:46 +02:00
Laurent Vivier	8f8c4d27eb	tcp: Allow checksum to be disabled We can need not to set TCP checksum. Add a parameter to tcp_fill_headers4() and tcp_fill_headers6() to disable it. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-18 17:15:28 +02:00
David Gibson	bb41901c71	tcp: Make tcp_update_seqack_wnd()s force_seq parameter explicitly boolean This parameter is already treated as a boolean internally. Make it a 'bool' type for clarity. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-18 17:14:55 +02:00
David Gibson	4aff6f9392	tcp: Clean up tcpi_snd_wnd probing When available, we want to retrieve our socket peer's advertised window and forward that to the guest. That information has been available from the kernel via the TCP_INFO getsockopt() since kernel commit 8f7baad7f035. Currently our probing for this is a bit odd. The HAS_SND_WND define determines if our headers include the tcp_snd_wnd field, but that doesn't necessarily mean the running kernel supports it. Currently we start by assuming it's _not_ available, but mark it as available if we ever see a non-zero value in the field. This is a bit hit and miss in two ways: * Zero is perfectly possible window the peer could report, so we can get false negatives * We're reading TCP_INFO into a local variable, which might not be zero initialised, so if the kernel _doesn't_ write it it could have non-zero garbage, giving us false positives. We can use a more direct way of probing for this: getsockopt() reports the length of the information retreived. So, check whether that's long enough to include the field. This lets us probe the availability of the field once and for all during initialisation. That in turn allows ctx to become a const pointer to tcp_prepare_flags() which cascades through many other functions. We also move the flag for the probe result from the ctx structure to a global, to match peek_offset_cap. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-18 17:14:47 +02:00
David Gibson	e6feb5a892	treewide: Use "our address" instead of "forwarding address" The term "forwarding address" to indicate the local-to-passt address was well-intentioned, but ends up being kinda confusing. As discussed on a recent call, let's try "our" instead. (While we're there correct an error in flow_initiate_af()s comments where we referred to parameters by the wrong name). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-08-21 11:59:29 +02:00
David Gibson	acca4235c4	flow, tcp: Generalise TCP hash table to general flow hash table Move the data structures and helper functions for the TCP hash table to flow.c, making it a general hash table indexing sides of flows. This is largely code motion and straightforward renames. There are two semantic changes: * flow_lookup_af() now needs to verify that the entry has a matching protocol and interface as well as matching addresses and ports. * We double the size of the hash table, because it's now at least theoretically possible for both sides of each flow to be hashed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-19 18:32:59 +02:00
David Gibson	52d45f1737	tcp: Obtain guest address from flowside Currently we always deliver inbound TCP packets to the guest's most recent observed IP address. This has the odd side effect that if the guest changes its IP address with active TCP connections we might deliver packets from old connections to the new address. That won't work; it will probably result in an RST from the guest. Worse, if the guest added a new address but also retains the old one, then we could break those old connections by redirecting them to the new address. Now that we maintain flowside information, we have a record of the correct guest side address and can just use it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-19 18:32:44 +02:00
David Gibson	f9fe212b1f	tcp, flow: Remove redundant information, repack connection structures Some information we explicitly store in the TCP connection is now duplicated in the common flow structure. Access it from there instead, and remove it from the TCP specific structure. With that done we can reorder both the "tap" and "splice" TCP structures a bit to get better packing for the new combined flow table entries. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-19 18:32:41 +02:00
Laurent Vivier	fba2b544b6	tcp: move buffers management functions to their own file Move all the TCP parts using internal buffers to tcp_buf.c and keep generic TCP management functions in tcp.c. Add tcp_internal.h to export needed functions from tcp.c and tcp_buf.h from tcp_buf.c With this change we can use existing TCP functions with a different kind of memory storage as for instance the shared memory provided by the guest via vhost-user. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-13 15:45:05 +02:00

12 commits