passt

Author	SHA1	Message	Date
Jon Maloy	78da088f7b	tcp: unify payload and flags l2 frames array In order to reduce static memory and code footprint, we merge the array for l2 flag frames into the one for payload frames. This change also ensures that no flag message will be sent out over the l2 media bypassing already queued payload messages. Performance measurements with iperf3, where we force all traffic via the tap queue, show no significant difference: Dual traffic both directions sinmultaneously, with patch: ======================================================== host->ns: -------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 36.3 GBytes 3.12 Gbits/sec 4759 sender [ 5] 0.00-100.04 sec 36.3 GBytes 3.11 Gbits/sec receiver ns->host: --------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 321 GBytes 27.6 Gbits/sec receiver Dual traffic both directions sinmultaneously, without patch: ============================================================ host->ns: -------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 35.0 GBytes 3.01 Gbits/sec 6001 sender [ 5] 0.00-100.04 sec 34.8 GBytes 2.99 Gbits/sec receiver ns->host -------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 345 GBytes 29.6 Gbits/sec receiver Single connection, with patch: ============================== host->ns: --------- [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 138 GBytes 11.8 Gbits/sec 922 sender [ 5] 0.00-100.04 sec 138 GBytes 11.8 Gbits/sec receiver ns->host: ----------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 430 GBytes 36.9 Gbits/sec receiver Single connection, without patch: ================================= host->ns: ------------ [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-100.00 sec 139 GBytes 11.9 Gbits/sec 900 sender [ 5] 0.00-100.04 sec 139 GBytes 11.9 Gbits/sec receiver ns->host: --------- [ ID] Interval Transfer Bitrate [ 5] 0.00-100.00 sec 440 GBytes 37.8 Gbits/sec receiver Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-11-07 12:47:41 +01:00
Jon Maloy	ba38e67cf4	tcp: unify l2 TCPv4 and TCPv6 queues and structures Following the preparations in the previous commit, we can now remove the payload and flag queues dedicated for TCPv6 and TCPv4 and move all traffic into common queues handling both protocol types. Apart from reducing code and memory footprint, this change reduces a potential risk for TCPv4 traffic starving out TCPv6 traffic. Since we always flush out the TCPv4 frame queue before the TCPv6 queue, the latter will never be handled if the former fails to send all its frames. Tests with iperf3 shows no measurable change in performance after this change. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-10-29 12:44:08 +01:00
David Gibson	4aff6f9392	tcp: Clean up tcpi_snd_wnd probing When available, we want to retrieve our socket peer's advertised window and forward that to the guest. That information has been available from the kernel via the TCP_INFO getsockopt() since kernel commit 8f7baad7f035. Currently our probing for this is a bit odd. The HAS_SND_WND define determines if our headers include the tcp_snd_wnd field, but that doesn't necessarily mean the running kernel supports it. Currently we start by assuming it's _not_ available, but mark it as available if we ever see a non-zero value in the field. This is a bit hit and miss in two ways: * Zero is perfectly possible window the peer could report, so we can get false negatives * We're reading TCP_INFO into a local variable, which might not be zero initialised, so if the kernel _doesn't_ write it it could have non-zero garbage, giving us false positives. We can use a more direct way of probing for this: getsockopt() reports the length of the information retreived. So, check whether that's long enough to include the field. This lets us probe the availability of the field once and for all during initialisation. That in turn allows ctx to become a const pointer to tcp_prepare_flags() which cascades through many other functions. We also move the flag for the probe result from the ctx structure to a global, to match peek_offset_cap. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-09-18 17:14:47 +02:00
David Gibson	a09aeb4bd6	tcp: Correctly update SO_PEEK_OFF when tcp_send_frames() drops frames When using the new SO_PEEK_OFF feature on TCP sockets, we must adjust the SO_PEEK_OFF value whenever we move conn->seq_to_tap backwards. Although it was discussed during development, somewhere during the shuffles the case where we move the pointer backwards because we lost frames while sending them to the guest. This can happen, for example, if the socket buffer on the Unix socket to qemu overflows. Fixing this is slightly complicated because we need to pass a non-const context pointer to some places we previously didn't need it. While we're there also fix a small stylistic issue in the function comment for tcp_revert_seq() - it was using spaces instead of tabs. Fixes: `e63d281871` ("tcp: leverage support of SO_PEEK_OFF socket option when available") Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>	2024-07-24 09:27:46 +02:00
Laurent Vivier	fba2b544b6	tcp: move buffers management functions to their own file Move all the TCP parts using internal buffers to tcp_buf.c and keep generic TCP management functions in tcp.c. Add tcp_internal.h to export needed functions from tcp.c and tcp_buf.h from tcp_buf.c With this change we can use existing TCP functions with a different kind of memory storage as for instance the shared memory provided by the guest via vhost-user. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2024-06-13 15:45:05 +02:00

5 commits