When available, we want to retrieve our socket peer's advertised window and
forward that to the guest. That information has been available from the
kernel via the TCP_INFO getsockopt() since kernel commit 8f7baad7f035.
Currently our probing for this is a bit odd. The HAS_SND_WND define
determines if our headers include the tcp_snd_wnd field, but that doesn't
necessarily mean the running kernel supports it. Currently we start by
assuming it's _not_ available, but mark it as available if we ever see
a non-zero value in the field. This is a bit hit and miss in two ways:
* Zero is perfectly possible window the peer could report, so we can
get false negatives
* We're reading TCP_INFO into a local variable, which might not be zero
initialised, so if the kernel _doesn't_ write it it could have non-zero
garbage, giving us false positives.
We can use a more direct way of probing for this: getsockopt() reports the
length of the information retreived. So, check whether that's long enough
to include the field. This lets us probe the availability of the field
once and for all during initialisation. That in turn allows ctx to become
a const pointer to tcp_prepare_flags() which cascades through many other
functions.
We also move the flag for the probe result from the ctx structure to a
global, to match peek_offset_cap.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When using the new SO_PEEK_OFF feature on TCP sockets, we must adjust
the SO_PEEK_OFF value whenever we move conn->seq_to_tap backwards.
Although it was discussed during development, somewhere during the shuffles
the case where we move the pointer backwards because we lost frames while
sending them to the guest. This can happen, for example, if the socket
buffer on the Unix socket to qemu overflows.
Fixing this is slightly complicated because we need to pass a non-const
context pointer to some places we previously didn't need it. While we're
there also fix a small stylistic issue in the function comment for
tcp_revert_seq() - it was using spaces instead of tabs.
Fixes: e63d281871 ("tcp: leverage support of SO_PEEK_OFF socket option when available")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Move all the TCP parts using internal buffers to tcp_buf.c
and keep generic TCP management functions in tcp.c.
Add tcp_internal.h to export needed functions from tcp.c and
tcp_buf.h from tcp_buf.c
With this change we can use existing TCP functions with a
different kind of memory storage as for instance the shared
memory provided by the guest via vhost-user.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>