In the Makefile we probe to create several defines based on the presence
of particular fields in struct tcp_info. These defines are used for two
purposes, neither of which they accomplish well:
1) Determining if the tcp_info fields are available at runtime. For this
purpose the defines are Just Plain Wrong, since the runtime kernel may
not be the same as the compile time kernel. We corrected this for
tcp_snd_wnd, but not for tcpi_bytes_acked or tcpi_min_rtt
2) Allowing the source to compile against older kernel headers which don't
have the fields in question. This works in theory, but it does mean
we won't be able to use the fields, even if later run against a
newer kernel. Furthermore, it's quite fragile: without much more
thorough tests of builds in different environments that we're currently
set up for, it's very easy to miss cases where we're accessing a field
without protection from an #ifdef. For example we currently access
tcpi_snd_wnd without #ifdefs in tcp_update_seqack_wnd().
Improve this with a different approach, borrowed from qemu (which has many
instances of similar problems). Don't compile against linux/tcp.h, using
netinet/tcp.h instead. Then for when we need an extension field, define
a struct tcp_info_linux, copied from the kernel, with all the fields we're
interested in. That may need updating from future kernel versions, but
only when we want to use a new extension, so it shouldn't be frequent.
This allows us to remove the HAS_SND_WND define entirely. We keep
HAS_BYTES_ACKED and HAS_MIN_RTT now, since they're used for purpose (1),
we'll fix that in a later patch.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Trivial grammar fixes in comments]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As a rule, we prefer constructing packets with matching C structures,
rather than building them byte by byte. However, one case we still build
byte by byte is the TCP options we include in SYN packets (in fact the only
time we generate TCP options on the tap interface).
Rework this to use a structure and initialisers which make it a bit
clearer what's going on.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by; Stefano Brivio <sbrivio@redhat.com>
As tcp_update_check_tcp4() and tcp_update_check_tcp6() compute the
checksum using the TCP header and the TCP payload, it is clearer
to use a pointer to tcp_payload_t that includes tcphdr and payload
rather than a pointer to tcphdr (and guessing TCP header is
followed by the payload).
Move tcp_payload_t and tcp_flags_t to tcp_internal.h.
(They will be used also by vhost-user).
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We can need not to set TCP checksum. Add a parameter to
tcp_fill_headers4() and tcp_fill_headers6() to disable it.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This parameter is already treated as a boolean internally. Make it a
'bool' type for clarity.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When available, we want to retrieve our socket peer's advertised window and
forward that to the guest. That information has been available from the
kernel via the TCP_INFO getsockopt() since kernel commit 8f7baad7f035.
Currently our probing for this is a bit odd. The HAS_SND_WND define
determines if our headers include the tcp_snd_wnd field, but that doesn't
necessarily mean the running kernel supports it. Currently we start by
assuming it's _not_ available, but mark it as available if we ever see
a non-zero value in the field. This is a bit hit and miss in two ways:
* Zero is perfectly possible window the peer could report, so we can
get false negatives
* We're reading TCP_INFO into a local variable, which might not be zero
initialised, so if the kernel _doesn't_ write it it could have non-zero
garbage, giving us false positives.
We can use a more direct way of probing for this: getsockopt() reports the
length of the information retreived. So, check whether that's long enough
to include the field. This lets us probe the availability of the field
once and for all during initialisation. That in turn allows ctx to become
a const pointer to tcp_prepare_flags() which cascades through many other
functions.
We also move the flag for the probe result from the ctx structure to a
global, to match peek_offset_cap.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The term "forwarding address" to indicate the local-to-passt address was
well-intentioned, but ends up being kinda confusing. As discussed on a
recent call, let's try "our" instead.
(While we're there correct an error in flow_initiate_af()s comments where
we referred to parameters by the wrong name).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Move the data structures and helper functions for the TCP hash table to
flow.c, making it a general hash table indexing sides of flows. This is
largely code motion and straightforward renames. There are two semantic
changes:
* flow_lookup_af() now needs to verify that the entry has a matching
protocol and interface as well as matching addresses and ports.
* We double the size of the hash table, because it's now at least
theoretically possible for both sides of each flow to be hashed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we always deliver inbound TCP packets to the guest's most
recent observed IP address. This has the odd side effect that if the
guest changes its IP address with active TCP connections we might
deliver packets from old connections to the new address. That won't
work; it will probably result in an RST from the guest. Worse, if the
guest added a new address but also retains the old one, then we could
break those old connections by redirecting them to the new address.
Now that we maintain flowside information, we have a record of the correct
guest side address and can just use it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Some information we explicitly store in the TCP connection is now
duplicated in the common flow structure. Access it from there instead, and
remove it from the TCP specific structure. With that done we can reorder
both the "tap" and "splice" TCP structures a bit to get better packing for
the new combined flow table entries.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Move all the TCP parts using internal buffers to tcp_buf.c
and keep generic TCP management functions in tcp.c.
Add tcp_internal.h to export needed functions from tcp.c and
tcp_buf.h from tcp_buf.c
With this change we can use existing TCP functions with a
different kind of memory storage as for instance the shared
memory provided by the guest via vhost-user.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>