Commit graph

710 commits

Author SHA1 Message Date
Stefano Brivio
bb70811183 treewide: Packet abstraction with mandatory boundary checks
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
3e4c2d1098 util: Fix function declaration style of write_pidfile()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
415ccf6116 tcp, tcp_splice: Use less awkward syntax to swap in/out sockets from pools
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
f41f0416b8 dhcp: Minimum option length implied by RFC 951 is 60 bytes, not 62
In section 3 ("Packet Format"), "vend" is 64 bytes long, minus the
magic that's 60 bytes, not 62.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
54d9df3903 tcp: Fit struct tcp_conn into a single 64-byte cacheline
...by:

- storing the chained-hash next connection pointer as numeric
  reference rather than as pointer

- storing the MSS as 14-bit value, and rounding it

- using only the effective amount of bits needed to store the hash
  bucket number

- explicitly limiting window scaling factors to 4-bit values
  (maximum factor is 14, from RFC 7323)

- scaling SO_SNDBUF values, and using a 8-bit representation for
  the duplicate ACK sequence

- keeping window values unscaled, as received and sent

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
bc4ec1a8e9 README: Update Interfaces and Availability sections
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
e80f608710 README: Avoid "here" links
They look a bit lame: rephrase sentences to avoid them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
0ebaac9747 test/perf: Work-around for virtio_net hang before long streams from guest
I didn't have time to investigate the root cause for the virtio_net
TX hang yet. Add a quick work-around for the moment being.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
92074c16a8 tcp_splice: Close sockets right away on high number of open files
We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.

Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
be5bbb9b06 tcp: Rework timers to use timerfd instead of periodic bitmap scan
With a lot of concurrent connections, the bitmap scan approach is
not really sustainable.

Switch to per-connection timerfd timers, set based on events and on
two new flags, ACK_FROM_TAP_DUE and ACK_TO_TAP_DUE. Timers are added
to the common epoll list, and implement the existing timeouts.

While at it, drop the CONN_ prefix from flag names, otherwise they
get quite long, and fix the logic to decide if a connection has a
local, possibly unreachable endpoint: we shouldn't go through the
rest of tcp_conn_from_tap() if we reset the connection due to a
successful bind(2), and we'll get EACCES if the port number is low.

Suggested by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
3eb19cfd8a tcp, udp, util: Enforce 24-bit limit on socket numbers
This should never happen, but there are no formal guarantees: ensure
socket numbers are below SOCKET_MAX.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
66a95e331e test, seccomp, Makefile: Switch to valgrind runs for passt functional tests
Pass to seccomp.sh a list of additional syscalls valgrind needs as
EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding
support in seccomp.sh itself.

In test setup functions, start passt with valgrind, but not for
performance tests.

Add tests checking that valgrind exits without errors after all the
other tests in the group are done.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
a3bcca864e test: Add asciinema(1) as requirement for CI in README
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
10f1787edf Makefile: Enable a few hardening flags
They don't have a measurable performance impact and make things a
bit safer.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
79217b7689 udp: Use flags for local, loopback, and configured unicast binds
There's no value in keeping a separate timestamp for activity and for
aging of local binds, given that they have the same timeout. Reduce
that to a single timestamp, with a flag indicating the local bind.

Also use flags instead of separate int fields for loopback and
configured unicast address usage as source address.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
5ca555cf78 dhcpv6, tap, tcp: Use IN6_ARE_ADDR_EQUAL instead of open-coded memcmp()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
5eb7604203 udp: Split buffer queueing/writing parts of udp_sock_handler()
...it became too hard to follow: split it off to
udp_sock_fill_data_v{4,6}.

While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h,
instead of open-coded memcmp().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
0296a57242 udp: Drop _splice from recv, send, sendto static buffer names
It's already implied by the fact they don't have "l2" in their
names, and dropping it improves readability a bit.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
700ce1f875 test/lib/video: Fill in href attributes of video shortcuts
...so that they can be indexed.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
e5eefe7743 tcp: Refactor to use events instead of states, split out spliced implementation
Using events and flags instead of states makes the implementation
much more straightforward: actions are mostly centered on events
that occurred on the connection rather than states.

An example is given by the ESTABLISHED_SOCK_FIN_SENT and
FIN_WAIT_1_SOCK_FIN abominations: we don't actually care about
which side started closing the connection to handle closing of
connection halves.

Split out the spliced implementation, as it has very little in
common with the "regular" TCP path.

Refactor things here and there to improve clarity. Add helpers
to trace where resets and flag settings come from.

No functional changes intended.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
de0961c01c util: Use standard int types
...instead of kernel-like short notations.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:21:13 +01:00
Stefano Brivio
6a1150c026 util: Drop CHECK_SET_MIN_MAX{,_PROTO_FD} macros
...those were used when epoll references used to be socket numbers,
they should have gone away a long time ago.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:21:13 +01:00
Stefano Brivio
5c0426981e pcap: Fix mistake in printed string
Packets are saved *to* a file, not *at* it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:21:13 +01:00
Stefano Brivio
d2e40bb8d9 conf, util, tap: Implement --trace option for extra verbose logging
--debug can be a bit too noisy, especially as single packets or
socket messages are logged: implement a new option, --trace,
implying --debug, that enables all debug messages.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:21:13 +01:00
Stefano Brivio
14c4c0253c README: Make it somewhat readable on mobile devices
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-04 19:23:45 +01:00
Stefano Brivio
216a266a75 hooks, README: gzipped js snippets, webp alternatives for png
Upload gzipped js snippets for usage with gzip_static in nginx or
equivalent. Convert png drawings to webp for smaller size, use them
as alternatives in README.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-02 14:02:03 +01:00
Stefano Brivio
bec6d3e084 test/lib/setup: Unshare PID namespace in pasta_setup()
...otherwise, we'll leave processes (dhclient) around.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-02 05:00:21 +01:00
Stefano Brivio
71ab6d9972 README: Don't preload CI recording, show poster from end of run
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 22:31:42 +01:00
Stefano Brivio
628c4f0cae README: s/guest/namespace/ in pasta "Try it" section
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 21:43:41 +01:00
Stefano Brivio
06f8e4f960 Makefile, hooks: Static target precondition for pkgs, copy .avx2 builds
Convenience packages are anyway built from static builds.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 21:41:22 +01:00
Stefano Brivio
763f281155 demo/pasta: Clean up before rebuilding with -g
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 21:32:08 +01:00
Stefano Brivio
2fa1cef016 arp, dhcp: Fix strict aliasing warnings reported by gcc 4.9 with -Ofast
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 22:17:32 +01:00
Stefano Brivio
213c397492 passt, pasta: Run-time selection of AVX2 build
Build-time selection of AVX2 flags and routines is not practical for
distributions, but limiting AVX2 usage to checksum routines with
specific run-time detection doesn't allow for easy performance gains
from auto-vectorisation of batched packet handling routines.

For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple
wrapper replacing the current executable with the AVX2 build if it's
available, and if AVX2 is supported by the current CPU.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 16:46:28 +01:00
Stefano Brivio
deca1ebe50 test/distro/opensuse: Add Tumbleweed armv7l test
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 04:25:53 +01:00
Stefano Brivio
7992995d35 test/lib/term: Don't run demo when started as ./run
I changed this in a previous commit by mistake, restore the original
command.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 04:24:54 +01:00
Stefano Brivio
04fd94ab07 seccomp, tcp: Add fcntl64 to pasta syscalls for armv6l, armv7l
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 04:23:31 +01:00
Stefano Brivio
8d33ec62de hooks/pre-push: Keep original cast on gzip, fix uploading with dash
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 02:51:34 +01:00
Stefano Brivio
28fb960451 demo/pasta: Exit namespace in 'ns' pane before restarting pasta
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 02:50:25 +01:00
Stefano Brivio
6d661dc5b2 seccomp: Adjust list of allowed syscalls for armv6l, armv7l
It looks like glibc commonly implements clock_gettime(2) with
clock_gettime64(), and uses recv() instead of recvfrom(), send()
instead of sendto(), and sigreturn() instead of rt_sigreturn() on
armv6l and armv7l.

Adjust the list of system calls for armv6l and armv7l accordingly.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-26 23:39:19 +01:00
Stefano Brivio
a095fbc457 passt: Don't warn on failed madvise()
A kernel might not be configured with CONFIG_TRANSPARENT_HUGEPAGE,
especially on embedded systems. Ignore the error, it doesn't affect
functionality.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-26 23:37:05 +01:00
Stefano Brivio
6dc1ec3c7a Makefile: Fix up AUDIT_ARCH for armv6l, armv7l
There's a single AUDIT_ARCH_ARM define available (and big-endian
shouldn't be a concern with those).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-26 23:34:40 +01:00
Stefano Brivio
bd7340e815 tap: Cast ETH_MAX_MTU to signed in comparisons
At least gcc 8.3 and 10.2 emit a warning on armv6l and armv7l.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-26 23:32:50 +01:00
Stefano Brivio
601f7ee78e seccomp.sh: Handle syscall number defines in the (x + y) form
This is the case at least for current glibc headers on armv6l and
armv7l.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-26 23:32:02 +01:00
Stefano Brivio
eed6933e6c udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}
Not functionally needed, but gcc versions 7 to 9 (at least) will
issue a warning otherwise.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-25 22:54:35 +01:00
Stefano Brivio
9b61bd0b39 passt: Explicitly check return value of chdir()
...it doesn't actually matter as we're checking errno at the very
end, but, depending on build flags, chdir() might be declared with
warn_unused_result and the compiler issues a warning.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-25 22:42:36 +01:00
Stefano Brivio
e221ca7613 hooks: Uploaded compressed .cast files too
...to benefit from gzip_static in nginx or equivalent.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 13:28:26 +01:00
Stefano Brivio
03f7eb945b passt.1: Drop duplicate --dns section
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 13:22:08 +01:00
Stefano Brivio
e5bd8dbb24 conf, ndp: Disable router advertisements on --config-net
If we statically configure a default route, and also advertise it for
SLAAC, the kernel will try moments later to add the same route:

  ICMPv6: RA: ndisc_router_discovery failed to add default route

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 13:21:52 +01:00
Stefano Brivio
ed58ad1a59 netlink: Avoid left-over bytes in request on MTU configuration
When nl_link() configures the MTU, it shouldn't send extra bytes,
otherwise we'll get a kernel warning:

  netlink: 4 bytes leftover after parsing attributes in process `pasta'.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 13:21:52 +01:00
Stefano Brivio
08b7a2ec38 test: Fix name of CI asciinema player in perf links handler
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-23 13:21:52 +01:00