Commit graph

111 commits

Author SHA1 Message Date
Stefano Brivio
6655625c30 Makefile: Include seccomp.h in HEADERS and require it for static checkers
Targets running static checkers (cppcheck and clang-tidy) need
seccomp.h, but the latter is not included in HEADERS. Add it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-22 16:53:35 +02:00
Stefano Brivio
512f5b1aab Makefile: Allow define overrides by prepending, not appending, CFLAGS
If we append CFLAGS to the ones passed via command line (if any),
-D options we append will override -D options passed on command line
(if any).

For example, OpenSUSE build flags include -D_FORTIFY_SOURCE=3, and we
want to have -D_FORTIFY_SOURCE=2, if and only if not overridden. The
current behaviour implies we redefine _FORTIFY_SOURCE as 2, though.

Instead of appending CFLAGS, prepend them by adding all the default
build flags to another variable, a simply expanded one (defined with
:=), named FLAGS, and pass that *before* CFLAGS in targets, so that
defines from command line can override default flags.

Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Dario Faggioli <dfaggioli@suse.com>
2022-09-22 16:53:09 +02:00
David Gibson
d72a1e7bb9 Move self-isolation code into a separate file
passt/pasta contains a number of routines designed to isolate passt from
the rest of the system for security.  These are spread through util.c and
passt.c.  Move them together into a new isolation.c file.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-13 05:31:51 +02:00
Stefano Brivio
b2ee37ad38 Makefile: Honour LDFLAGS for binary targets
We don't set any, but we should use them if they are passed in the
environment. On a Fedora Rawhide package build, annocheck
(https://sourceware.org/annobin/) reports:

  Hardened: /usr/bin/passt: FAIL: bind-now test because not linked with -Wl,-z,now

...despite the build system exporting -Wl,-z,now in LDFLAGS.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-09-07 11:01:10 +02:00
Stefano Brivio
7b710946b1 Makefile: Use more GNU-style directory variables, explicit docdir for OpenSUSE
It turns out that, while on most distributions "docdir" would be
/usr/share/doc, it's /usr/share/doc/packages/ on OpenSUSE Tumbleweed.
Use an explicit docdir as shown in:
  https://en.opensuse.org/openSUSE:Build_Service_cross_distribution_howto

and don't unnecessarily hardcode directory variables in the Makefile.
Otherwise, RPM builds for OpenSUSE will fail now that we have a README
there.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-21 22:25:51 +02:00
Stefano Brivio
be0fe6502f Makefile: Install demo.sh too, uninstall stuff under /usr/share
Suggested-by: Benson Muite <benson_muite@emailplus.org>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-20 19:07:12 +02:00
Stefano Brivio
c5f4ba1b1b Makefile: Ugly hack to get a "plain" Markdown version of README
Distribution packages reasonably expect to have a human-readable
Markdown version of the README under /usr/share/doc/, but all we have
right now is a heavily web-oriented version.

Introduce a ugly hack to strip web-oriented parts from the current
README and install it.

It should probably work the other way around: a human-readable README
could be used as a source for the web page. But cgit needs a file
that's in the tree, not something that can be built, and
https://passt.top/ is based on cgit. It should eventually be doable
to work around this in cgit, instead.

Reported-by: Benson Muite <benson_muite@emailplus.org>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-08-20 19:07:12 +02:00
David Gibson
05dc1c65c1 valgrind needs futex
Some versions of valgrind (such as the version on my Fedora laptop -
valgrind-3.19.0-3.fc36.x86_64) use futexes.  But futex is currently not
allowed in the seccomp filter, even with the extra calls added for
valgrind builds.  Add it, to avoid spurious valgrind failures.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-22 19:41:42 +02:00
Stefano Brivio
1d223e4b4c passt: Allow exit_group() system call in seccomp profiles
We handle SIGQUIT and SIGTERM calling exit(), which is usually
implemented with the exit_group() system call.

If we don't allow exit_group(), we'll get a SIGSYS while handling
SIGQUIT and SIGTERM, which means a misleading non-zero exit code.

Reported-by: Wenli Quan <wquan@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2101990
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-07-14 01:36:05 +02:00
David Gibson
ed63892a16 Clean up passt.pid file
If the tests are interrupted at the right point a passt.pid file can be
left over.  Clean it up with "make clean" and add it to .gitignore so it
doesn't get accidentally committed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-14 01:32:42 +02:00
David Gibson
dab2c6ee1f Add cleaner line-by-line reading primitives
Two places in passt need to read files line by line (one parsing
resolv.conf, the other parsing /proc/net/*.  They can't use fgets()
because in glibc that can allocate memory.  Instead they use an
implementation line_read() in util.c.  This has some problems:

 * It has two completely separate modes of operation, one buffering
   and one not, the relation between these and how they're activated
   is subtle and confusing
 * At least in non-buffered mode, it will mishandle an empty line,
   folding them onto the start of the next non-empty line
 * In non-buffered mode it will use lseek() which prevents using this
   on non-regular files (we don't need that at present, but it's a
   surprising limitation)
 * It has a lot of difficult to read pointer mangling

Add a new cleaner implementation of allocation-free line-by-line
reading in lineread.c.  This one always buffers, using a state
structure to keep track of what we need.  This is larger than I'd
like, but it turns out handling all the edge cases of line-by-line
reading in C is surprisingly hard.

This just adds the code, subsequent patches will change the existing
users of line_read() to the new implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-06 08:10:55 +02:00
David Gibson
2c13f6bead Makefile: Don't create extraneous -.s file
In order to probe availability of certain features the Makefile test
compiles a handful of tiny snippets, feeding those in from stdin.  However
in one case - the one for -fstack-protector - it forgets to redirect the
output to stdout, meaning it creates a stray '-.s' file when make is
invoked (even make clean).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-06-18 09:06:00 +02:00
David Gibson
4f95db7945 Makefile: Tweak $(RM) usage
The use of rm commands in the clean and uninstall targets adds an explicit
leading - to ignore errors.  However the built-in RM variable in make is
actually "rm -f" which already ignores errors, so the - isn't neccessary.

Also replace ${RM} with $(RM) which is the more conventional form in
Makefiles.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-06-18 09:06:00 +02:00
David Gibson
ae92e77d5e Makefile: Simplify pasta* targets with a pattern rule
pasta, pasta.avx2 and pasta.1 are all generated as a link to the
corresponding passt file.  We can consolidate the 3 rules for these targets
into a single pattern rule.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-06-18 09:06:00 +02:00
David Gibson
25f515831c Makefile: Use $(BIN) and $(MANPAGES) variable to simplify several targets
There are several places which explicitly list the various generated
binaries, even though a $(BIN) variable already lists them.  There are
several more places that list all the manpage files, introduce a
$(MANPAGES) variable to remove that repetition as well.

Tweak the generation of pasta.1 as a link to passt.1 so it's not just made
as a side effect of the pasta target.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: add passt.1 and qrap.1 to guest files for distro tests]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-06-18 09:06:00 +02:00
David Gibson
08007d0b25 Makefile: Avoid using wildcard sources
The passt/pasta Makefile makes fairly heavy use of GNU make's $(wildcard)
function to locate the sources and headers to build.  Using wildcards for
the things to compile is usually a bad idea though: if somehow you end up
with a .c or .h file in your tree you didn't expect it can misbuild in an
exceedingly confusing way.  In particular this can sometimes happen if
switching between releases / branches where files have been added or
removed without 100% cleaning the tree.

It also makes life a bit complicated if building multiple different
binaries in the same tree: we already have some rather awkward
$(filter-out) constructions to avoid including qrap.c in the passt build.

Replace use of $(wildcard) with the more idiomatic approach of defining
variables listing all the relevant source files then using that throughout.
In the rule for seccomp.h there was also a bare "*.c" which caused make to
always rebuild that target.  Fix that as well.

Similarly, seccomp.sh uses a wildcard to locate the sources, which is
unwise for the same reasons.  Make it take the sources to examine on the
command line instead, and have the Makefile pass them in from the same
variables.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-06-18 09:06:00 +02:00
Stefano Brivio
721fa1bf5d Makefile: Suppress unusedStructMember Cppcheck warning in dhcp.c
New from Cppcheck 2.8: all the fields of struct msg that are not
directly manipulated are now reported as unused, which is kind of
correct as those fields are used as a blob "copied" from request
to response and not as separate fields.

However, keeping the message composition explicit is probably
desirable, and adding inline suppressions makes the whole thing
rather unreadable, so just suppress unusedStructMember warnings for
dhcp.c, while also adding a suppression for unmatched suppressions
to keep earlier versions of Cppcheck happy.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-06-18 09:06:00 +02:00
Stefano Brivio
33fc2dece2 Makefile: Allow implicit test for bugprone-suspicious-string-compare checker
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
62c3edd957 treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
66a95e331e test, seccomp, Makefile: Switch to valgrind runs for passt functional tests
Pass to seccomp.sh a list of additional syscalls valgrind needs as
EXTRA_SYSCALLS in a new 'valgrind' make target, and add corresponding
support in seccomp.sh itself.

In test setup functions, start passt with valgrind, but not for
performance tests.

Add tests checking that valgrind exits without errors after all the
other tests in the group are done.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
10f1787edf Makefile: Enable a few hardening flags
They don't have a measurable performance impact and make things a
bit safer.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-28 17:11:40 +02:00
Stefano Brivio
06f8e4f960 Makefile, hooks: Static target precondition for pkgs, copy .avx2 builds
Convenience packages are anyway built from static builds.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-01 21:41:22 +01:00
Stefano Brivio
213c397492 passt, pasta: Run-time selection of AVX2 build
Build-time selection of AVX2 flags and routines is not practical for
distributions, but limiting AVX2 usage to checksum routines with
specific run-time detection doesn't allow for easy performance gains
from auto-vectorisation of batched packet handling routines.

For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple
wrapper replacing the current executable with the AVX2 build if it's
available, and if AVX2 is supported by the current CPU.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-28 16:46:28 +01:00
Stefano Brivio
6dc1ec3c7a Makefile: Fix up AUDIT_ARCH for armv6l, armv7l
There's a single AUDIT_ARCH_ARM define available (and big-endian
shouldn't be a concern with those).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-26 23:34:40 +01:00
Stefano Brivio
745a9ba428 pasta: By default, quit if filesystem-bound net namespace goes away
This should be convenient for users managing filesystem-bound network
namespaces: monitor the base directory of the namespace and exit if
the namespace given as PATH or NAME target is deleted. We can't add
an inotify watch directly on the namespace directory, that won't work
with nsfs.

Add an option to disable this behaviour, --no-netns-quit.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
ce4e7b4d5d Makefile, conf, passt: Drop passt4netns references, explicit argc check
Nobody currently calls this as passt4netns, that was the name I used
before 'pasta', drop any reference before it's too late.

While at it, explicitly check that argc is bigger than or equal to
one, just as a defensive measure: argv[0] being NULL is not an issue
anyway.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
292c185553 passt: Address new clang-tidy warnings from LLVM 13.0.1
clang-tidy from LLVM 13.0.1 reports some new warnings from these
checkers:

- altera-unroll-loops, altera-id-dependent-backward-branch: ignore
  for the moment being, add a TODO item

- bugprone-easily-swappable-parameters: ignore, nothing to do about
  those

- readability-function-cognitive-complexity: ignore for the moment
  being, add a TODO item

- altera-struct-pack-align: ignore, alignment is forced in protocol
  headers

- concurrency-mt-unsafe: ignore for the moment being, add a TODO
  item

Fix bugprone-implicit-widening-of-multiplication-result warnings,
though, that's doable and they seem to make sense.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-30 02:59:12 +01:00
Stefano Brivio
1776de0140 tcp, netlink, HAS{BYTES_ACKED,MIN_RTT,GETRANDOM} and NETLINK_GET_STRICT_CHK
tcpi_bytes_acked and tcpi_min_rtt are only available on recent
kernel versions: provide fall-back paths (incurring some grade of
performance penalty).

Support for getrandom() was introduced in Linux 3.17 and glibc 2.25:
provide an alternate mechanism for that as well, reading from
/dev/random.

Also check if NETLINK_GET_STRICT_CHK is defined before using it:
it's not strictly needed, we'll filter out irrelevant results from
netlink anyway.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 16:30:59 +01:00
Stefano Brivio
fa7e2e7016 Makefile, seccomp: Fix build for i386, ppc64, ppc64le
On some distributions, on ppc64, ulimit -s returns 'unlimited': add a
reasonable default, and also make sure ulimit is invoked using the
default shell, which should ensure ulimit is actually implemented.

Also note that AUDIT_ARCH doesn't follow closely the naming reported
by 'uname -m': convert for i386 and ppc as needed.

While at it, move inclusion of seccomp.h after util.h, the former is
less generic (cosmetic/clang-tidy only).

Older kernel headers might lack a definition for AUDIT_ARCH_PPC64LE:
define that explicitly if it's not available.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 07:57:09 +01:00
Stefano Brivio
685b50c3ce Makefile: cppcheck target: Suppress unmatchedSuppression, pass CFLAGS
Some of those warnings don't trigger even on systems with very
similar toolchains, suppress unmatchedSuppression warnings, they're
basically useless.

While at it, pass CFLAGS to cppcheck.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 12:16:16 +02:00
Stefano Brivio
627e18fa8a passt: Add cppcheck target, test, and address resulting warnings
...mostly false positives, but a number of very relevant ones too,
in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 09:41:13 +02:00
Stefano Brivio
dd942eaa48 passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:26:08 +02:00
Stefano Brivio
849308d207 Makefile, tcp: Don't try to use tcpi_snd_wnd from tcp_info on pre-5.3 kernels
Detect missing tcpi_snd_wnd in struct tcp_info at build time,
otherwise build fails with a pre-5.3 linux/tcp.h header.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 01:19:27 +02:00
Stefano Brivio
12cfa6444c passt: Add clang-tidy Makefile target and test, take care of warnings
Most are just about style and form, but a few were actually
serious mistakes (NDP-related).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:34:22 +02:00
Stefano Brivio
1a563a0cbd passt: Address gcc 11 warnings
A mix of unchecked return values, a missing permission mask for
open(2) with O_CREAT, and some false positives from
-Wstringop-overflow and -Wmaybe-uninitialized.

Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
087b5f4dbb LICENSES: Add license text files, add missing notices, fix SPDX tags
SPDX tags don't replace license files. Some notices were missing and
some tags were not according to the SPDX specification, too.

Now reuse --lint from the REUSE tool (https://reuse.software/) passes.

Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
f154a0489a Makefile: Install man pages to /usr/share/man instead of /usr/man
Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Stefano Brivio
2725003d45 Makefile: Prefix installation paths with $(DESTDIR)
Martin reports that DESTDIR is ignored in install/uninstall targets,
see also:
	https://www.gnu.org/prep/standards/html_node/DESTDIR.html

Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-19 09:42:08 +02:00
Stefano Brivio
2c7d1ce088 passt: Static builds: don't redefine __vsyslog(), skip getpwnam() and initgroups()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-16 16:53:40 +02:00
Stefano Brivio
66d5930ec7 passt, pasta: Add seccomp support
List of allowed syscalls comes from comments in the form:
	#syscalls <list>

for syscalls needed both in passt and pasta mode, and:
	#syscalls:pasta <list>
	#syscalls:passt <list>

for syscalls specifically needed in pasta or passt mode only.

seccomp.sh builds a list of BPF statements from those comments,
prefixed by a binary search tree to keep lookup fast.

While at it, clean up a bit the Makefile using wildcards.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:15:46 +02:00
Stefano Brivio
675174d4ba conf, tap: Split netlink and pasta functions, allow interface configuration
Move netlink routines to their own file, and use netlink to configure
or fetch all the information we need, except for the TUNSETIFF ioctl.

Move pasta-specific functions to their own file as well, add
parameters and calls to configure the tap interface in the namespace.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-14 13:15:12 +02:00
Giuseppe Scrivano
9a175cc2ce pasta: Allow specifying paths and names of namespaces
Based on a patch from Giuseppe Scrivano, this adds the ability to:

- specify paths and names of target namespaces to join, instead of
  a PID, also for user namespaces, with --userns

- request to join or create a network namespace only, without
  entering or creating a user namespace, with --netns-only

- specify the base directory for netns mountpoints, with --nsrun-dir

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
[sbrivio: reworked logic to actually join the given namespaces when
 they're not created, implemented --netns-only and --nsrun-dir,
 updated pasta demo script and man page]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-07 04:05:15 +02:00
Stefano Brivio
2dbed699e7 passt: Align pkt_buf to PAGE_SIZE (start and size), try to fit in huge pages
If transparent huge pages are available, madvise() will do the trick.

While at it, decrease EPOLL_EVENTS for the main loop from 10 to 8,
for slightly better socket fairness.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
b216df04a1 Makefile: Visually separate CFLAGS from input files in resulting cc commands
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
f29c48db6b Makefile: Make sure destination directories exist on install
Mostly theoretical, but convenient for testing.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-01 17:00:27 +02:00
Stefano Brivio
77c72b31ed Makefile: Quick hack to build convenience Debian and RPM packages
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-01 17:00:27 +02:00
Stefano Brivio
b9c6fca469 Makefile: Add install, uninstall targets
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-01 17:00:27 +02:00
Stefano Brivio
1e49d194d0 passt, pasta: Introduce command-line options and port re-mapping
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-01 17:00:27 +02:00
Stefano Brivio
17765f8de0 checksum: Introduce AVX2 implementation, unify helpers
Provide an AVX2-based function using compiler intrinsics for
TCP/IP-style checksums. The load/unpack/add idea and implementation
is largely based on code from BESS (the Berkeley Extensible Software
Switch) licensed as 3-Clause BSD, with a number of modifications to
further decrease pipeline stalls and to minimise cache pollution.

This speeds up considerably data paths from sockets to tap
interfaces, decreasing overhead for checksum computation, with
16-64KiB packet buffers, from approximately 11% to 7%. The rest is
just syscalls at this point.

While at it, provide convenience targets in the Makefile for avx2,
avx2_debug, and debug targets -- these simply add target-specific
CFLAGS to the build.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-26 07:18:50 +02:00
Stefano Brivio
33482d5bf2 passt: Add PASTA mode, major rework
PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host
connectivity to an otherwise disconnected, unprivileged network
and user namespace, similarly to slirp4netns. Given that the
implementation is largely overlapping with PASST, no separate binary
is built: 'pasta' (and 'passt4netns' for clarity) both link to
'passt', and the mode of operation is selected depending on how the
binary is invoked. Usage example:

	$ unshare -rUn
	# echo $$
	1871759

	$ ./pasta 1871759	# From another terminal

	# udhcpc -i pasta0 2>/dev/null
	# ping -c1 pasta.pizza
	PING pasta.pizza (64.190.62.111) 56(84) bytes of data.
	64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms

	--- pasta.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms
	# ping -c1 spaghetti.pizza
	PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes
	64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms

	--- spaghetti.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms

This entails a major rework, especially with regard to the storage of
tracked connections and to the semantics of epoll(7) references.

Indexing TCP and UDP bindings merely by socket proved to be
inflexible and unsuitable to handle different connection flows: pasta
also provides Layer-2 to Layer-2 socket mapping between init and a
separate namespace for local connections, using a pair of splice()
system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local
bindings. For instance, building on the previous example:

	# ip link set dev lo up
	# iperf3 -s

	$ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4
	[SUM]   0.00-10.00  sec  52.3 GBytes  44.9 Gbits/sec  283             sender
	[SUM]   0.00-10.43  sec  52.3 GBytes  43.1 Gbits/sec                  receiver

	iperf Done.

epoll(7) references now include a generic part in order to
demultiplex data to the relevant protocol handler, using 24
bits for the socket number, and an opaque portion reserved for
usage by the single protocol handlers, in order to track sockets
back to corresponding connections and bindings.

A number of fixes pertaining to TCP state machine and congestion
window handling are also included here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-17 11:04:22 +02:00
Stefano Brivio
17337a736f passt: Introduce packet capture implementation
With -DDEBUG, passt now saves guest-side traffic captures in
pcap format at /tmp/passt_<ISO8601 timestamp>.pcap. The timestamp
refers to time and date of start-up.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-21 11:14:48 +02:00
Stefano Brivio
6f89dc3650 qrap: Find qemu command if not passed, patch command line
It might be impractical to pass options to qrap when using libvirt,
because the <emulator/> tag expects a path to an executable, without
further arguments.

If the first argument is not a plausible socket number, and the
second argument is not a valid executable, look up a qemu command
from a list of possible names, then start it patching the command line
to include the -netdev fd= parameter corresponding to the AF_UNIX
domain socket we just opened.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-10 12:38:50 +02:00
Stefano Brivio
4aa8e54a30 passt: Introduce a DHCPv6 server
This implementation, similarly to the IPv4 DHCP one, hands out a
single address, which is the same as the upstream address for the
host.

This avoids the need for address translation as long as the client
runs a DHCPv6 client. The NDP "Managed" flag is now set in Router
Advertisements.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-13 22:37:40 +02:00
Stefano Brivio
1d807fc720 passt: Introduce ICMP echo proxy
It's nice to be able to confirm connectivity using ICMP or ICMPv6
echo requests, and "ping" sockets on Linux (IPPROTO_ICMP datagram)
allow us to do that without any special capability.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-03-18 12:58:03 +01:00
Stefano Brivio
a418946837 tcp: Add siphash implementation for initial sequence numbers
Implement siphash routines for initial TCP sequence numbers (12 bytes
input for IPv4, 36 bytes input for IPv6), and while at it, also
functions we'll use later on for hash table indices and TCP timestamp
offsets (with 8, 20, 32 bytes of input).

Use these to set the initial sequence number, according to RFC 6528,
for connections originating either from the tap device or from
sockets.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-03-17 10:57:36 +01:00
Stefano Brivio
105b916361 passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.

Conceptually, this follows the design presented at:
	https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md

The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.

Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.

While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
  because of explicit checks now introduced in qemu to ensure that
  the corresponding file descriptor is actually a tap device. For
  this reason, qrap now operates on a 'socket' back-end type,
  accounting for and building the additional header reporting
  frame length

- provide a demo script that sets up namespaces, addresses and
  routes, and starts the daemon. A virtual machine started in the
  network namespace, wrapped by qrap, will now directly interface
  with passt and communicate using Layer 4 sockets provided by the
  host kernel.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 09:28:55 +01:00
Stefano Brivio
d02e059ddc passt: Add IPv6 and NDP support, further fixes for IPv4 CT
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:58:05 +01:00
Stefano Brivio
6709ade2bd merd: Rename to PASST
Plug A Simple Socket Transport.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:58:01 +01:00
Stefano Brivio
b439984641 merd: ARP and DHCP handlers, connection tracking fixes
With this, merd provides a fully functional IPv4 environment to
guests, requiring a single capability, CAP_NET_RAW.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:57:57 +01:00
Stefano Brivio
fa2d20908d merd: Switch to AF_UNIX for qemu tap, provide wrapper
We can bypass a full-fledged network interface between qemu and merd by
connecting the qemu tap file descriptor to a provided UNIX domain
socket: this could be implemented in qemu eventually, qrap covers this
meanwhile.

This also avoids the need for the AF_PACKET socket towards the guest.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:57:51 +01:00
Stefano Brivio
cefcf0bc2c merd: Initial import
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:57:46 +01:00