Commit graph

36 commits

Author SHA1 Message Date
Stefano Brivio
099ace64ce treewide: Address cert-err33-c clang-tidy warnings for clock and timer functions
For clock_gettime(), we shouldn't ignore errors if they happen at
initialisation phase, because something is seriously wrong and it's
not helpful if we proceed as if nothing happened.

As we're up and running, though, it's probably better to report the
error and use a stale value than to terminate altogether. Make sure
we use a zero value if we don't have a stale one somewhere.

For timerfd_gettime() and timerfd_settime() failures, just report an
error, there isn't much else we can do.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-10-30 12:37:31 +01:00
Stefano Brivio
59fe34ee36 treewide: Suppress clang-tidy warning if we already use O_CLOEXEC
In pcap_init(), we should always open the packet capture file with
O_CLOEXEC, even if we're not running in foreground: O_CLOEXEC means
close-on-exec, not close-on-fork.

In logfile_init() and pidfile_open(), the fact that we pass a third
'mode' argument to open() seems to confuse the android-cloexec-open
checker in LLVM versions from 16 to 19 (at least).

The checker is suggesting to add O_CLOEXEC to 'mode', and not in
'flags', where we already have it.

Add a suppression for clang-tidy and a comment, and avoid repeating
those three times by adding a new helper, output_file_open().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-10-30 12:37:31 +01:00
Laurent Vivier
fd8334b25d pcap: Add an offset argument in pcap_iov()
The offset is passed directly to pcap_frame() and allows
any headers that are not part of the frame to
capture to be skipped.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-10-04 14:51:02 +02:00
David Gibson
bfc294b90d util: Add helper to write() all of a buffer
write(2) might not write all the data it is given.  Add a write_all_buf()
helper to keep calling it until all the given data is written, or we get an
error.

Currently we use write_remainder() to do this operation in pcap_frame().
That's a little awkward since it requires constructing an iovec, and future
changes we want to make to write_remainder() will be easier in terms of
this single buffer helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-09-18 17:14:59 +02:00
Stefano Brivio
dba7f0f5ce treewide: Replace strerror() calls
Now that we have logging functions embedding perror() functionality,
we can make _some_ calls more terse by using them. In many places,
the strerror() calls are still more convenient because, for example,
they are used in flow debugging functions, or because the return code
variable of interest is not 'errno'.

While at it, convert a few error messages from a scant perror style
to proper failure descriptions.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-06-21 15:32:44 +02:00
David Gibson
5566386f5f treewide: Standardise variable names for various packet lengths
At various points we need to track the lengths of a packet including or
excluding various different sets of headers.  We don't always use the same
variable names for doing so.  Worse in some places we use the same name
for different things: e.g. tcp_fill_headers[46]() use ip_len for the
length including the IP headers, but then tcp_send_flag() which calls it
uses it to mean the IP payload length only.

To improve clarity, standardise on these names:
   dlen:		L4 protocol payload length ("data length")
   l4len:		plen + length of L4 protocol header
   l3len:		l4len + length of IPv4/IPv6 header
   l2len:		l3len + length of L2 (ethernet) header

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-05-02 16:13:23 +02:00
Stefano Brivio
3fe9878db7 pcap: Use clock_gettime() instead of gettimeofday()
POSIX.1-2008 declared gettimeofday() as obsolete, but I'm a dinosaur.

Usually, C libraries translate that to the clock_gettime() system
call anyway, but this doesn't happen in Jon's environment, and,
there, seccomp happily kills pasta(1) when started with --pcap,
because we didn't add gettimeofday() to our seccomp profiles.

Use clock_gettime() instead.

Reported-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-03-14 08:18:36 +01:00
Laurent Vivier
94502fa15e pcap: add pcap_iov()
Introduce a new function pcap_iov() to capture packet desribed by an IO
vector.

Update pcap_frame() to manage iovcnt > 1.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-ID: <20240303135114.1023026-2-lvivier@redhat.com>
[dwg: Fixed trivial cppcheck regressions]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-06 08:03:30 +01:00
David Gibson
dda7945ca9 pcap: Handle short writes in pcap_frame()
Currently pcap_frame() assumes that if write() doesn't return an error, it
has written everything we want.  That's not necessarily true, because it
could return a short write.  That's not likely to happen on a regular file,
but there's not a lot of reason not to be robust here; it's conceivable we
might want to direct the pcap fd at a named pipe or similar.

So, make pcap_frame() handle short frames by using the write_remainder()
helper.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Formatting fix, and avoid gcc warning in pcap_frame()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 06:35:01 +01:00
David Gibson
24410b37a4 pcap: Update pcap_frame() to take an iovec and offset
Update the low-level helper pcap_frame() to take a struct iovec and
offset within it, rather than an explicit pointer and length for the
frame.  This moves the handling of an offset (to skip vnet_len) from
pcap_multiple() to pcap_frame().

This doesn't accomplish a great deal immediately, but will make
subsequent changes easier.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-02-29 06:24:10 +01:00
Stefano Brivio
06559048e7 treewide: Use 'z' length modifier for size_t/ssize_t conversions
Types size_t and ssize_t are not necessarily long, it depends on the
architecture.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-12-02 03:54:42 +01:00
Stefano Brivio
ca2749e1bd passt: Relicense to GPL 2.0, or any later version
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.

Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.

Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-04-06 18:00:33 +02:00
David Gibson
54502cca7f udp: Use tap_send_frames()
To send frames on the tap interface, the UDP uses a fairly complicated two
level batching.  First multiple frames are gathered into a single "message"
for the qemu stream socket, then multiple messages are send with
sendmmsg().  We now have tap_send_frames() which already deals with sending
a number of frames, including batching and handling partial sends.  Use
that to considerably simplify things.

This does make a couple of behavioural changes:
  * We used to split messages to keep them under 32kiB (except when a
    single frame was longer than that).  The comments claim this is
    needed to stop qemu from closing the connection, but we don't have any
    equivalent logic for TCP.  I wasn't able to reproduce the problem with
    this series, although it was apparently easy to reproduce earlier.

    My suspicion is that there was never an inherent need to keep messages
    small, however with larger messages (and default kernel buffer sizes)
    the chances of needing more than one resend for partial send()s is
    greatly increased.  We used not to correctly handle that case of
    multiple resends, but now we do.

  * Previously when we got a partial send on UDP, we would resend the
    remainder of the entire "message", including multiple frames.  The
    common code now only resends the remainder of a single frame, simply
    dropping any frames which weren't even partially sent.  This is what
    TCP always did and is probably a better idea for UDP too.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:55:04 +01:00
David Gibson
d3089eb0ea pcap: Replace pcapm() with pcap_multiple()
pcapm() captures multiple frames from a msghdr, however the only thing it
cares about in the msghdr is the list of buffers, where it assumes there is
one frame to capture per buffer.  That's what we want for its single caller
but it's not the only obvious choice here (one frame per msghdr would
arguably make more sense in isolation).  In addition pcapm() has logic
that only makes sense in the context of the passt specific path its called
from: it skips the first 4 bytes of each buffer, because those have the
qemu vnet_len rather than the frame proper.

Make this clearer by replacing pcapm() with pcap_multiple() which more
explicitly takes one struct iovec per frame, and parameterizes how much of
each buffer to skip (i.e. the offset of the frame within the buffer).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:54:27 +01:00
David Gibson
cb3c1ce307 pcap: Introduce pcap_frame() helper
pcap(), pcapm() and pcapmm() duplicate some code, for the actual writing
to the capture file.  The purpose of pcapm() and pcapmm() not calling
pcap() seems to be to avoid repeatedly calling gettimeofday() and to avoid
printing errors for every packet in a batch if there's a problem.  We can
accomplish that while still sharing code by adding a new helper which
takes the packet timestamp as a parameter.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-01-23 18:48:51 +01:00
Stefano Brivio
da152331cf Move logging functions to a new file, log.c
Logging to file is going to add some further complexity that we don't
want to squeeze into util.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:25 +02:00
David Gibson
bf95322fc1 conf: Make the argument to --pcap option mandatory
The --pcap or -p option can be used with or without an argument.  If given,
the argument gives the name of the file to save a packet trace to.  If
omitted, we generate a default name in /tmp.

Generating the default name isn't particularly useful though, since making
a suitable name can easily be done by the caller.  Remove this feature.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-08-30 19:17:57 +02:00
Stefano Brivio
dbd0a7035c treewide: Invalid type in argument to printf format specifier, CWE-686
Harmless except for two bad debugging prints.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-04-05 18:47:04 +02:00
Stefano Brivio
62c3edd957 treewide: Fix android-cloexec-* clang-tidy warnings, re-enable checks
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
48582bf47f treewide: Mark constant references as const
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
5c0426981e pcap: Fix mistake in printed string
Packets are saved *to* a file, not *at* it.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-25 13:21:13 +01:00
Stefano Brivio
0515adceaa passt, pasta: Namespace-based sandboxing, defer seccomp policy application
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.

While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.

With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.

While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.

We have to open all the file handles we'll ever need before
sandboxing:

- the packet capture file can only be opened once, drop instance
  numbers from the default path and use the (pre-sandbox) PID instead

- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
  of bound ports in pasta mode, are now opened only once, before
  sandboxing, and their handles are stored in the execution context

- the UNIX domain socket for passt is also bound only once, before
  sandboxing: to reject clients after the first one, instead of
  closing the listening socket, keep it open, accept and immediately
  discard new connection if we already have a valid one

Clarify the (unchanged) behaviour for --netns-only in the man page.

To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.

For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.

We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.

Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
b93c2c1713 passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions
This is the only remaining Linux-specific include -- drop it to avoid
clang-tidy warnings and to make code more portable.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 07:57:09 +01:00
Stefano Brivio
000ae818d4 pcap: Fix failure check on write() in pcapm()
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 17:36:05 +02:00
Stefano Brivio
627e18fa8a passt: Add cppcheck target, test, and address resulting warnings
...mostly false positives, but a number of very relevant ones too,
in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 09:41:13 +02:00
Stefano Brivio
dd942eaa48 passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:26:08 +02:00
Stefano Brivio
1a563a0cbd passt: Address gcc 11 warnings
A mix of unchecked return values, a missing permission mask for
open(2) with O_CREAT, and some false positives from
-Wstringop-overflow and -Wmaybe-uninitialized.

Reported-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:29:30 +02:00
Giuseppe Scrivano
9a175cc2ce pasta: Allow specifying paths and names of namespaces
Based on a patch from Giuseppe Scrivano, this adds the ability to:

- specify paths and names of target namespaces to join, instead of
  a PID, also for user namespaces, with --userns

- request to join or create a network namespace only, without
  entering or creating a user namespace, with --netns-only

- specify the base directory for netns mountpoints, with --nsrun-dir

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
[sbrivio: reworked logic to actually join the given namespaces when
 they're not created, implemented --netns-only and --nsrun-dir,
 updated pasta demo script and man page]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-07 04:05:15 +02:00
Stefano Brivio
a491c8ee3c pcap: Drop O_DSYNC from pcap file descriptor
passt is stable enough, and dropping O_DSYNC makes reduces the impact
of capturing packets on timing, while running tests.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
11cc43feec pcap: Don't make pcap files world-readable
Even if it's just a debugging feature, it's not nice to leak packets
to everybody around.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
e4b94da0af pcap: Don't reinitialise packet capture if we already have one
If the guest disconnects, and a given name (without timestamp) for
the pcap file is passed, we would otherwise lose the packets
captured until that point.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
1e49d194d0 passt, pasta: Introduce command-line options and port re-mapping
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-01 17:00:27 +02:00
Stefano Brivio
64a0ba3b27 udp: Introduce recvmmsg()/sendmmsg(), zero-copy path from socket
Packets are received directly onto pre-cooked, static buffers
for IPv4 (with partial checksum pre-calculation) and IPv6 frames,
with pre-filled Ethernet addresses and, partially, IP headers,
and sent out from the same buffers with sendmmsg(), for both
passt and pasta (non-local traffic only) modes.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-21 12:01:04 +02:00
Stefano Brivio
33482d5bf2 passt: Add PASTA mode, major rework
PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host
connectivity to an otherwise disconnected, unprivileged network
and user namespace, similarly to slirp4netns. Given that the
implementation is largely overlapping with PASST, no separate binary
is built: 'pasta' (and 'passt4netns' for clarity) both link to
'passt', and the mode of operation is selected depending on how the
binary is invoked. Usage example:

	$ unshare -rUn
	# echo $$
	1871759

	$ ./pasta 1871759	# From another terminal

	# udhcpc -i pasta0 2>/dev/null
	# ping -c1 pasta.pizza
	PING pasta.pizza (64.190.62.111) 56(84) bytes of data.
	64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms

	--- pasta.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms
	# ping -c1 spaghetti.pizza
	PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes
	64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms

	--- spaghetti.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms

This entails a major rework, especially with regard to the storage of
tracked connections and to the semantics of epoll(7) references.

Indexing TCP and UDP bindings merely by socket proved to be
inflexible and unsuitable to handle different connection flows: pasta
also provides Layer-2 to Layer-2 socket mapping between init and a
separate namespace for local connections, using a pair of splice()
system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local
bindings. For instance, building on the previous example:

	# ip link set dev lo up
	# iperf3 -s

	$ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4
	[SUM]   0.00-10.00  sec  52.3 GBytes  44.9 Gbits/sec  283             sender
	[SUM]   0.00-10.43  sec  52.3 GBytes  43.1 Gbits/sec                  receiver

	iperf Done.

epoll(7) references now include a generic part in order to
demultiplex data to the relevant protocol handler, using 24
bits for the socket number, and an opaque portion reserved for
usage by the single protocol handlers, in order to track sockets
back to corresponding connections and bindings.

A number of fixes pertaining to TCP state machine and congestion
window handling are also included here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-17 11:04:22 +02:00
Stefano Brivio
19d254bbbb passt: Add support for multiple instances in different network namespaces
...sharing the same filesystem. Instead of a fixed path for the UNIX
domain socket, passt now uses a path with a counter, probing for
existing instances, and picking the first free one.

The demo script is updated accordingly -- it can now be started several
times to create multiple namespaces with an instance of passt each,
with addressing reflecting separate subnets, and NDP proxying between
them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-21 11:14:51 +02:00
Stefano Brivio
17337a736f passt: Introduce packet capture implementation
With -DDEBUG, passt now saves guest-side traffic captures in
pcap format at /tmp/passt_<ISO8601 timestamp>.pcap. The timestamp
refers to time and date of start-up.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-21 11:14:48 +02:00