Introduce the equivalent of the --api-socket option from slirp4netns:
spawn a subshell to handle requests, netcat binds to a UNIX domain
socket and jq parses messages.
Three minor differences compared to slirp4netns:
- IPv6 ports are forwarded too
- error messages are not as specific, for example we don't tell
apart malformed JSON requests from invalid parameters
- host addresses are always 0.0.0.0 and ::1, pasta doesn't bind on
specific addresses for different ports
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Nobody currently calls this as passt4netns, that was the name I used
before 'pasta', drop any reference before it's too late.
While at it, explicitly check that argc is bigger than or equal to
one, just as a defensive measure: argv[0] being NULL is not an issue
anyway.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Two effects:
- ptrace() on passt and pasta can only be done by root, so that even
if somebody gains access to the same user, they won't be able to
check data passed in syscalls anyway. No core dumps allowed either
- /proc/PID files are owned by root:root, and they can't be read by
the same user as the one passt or pasta are running with
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.
While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.
With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.
While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.
We have to open all the file handles we'll ever need before
sandboxing:
- the packet capture file can only be opened once, drop instance
numbers from the default path and use the (pre-sandbox) PID instead
- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
of bound ports in pasta mode, are now opened only once, before
sandboxing, and their handles are stored in the execution context
- the UNIX domain socket for passt is also bound only once, before
sandboxing: to reject clients after the first one, instead of
closing the listening socket, keep it open, accept and immediately
discard new connection if we already have a valid one
Clarify the (unchanged) behaviour for --netns-only in the man page.
To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.
For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.
We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.
Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...otherwise, we don't terminate pasta on regular exit, i.e.
on a read from the "exit" file descriptor.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
With a recent 5.15 kernel, passing a huge window size to iperf3 with
lower MTUs makes iperf3 stop sending packets after a few seconds --
I haven't investigated this in detail, but the window size will be
adjusted dynamically anyway and not passing it doesn't actually
affect throughput, so simply drop the option.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Oops. If *word & BITMAP_BIT(bit) is bigger than an int (which is the
case for half of the possible bits of a bitmap on 64-bit archs), we'll
return that as an int, that is, zero, even if the bit at hand is set.
Just return zero or one there, no callers are interested in the actual
bitmap as return value.
Issue found as pasta wouldn't automatically detect some bound ports.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Some recent change to xenial-updates broke dependencies for gcc,
it can't be installed anymore. Skipping apt-get update leaves gcc
dependencies in a consistent state, though.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The shell might report 'nc -6 -l -p 9999 > /tmp/ns_msg' as done
even after the subsequent 'echo' is done: wait one second before
reading out /tmp/ns_msg, to ensure we read that instead of the
"Done" message.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
clang-tidy from LLVM 13.0.1 reports some new warnings from these
checkers:
- altera-unroll-loops, altera-id-dependent-backward-branch: ignore
for the moment being, add a TODO item
- bugprone-easily-swappable-parameters: ignore, nothing to do about
those
- readability-function-cognitive-complexity: ignore for the moment
being, add a TODO item
- altera-struct-pack-align: ignore, alignment is forced in protocol
headers
- concurrency-mt-unsafe: ignore for the moment being, add a TODO
item
Fix bugprone-implicit-widening-of-multiplication-result warnings,
though, that's doable and they seem to make sense.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
passt can be used to implement user-mode networking for the Kata
Containers runtime, so that networking setup doesn't need elevated
privileges or capabilities.
This commit adds the patch for Kata Containers runtime and agent
to support passt as networking model and endpoint, and some basic
documentation.
See contrib/kata-containers/README.md for more details and setup
steps.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I'm about to add a new adaptation carrying out-of-tree patches
for a Kata Containers PoC -- move the existing out-of-tree patches
to their own directory to keep things easy to find in the main one.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The existing behaviour is not really practical: an automated agent in
charge of starting both qemu and passt would need to fork itself to
start passt, because passt won't fork to background until qemu
connects, and the agent needs to unblock to start qemu.
Instead of waiting for a connection to daemonise, do it right away as
soon as a socket is available: that can be considered an initialised
state already.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The new tests check build and a simple case with pasta sending a
short message in both directions (namespace to init, init to
namespace).
Tests cover a mix of Debian, Fedora, OpenSUSE and Ubuntu combinations
on aarch64, i386, ppc64, ppc64le, s390x, x86_64.
Builds tested starting from approximately glibc 2.19, gcc 4.7, and
actual functionality approximately from 4.4 kernels, glibc 2.25,
gcc 4.8, all the way up to current glibc/gcc/kernel versions.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For distribution tests, we'll repeat some tests frequently. Add a
'def' directive that starts a block, ended by 'endef', whose
execution can then be triggered by simply giving its name as a
directive itself.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We might have highlighting and slightly different prompts across
different distributions, allow a more reasonable set of prompt
strings to be accepted as prompts.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The throughput results in this test look quite variable, slightly
lower figures look reasonable anyway.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Debug information might be printed after a prompt is seen,
just wait those 3 seconds and be done with it.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Depending on the C library, but not necessarily in all the
functions we use, statx() might be used instead of stat(),
getdents() instead of getdents64(), readlinkat() instead of
readlink(), openat() instead of open().
On aarch64, it's clone() and not fork(), and dup3() instead of
dup2() -- just allow the existing alternative instead of dealing
with per-arch selections.
Since glibc commit 9a7565403758 ("posix: Consolidate fork
implementation"), we need to allow set_robust_list() for
fork()/clone(), even in a single-threaded context.
On some architectures, epoll_pwait() is provided instead of
epoll_wait(), but never both. Same with newfstat() and
fstat(), sigreturn() and rt_sigreturn(), getdents64() and
getdents(), readlink() and readlinkat(), unlink() and
unlinkat(), whereas pipe() might not be available, but
pipe2() always is, exclusively or not.
Seen on Fedora 34: newfstatat() is used on top of fstat().
syslog() is an actual system call on some glibc/arch combinations,
instead of a connect()/send() implementation.
On ppc64 and ppc64le, _llseek(), recv(), send() and getuid()
are used. For ppc64 only: ugetrlimit() for the getrlimit()
implementation, plus sigreturn() and fcntl64().
On s390x, additionally, we need to allow socketcall() (on top
of socket()), and sigreturn() also for passt (not just for
pasta).
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Seen on PPC with some older kernel versions: we seemingly have bytes
left to read from the returned array of dirent structs, but d_reclen
is zero: this, and all the subsequent entries, are not valid.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The effect of this typo became visible in an IPv6-only environment,
where passt wouldn't work at all.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Bitmap manipulating functions would otherwise refer to inconsistent
sets of bits on big-endian architectures. While at it, fix up a
couple of casts.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcpi_bytes_acked and tcpi_min_rtt are only available on recent
kernel versions: provide fall-back paths (incurring some grade of
performance penalty).
Support for getrandom() was introduced in Linux 3.17 and glibc 2.25:
provide an alternate mechanism for that as well, reading from
/dev/random.
Also check if NETLINK_GET_STRICT_CHK is defined before using it:
it's not strictly needed, we'll filter out irrelevant results from
netlink anyway.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Some C library functions are commonly implemented by different syscalls
on different architectures. Add a mechanism to allow selected syscalls
for a single architecture, syntax in #syscalls comment is:
#syscalls <arch>:<name>
e.g. s390x:socketcall, given that socketcall() is commonly used there
instead of socket().
This is now implemented by a compiler probe for syscall numbers,
auditd tools (ausyscall) are not required anymore as a result.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
On some distributions, on ppc64, ulimit -s returns 'unlimited': add a
reasonable default, and also make sure ulimit is invoked using the
default shell, which should ensure ulimit is actually implemented.
Also note that AUDIT_ARCH doesn't follow closely the naming reported
by 'uname -m': convert for i386 and ppc as needed.
While at it, move inclusion of seccomp.h after util.h, the former is
less generic (cosmetic/clang-tidy only).
Older kernel headers might lack a definition for AUDIT_ARCH_PPC64LE:
define that explicitly if it's not available.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This is the only remaining Linux-specific include -- drop it to avoid
clang-tidy warnings and to make code more portable.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Based on an original patch by Giuseppe Scrivano: there's no need
to pass $0 to usage, drop that everywhere, and make it consistent.
Don't exit with error on -h, --help.
Suggested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...it looks like, on a recent Fedora installation, daemon() uses it.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...I broke this while playing with clang-tidy, and didn't add
tests for pasta's --config-net yet.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This is actually annoying: there's no way to make it fork into
background when running from a script. However, it's always
possible to keep it in foreground with -f. Make it simpler, and
always fork into background if -f is not given.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
An inline comment prefixed by a space doesn't mean the space
is dropped, and sleep(1) will get a blank in its argument.
Move the comment on its own line.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
They'll start DAD as we bring up the interface, and the DHCPv6
client might be unreasonably delayed if we start it too early.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>