Upload gzipped js snippets for usage with gzip_static in nginx or
equivalent. Convert png drawings to webp for smaller size, use them
as alternatives in README.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Build-time selection of AVX2 flags and routines is not practical for
distributions, but limiting AVX2 usage to checksum routines with
specific run-time detection doesn't allow for easy performance gains
from auto-vectorisation of batched packet handling routines.
For x86_64, build non-AVX2 and AVX2 binaries, and implement a simple
wrapper replacing the current executable with the AVX2 build if it's
available, and if AVX2 is supported by the current CPU.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
It looks like glibc commonly implements clock_gettime(2) with
clock_gettime64(), and uses recv() instead of recvfrom(), send()
instead of sendto(), and sigreturn() instead of rt_sigreturn() on
armv6l and armv7l.
Adjust the list of system calls for armv6l and armv7l accordingly.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
A kernel might not be configured with CONFIG_TRANSPARENT_HUGEPAGE,
especially on embedded systems. Ignore the error, it doesn't affect
functionality.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
There's a single AUDIT_ARCH_ARM define available (and big-endian
shouldn't be a concern with those).
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...it doesn't actually matter as we're checking errno at the very
end, but, depending on build flags, chdir() might be declared with
warn_unused_result and the compiler issues a warning.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If we statically configure a default route, and also advertise it for
SLAAC, the kernel will try moments later to add the same route:
ICMPv6: RA: ndisc_router_discovery failed to add default route
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When nl_link() configures the MTU, it shouldn't send extra bytes,
otherwise we'll get a kernel warning:
netlink: 4 bytes leftover after parsing attributes in process `pasta'.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For demos, cool-retro-term(1) looked fancier, but several threads of
that and ffmpeg(1) are just messing up with performance testing.
The CI videos started getting really big as well, and they were
difficult to read.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...showing setup steps, some peculiarities as --net option, and a
general side-to-side comparison with slirp4netns(1), including
"quick" TCP and UDP throughput and latency benchmarks.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The patch introduces a "pasta" networking mode for rootless
container, similar to the existing slirp4netns mode. Notable
differences are described in the commit message.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
That test fails sometimes, it looks like iperf3 is still sending
initial messages that are too big. I'll need to figure out why,
but given that 256 bytes is not really an expected MTU, drop the
thresholds to zero for the moment being.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Removing the needrestart package doesn't seem to work anymore, and
I'm getting again prompts to restart services after installing gcc
and make: export DEBIAN_FRONTEND=noninteractive before installing
packages to avoid that.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This should be convenient for users managing filesystem-bound network
namespaces: monitor the base directory of the namespace and exit if
the namespace given as PATH or NAME target is deleted. We can't add
an inotify watch directly on the namespace directory, that won't work
with nsfs.
Add an option to disable this behaviour, --no-netns-quit.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In pasta mode, when we get data from sockets and write it as single
frames to the tap device, we batch receive operations considerably,
and then (conceptually) split the data in many smaller writes.
It looked like an obvious choice, but performance is actually better
if we receive data in many small frame-sized recvmsg()/recvmmsg().
The syscall overhead with the previous behaviour, observed by perf,
comes predominantly from write operations, but receiving data in
shorter chunks probably improves cache locality by a considerable
amount.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Likely for testing purposes only: allow connections from host to
guest or namespace using, as connection target, the configured,
possibly global unicast address.
In this case, we have to map the destination address to a link-local
address, and for port-based tracked responses, the source address
needs to be again the unicast address: not loopback, not link-local.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For compatibility with libslirp/slirp4netns users: introduce a
mechanism to map, in the UDP routines, an address facing guest or
namespace to the first IPv4 or IPv6 address resulting from
configuration as resolver. This can be enabled with the new
--dns-forward option.
This implies that sourcing and using DNS addresses and search lists,
passed via command line or read from /etc/resolv.conf, is not bound
anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just
want to use addresses from /etc/resolv.conf as mapping target, while
not passing DNS options via DHCP.
Reflect this in all the involved code paths by differentiating
DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new
options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns,
--no-dhcp-search for passt.
This should be the last bit to enable substantial compatibility
between slirp4netns.sh and slirp4netns(1): pass the --dns-forward
option from the script too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Provide a sane default, instead of /0, if an address is given, and it
doesn't correspond to any host address we could find via netlink.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Introduce the equivalent of the --api-socket option from slirp4netns:
spawn a subshell to handle requests, netcat binds to a UNIX domain
socket and jq parses messages.
Three minor differences compared to slirp4netns:
- IPv6 ports are forwarded too
- error messages are not as specific, for example we don't tell
apart malformed JSON requests from invalid parameters
- host addresses are always 0.0.0.0 and ::1, pasta doesn't bind on
specific addresses for different ports
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Nobody currently calls this as passt4netns, that was the name I used
before 'pasta', drop any reference before it's too late.
While at it, explicitly check that argc is bigger than or equal to
one, just as a defensive measure: argv[0] being NULL is not an issue
anyway.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Two effects:
- ptrace() on passt and pasta can only be done by root, so that even
if somebody gains access to the same user, they won't be able to
check data passed in syscalls anyway. No core dumps allowed either
- /proc/PID files are owned by root:root, and they can't be read by
the same user as the one passt or pasta are running with
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
To reach (at least) a conceptually equivalent security level as
implemented by --enable-sandbox in slirp4netns, we need to create a
new mount namespace and pivot_root() into a new (empty) mountpoint, so
that passt and pasta can't access any filesystem resource after
initialisation.
While at it, also detach IPC, PID (only for passt, to prevent
vulnerabilities based on the knowledge of a target PID), and UTS
namespaces.
With this approach, if we apply the seccomp filters right after the
configuration step, the number of allowed syscalls grows further. To
prevent this, defer the application of seccomp policies after the
initialisation phase, before the main loop, that's where we expect bad
things to happen, potentially. This way, we get back to 22 allowed
syscalls for passt and 34 for pasta, on x86_64.
While at it, move #syscalls notes to specific code paths wherever it
conceptually makes sense.
We have to open all the file handles we'll ever need before
sandboxing:
- the packet capture file can only be opened once, drop instance
numbers from the default path and use the (pre-sandbox) PID instead
- /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection
of bound ports in pasta mode, are now opened only once, before
sandboxing, and their handles are stored in the execution context
- the UNIX domain socket for passt is also bound only once, before
sandboxing: to reject clients after the first one, instead of
closing the listening socket, keep it open, accept and immediately
discard new connection if we already have a valid one
Clarify the (unchanged) behaviour for --netns-only in the man page.
To actually make passt and pasta processes run in a separate PID
namespace, we need to unshare(CLONE_NEWPID) before forking to
background (if configured to do so). Introduce a small daemon()
implementation, __daemon(), that additionally saves the PID file
before forking. While running in foreground, the process itself can't
move to a new PID namespace (a process can't change the notion of its
own PID): mention that in the man page.
For some reason, fork() in a detached PID namespace causes SIGTERM
and SIGQUIT to be ignored, even if the handler is still reported as
SIG_DFL: add a signal handler that just exits.
We can now drop most of the pasta_child_handler() implementation,
that took care of terminating all processes running in the same
namespace, if pasta started a shell: the shell itself is now the
init process in that namespace, and all children will terminate
once the init process exits.
Issuing 'echo $$' in a detached PID namespace won't return the
actual namespace PID as seen from the init namespace: adapt
demo and test setup scripts to reflect that.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...otherwise, we don't terminate pasta on regular exit, i.e.
on a read from the "exit" file descriptor.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
With a recent 5.15 kernel, passing a huge window size to iperf3 with
lower MTUs makes iperf3 stop sending packets after a few seconds --
I haven't investigated this in detail, but the window size will be
adjusted dynamically anyway and not passing it doesn't actually
affect throughput, so simply drop the option.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Oops. If *word & BITMAP_BIT(bit) is bigger than an int (which is the
case for half of the possible bits of a bitmap on 64-bit archs), we'll
return that as an int, that is, zero, even if the bit at hand is set.
Just return zero or one there, no callers are interested in the actual
bitmap as return value.
Issue found as pasta wouldn't automatically detect some bound ports.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Some recent change to xenial-updates broke dependencies for gcc,
it can't be installed anymore. Skipping apt-get update leaves gcc
dependencies in a consistent state, though.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>