Commit graph

1504 commits

Author SHA1 Message Date
Stefano Brivio
f3cd0f9e45 contrib: Drop Podman out-of-tree patch, integration is upstream now
See https://github.com/containers/podman/pull/16141, shipped in
Podman 4.4.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
d7272f1df8 tcp: Clamp MSS value when queueing data to tap, also for pasta
Tom reports that a pattern of repated ~1 MiB chunks downloads over
NNTP over TLS, on Podman 4.4 using pasta as network back-end, results
in pasta taking one full CPU thread after a while, and the download
never succeeds.

On that setup, we end up re-sending the same frame over and over,
with a consistent 65 534 bytes size, and never get an
acknowledgement from the tap-side client. This only happens for the
default MTU value (65 520 bytes) or for values that are slightly
smaller than that (down to 64 499 bytes).

We hit this condition because the MSS value we use in
tcp_data_from_sock(), only in pasta mode, is simply clamped to
USHRT_MAX, and not to the actual size of the buffers we pre-cooked
for sending, which is a bit less than that.

It looks like we got away with it until commit 0fb7b2b908 ("tap:
Use different io vector bases depending on tap type") fixed the
setting of iov_len.

Luckily, since it's pasta, we're queueing up to two frames at a time,
so the worst that can happen is a badly segmented TCP stream: we
always have some space at the tail of the buffer.

Clamp the MSS value to the appropriate maximum given by struct
tcp{4,6}_buf_data_t, no matter if we're running in pasta or passt
mode.

While at it, fix the comments to those structs to reflect the current
struct size. This is not really relevant for any further calculation
or consideration, but it's convenient to know while debugging this
kind of issues.

Thanks to Tom for reporting the issue in a very detailed way and for
providing a test setup.

Reported-by: Tom Mombourquette <tom@devnode.com>
Link: https://github.com/containers/podman/issues/17703
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
bb2b67cb35 conf: Terminate on EMFILE or ENFILE on sockets for port mapping
In general, we don't terminate or report failures if we fail to bind
to some ports out of a given port range specifier, to allow users to
conveniently specify big port ranges (or "all") without having to
care about ports that might already be in use.

However, running out of the open file descriptors quota is a
different story: we can't do what the user requested in a very
substantial way.

For example, if the user specifies '-t all' and we can only bind
1024 sockets, the behaviour is rather unexpected.

Fail whenever socket creation returns -ENFILE or -EMFILE.

Link: https://bugs.passt.top/show_bug.cgi?id=27
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
5aea2f88ab tcp, udp: Fix partial success return codes in {tcp,udp}_sock_init()
The comments say we should return 0 on partial success, and an error
code on complete failure. Rationale: if the user configures a port
forwarding, and we succeed to bind that port for IPv4 or IPv6 only,
that might actually be what the user intended.

Adjust the two functions to reflect the comments.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
73992c42ce tcp, udp, util: Pass socket creation errors all the way up
...starting from sock_l4(), pass negative error (errno) codes instead
of -1. They will only be used in two commits from now, no functional
changes intended here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
50687616e4 util: Carry own definition of __bswap_constant{16,32}
musl doesn't define those, use our own definition there. This is a
trivial implementation, similar to what's shipped by e.g. uClibc,
glibc, libiio.

Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Chris Kuhn
89e38f5540 treewide: Fix header includes to build with musl
Roughly inspired from a patch by Chris Kuhn: fix up includes so that
we can build against musl: glibc is more lenient as headers generally
include a larger amount of other headers.

Compared to the original patch, I only included what was needed
directly in C files, instead of adding blanket includes in local
header files. It's a bit more involved, but more consistent with the
current (not ideal) situation.

Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Chris Kuhn
5c58feab7b conf, passt: Rename stderr to force_stderr
While building against musl, gcc informs us that 'stderr' is a
protected keyword. This probably comes from a #define stderr (stderr)
in musl's stdio.h, to avoid a clash with extern FILE *const stderr,
but I didn't really track it down. Just rename it to force_stderr, it
makes more sense.

[sbrivio: Added commit message]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
fde8004ab0 netlink: Use 8 KiB * netlink message header size as response buffer
...instead of BUFSIZ. On musl, BUFSIZ is 1024, so we'll typically
truncate the response to the request we send in nl_link(). It's
usually 8192 or more with glibc.

There doesn't seem to be any macro defining the rtnetlink maximum
message size, and iproute2 just hardcodes 1024 * 1024 for the receive
buffer, but the example in netlink(7) makes somewhat sense, looking
at the kernel implementation.

It's not very clean, but we're very unlikely to hit that limit,
and if we do, we'll find out painlessly, because NLA_OK() will tell
us right away.

Reported-by: Chris Kuhn <kuhnchris+passt@kuhnchris.eu>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:44:21 +01:00
Stefano Brivio
a9c59dd91b conf, icmp, tcp, udp: Add options to bind to outbound address and interface
I didn't notice earlier: libslirp (and slirp4netns) supports binding
outbound sockets to specific IPv4 and IPv6 addresses, to force the
source addresse selection. If we want to claim feature parity, we
should implement that as well.

Further, Podman supports specifying outbound interfaces as well, but
this is simply done by resolving the primary address for an interface
when the network back-end is started. However, since kernel version
5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for
non-root users"), we can actually bind to a specific interface name,
which doesn't need to be validated in advance.

Implement -o / --outbound ADDR to bind to IPv4 and IPv6 addresses,
and --outbound-if4 and --outbound-if6 to bind IPv4 and IPv6 sockets
to given interfaces.

Given that it probably makes little sense to select addresses and
routes from interfaces different than the ones given for outbound
sockets, also assign those as "template" interfaces, by default,
unless explicitly overridden by '-i'.

For ICMP and UDP, we call sock_l4() to open outbound sockets, as we
already needed to bind to given ports or echo identifiers, and we
can bind() a socket only once: there, pass address (if any) and
interface (if any) for the existing bind() and setsockopt() calls.

For TCP, in general, we wouldn't otherwise bind sockets. Add a
specific helper to do that.

For UDP outbound sockets, we need to know if the final destination
of the socket is a loopback address, before we decide whether it
makes sense to bind the socket at all: move the block mangling the
address destination before the creation of the socket in the IPv4
path. This was already the case for the IPv6 path.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 03:43:59 +01:00
Stefano Brivio
70148ce5be conf, passt.h: Rename "outbound" interface to "template" interface
In preparation for the next patch, make it clear that the first
routable interface fetched via netlink, or the one configured via
-i/--interface, is simply used as template to copy addresses and
routes, not an interface we actually use to derive the source address
(which will be _bound to_) for outgoing packets.

The man page and usage message appear to be already clear enough.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-03-09 00:36:08 +01:00
Stefano Brivio
d361fe6e80 contrib/selinux: Let interface users set paths for log, PID, socket files
Even libvirt itself will configure passt to write log, PID and socket
files to different locations depending on whether the domain is
started as root (/var/log/libvirt/...) or as a regular user
(/var/log/<PID>/libvirt/...), and user_tmp_t would only cover the
latter.

Create interfaces for log and PID files, so that callers can specify
different file contexts for those, and modify the interface for the
UNIX socket file to allow different paths as well.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Laine Stump <laine@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2023-03-09 00:36:08 +01:00
Stefano Brivio
de9b0cb5fe contrib/selinux: Allow binding and connecting to all UDP and TCP ports
Laine reports that with a simple:

      <portForward proto='tcp'>
        <range start='2022' to='22'/>
      </portForward>

in libvirt's domain XML, passt won't start as it fails to bind
arbitrary ports. That was actually the intention behind passt_port_t:
the user or system administrator should have explicitly configured
allowed ports on a given machine. But it's probably not realistic, so
just allow any port to be bound and forwarded.

Also fix up some missing operations on sockets.

Reported-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Laine Stump <laine@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2023-03-09 00:36:08 +01:00
Stefano Brivio
41bc669866 contrib/selinux: Let passt write to stdout and stderr when it starts
Otherwise, it's unusable as stand-alone tool, or in foreground mode,
and it's also impossible to get output from --help or --version,
because for SELinux it's just a daemon.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Laine Stump <laine@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2023-03-09 00:36:08 +01:00
Stefano Brivio
009af75e45 contrib/selinux: Drop duplicate init_daemon_domain() rule
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Laine Stump <laine@redhat.com>
Reviewed-by: Laine Stump <laine@redhat.com>
2023-03-09 00:36:08 +01:00
Stefano Brivio
83236216c4 udp: Fix signedness warning on 32-bits architectures
When a ssize_t is an int:

udp.c: In function ‘udp_sock_handler’:
udp.c:774:23: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘ssize_t’ {aka ‘int’} [-Wsign-compare]
  774 |         for (i = 0; i < n; i += m) {
      |                       ^
udp.c:781:43: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘ssize_t’ {aka ‘int’} [-Wsign-compare]
  781 |                         for (m = 1; i + m < n; m++) {
      |

Change 'i' and 'm' counters in udp_sock_handler() to signed versions,
to match ssize_t n.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 00:36:06 +01:00
Stefano Brivio
f6b6b66a88 Makefile: Fix SuperH 4 builds: it's AUDIT_ARCH_SH, not AUDIT_ARCH_SH4
sh4 builds currently fail because of this:
  https://buildd.debian.org/status/fetch.php?pkg=passt&arch=sh4&ver=0.0%7Egit20230216.4663ccc-1&stamp=1677091350&raw=0

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-03-09 00:36:06 +01:00
Stefano Brivio
0d8c114aa2 Makefile, seccomp.sh: Fix cross-builds, adjust syscalls list to compiler
Debian cross-building automatic checks:

  http://crossqa.debian.net/src/passt

currently fail because we don't use the right target architecture and
compiler while building the system call lists and resolving their
numbers in seccomp.sh. Pass ARCH and CC to seccomp.sh and use them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-03-09 00:36:03 +01:00
Stefano Brivio
c538ee8d69 util: Add own prototype for __clone2() on ia64
ia64 needs to use __clone2() as clone() is not available, but glibc
doesn't export the prototype. Take it from clone(2) to avoid an
implicit declaration:

util.c: In function ‘do_clone’:
util.c:512:16: warning: implicit declaration of function ‘__clone2’ [-Wimplicit-function-declaration]
  512 |         return __clone2(fn, stack_area + stack_size / 2, stack_size / 2,
      |                ^~~~~~~~

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-27 18:56:37 +01:00
Stefano Brivio
b1b75bd73a contrib/apparmor: Split profile into abstractions, use them
One day, libvirt might actually support running passt to provide
guest connectivity. Should libvirtd (or virtqemud) start passt, it
will need to access socket and PID files in specific locations, and
passt needs to accept SIGTERM in case QEMU fails to start after passt
is already started.

To make this more convenient, split the current profile into two
abstractions, for passt and for pasta, so that external programmes
can include the bits they need (and especially not include the pasta
abstraction if they only need to start passt), plus whatever specific
adaptation is needed.

For stand-alone usage of passt and pasta, the 'passt' profile simply
includes both abstractions, plus rules to create and access PID and
capture files in default or reasonable ($HOME) locations.

Tested on Debian with libvirt 9.0.0 together with a local fix to start
passt as intended, namely libvirt commit c0efdbdb9f66 ("qemu_passt:
Avoid double daemonizing passt"). This is an example of how the
libvirtd profile (or virtqemud abstraction, or virtqemud profile) can
use this:

  # support for passt network back-end
  /usr/bin/passt Cx -> passt,
  profile passt {
    /usr/bin/passt r,

    owner @{run}/user/[0-9]*/libvirt/qemu/run/passt/* rw,
    signal (receive) set=("term") peer=/usr/sbin/libvirtd,
    signal (receive) set=("term") peer=libvirtd,

    include if exists <abstractions/passt>
  }

translated:

- when executing /usr/bin/passt, switch to the subprofile "passt"
  (not the "discrete", i.e. stand-alone profile), described below.
  Scrub the environment (e.g. LD_PRELOAD is dropped)

- in the "passt" subprofile:

  - allow reading the binary

  - allow read and write access to PID and socket files

  - make passt accept SIGTERM from /usr/sbin/libvirtd, and
    libvirtd peer names

  - include anything else that's needed by passt itself

Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-27 18:56:32 +01:00
Andrea Bolognani
0d547a5b0f qrap: Generate -netdev as JSON
While generating -device as JSON when JSON is in use is
mandatory, because not doing so can often prevent the VM from
starting up, using JSON for -netdev simply makes things a bit
nicer. No reason not to do it, though.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:56:29 +01:00
Andrea Bolognani
4f2341f31d qrap: Introduce machine-specific PCI address base
For pc machines, devices are placed directly on pci.0 with
addresses like

  bus=pci.0,addr=0xa

and in this case the existing code works correctly.

For q35 machines, however, a separate PCI bus is created for
each devices using a pcie-root-port, and the resulting
addresses look like

  bus=pci.9,addr=0x0

In this case, we need to treat PCI addresses as decimal, not
hexadecimal, both when parsing and generating them.

This issue has gone unnoticed for a long time because it only
shows up when enough PCI devices are present: for small
numbers, decimal and hexadecimal overlap, masking the issue.

Reported-by: Alona Paz <alkaplan@redhat.com>
Fixes: 5307faa059 ("qrap: Strip network devices from command line, set them up according to machine")
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:56:24 +01:00
Andrea Bolognani
8828a637ba qrap: Drop args in JSON format
When JSON support was introduced, the drop_args array has
not been updated accordingly.

Fixes: b944ca1855 ("qrap: Support JSON syntax for -device")
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:56:22 +01:00
Andrea Bolognani
16f19c87f5 qrap: Fix support for pc machines
The JSON version of the template is incorrect, making qrap
completely unusable when JSON arguments are used together
with the pc machine type.

Fixes: b944ca1855 ("qrap: Support JSON syntax for -device")
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:56:20 +01:00
Andrea Bolognani
9cec4309c9 qrap: Fix limits for PCI addresses
The pci.0 bus on a pc machine has 32 slots.

For q35 machines, we don't expect devices to be plugged into
pcie.0 directly, so technically we could have a very large
number of slots by adding many pcie-root-ports, but even in
this scenario 32 is a reasonable number.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:56:07 +01:00
Stefano Brivio
d2df763232 log, conf, tap: Define die() as err() plus exit(), drop cppcheck workarounds
If we define die() as a variadic macro, passing __VA_ARGS__ to err(),
and calling exit() outside err() itself, we can drop the workarounds
introduced in commit 36f0199f6e ("conf, tap: Silence two false
positive invalidFunctionArg from cppcheck").

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:55:57 +01:00
Stefano Brivio
fb05a71378 doc/demo: Fix and suppress ShellCheck warnings
ShellCheck reports (SC2034) that __qemu_arch is not used. Use it,
and silence the resulting SC2086 warning as we want word splitting on
options we pass with it.

While at it, silence SC2317 warnings for commands in cleanup() that
appear to be unreachable: cleanup() is only called as trap.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:55:39 +01:00
Stefano Brivio
26a0e4d6ee Fix definitions of SOCKET_MAX, TCP_MAX_CONNS
...and, given that I keep getting this wrong, add a convenience
macro, MAX_FROM_BITS().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:55:31 +01:00
Stefano Brivio
4f523c3276 tcp: Avoid (theoretical) resource leak (CWE-772) Coverity warning
If tcp_timer_ctl() gets a socket number greater than SOCKET_MAX
(2 ^ 24), we return error but we don't close the socket. This is a
rather formal issue given that, at least on Linux, socket numbers are
monotonic and we're in general not allowed to open so many sockets.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:55:20 +01:00
Stefano Brivio
a1d5537741 tcp: Avoid false (but convoluted) positive Coverity CWE-476 warning
If there are no TCP options in the header, tcp_tap_handler() will
pass the corresponding pointer, fetched via packet_get(), as NULL to
tcp_conn_from_sock_finish(), which in turn indirectly calls
tcp_opt_get().

If there are no options, tcp_opt_get() will stop right away because
the option length is indicated as zero. However, if the logic is
complicated enough to follow for static checkers, adding an explicit
check against NULL in tcp_opt_get() is probably a good idea.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:55:10 +01:00
Stefano Brivio
5474bc5485 tcp, tcp_splice: Get rid of false positive CWE-394 Coverity warning from fls()
We use the return value of fls() as array index for debug strings.

While fls() can return -1 (if no bit is set), Coverity Scan doesn't
see that we're first checking the return value of another fls() call
with the same bitmask, before using it.

Call fls() once, store its return value, check it, and use the stored
value as array index.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:54:38 +01:00
Stefano Brivio
a48c5c2abf treewide: Disable gcc strict aliasing rules as needed, drop workarounds
Recently, commit 4ddbcb9c0c ("tcp: Disable optimisations
for tcp_hash()") worked around yet another issue we hit with gcc 12
and '-flto -O2': some stores affecting the input data to siphash_20b()
were omitted altogether, and tcp_hash() wouldn't get the correct hash
for incoming connections.

Digging further into this revealed that, at least according to gcc's
interpretation of C99 aliasing rules, passing pointers to functions
with different types compared to the effective type of the object
(for example, a uint8_t pointer to an anonymous struct, as it happens
in tcp_hash()), doesn't guarantee that stores are not reordered
across the function call.

This means that, in general, our checksum and hash functions might
not see parts of input data that was intended to be provided by
callers.

Not even switching from uint8_t to character types, which should be
appropriate here, according to C99 (ISO/IEC 9899, TC3, draft N1256),
section 6.5, "Expressions", paragraph 7:

  An object shall have its stored value accessed only by an lvalue
  expression that has one of the following types:

  [...]

  — a character type.

does the trick. I guess this is also subject to interpretation:
casting passed pointers to character types, and then using those as
different types, might still violate (dubious) aliasing rules.

Disable gcc strict aliasing rules for potentially affected functions,
which, in turn, disables gcc's Type-Based Alias Analysis (TBAA)
optimisations based on those function arguments.

Drop the existing workarounds. Also the (seemingly?) bogus
'maybe-uninitialized' warning on the tcp_tap_handler() > tcp_hash() >
siphash_20b() path goes away with -fno-strict-aliasing on
siphash_20b().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:54:13 +01:00
Stefano Brivio
da46fdac36 tcp: Suppress knownConditionTrueFalse cppcheck false positive
cppcheck 2.10 reports:

tcp.c:1815:12: style: Condition 'wnd>prev_scaled' is always false [knownConditionTrueFalse]
  if ((wnd > prev_scaled && wnd * 99 / 100 < prev_scaled) ||
           ^
tcp.c:1808:8: note: Assignment 'wnd=((1<<(16+8))<(wnd))?(1<<(16+8)):(wnd)', assigned value is less than 1
 wnd = MIN(MAX_WINDOW, wnd);
       ^
tcp.c:1811:19: note: Assuming condition is false
  if (prev_scaled == wnd)
                  ^
tcp.c:1815:12: note: Condition 'wnd>prev_scaled' is always false
  if ((wnd > prev_scaled && wnd * 99 / 100 < prev_scaled) ||
           ^

but this is not actually the case: wnd is typically greater than 1,
and might very well be greater than prev_scaled as well.

I bisected this down to cppcheck commit b4d455df487c ("Fix 11349: FP
negativeIndex for clamped array index (#4627)") and reported findings
at https://github.com/danmar/cppcheck/pull/4627.

Suppress the warning for the moment being.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-27 18:53:45 +01:00
Stefano Brivio
834b9a3049 log: Send identifier string in log messages, openlog() won't work for us
openlog() can be used to set "ident" and have all the log messages
prefixed by it, but only if we call syslog() -- this is implemented
by C libraries.

We don't log messages with syslog(), though, as we have a custom
implementation to ensure we don't need dynamic memory allocation.
This means that it's perfectly useless to call openlog(), and that we
have to prefix every message we log by the identifier on our own.

Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:53:38 +01:00
Stefano Brivio
bad2526872 conf, udp: Allow any loopback address to be used as resolver
Andrea reports that with a Fedora 37 guest running on a Fedora 37
host, both using systemd-resolved, with passt connecting them,
running with default options, DNS queries don't work.

systemd-resolved on the host is reachable only at the loopback
address 127.0.0.53.

We advertise the default gateway address to the guest as resolver,
because our local address is of course unreachable from there, which
means we see DNS queries directed to the default gateway, and we
redirect them to 127.0.0.1. However, systemd-resolved doesn't answer
on 127.0.0.1.

To fix this, set @dns_match to the address of the default gateway,
unless a different resolver address is explicitly configured, so that
we know we explicitly have to map DNS queries, in this case, to the
address of the local resolver.

This means that in udp_tap_handler() we need to check, first, if
the destination address of packets matches @dns_match: even if it's
the address of the local gateway, we want to map that to a specific
address, which isn't necessarily 127.0.0.1.

Do the same for IPv6 for consistency, even though IPv6 defines a
single loopback address.

Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:52:56 +01:00
Stefano Brivio
8ca907a3f0 conf: Split add_dns{4,6}() out of get_dns()
The logic handling which resolvers we add, and whether to add them,
is getting rather cramped in get_dns(): split it into separate
functions.

No functional changes intended.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:52:46 +01:00
Stefano Brivio
ddf7097a71 udp: Actually use host resolver to forward DNS queries
Instead of the address of the first resolver we advertise to
the guest or namespace.

This was one of the intentions behind commit 3a2afde87d ("conf,
udp: Drop mostly duplicated dns_send arrays, rename related fields"),
but I forgot to implement this part. In practice, they are usually
the same thing, unless /etc/resolv.conf points to a loopback address.

Fixes: 3a2afde87d ("conf, udp: Drop mostly duplicated dns_send arrays, rename related fields")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-02-27 18:52:04 +01:00
Stefano Brivio
4ddbcb9c0c tcp: Disable optimisations for tcp_hash()
I'm not sure if we're breaking some aliasing rule here, but with gcc
12.2.1 on x86_64 and -flto, the siphash_20b() call in tcp_hash()
doesn't see the connection address -- it gets all zeroes instead.

Fix this temporarily by disabling optimisations for this tcp_hash().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-22 13:16:22 +01:00
Stefano Brivio
933aa1014b selinux/passt.te: Allow setting socket option on routing netlink socket
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-21 19:12:37 +01:00
Stefano Brivio
0c11355e83 selinux/passt.te: Allow /etc/resolv.conf symlinks to be followed
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-21 19:12:37 +01:00
Stefano Brivio
7d9150db0a selinux/passt.te: Allow setcap on the process itself
This is needed by the new functions in isolate.c, add the
corresponding rule.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-21 19:12:37 +01:00
Stefano Brivio
01801b131f selinux: Switch to a more reasonable model for PID and socket files
Instead of restricting PID files to /var/run/passt.pid, which is a
single file and unlikely to be used, use the user_tmp_t type which
should cover any reasonable need.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-21 19:12:37 +01:00
Stefano Brivio
4902447630 selinux: Define interfaces for libvirt and similar frameworks
Services running passt will commonly need to transition to its
domain, terminate it, connect and write to its socket.

The init_daemon_domain() macro now defines the default transition to
the passt_t domain, using the passt_exec_t type.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-21 19:12:37 +01:00
Stefano Brivio
9136f74015 selinux/passt.if: Fix typo in passt_read_data interface definition
This is an example interface, currently unused, so it went undetected:
m4 macros need a backtick at the beginning of a block instead of a
single quote.

Fixes: 1f4b7fa0d7 ("passt, pasta: Add examples of SELinux policy modules")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-21 19:12:30 +01:00
Stefano Brivio
4663ccc89a conf: Fix typo and logic in conf_ports() check for port binding
Ouch, I accidentally pushed the previous change without running the
tests:

- we need to check, in conf_ports(), that udp_sock_init()
  managed to bind at least a port, not the opposite

- for -T and -U, we have no way to know if we'll manage to bind
  the port later, so never report an error for those

Fixes: 3d0de2c1d7 ("conf, tcp, udp: Exit if we fail to bind sockets for all given ports")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 19:59:07 +01:00
Stefano Brivio
36f0199f6e conf, tap: Silence two false positive invalidFunctionArg from cppcheck
The newly introduced die() calls exit(), but cppcheck doesn't see it
and warns about possibly invalid arguments used after the check which
triggers die(). Add return statements to silence the warnings.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 19:19:23 +01:00
David Gibson
89e0fbfaa7 tcp: Remove 'zero_len' goto from tcp_data_from_sock
This goto exists purely to move this exception case out of line.  Although
that does make the "normal" path a little clearer, it comes at the cost of
not knowing how where control will flow after jumping to the zero_len
label.  The exceptional case isn't that long, so just put it inline.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 18:56:37 +01:00
David Gibson
83b2061ae7 tcp: Remove 'recvmsg' goto from tcp_data_from_sock
This goto can be handled just as simply and more clearly with a do while.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 18:56:35 +01:00
David Gibson
42bfd212b1 tap: Eliminate goto from tap_handler()
The goto here really doesn't improve clarity or brevity at all.  Use a
clearer construct.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 18:56:31 +01:00
David Gibson
b62ed9ca0e tap: Don't pcap frames that didn't get sent
In tap_send_frames() we send a number of frames to the tap device, then
also write them to the pcap capture file (if configured).  However the tap
send can partially fail (short write()s or similar), meaning that some
of the requested frames weren't actually sent, but we still write those
frames to the capture file.

We do give a debug message in this case, but it's misleading to add frames
that we know weren't sent to the capture file.  Rework to avoid this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-02-16 18:56:20 +01:00