One day, libvirt might actually support running passt to provide
guest connectivity. Should libvirtd (or virtqemud) start passt, it
will need to access socket and PID files in specific locations, and
passt needs to accept SIGTERM in case QEMU fails to start after passt
is already started.
To make this more convenient, split the current profile into two
abstractions, for passt and for pasta, so that external programmes
can include the bits they need (and especially not include the pasta
abstraction if they only need to start passt), plus whatever specific
adaptation is needed.
For stand-alone usage of passt and pasta, the 'passt' profile simply
includes both abstractions, plus rules to create and access PID and
capture files in default or reasonable ($HOME) locations.
Tested on Debian with libvirt 9.0.0 together with a local fix to start
passt as intended, namely libvirt commit c0efdbdb9f66 ("qemu_passt:
Avoid double daemonizing passt"). This is an example of how the
libvirtd profile (or virtqemud abstraction, or virtqemud profile) can
use this:
# support for passt network back-end
/usr/bin/passt Cx -> passt,
profile passt {
/usr/bin/passt r,
owner @{run}/user/[0-9]*/libvirt/qemu/run/passt/* rw,
signal (receive) set=("term") peer=/usr/sbin/libvirtd,
signal (receive) set=("term") peer=libvirtd,
include if exists <abstractions/passt>
}
translated:
- when executing /usr/bin/passt, switch to the subprofile "passt"
(not the "discrete", i.e. stand-alone profile), described below.
Scrub the environment (e.g. LD_PRELOAD is dropped)
- in the "passt" subprofile:
- allow reading the binary
- allow read and write access to PID and socket files
- make passt accept SIGTERM from /usr/sbin/libvirtd, and
libvirtd peer names
- include anything else that's needed by passt itself
Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While generating -device as JSON when JSON is in use is
mandatory, because not doing so can often prevent the VM from
starting up, using JSON for -netdev simply makes things a bit
nicer. No reason not to do it, though.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
For pc machines, devices are placed directly on pci.0 with
addresses like
bus=pci.0,addr=0xa
and in this case the existing code works correctly.
For q35 machines, however, a separate PCI bus is created for
each devices using a pcie-root-port, and the resulting
addresses look like
bus=pci.9,addr=0x0
In this case, we need to treat PCI addresses as decimal, not
hexadecimal, both when parsing and generating them.
This issue has gone unnoticed for a long time because it only
shows up when enough PCI devices are present: for small
numbers, decimal and hexadecimal overlap, masking the issue.
Reported-by: Alona Paz <alkaplan@redhat.com>
Fixes: 5307faa059 ("qrap: Strip network devices from command line, set them up according to machine")
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
When JSON support was introduced, the drop_args array has
not been updated accordingly.
Fixes: b944ca1855 ("qrap: Support JSON syntax for -device")
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
The JSON version of the template is incorrect, making qrap
completely unusable when JSON arguments are used together
with the pc machine type.
Fixes: b944ca1855 ("qrap: Support JSON syntax for -device")
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
The pci.0 bus on a pc machine has 32 slots.
For q35 machines, we don't expect devices to be plugged into
pcie.0 directly, so technically we could have a very large
number of slots by adding many pcie-root-ports, but even in
this scenario 32 is a reasonable number.
Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If we define die() as a variadic macro, passing __VA_ARGS__ to err(),
and calling exit() outside err() itself, we can drop the workarounds
introduced in commit 36f0199f6e ("conf, tap: Silence two false
positive invalidFunctionArg from cppcheck").
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
ShellCheck reports (SC2034) that __qemu_arch is not used. Use it,
and silence the resulting SC2086 warning as we want word splitting on
options we pass with it.
While at it, silence SC2317 warnings for commands in cleanup() that
appear to be unreachable: cleanup() is only called as trap.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
...and, given that I keep getting this wrong, add a convenience
macro, MAX_FROM_BITS().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If tcp_timer_ctl() gets a socket number greater than SOCKET_MAX
(2 ^ 24), we return error but we don't close the socket. This is a
rather formal issue given that, at least on Linux, socket numbers are
monotonic and we're in general not allowed to open so many sockets.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If there are no TCP options in the header, tcp_tap_handler() will
pass the corresponding pointer, fetched via packet_get(), as NULL to
tcp_conn_from_sock_finish(), which in turn indirectly calls
tcp_opt_get().
If there are no options, tcp_opt_get() will stop right away because
the option length is indicated as zero. However, if the logic is
complicated enough to follow for static checkers, adding an explicit
check against NULL in tcp_opt_get() is probably a good idea.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
We use the return value of fls() as array index for debug strings.
While fls() can return -1 (if no bit is set), Coverity Scan doesn't
see that we're first checking the return value of another fls() call
with the same bitmask, before using it.
Call fls() once, store its return value, check it, and use the stored
value as array index.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Recently, commit 4ddbcb9c0c ("tcp: Disable optimisations
for tcp_hash()") worked around yet another issue we hit with gcc 12
and '-flto -O2': some stores affecting the input data to siphash_20b()
were omitted altogether, and tcp_hash() wouldn't get the correct hash
for incoming connections.
Digging further into this revealed that, at least according to gcc's
interpretation of C99 aliasing rules, passing pointers to functions
with different types compared to the effective type of the object
(for example, a uint8_t pointer to an anonymous struct, as it happens
in tcp_hash()), doesn't guarantee that stores are not reordered
across the function call.
This means that, in general, our checksum and hash functions might
not see parts of input data that was intended to be provided by
callers.
Not even switching from uint8_t to character types, which should be
appropriate here, according to C99 (ISO/IEC 9899, TC3, draft N1256),
section 6.5, "Expressions", paragraph 7:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
[...]
— a character type.
does the trick. I guess this is also subject to interpretation:
casting passed pointers to character types, and then using those as
different types, might still violate (dubious) aliasing rules.
Disable gcc strict aliasing rules for potentially affected functions,
which, in turn, disables gcc's Type-Based Alias Analysis (TBAA)
optimisations based on those function arguments.
Drop the existing workarounds. Also the (seemingly?) bogus
'maybe-uninitialized' warning on the tcp_tap_handler() > tcp_hash() >
siphash_20b() path goes away with -fno-strict-aliasing on
siphash_20b().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
cppcheck 2.10 reports:
tcp.c:1815:12: style: Condition 'wnd>prev_scaled' is always false [knownConditionTrueFalse]
if ((wnd > prev_scaled && wnd * 99 / 100 < prev_scaled) ||
^
tcp.c:1808:8: note: Assignment 'wnd=((1<<(16+8))<(wnd))?(1<<(16+8)):(wnd)', assigned value is less than 1
wnd = MIN(MAX_WINDOW, wnd);
^
tcp.c:1811:19: note: Assuming condition is false
if (prev_scaled == wnd)
^
tcp.c:1815:12: note: Condition 'wnd>prev_scaled' is always false
if ((wnd > prev_scaled && wnd * 99 / 100 < prev_scaled) ||
^
but this is not actually the case: wnd is typically greater than 1,
and might very well be greater than prev_scaled as well.
I bisected this down to cppcheck commit b4d455df487c ("Fix 11349: FP
negativeIndex for clamped array index (#4627)") and reported findings
at https://github.com/danmar/cppcheck/pull/4627.
Suppress the warning for the moment being.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
openlog() can be used to set "ident" and have all the log messages
prefixed by it, but only if we call syslog() -- this is implemented
by C libraries.
We don't log messages with syslog(), though, as we have a custom
implementation to ensure we don't need dynamic memory allocation.
This means that it's perfectly useless to call openlog(), and that we
have to prefix every message we log by the identifier on our own.
Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Andrea reports that with a Fedora 37 guest running on a Fedora 37
host, both using systemd-resolved, with passt connecting them,
running with default options, DNS queries don't work.
systemd-resolved on the host is reachable only at the loopback
address 127.0.0.53.
We advertise the default gateway address to the guest as resolver,
because our local address is of course unreachable from there, which
means we see DNS queries directed to the default gateway, and we
redirect them to 127.0.0.1. However, systemd-resolved doesn't answer
on 127.0.0.1.
To fix this, set @dns_match to the address of the default gateway,
unless a different resolver address is explicitly configured, so that
we know we explicitly have to map DNS queries, in this case, to the
address of the local resolver.
This means that in udp_tap_handler() we need to check, first, if
the destination address of packets matches @dns_match: even if it's
the address of the local gateway, we want to map that to a specific
address, which isn't necessarily 127.0.0.1.
Do the same for IPv6 for consistency, even though IPv6 defines a
single loopback address.
Reported-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
The logic handling which resolvers we add, and whether to add them,
is getting rather cramped in get_dns(): split it into separate
functions.
No functional changes intended.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Instead of the address of the first resolver we advertise to
the guest or namespace.
This was one of the intentions behind commit 3a2afde87d ("conf,
udp: Drop mostly duplicated dns_send arrays, rename related fields"),
but I forgot to implement this part. In practice, they are usually
the same thing, unless /etc/resolv.conf points to a loopback address.
Fixes: 3a2afde87d ("conf, udp: Drop mostly duplicated dns_send arrays, rename related fields")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrea Bolognani <abologna@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
I'm not sure if we're breaking some aliasing rule here, but with gcc
12.2.1 on x86_64 and -flto, the siphash_20b() call in tcp_hash()
doesn't see the connection address -- it gets all zeroes instead.
Fix this temporarily by disabling optimisations for this tcp_hash().
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Instead of restricting PID files to /var/run/passt.pid, which is a
single file and unlikely to be used, use the user_tmp_t type which
should cover any reasonable need.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Services running passt will commonly need to transition to its
domain, terminate it, connect and write to its socket.
The init_daemon_domain() macro now defines the default transition to
the passt_t domain, using the passt_exec_t type.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This is an example interface, currently unused, so it went undetected:
m4 macros need a backtick at the beginning of a block instead of a
single quote.
Fixes: 1f4b7fa0d7 ("passt, pasta: Add examples of SELinux policy modules")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Ouch, I accidentally pushed the previous change without running the
tests:
- we need to check, in conf_ports(), that udp_sock_init()
managed to bind at least a port, not the opposite
- for -T and -U, we have no way to know if we'll manage to bind
the port later, so never report an error for those
Fixes: 3d0de2c1d7 ("conf, tcp, udp: Exit if we fail to bind sockets for all given ports")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The newly introduced die() calls exit(), but cppcheck doesn't see it
and warns about possibly invalid arguments used after the check which
triggers die(). Add return statements to silence the warnings.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This goto exists purely to move this exception case out of line. Although
that does make the "normal" path a little clearer, it comes at the cost of
not knowing how where control will flow after jumping to the zero_len
label. The exceptional case isn't that long, so just put it inline.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This goto can be handled just as simply and more clearly with a do while.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The goto here really doesn't improve clarity or brevity at all. Use a
clearer construct.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In tap_send_frames() we send a number of frames to the tap device, then
also write them to the pcap capture file (if configured). However the tap
send can partially fail (short write()s or similar), meaning that some
of the requested frames weren't actually sent, but we still write those
frames to the capture file.
We do give a debug message in this case, but it's misleading to add frames
that we know weren't sent to the capture file. Rework to avoid this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Based on a patch from Laine, and reports from Laine and Yalan: fix
the "22-80:32-90" example, and improve wording for the other ones:
instead of using "to" to denote the end of a range, use "between ...
and", so that it's clear we're *not* referring to target ports.
Reported-by: Laine Stump <laine@redhat.com>
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: da20f57f19 ("passt, qrap: Add man pages")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Similar to the conf_print() fix from commit 4129764eca ("conf: Fix
mask calculation from prefix_len in conf_print()"): to calculate an
IPv4 netmask from the prefix length, we need to left shift 32 all-one
bits by 32 minus the prefix length -- not by the prefix length
itself.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: dd09cceaee ("Minor improvements to IPv4 netmask handling")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
David points out that using multiple counters to go over the iov
array, namely 'i' and 'iov', makes mistakes easier. We can't just use
'iov', unless we reserve an element with zero iov_len at the end,
which isn't really justified.
Simply use 'i' to iterate over the array.
Link: https://archives.passt.top/passt-dev/Y+mfenvLn3VJ7Dg5@yekko/
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
passt supports ranges of forwarded ports as well as 'all' for TCP and
UDP, so it might be convenient to proceed if we fail to bind only
some of the desired ports.
But if we fail to bind even a single port for a given specification,
we're clearly, unexpectedly, conflicting with another network
service. In that case, report failure and exit.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Now that logging functions force printing messages to stderr before
passt forks to background, we'll have duplicate messages when running
from an interactive terminal, or if --stderr is passed, because at
some point we set LOG_PERROR in our __openlog() wrapper.
We could defer setting LOG_PERROR, but that would change option
semantics in other, unexpected ways. We could force calling
passt_vsyslog() as long as the mask is set to LOG_EMERG, but that
complicates the logic in logging functions even further.
Go the easy way for now: don't force printing to stderr with
LOG_EMERG if LOG_PERROR is already set. We should seriously consider a
rework of those logging functions at this point.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
This actually leaves us with 0 uses of err(), but someone could want
to use it in the future, so we may as well leave it around.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Again, it can then be made to return void, simplifying the caller.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As with conf_ports, this allows us to make the function return void.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Rather than having conf_ports() (possibly) log an error, and then
letting the caller log the entire usage() message and exit, just log
the errors and exit immediately (using die()).
For some errors, conf_ports would previously not log any specific
message, leaving it up to the user to determine the problem by
guessing. We replace all of those silent returns with die()
(logging a specific error), thus permitting us to make conf_ports()
return void, which simplifies the caller.
While modifying the two callers to conf_ports() to not check for a
return value, we can further simplify the code by removing the check
for a non-null optarg, as that is guaranteed to never happen (due to
prior calls to getopt_long() with "argument required" for all relevant
options - getopt_long() would have already caught this error).
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Nearly all of the calls to usage() in conf() occur immediately after
logging a more detailed error message, and the fact that these errors
are occuring indicates that the user has already seen the passt usage
message (or read the manpage). Spamming the logfile with the complete
contents of the usage message serves only to obscure the more detailed
error message. The only time when the full usage message should be output
is if the user explicitly asks for it with -h (or its synonyms)
As a start to eliminating the excessive calls to usage(), this patch
replaces most calls to err() followed by usage() with a call to die()
instead. A few other usage() calls remain, but their removal involves
bit more nuance that should be properly explained in separate commit
messages.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Almost all occurences of err() are either immediately followed by
exit(EXIT_FAILURE), usage(argv[0]) (which itself then calls
exit(EXIT_FAILURE), or that is what's done immediately after returning
from the function that calls err(). Modify the errfn macro so that its
instantiations can include exit(EXIT_FAILURE) at the end, and use that
to create a new function die() that will log an error and then
exit.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Once a log file (specified on the commandline) is opened, the logging
functions will stop sending error logs to stderr, which is annoying if
passt has been started by another process that wants to collect error
messages from stderr so it can report why passt failed to start. It
would be much nicer if passt continued sending all log messages to
stderr until it daemonizes itself (at which point the process that
started passt can assume that it was started successfully).
The system log mask is set to LOG_EMERG when the process starts, and
we're already using that to do "special" logging during the period
from process start until the log level requested on the commandline is
processed (setting the log mask to something else). This period
*almost* matches with "the time before the process is daemonized"; if
we just delay setting the log mask a tiny bit, then it will match
exactly, and we can use it to determine if we need to send log
messages to stderr even when a log file has been specified and opened.
This patch delays the setting of the log mask until immediately before
the call to __daemon(). It also modifies logfn() slightly, so that it
will log to stderr any time log mask is LOG_EMERG, even if a log file
has been opened.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Fedora 32-35 are now old enough that they're not on all mirrors. Fetch
them from the archive server instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The current debian cloud images no longer include ppc64. Change to using
the latest snapshot which does include ppc64.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When creating a new spliced connection, we need to get a socket in the
other ns from the originating one. To avoid excessive ns switches we
usually get these from a pool refilled on a timer. However, if the pool
runs out we need a fallback. Currently that's done by passing -1 as the
socket to tcp_splice_connnect() and running it in the target ns.
This means that tcp_splice_connect() itself needs to have different cases
depending on whether it's given an existing socket or not, which is
a separate concern from what it's mostly doing. We change it to require
a suitable open socket to be passed in, and ensuring in the caller that we
have one.
This requires adding the fallback paths to the caller, tcp_splice_new().
We use slightly different approaches for a socket in the init ns versus the
guest ns.
This also means that we no longer need to run tcp_splice_connect() itself
in the guest ns, which allows us to remove a bunch of boilerplate code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_conn_new_sock() first looks for a socket in a pre-opened pool, then if
that's empty creates a new socket in the init namespace. Both parts of
this are duplicated in other places: the pool lookup logic is duplicated in
tcp_splice_new(), and the socket opening logic is duplicated in
tcp_sock_refill_pool().
Split the function into separate parts so we can remove both these
duplications.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_splice.c has some explicit extern declarations to access the
socket pools. This is pretty dangerous - if we changed the type of
these variables in tcp.c, we'd have tcp.c and tcp_splice.c using the
same memory in different ways with no compiler error. So, move the
extern declarations to tcp_conn.h so they're visible to both tcp.c and
tcp_splice.c, but not the rest of pasta.
In fact the pools for the guest namespace are necessarily only used by
tcp_splice.c - we have no sockets on the guest side if we're not
splicing. So move those declarations and the functions that deal
exclusively with them to tcp_splice.c
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>