...not just for debug messages. Otherwise, timestamps in the log file
are consistent but the starting point is not zero.
Do this right away as we enter main(), so that the resulting
timestamps are as closely as possible relative to when we start.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
EPOLL_TYPE_UDP is now only used for "listening" sockets; long lived
sockets which can initiate new flows. Rename to EPOLL_TYPE_UDP_LISTEN
and associated functions to match. Along with that, remove the .orig
field from union udp_listen_epoll_ref, since it is now always true.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When forwarding a datagram to a socket, we need to find a socket with a
suitable local address to send it. Currently we keep track of such sockets
in an array indexed by local port, but this can't properly handle cases
where we have multiple local addresses in active use.
For "spliced" (socket to socket) cases, improve this by instead opening
a socket specifically for the target side of the flow. We connect() as
well as bind()ing that socket, so that it will only receive the flow's
reply packets, not anything else. We direct datagrams sent via that socket
using the addresses from the flow table, effectively replacing bespoke
addressing logic with the unified logic in fwd.c
When we create the flow, we also take a duplicate of the originating
socket, and use that to deliver reply datagrams back to the origin, again
using addresses from the flow table entry.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If a log file is configured, we would otherwise open a connection to
the system logger (if any), print any message that we might have
before we initialise the log file, and then keep that connection
around for no particular reason.
Call __openlog() as an alternative to the log file setup, instead.
This way, we might skip printing some messages during the
initialisation phase, but they're probably not really valuable to
have in a system log, and we're going to print them to standard
error anyway.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Now that we have logging functions embedding perror() functionality,
we can make _some_ calls more terse by using them. In many places,
the strerror() calls are still more convenient because, for example,
they are used in flow debugging functions, or because the return code
variable of interest is not 'errno'.
While at it, convert a few error messages from a scant perror style
to proper failure descriptions.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
perror() prints directly to standard error, but in many cases standard
error might be already closed, or we might want to skip logging, based
on configuration. Our logging functions provide all that.
While at it, make errors more descriptive, replacing some of the
existing basic perror-style messages.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
After commit 15001b39ef ("conf: set the log level much earlier"), we
had a phase during initialisation when messages wouldn't be printed to
standard error anymore.
Commit f67238aa86 ("passt, log: Call __openlog() earlier, log to
stderr until we detach") fixed that, but only for the case where no
log files are given.
If a log file is configured, vlogmsg() will not call passt_vsyslog(),
but during initialisation, LOG_PERROR is set, so to avoid duplicated
prints (which would result from passt_vsyslog() printing to stderr),
we don't call fprintf() from vlogmsg() either.
This is getting a bit too complicated. Instead of abusing LOG_PERROR,
define an internal logging flag that clearly represents that we're not
done with the initialisation phase yet.
If this flag is not set, make sure we always print to stderr, if the
log mask matches.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
The original behaviour of printing messages to standard error by
default when running from a non-interactive terminal was introduced
because the first KubeVirt integration draft used to start passt in
foreground and get messages via standard error.
For development purposes, the system logger was more convenient at
that point, and passt was running from interactive terminals only if
not started by the KubeVirt integration.
This behaviour was introduced by 84a62b79a2 ("passt: Also log to
stderr, don't fork to background if not interactive").
Later, I added command-line options in 1e49d194d0 ("passt, pasta:
Introduce command-line options and port re-mapping") and accidentally
reversed this condition, which wasn't a problem as --stderr could
force printing to standard error anyway (and it was used by KubeVirt).
Nowadays, the KubeVirt integration uses a log file (requested via
libvirt configuration), and the same applies for Podman if one
actually needs to look at runtime logs. There are no use cases left,
as far as I know, where passt runs in foreground in non-interactive
terminals.
Seize the chance to reintroduce some sanity here. If we fork to
background, standard error is closed, so --stderr is useless in that
case.
If we run in foreground, there's no harm in printing messages to
standard error, and that accidentally became the default behaviour
anyway, so --stderr is not needed in that case.
It would be needed for non-interactive terminals, but there are no
use cases, and if there were, let's log to standard error anyway:
the user can always redirect standard error to /dev/null if needed.
Before we're up and running, we need to print to standard error anyway
if something happens, otherwise we can't report failure to start in
any kind of usage, stand-alone or in integrations.
So, make --stderr do nothing, and deprecate it.
While at it, drop a left-over comment about --foreground being the
default only for interactive terminals, because it's not the case
anymore.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If we don't run in foreground, we close standard error as we
daemonise, so it makes no sense to check if the controlling terminal
is an interactive terminal or if --force-stderr was given, to decide
if we want to log to standard error.
Make --force-stderr depend on --foreground.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
As we are going to introduce the MODE_VU that will act like
the mode MODE_PASST, compare to MODE_PASTA rather than to add
a comparison to MODE_VU when we check for MODE_PASST.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We are going to introduce a variant of the function to use
vhost-user buffers rather than passt internal buffers.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The 'c' parameter is always passed exactly 'c'. The 'now' parameter is
always passed exactly 'now'.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Otherwise, if the user runs us as root, and gives us paths that are
only accessible by root, we'll fail to open them, which might in turn
encourage users to change permissions or ownerships: definitely a bad
idea in terms of security.
Reported-by: Minxi Hou <mhou@redhat.com>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
We won't call it from main() any longer: move it.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
As I'm adding pidfile_open() in the next patch. The function comment
didn't match, by the way.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
This is a remnant from the time we kept access to the original
filesystem and we could reinitialise the listening AF_UNIX socket.
Since commit 0515adceaa ("passt, pasta: Namespace-based sandboxing,
defer seccomp policy application"), however, we can't re-bind the
listening socket once we're up and running.
Drop the -1 initalisation and the corresponding check.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
When I switched from 'uname -m' to 'gcc -dumpmachine' to fetch the
architecture name for, among others, seccomp.sh, I didn't realise
that "armv6l" and "armv7l" are just Linux kernel names -- compilers
just call that "arm".
Fix the "syscalls" annotation we use to define seccomp profiles
accordingly, otherwise pasta will be terminated on sigreturn() on
armv6l and armv7l.
Fixes: 213c397492 ("passt, pasta: Run-time selection of AVX2 build")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Paul reports that, with commit 15001b39ef ("conf: set the log level
much earlier"), early messages aren't reported to standard error
anymore.
The reason is that, once the log mask is changed from LOG_EARLY, we
don't force logging to stderr, and this mechanism was abused to have
early errors on stderr. Now that we drop LOG_EARLY earlier on, this
doesn't work anymore.
Call __openlog() as soon as we know the mode we're running as, using
LOG_PERROR. Then, once we detach, if we're not running from an
interactive terminal and logging to standard error is not forced,
drop LOG_PERROR from the options.
While at it, check if the standard error descriptor refers to a
terminal, instead of checking standard output: if the user redirects
standard output to /dev/null, they might still want to see messages
from standard error.
Further, make sure we don't print messages to standard error reporting
that we couldn't log to the system logger, if we didn't open a
connection yet. That's expected.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Fixes: 15001b39ef ("conf: set the log level much earlier")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Currently ping sockets use a custom epoll reference type which includes
the ICMP id. However, now that we have entries in the flow table for
ping flows, finding that is sufficient to get everything else we want,
including the id. Therefore remove the icmp_epoll_ref type and use the
general 'flowside' field for ping sockets.
Having done this we no longer need separate EPOLL_TYPE_ICMP and
EPOLL_TYPE_ICMPV6 reference types, because we can easily determine
which case we have from the flow type. Merge both types into
EPOLL_TYPE_PING.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently icmp_id_map[][] stores information about ping sockets in a
bespoke structure. Move the same information into new types of flow
in the flow table. To match that change, replace the existing ICMP
timer with a flow-based timer for expiring ping sockets. This has the
advantage that we only need to scan the active flows, not all possible
ids.
We convert icmp_id_map[][] to point to the flow table entries, rather
than containing its own information. We do still use that array for
locating the right ping flows, rather than using a "flow native" form
of lookup for the time being.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update id_sock description in comment to icmp_ping_new()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
--quiet is supposed to silence the "No routable interface" message but
it does not work because the log level was set long after conf_ip4/6()
was called which means it uses the default level which logs everything.
To address this move the log level logic directly after the option
parsing in conf().
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We don't know how frequently this happens, but hitting
fs.inotify.max_user_watches or similar sysctl limits is definitely
not out of question, and Paul mentioned that, for example, Podman's
CI environments hit similar issues in the past.
Introduce a fallback mechanism based on a timer file descriptor: we
grab the directory handle at startup, and we can then use openat(),
triggered periodically, to check if the (network) namespace directory
still exists. If openat() fails at some point, exit.
Link: https://github.com/containers/podman/pull/21563#issuecomment-1943505707
Reported-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Before commit 32d07f5e59 ("passt, pasta: Completely avoid dynamic
memory allocation"), we didn't store the current log mask in a
variable, and we fetched it using setlogmask(0) wherever needed.
But after that commit, we can use our log_mask copy instead. And we
should: with recent glibc versions, setlogmask(0) actually results in
a system call, which causes a substantial overhead with high transfer
rates: we use setlogmask(0) even to decide we don't want to print
debug messages.
Now that we rely on log_mask in early stages, before setlogmask() is
called, we need to initialise that variable to the special LOG_EMERG
mask value right away: define LOG_EARLY to make this clearer, and,
while at it, group conditions in vlogmsg() into something more terse.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Currently we have separate handlers for ICMP and ICMPv6 ping replies.
Although there are a number of points of difference, with some creative
refactoring we can combine these together sensibly. Although it doesn't
save a vast amount of code, it does make it clearer that we're performing
basically the same steps for each case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we always keep the flow table maximally compact: that is all the
active entries are contiguous at the start of the table. Doing this
sometimes requires moving an entry when one is freed. That's kind of
fiddly, and potentially expensive: it requires updating the hash table for
the new location, and depending on flow type, it may require EPOLL_CTL_MOD,
system calls to update epoll tags with the new location too.
Implement a new way of managing the flow table that doesn't ever move
entries. It attempts to maintain some compactness by always using the
first free slot for a new connection, and mitigates the effect of non
compactness by cheaply skipping over contiguous blocks of free entries.
See the "theory of operation" comment in flow.c for details.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>b
[sbrivio: additional ASSERT(flow_first_free <= FLOW_MAX - 2) to avoid
Coverity Scan false positive]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently connected TCP sockets have the same epoll type, whether they're
for a "tap" connection or a spliced connection. This means that
tcp_sock_handler() has to do a secondary check on the type of the
connection to call the right function. We can avoid this by adding a new
epoll type and dispatching directly to the right thing.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As we already did for flow types, use an "EPOLL_NUM_TYPES" isntead of
EPOLL_TYPE_MAX, which is a little bit safer and clearer. Add a static
assert on the size of the matching names array.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_timer() scans the flow table so that it can run tcp_splice_timer() on
each spliced connection. More generally, other flow types might want to
run similar timers in future.
We could add a flow_timer() analagous to tcp_timer(), udp_timer() etc.
However, this would need to scan the flow table, which we would have just
done in flow_defer_handler(). We'd prefer to just scan the flow table
once, dispatching both per-flow deferred events and per-flow timed events
if necessary.
So, extend flow_defer_handler() to do this. For now we use the same timer
interval for all flow types (1s). We can make that more flexible in future
if we need to.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_defer_handler(), amongst other things, scans the flow table and does
some processing for each TCP connection. When we add other protocols to
the flow table, they're likely to want some similar scanning. It makes
more sense for cache friendliness to perform a single scan of the flow
table and dispatch to the protocol specific handlers, rather than having
each protocol separately scan the table.
To that end, add a new flow_defer_handler() handling all flow-linked
deferred operations.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The TCP state structure includes a 128-bit hash_secret which we use for
SipHash calculations to mitigate attacks on the TCP hash table and initial
sequence number.
We have plans to use SipHash in places that aren't TCP related, and there's
no particular reason they'd need their own secret. So move the hash_secret
to the general context structure.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Because packets sent on the tap interface will always be going to the
guest/namespace, we more-or-less know what address they'll be going to. So
we pre-fill this destination address in our header buffers for IPv4. We
can't do the same for IPv6 because we could need either the global or
link-local address for the guest. In future we're going to want more
flexibility for the destination address, so this pre-filling will get in
the way.
Change the flow so we always fill in the IPv4 destination address for each
packet, rather than prefilling it from proto_update_l2_buf(). In fact for
TCP we already redundantly filled the destination for each packet anyway.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we have a single epoll event type for the "tap" fd, which could
be either a handle on a /dev/net/tun device (pasta) or a connected Unix
socket (passt). However for the two modes we call different handler
functions. Simplify this a little by using different epoll types and
dispatching directly to the correct handler function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tap_handler() actually handles events on three different types of object:
the /dev/tap character device (pasta), a connected Unix domain socket
(passt) or a listening Unix domain socket (passt).
The last, in particular, really has no handling in common with the others,
so split it into its own epoll type and directly dispatch to the relevant
handler from the top level.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_sock_handler() handles both listening TCP sockets, and connected TCP
sockets, but what it needs to do in those cases has essentially nothing in
common. Therefore, give listening sockets their own epoll_type value and
dispatch directly to their own handler from the top level. Furthermore,
the two handlers need essentially entirely different information from the
reference: we re-(ab)used the index field in the tcp_epoll_ref to indicate
the port for the listening socket, but that's not the same meaning. So,
switch listening sockets to their own reference type which we can lay out
as we please. That lets us remove the listen and outbound fields from the
normal (connected) tcp_epoll_ref, reducing it to just the connection table
index.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
tcp_sock_handler() actually handles several different types of fd events.
This includes timerfds that aren't sockets at all. The handling of these
has essentially nothing in common with the other cases. So, give the
TCP timers there own epoll_type value and dispatch directly to their
handler. This also means we can remove the timer field from tcp_epoll_ref,
the information it encoded is now implicit in the epoll_type value.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Move the test for c->no_udp into the function itself, rather than in the
dispatching switch statement to better localize the UDP specific logic, and
make for greated consistency with other handler functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We have different epoll type values for ICMP and ICMPv6 sockets, but they
both call the same handler function, icmp_sock_handler(). However that
function does essentially nothing in common for the two cases. So, split
it into icmp_sock_handler() and icmpv6_sock_handler() and dispatch them
separately from the top level.
While we're there remove some parameters that the function was never using
anyway. Also move the test for c->no_icmp into the functions, so that all
the logic specific to ICMP is within the handler, rather than in the top
level dispatch code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When we process events from epoll_wait(), we check for a number of special
cases before calling sock_handler() which then dispatches based on the
protocol type of the socket in the event.
Fold these cases together into a single switch on the fd type recorded in
the epoll data field.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
epoll_ref contains a variety of information useful when handling epoll
events on our sockets, and we place it in the epoll_event data field
returned by epoll. However, for a few other things we use the 'fd' field
in the standard union of types for that data field.
This actually introduces a bug which is vanishingly unlikely to hit in
practice, but very nasty if it ever did: theoretically if we had a very
large file descriptor number for fd_tap or fd_tap_listen it could overflow
into bits that overlap with the 'proto' field in epoll_ref. With some
very bad luck this could mean that we mistakenly think an event on a
regular socket is an event on fd_tap or fd_tap_listen.
More practically, using different (but overlapping) fields of the
epoll_data means we can't unify dispatch for the various different objects
in the epoll. Therefore use the same epoll_ref as the data for the tap
fds and the netns quit fd, adding new fd type values to describe them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The epoll_ref type includes fields for the IP protocol of a socket, and the
socket fd. However, we already have a few things in the epoll which aren't
protocol sockets, and we may have more in future. Rename these fields to
an abstract "fd type" and file descriptor for more generality.
Similarly, rather than using existing IP protocol numbers for the type,
introduce our own number space. For now these just correspond to the
supported protocols, but we'll expand on that in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
union epoll_ref has a deeply nested set of structs and unions to let us
subdivide it into the various different fields we want. This means that
referencing elements can involve an awkward long string of intermediate
fields.
Using C11 anonymous structs and unions lets us do this less clumsily.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Valtteri reports that if SIGPIPE already has a disposition set by the
parent process, such as systemd with the default setting of
IgnoreSIGPIPE=yes, signal() will return the previous value, not zero,
and this is not an error: check for SIG_ERR instead.
While at it, split messages for failures of sigaction() and signal(),
and report the actual error.
Reported-by: Valtteri Vuorikoski <vuori@notcom.org>
Fixes: 8534be076c ("Catch failures when installing signal handlers")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.
Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.
Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Roughly inspired from a patch by Chris Kuhn: fix up includes so that
we can build against musl: glibc is more lenient as headers generally
include a larger amount of other headers.
Compared to the original patch, I only included what was needed
directly in C files, instead of adding blanket includes in local
header files. It's a bit more involved, but more consistent with the
current (not ideal) situation.
Reported-by: Chris Kuhn <kuhnchris+github@kuhnchris.eu>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
While building against musl, gcc informs us that 'stderr' is a
protected keyword. This probably comes from a #define stderr (stderr)
in musl's stdio.h, to avoid a clash with extern FILE *const stderr,
but I didn't really track it down. Just rename it to force_stderr, it
makes more sense.
[sbrivio: Added commit message]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
This actually leaves us with 0 uses of err(), but someone could want
to use it in the future, so we may as well leave it around.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Once a log file (specified on the commandline) is opened, the logging
functions will stop sending error logs to stderr, which is annoying if
passt has been started by another process that wants to collect error
messages from stderr so it can report why passt failed to start. It
would be much nicer if passt continued sending all log messages to
stderr until it daemonizes itself (at which point the process that
started passt can assume that it was started successfully).
The system log mask is set to LOG_EMERG when the process starts, and
we're already using that to do "special" logging during the period
from process start until the log level requested on the commandline is
processed (setting the log mask to something else). This period
*almost* matches with "the time before the process is daemonized"; if
we just delay setting the log mask a tiny bit, then it will match
exactly, and we can use it to determine if we need to send log
messages to stderr even when a log file has been specified and opened.
This patch delays the setting of the log mask until immediately before
the call to __daemon(). It also modifies logfn() slightly, so that it
will log to stderr any time log mask is LOG_EMERG, even if a log file
has been opened.
Signed-off-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Adapted from a patch by Paul Holzinger: when pasta spawns a command,
operating without a pre-existing user and network namespace, it needs
to wait for the tap device to be configured and its handler ready,
before the command is actually executed.
Otherwise, something like:
pasta --config-net nslookup passt.top
usually fails as the nslookup command is issued before the network
interface is ready.
We can't adopt a simpler approach based on SIGSTOP and SIGCONT here:
the child runs in a separate PID namespace, so it can't send SIGSTOP
to itself as the kernel sees the child as init process and blocks
the delivery of the signal.
We could send SIGSTOP from the parent, but this wouldn't avoid the
possible condition where the child isn't ready to wait for it when
the parent sends it, also raised by Paul -- and SIGSTOP can't be
blocked, so it can never be pending.
Use SIGUSR1 instead: mask it before clone(), so that the child starts
with it blocked, and can safely wait for it. Once the parent is
ready, it sends SIGUSR1 to the child. If SIGUSR1 is sent before the
child is waiting for it, the kernel will queue it for us, because
it's blocked.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Fixes: 1392bc5ca0 ("Allow pasta to take a command to execute")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>