It might not be feasible for users to start passt-repair after passt
is started, on a migration target, but before the migration process
starts.
For instance, with libvirt, the guest domain (and, hence, passt) is
started on the target as part of the migration process. At least for
the moment being, there's no hook a libvirt user (including KubeVirt)
can use to start passt-repair before the migration starts.
Add a directory watch using inotify: if PATH is a directory, instead
of connecting to it, we'll watch for a .repair socket file to appear
in it, and then attempt to connect to that socket.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Here are a bunch of workarounds and a couple of fixes for libvirt
usage which are rather hard to split into single logical patches
as there appear to be some obscure dependencies between some of them:
- passt-repair needs to have an exec_type typeattribute (otherwise
the policy for lsmd(1) causes a violation on getattr on its
executable) file, and that typeattribute just happened to be there
for passt as a result of init_daemon_domain(), but passt-repair
isn't a daemon, so we need an explicit corecmd_executable_file()
- passt-repair needs a workaround, which I'll revisit once
https://github.com/fedora-selinux/selinux-policy/issues/2579 is
solved, for usage with libvirt: allow it to use qemu_var_run_t
and virt_var_run_t sockets
- add 'bpf' and 'dac_read_search' capabilities for passt-repair:
they are needed (for whatever reason I didn't investigate) to
actually receive socket files via SCM_RIGHTS
- passt needs further workarounds in the sense of
https://github.com/fedora-selinux/selinux-policy/issues/2579:
allow it to use map and use svirt_tmpfs_t (not just svirt_image_t):
it depends on where the libvirt guest image is
- ...it also needs to map /dev/null if <access mode='shared'/> is
enabled in libvirt's XML for the memoryBacking object, for
vhost-user operation
- and 'ioctl' on the TCP socket appears to be actually needed, on top
of 'getattr', to dump some socket parameters
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Otherwise we build it, but we don't install it. Not an issue that
warrants a a release right away as it's anyway usable.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This implements flow preparation on the source, transfer of data with
a format roughly inspired by struct tcp_tap_conn, plus a specific
structure for parameters that don't fit in the flow table, and flow
insertion on the target, with all the appropriate window options,
window scaling, MSS, etc.
Contents of pending queues are transferred as well.
The target side is rather convoluted because we first need to create
sockets and switch them to repair mode, before we can apply options
that are *not* stored in the flow table. This also means that, if
we're testing this on the same machine, in the same namespace, we need
to close the listening socket on the source before we can start moving
data.
Further, we need to connect() the socket on the target before we can
restore data queues, but we can't do that (again, on the same machine)
as long as the matching source socket is open, which implies an
arbitrary limit on queue sizes we can transfer, because we can only
dump pending queues on the source as long as the socket is open, of
course.
Co-authored-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This doesn't actually belong to passt's own policy: we should export
an interface and libvirt's policy should use it, because passt's
policy shouldn't be aware of svirt_image_t at all.
However, libvirt doesn't maintain its own policy, which makes policy
updates rather involved. Add this workaround to ensure --vhost-user
is working in combination with libvirt, as it might take ages before
we can get the proper rule in libvirt's policy.
Reported-by: Laine Stump <laine@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...other than being convenient, they might be reasonably
representative of typical stand-alone usage.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If libvirtd is triggered by an unprivileged user, the virt-aa-helper
mechanism doesn't work, because per-VM profiles can't be instantiated,
and as a result libvirtd runs unconfined.
This means passt can't start, because the passt subprofile from
libvirt's profile is not loaded either.
Example:
$ virsh start alpine
error: Failed to start domain 'alpine'
error: internal error: Child process (passt --one-off --socket /run/user/1000/libvirt/qemu/run/passt/1-alpine-net0.socket --pid /run/user/1000/libvirt/qemu/run/passt/1-alpine-net0-passt.pid --tcp-ports 40922:2) unexpected fatal signal 11
Add an annoying workaround for the moment being. Much better than
encouraging users to start guests as root, or to disable AppArmor
altogether.
Reported-by: Prafulla Giri <prafulla.giri@protonmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
On Fedora 41, without "allow pasta_t unconfined_t:dir read"
/usr/bin/pasta can't open /proc/[pid]/ns, which is required by
pasta_netns_quit_init().
This patch also remove one duplicate rule "allow pasta_t nsfs_t:file
read;", "allow pasta_t nsfs_t:file { open read };" at line 123 is
enough.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If passt or pasta are started as root, we need to read the passwd file
(be it /etc/passwd or whatever sssd provides) to find out UID and GID
of 'nobody' so that we can switch to it.
Instead of a bunch of allow rules for passwd_file_t and sssd macros,
use the more convenient auth_read_passwd() interface which should
cover our usage of getpwnam().
The existing rules weren't actually enough:
# strace -e openat passt -f
[...]
Started as root, will change to nobody.
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/lib64/libnss_sss.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/var/lib/sss/mc/passwd", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
with corresponding SELinux warnings logged in audit.log.
Reported-by: Minxi Hou <mhou@redhat.com>
Analysed-by: Miloš Malik <mmalik@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since commit eedc81b6ef ("fwd, conf: Probe host's ephemeral ports"),
we might need to read from /proc/sys/net/ipv4/ip_local_port_range in
both passt and pasta.
While pasta was already allowed to open and write /proc/sys/net
entries, read access was missing in SELinux's type enforcement: add
that.
In passt, instead, this is the first time we need to access an entry
there: add everything we need.
Fixes: eedc81b6ef ("fwd, conf: Probe host's ephemeral ports")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Allow access to user_devpts.
$ pasta --version
pasta 0^20240510.g7288448-1.fc40.x86_64
...
$ awk '' < /dev/null
$ pasta --version
$
While this might be a awk bug it appears pasta should still have access
to devpts.
Signed-off-by: Derek Schrock <dereks@lifeofadishwasher.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Now:
- we don't open the PID file in main() anymore
- PID file and AF_UNIX socket are opened by pidfile_open() and
tap_sock_unix_open()
- write_pidfile() becomes pidfile_write()
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Commit b686afa2 introduced the invalid apparmor rule
`mount options=(rw, runbindable) /,` since runbindable mount rules
cannot have a source.
Therefore running aa-logprof/aa-genprof will trigger errors (see
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2065685)
$ sudo aa-logprof
ERROR: Operation {'runbindable'} cannot have a source. Source = AARE('/')
This patch fixes it to the intended behavior.
Link: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/2065685
Fixes: b686afa23e ("apparmor: Explicitly pass options we use while remounting root filesystem")
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For some unknown reason "owner" makes it impossible to open bind mounted
netns references as apparmor denies it. In the kernel denied log entry
we see ouid=0 but it is not clear why that is as the actual file is
owned by the real (rootless) user id.
In abstractions/pasta there is already `@{run}/user/@{uid}/**` without
owner set for the same reason as this path contains the netns path by
default when running under Podman.
Fixes: 72884484b0 ("apparmor: allow read access on /tmp for pasta")
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The podman CI on debian runs tests based on /tmp but pasta is failing
there because it is unable to open the netns path as the open for read
access is denied.
Link: https://github.com/containers/podman/issues/22625
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
From an original patch by Danish Prakash.
With commit ff22a78d7b ("pasta: Don't try to watch namespaces in
procfs with inotify, use timer instead"), if a filesystem-bound
target namespace is passed on the command line, we'll grab a handle
on its parent directory. That commit, however, didn't introduce a
matching AppArmor rule. Add it here.
To access a network namespace procfs entry, we also need a 'ptrace'
rule. See commit 594dce66d3 ("isolation: keep CAP_SYS_PTRACE when
required") for details as to when we need this -- essentially, it's
about operation with Buildah.
Reported-by: Jörg Sonnenberger <joerg@bec.de>
Link: https://github.com/containers/buildah/issues/5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Fixes: ff22a78d7b ("pasta: Don't try to watch namespaces in procfs with inotify, use timer instead")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
With Podman's custom networks, pasta will typically need to open the
target network namespace at /run/user/<UID>/containers/networks:
grant access to anything under /run/user/<UID> instead of limiting it
to some subpath.
Note that in this case, Podman will need pasta to write out a PID
file, so we need write access, for similar locations, too.
Reported-by: Jörg Sonnenberger <joerg@bec.de>
Link: https://github.com/containers/buildah/issues/5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
For the policy to work as expected across either AppArmor commit
9d3f8c6cc05d ("parser: fix parsing of source as mount point for
propagation type flags") and commit 300889c3a4b7 ("parser: fix option
flag processing for single conditional rules"), we need one mount
rule with matching mount options as "source" (that is, without
source), and one without mount options and an explicit, empty source.
Link: https://github.com/containers/buildah/issues/5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The spec file patch by Dan Čermák was originally contributed at:
https://src.fedoraproject.org/rpms/passt/pull-request/1
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Partially equivalent to commit abf5ef6c22 ("apparmor: Allow pasta
to remount /proc, access entries under its own copy"): we should
allow pasta to remount /proc. It still works otherwise, but further
UID remapping in nested user namespaces (e.g. pasta in pasta) won't.
Reported-by: Laurent Jacquot <jk@lutty.net>
Link: https://bugs.passt.top/show_bug.cgi?id=79#c3
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This reverts commit 3fb3f0f7a5: it was
meant as a patch for Fedora 37 (and no later versions), not something
I should have merged upstream.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If passt is started with --fd to talk over a pre-opened UNIX domain
socket, we don't really know what label might be associated to it,
but at least for an unconfined_t socket, this bit of policy wouldn't
belong to anywhere else: enable that here.
This is rather loose, of course, but on the other hand passt will
sandbox itself into an empty filesystem, so we're not really adding
much to the attack surface except for what --fd is supposed to do.
Reported-by: Matej Hrica <mhrica@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2247221
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
With current selinux-policy-37.22-1.fc37.noarch, and presumably any
future update for Fedora 37, the user_namespace class is not
available, so statements using it prevent the policy from being
loaded.
If a class is not defined in the base policy, any related permission
is assumed to be enabled, so we can safely drop those.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2237996
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The hard link trick didn't actually fix the issue with SELinux file
contexts properly: as opposed to symbolic links, SELinux now
correctly associates types to the labels that are set -- except that
those labels are now shared, so we can end up (depending on how
rpm(8) extracts the archives) with /usr/bin/passt having a
pasta_exec_t context.
This got rather confusing as running restorecon(8) seemed to fix up
labels -- but that's simply toggling between passt_exec_t and
pasta_exec_t for both links, because each invocation will just "fix"
the file with the mismatching context.
Replace the hard links with two separate builds of the binary, as
suggested by David. The build is reproducible, so we pass "-pasta" in
the VERSION for pasta's build. This is wasteful but better than the
alternative.
Just copying the binary over would otherwise cause issues with
debuginfo packages due to duplicate Build-IDs -- and rpmbuild(8) also
warns about them.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If pasta and pasta.avx2 are hard links to passt and passt.avx2,
AppArmor will attach their own profiles on execution, and we can
restrict passt's profile to what it actually needs. Note that pasta
needs to access all the resources that passt needs, so the pasta
abstraction still includes passt's one.
I plan to push the adaptation required for the Debian package in
commit 5bb812e79143 ("debian/rules: Override pasta symbolic links
with hard links"), on Salsa. If other distributions need to support
AppArmor profiles they can follow a similar approach.
The profile itself will be installed, there, via dh_apparmor, in a
separate commit, b52557fedcb1 ("debian/rules: Install new pasta
profile using dh_apparmor").
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since commit b0e450aa85 ("pasta: Detach mount namespace, (re)mount
procfs before spawning command"), we need to explicitly permit mount
of /proc, and access to entries under /proc/PID/net (after remount,
that's what AppArmor sees as path).
Fixes: b0e450aa85 ("pasta: Detach mount namespace, (re)mount procfs before spawning command")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Starting with commit 770d1a4502 ("isolation: Initially Keep
CAP_SETFCAP if running as UID 0 in non-init"), the lack of this rule
became more apparent as pasta needs to access uid_map in procfs even
as non-root.
However, both passt and pasta needs this, in case they are started as
root, so add this directly to passt's abstraction (which is sourced
by pasta's profile too).
Fixes: 770d1a4502 ("isolation: Initially Keep CAP_SETFCAP if running as UID 0 in non-init")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As a result of AppArmor commit d4b0fef10a4a ("parser: fix rule flag
generation change_mount type rules"), we can't expect anymore to
get permission to mount() / read-write, with MS_REC | MS_UNBINDABLE
("runbindable", in AppArmor terms), if we don't explicitly pass those
flags as options. It used to work by mistake.
Now, the reasonable expectation would be that we could just change the
existing rule into:
mount options=(rw, runbindable) "" -> /,
...but this now fails to load too, I think as a result of AppArmor
commit 9d3f8c6cc05d ("parser: fix parsing of source as mount point
for propagation type flags"). It works with 'rw' alone, but
'runbindable' is indeed a propagation type flag.
Skip the source specification, it doesn't add anything meaningful to
the rule anyway.
Reported-by: Paul Holzinger <pholzing@redhat.com>
Link: https://github.com/containers/podman/pull/19751
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While abstractions/nameservice appeared too broad and overkill for
our simple need (read-only resolv.conf access), it properly deals
with symlinked resolv.conf files generated by systemd-resolved,
NetworkManager or suchlike.
If we just grant read-only access to /etc/resolv.conf, we'll fail to
read nameserver information in rather common configurations, because
AppArmor won't follow the symlink.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...now it gets ugly. If we use pasta without an existing target
namespace, and run commands directly or spawn a shell, and keep
the pasta_t domain when we do, they won't be able to do much: a
shell might even start, but it's not going to be usable, or to
even display a prompt.
Ideally, pasta should behave like a shell when it spawns a command:
start as unconfined_t and automatically transition to whatever
domain is associated in the specific policy for that command. But
we can't run as unconfined_t, of course.
It would seem natural to switch to unconfined_t "just before", so
that the default transitions happen. But transitions can only happen
when we execvp(), and that's one single transition -- not two.
That is, this approach would work for:
pasta -- sh -c 'ip address show'
but not for:
pasta -- ip address show
If we configure a transition to unconfined_t when we run ip(8), we'll
really try to start that as unconfined_t -- but unconfined_t isn't
allowed as entrypoint for ip(8) itself, and execvp() will fail.
However, there aren't many different types of binaries pasta might
commonly run -- for example, we're unlikely to see pasta used to run
a mount(8) command.
Explicitly set up domain transition for common stuff -- switching to
unconfined_t for bin_t and shells works just fine, ip(8), ping(8),
arping(8) and similar need a different treatment.
While at it, allow commands we spawn to inherit resource limits and
signal masks, because that's what happens by default, and don't
require AT_SECURE sanitisation of the environment (because that
won't happen by default). Slightly unrelated: we also need to
explicitly allow pasta_t to use TTYs, not just PTYs, otherwise
we can't keep stdin and stdout open for shells.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This is needed to monitor filesystem-bound namespaces and quit when
they're gone -- this feature never really worked with SELinux.
Fixes: 745a9ba428 ("pasta: By default, quit if filesystem-bound net namespace goes away")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
That's what we actually need to check networking-related sysctls,
to scan for bound ports, and to manipulate bits of network
configuration inside pasta's target namespaces.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Somehow most of this used to work on older kernels, but now we need
to explicitly permit setuid, setgid, and setcap capabilities, as well
as read-only access to passwd (as we support running under a given
login name) and sssd library facilities.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Kernel commit ed5d44d42c95 ("selinux: Implement userns_create hook")
seems to just introduce a new functionality, but given that SELinux
implements a form of mandatory access control, introducing the new
permission breaks any application (shipping with SELinux policies)
that needs to create user namespaces, such as passt and pasta for
sandboxing purposes.
Add the new 'allow' rules. They appear to be backward compatible,
kernel-wise, and the policy now requires the new 'user_namespace'
class to build, but that's something distributions already ship.
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
There's no reason to use wildcards, and we don't want any
similarly-named binary (not that I'm aware of any) to risk being
associated to passt_exec_t and pasta_exec_t by accident.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
The Makefile installs symbolic links by default, which actually
worked at some point (not by design) with SELinux, but at least on
recent kernel versions it doesn't anymore: override pasta (and
pasta.avx2) with hard links.
Otherwise, even if the links are labeled as pasta_exec_t, SELinux
will "resolve" them to passt_exec_t, and we'll have pasta running as
passt_t instead of pasta_t.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.
Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.
Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
That was meant to be an example, and I just dropped it in the
previous commit -- passt.if should be more than enough as a possible
example.
Reported-by: Carl G. <carlg@fedoraproject.org>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182145
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This was meant to be an example, but I managed to add syntax errors
to it. Drop it altogether.
Reported-by: Carl G. <carlg@fedoraproject.org>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182145
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Instead of:
https://fedoraproject.org/wiki/SELinux_Policy_Modules_Packaging_Draft
follow this:
https://fedoraproject.org/wiki/PackagingDrafts/SELinux_Independent_Policy
which seems to make more sense and fixes the issue that, on a fresh
install, without a reboot, the file contexts for the binaries are not
actually updated.
In detail:
- labels are refreshed using the selinux_relabel_pre and
selinux_relabel_post on install, upgrade, and uninstall
- use the selinux_modules_install and selinux_modules_uninstall
macros, instead of calling 'semodule' directly (no functional
changes in our case)
- require the -selinux package on SELinux-enabled environments and if
the current system policy is "targeted"
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...and in any case, this patch doesn't offer any advantage over the
current upstream integration.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Native support was introduced with commit 13c6be96618c, QEMU 7.2.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>