passt

mirror of https://passt.top/passt synced 2025-06-02 06:15:33 +02:00

Author	SHA1	Message	Date
Stefano Brivio	e76e65a36e	test/lib: Move screen-scraping setup and layout functions to _ugly files I'm going to add yet another one of those, for which I have no quick solution. It's a regression in some sense, but at least if we make this regression more observable and defined, it should be easier to find a comprehensive solution later, within this or another testing framework. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-11-04 12:01:05 +01:00
Stefano Brivio	ea5e046646	README: Add Podman, vhost-user links, and links to Bugzilla queries Unfortunately Bugzilla doesn't enable sharing of queries to unregistered users: https://bugzilla.mozilla.org/show_bug.cgi?id=400063 ...but we can still use ugly search links. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-27 22:41:37 +02:00
Stefano Brivio	10cabe3dbf	passt.1: Fix typo: "addressses", reported by Lintian Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-27 14:28:00 +02:00
Stefano Brivio	f212044940	icmp: Don't discard first reply sequence for a given echo ID In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any reply we send: we're on the same host as the target, after all. We discard them by comparing the last sequence we sent with the sequence we receive. However, on the first reply for a given identifier, the sequence might be zero, depending on the implementation of ping(8): we need another value to indicate we haven't sent any sequence number, yet. Use -1 as initialiser in the echo identifier map. This is visible with Busybox's ping, and was reported by Paul on the integration at https://github.com/containers/podman/pull/16141, with: $ podman run --net=pasta alpine ping -c 2 192.168.188.1 ...where only the second reply would be routed back. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `33482d5bf2` ("passt: Add PASTA mode, major rework") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-27 00:18:21 +02:00
Stefano Brivio	b062ee47d1	icmp: Add debugging messages for handled replies and requests ...instead of just reporting errors. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-27 00:18:18 +02:00
Stefano Brivio	947d756747	tap: Trace received (outbound) ICMP packets in debug mode, too This only worked for ICMPv6: ICMP packets have no TCP-style header, so they are handled as a special case before packet sequences are formed, and the call to tap_packet_debug() was missing. Fixes: `bb70811183` ("treewide: Packet abstraction with mandatory boundary checks") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-27 00:18:16 +02:00
Stefano Brivio	7402951658	conf, passt.1: Don't imply --foreground with --debug Having -f implied by -d (and --trace) usually saves some typing, but debug mode in background (with a log file) is quite useful if pasta is started by Podman, and is probably going to be handy for passt with libvirt later, too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-27 00:17:56 +02:00
Stefano Brivio	e4df8b0844	test/run: Temporarily disable distribution tests They're too slow to cope with current release cycles, and they haven't found bugs in months, also because clang-tidy and cppcheck would find most of them earlier. Disable them for the moment. We should pre-install gcc and make in non-x86 images, as those run on my test machine with qemu TCG, and that's the real slow-down here. Then we can re-enable them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-26 07:03:56 +02:00
Stefano Brivio	fb820ebb2e	hooks: Temporarily disable demo generation in pre-push The out-of-tree Podman patch needs to be rebased every second week or so, and I'm currently trying to get that upstream: https://github.com/containers/podman/pull/16141 Disable demo generation for the moment, so that I avoid wasting time with those rebases. We'll re-enable it later. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-26 06:56:25 +02:00
Stefano Brivio	d472476caa	test: Add log file tests for pasta plus corresponding layout and setup To test log files on a tmpfs mount, we need to unshare the mount namespace, which means using a context for the passt pane is not really practical at the moment, as we can't open a shell there, so we would have to encapsulate all the commands under 'unshare -rUm', plus the "inner" pasta command, running in turn a tcp_rr server. It might be worth fixing this by e.g. detecting we are trying to spawn an interactive shell and adding a special path in the context setup with some form of stdin redirection -- I'm not sure it's doable though. For this reason, add a new layout, using a context only for the host pane, while keeping the old command dispatch mechanism for the passt pane. We also need a new setup function that doesn't start pasta: we want to start and restart it with different options. Further, we need a 'pint' directive, to send an interrupt to the passt pane: add that in lib/test. All the tests before the one involving tmpfs and a detached mount namespace were also tested with the context mechanism. To make an eventual conversion easier, pass tcp_crr directly as a command on pasta's command line where feasible. While at it, fix the comment to the teardown_pasta() function. The new test set can be semi-conveniently run as: ./run pasta_options/log_to_file and it checks basic log creation, size of the log file after flooding it with debug entries, rotations, and basic consistency after rotations, on both an existing filesystem and a tmpfs, chosen as it doesn't support collapsing data ranges via fallocate(), hence triggering the fall-back mechanism for logging rotation. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-26 06:28:41 +02:00
Stefano Brivio	e67039f712	checksum: Fix calculation for ICMP checksum on IPv4 We need to zero out the checksum field before calculating the checksum, of course. I have no idea how this passed the "icmp" test set, looking into it. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `67ab617172` ("Add csum_icmp4() helper for calculating ICMP checksums") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-26 06:28:06 +02:00
Stefano Brivio	c11277b94f	conf: Don't pass leading ~ to parse_port_range() on exclusions Commit `84fec4e998` ("Clean up parsing of port ranges") drops the strspn() call before the parsing of excluded port ranges, because now we're checking against any stray characters at every step. However, that also has the effect of passing ~ as first character to the new parse_port_range(), which makes no sense: we already checked that ~ is the first character before the call, so skip it. Alona reported this output: Invalid port specifier ~15000,~15001,~15006,~15008,~15020,~15021,~15090 while the whole specifier is indeed valid. Reported-by: Alona Paz <alkaplan@redhat.com> Fixes: `84fec4e998` ("Clean up parsing of port ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-24 14:37:22 +02:00
Stefano Brivio	b68da100ba	util: Set NS_FN_STACK_SIZE to one eighth of ulimit-reported maximum stack size ...instead of one fourth. On the main() -> conf() -> nl_sock_init() call path, LTO from gcc 12 on (at least) x86_64 decides to inline... everything: nl_sock_init() is effectively part of main(), after commit `3e2eb4337b` ("conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()"). This means we exceed the maximum stack size, and we get SIGSEGV, under any condition, at start time, as reported by Andrea on a recent build for CentOS Stream 9. The calculation of NS_FN_STACK_SIZE, which is the stack size we reserve for clones, was previously obtained by dividing the maximum stack size by two, to avoid an explicit check on architecture (on PA-RISC, also known as hppa, the stack grows up, so we point the clone to the middle of this area), and then further divided by two to allow for any additional usage in the caller. Well, if there are essentially no function calls anymore, this is not enough. Divide it by eight, which is anyway much more than possibly needed by any clone()d callee. I think this is robust, so it's a fix in some sense. Strictly speaking, though, we have no formal guarantees that this isn't either too little or too much. What we should do, eventually: check cloned() callees, there are just thirteen of them at the moment. Note down any stack usage (they are mostly small helpers), bonus points for an automated way at build time, quadruple that or so, to allow for extreme clumsiness, and use as NS_FN_STACK_SIZE. Perhaps introduce a specific condition for hppa. Reported-by: Andrea Bolognani <abologna@redhat.com> Fixes: `3e2eb4337b` ("conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-22 08:46:57 +02:00
Andrea Bolognani	5715a297a7	Add git-publish configuration file Signed-off-by: Andrea Bolognani <abologna@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-22 03:45:50 +02:00
Andrea Bolognani	b944ca1855	qrap: Support JSON syntax for -device Starting with version 8.1.0, libvirt uses JSON syntax when generating the arguments to -device, so they will now look like {"driver":"virtio-scsi-pci","bus":"pci.3","addr":"0x0"} instead of virtio-scsi-pci,bus=pci.3,addr=0x0 qrap needs to parse these arguments and extract the bus number in order to figure out what address to use for the virtio-net device it adds, and the libvirt change described above has broken this parsing logic. Tweak the code so that both styles are accepted and handled correctly. Note that, when JSON is in use, qrap needs to generate its own command line options in that format as well or things will not work as expected. Signed-off-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-21 11:43:45 +02:00
David Gibson	c6845f60a0	dhcp: Use tap_udp4_send() helper in dhcp() The IPv4 specific dhcp() manually constructs L2 and IP headers to send its DHCP reply packet, unlike its IPv6 equivalent in dhcpv6.c which uses the tap_udp6_send() helper. Now that we've broaded the parameters to tap_udp4_send() we can use it in dhcp() to avoid some duplicated logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:35:00 +02:00
David Gibson	2dbc622f54	tap: Split tap_ip4_send() into UDP and ICMP variants tap_ip4_send() has special case logic to compute the checksums for UDP and ICMP packets, which is a mild layering violation. By using a suitable helper we can split it into tap_udp4_send() and tap_icmp4_send() functions without greatly increasing the code size, this removing that layering violation. We make some small changes to the interface while there. In both cases we make the destination IPv4 address a parameter, which will be useful later. For the UDP variant we make it take just the UDP payload, and it will generate the UDP header. For the ICMP variant we pass in the ICMP header as before. The inconsistency is because that's what seems to be the more natural way to invoke the function in the callers in each case. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:56 +02:00
David Gibson	db07804d26	ndp: Use tap_icmp6_send() helper We send ICMPv6 packets to the guest from both icmp.c and from ndp.c. The case in ndp() manually constructs L2 and IPv6 headers, unlike the version in icmp.c which uses the tap_icmp6_send() helper from tap.c Now that we've broaded the parameters of tap_icmp6_send() we can use it in ndp() as well saving some duplicated logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:53 +02:00
David Gibson	cb1edae3b5	ndp: Remove unneeded eh_source parameter ndp() takes a parameter giving the ethernet source address of the packet it is to respond to, which it uses to determine the destination address to send the reply packet to. This is not necessary, because the address will always be the guest's MAC address. Even if the guest has just changed MAC address, then either tap_handler_passt() or tap_handler_pasta() - which are the only call paths leading to ndp() will have updated c->mac_guest with the new value. So, remove the parameter, and just use c->mac_guest, making it more consistent with other paths where we construct packets to send inwards. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:51 +02:00
David Gibson	9d8dd8b6f4	tap: Split tap_ip6_send() into UDP and ICMP variants tap_ip6_send() has special case logic to compute the checksums for UDP and ICMP packets, which is a mild layering violation. By using a suitable helper we can split it into tap_udp6_send() and tap_icmp6_send() functions without greatly increasing the code size, this removing that layering violation. We make some small changes to the interface while there. In both cases we make the destination IPv6 address a parameter, which will be useful later. For the UDP variant we make it take just the UDP payload, and it will generate the UDP header. For the ICMP variant we pass in the ICMP header as before. The inconsistency is because that's what seems to be the more natural way to invoke the function in the callers in each case. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:48 +02:00
David Gibson	f616ca231e	Split tap_ip_send() into IPv4 and IPv6 specific functions The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and it turns out that every caller (statically) knows if it is using IPv4 or IPv6. So split into separate tap_ip4_send() and tap_ip6_send() functions. Use a new tap_l2_hdr() function for the very small common part. While we're there, make some minor cleanups: - We were double writing some fields in the IPv6 header, so that it temporary matched the pseudo-header for checksum calculation. With recent checksum reworks, this isn't neccessary any more. - We don't use any IPv4 header options, so use some sizeof() constructs instead of some open coded values for header length. - The comment used to say that the flow label was for TCP over IPv6, but in fact the only thing we used it for was DHCPv6 over UDP traffic Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:45 +02:00
David Gibson	fb5d1c5d7d	tap: Remove unhelpeful vnet_pre optimization from tap_send() Callers of tap_send() can optionally use a small optimization by adding extra space for the 4 byte length header used on the qemu socket interface. tap_ip_send() is currently the only user of this, but this is used only for "slow path" ICMP and DHCP packets, so there's not a lot of value to the optimization. Worse, having the two paths here complicates the interface and makes future cleanups difficult, so just remove it. I have some plans to bring back the optimization in a more general way in future, but for now it's just in the way. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:43 +02:00
David Gibson	f72b63e92f	Remove support for TCP packets from tap_ip_send() tap_ip_send() is never used for TCP packets, we're unlikely to use it for that in future, and the handling of TCP packets makes other cleanups unnecessarily awkward. Remove it. This is the only user of csum_tcp4(), so we can remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:40 +02:00
David Gibson	a2eb2d310a	Add helpers for normal inbound packet destination addresses tap_ip_send() doesn't take a destination address, because it's specifically for inbound packets, and the IP addresses of the guest/namespace are already known to us. Rather than open-coding this destination address logic, make helper functions for it which will enable some later cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:38 +02:00
David Gibson	3d8ccb44a6	Add csum_ip4_header() helper to calculate IPv4 header checksums We calculate IPv4 header checksums in at least two places, in dhcp() and in tap_ip_send. Add a helper to handle this calculation in both places. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:34 +02:00
David Gibson	bd4be308fc	Add csum_udp4() helper for calculating UDP over IPv4 checksums At least two places in passt fill in UDP over IPv4 checksums, although since UDP checksums are optional with IPv4 that just amounts to storing a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in dhcp()). For consistency, add a helper for this "calculation". Just for the heck of it, add the option (compile time disabled for now) to calculate real UDP checksums. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:32 +02:00
David Gibson	6905ac75ec	Add csum_udp6() helper for calculating UDP over IPv6 checksums Add a helper for calculating UDP checksums when used over IPv6 For future flexibility, the new helper takes parameters for the fields in the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to be explicitly constructed. It also allows the UDP header and payload to be in separate buffers, although we don't use this yet. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:29 +02:00
David Gibson	67ab617172	Add csum_icmp4() helper for calculating ICMP checksums Although tap_ip_send() is currently the only place calculating ICMP checksums, create a helper function for symmetry with ICMPv6. For future flexibility it allows the ICMPv6 header and payload to be in separate buffers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:26 +02:00
David Gibson	7abd2b0d72	Add csum_icmp6() helper for calculating ICMPv6 checksums At least two places in passt calculate ICMPv6 checksums, ndp() and tap_ip_send(). Add a helper to handle this calculation in both places. For future flexibility, the new helper takes parameters for the fields in the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to be explicitly constructed. It also allows the ICMPv6 header and payload to be in separate buffers, although we don't use this yet. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-19 03:34:21 +02:00
Stefano Brivio	b3f359167b	passt.1: Add David to AUTHORS I just realised while reading the man page. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	3e2eb4337b	conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user() Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in the target user namespace as we isolate the process, which means we're unable to bind to low ports at that point. Bind inbound ports, and only those, before isolate_user(). Keep the handling of outbound ports (for pasta mode only) after the setup of the namespace, because that's where we'll bind them. To this end, initialise the netlink socket for the init namespace before isolate_user() as well, as we actually need to know the addresses of the upstream interface before binding ports, in case they're not explicitly passed by the user. As we now call nl_sock_init() twice, checking its return code from conf() twice looks a bit heavy: make it exit(), instead, as we can't do much if we don't have netlink sockets. While at it: - move the v4_only && v6_only options check just after the first option processing loop, as this is more strictly related to option parsing proper - update the man page, explaining that CAP_NET_BIND_SERVICE is not the preferred way to bind ports, because passt and pasta can be abused to allow other processes to make effective usage of it. Add a note about the recommended sysctl instead - simplify nl_sock_init_do() now that it's called once for each case Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	40abd447c8	Rename pasta_setup_ns() to pasta_spawn_cmd() pasta_setup_ns() no longer has much to do with setting up a namespace. Instead it's really about starting the shell or other command we want to run with pasta connectivity. Rename it and its argument structure to be less misleading. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	eb3d03a588	isolation: Only configure UID/GID mappings in userns when spawning shell When in passt mode, or pasta mode spawning a command, we create a userns for ourselves. This is used both to isolate the pasta/passt process itself and to run the spawned command, if any. Since `eed17a47` "Handle userns isolation and dropping root at the same time" we've handled both cases the same, configuring the UID and GID mappings in the new userns to map whichever UID we're running as to root within the userns. This mapping is desirable when spawning a shell or other command, so that the user gets a root shell with reasonably clear abilities within the userns and netns. It's not necessarily essential, though. When not spawning a shell, it doesn't really have any purpose: passt itself doesn't need to be root and can operate fine with an unmapped user (using some of the capabilities we get when entering the userns instead). Configuring the uid_map can cause problems if passt is running with any capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to allow it to forward low ports. In this case the kernel makes files in /proc/pid owned by root rather than the starting user to prevent the user from interfering with the operation of the capability-enhanced process. This includes uid_map meaning we are not able to write to it. Whether this behaviour is correct in the kernel is debatable, but in any case we might as well avoid problems by only initializing the user mappings when we really want them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	fb449b16bd	isolation: Prevent any child processes gaining capabilities We drop our own capabilities, but it's possible that processes we exec() could gain extra privilege via file capabilities. It shouldn't be possible for us to exec() anyway due to seccomp() and our filesystem isolation. But just in case, zero the bounding and inheritable capability sets to prevent any such child from gainin privilege. Note that we do this after spawning the pasta shell/command (if any), because we do want the user to be able to give that privilege if they want. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	c22ebccba8	isolation: Replace drop_caps() with a version that actually does something The current implementation of drop_caps() doesn't really work because it attempts to drop capabilities from the bounding set. That's not the set that really matters, it's about limiting the abilities of things we might later exec() rather than our own capabilities. It also requires CAP_SETPCAP which we won't usually have. Replace it with a new version which uses setcap(2) to drop capabilities from the effective and permitted sets. For now we leave the inheritable set as is, since we don't want to preclude the user from passing inheritable capabilities to the command spawed by pasta. Correctly dropping caps reveals that we were relying on some capabilities we'd supposedly dropped. Re-divide the dropping of capabilities between isolate_initial(), isolate_user() and isolate_prefork() to make this work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	ceb2061587	isolation: Refactor isolate_user() to allow for a common exit path Currently, isolate_user() exits early if the --netns-only option is given. That works for now, but shortly we're going to want to add some logic to go at the end of isolate_user() that needs to run in all cases: joining a given userns, creating a new userns, or staying in our original userns (--netns-only). To avoid muddying those changes, here we reorganize isolate_user() to have a common exit path for all cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	ea5936dd3f	Replace FWRITE with a function In a few places we use the FWRITE() macro to open a file, replace it's contents with a given string and close it again. There's no real reason this needs to be a macro rather than just a function though. Turn it into a function 'write_file()' and make some ancillary cleanups while we're there: - Add a return code so the caller can handle giving a useful error message - Handle the case of short write()s (unlikely, but possible) - Add O_TRUNC, to make sure we replace the existing contents entirely Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	096e48669b	isolation: Clarify various self-isolation steps We have a number of steps of self-isolation scattered across our code. Improve function names and add comments to make it clearer what the self isolation model is, what the steps do, and why they happen at the points they happen. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	6909a8e339	Remove unhelpful drop_caps() call in pasta_start_ns() drop_caps() has a number of bugs which mean it doesn't do what you'd expect. However, even if we fixed those, the call in pasta_start_ns() doesn't do anything useful: * In the common case, we're UID 0 at this point. In this case drop_caps() doesn't accomplish anything, because even with capabilities dropped, we are still privileged. * When attaching to an existing namespace with --userns or --netns-only we might not be UID 0. In this case it's too early to drop all capabilities: we need at least CAP_NET_ADMIN to configure the tap device in the namespace. Remove this call - we will still drop capabilities a little later in sandbox(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	01b4e71f7a	pasta_start_ns() always ends in parent context The end of pasta_start_ns() has a test against pasta_child_pid, testing if we're in the parent or the child. However we started the child running the pasta_setup_ns function which always exec()s or exit()s, so if we return from the clone() we are always in the parent, making that test unnecessary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	672a8cd80e	pasta: More general way of starting spawned shell as a login shell When invoked so as to spawn a shell, pasta checks explicitly for the shell being bash and if so, adds a "-l" option to make it a login shell. This is not ideal, since this is a bash specific option and requires pasta to know about specific shell variants. There's a general convention for starting a login shell, which is to prepend a "-" to argv[0]. Use this approach instead, so we don't need bash specific logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
David Gibson	f99e9a3338	test: Move slower tests to end of test run The distro and performance tests are by far the slowest part of the passt testsuite. Move them to the end of the testsuite run, so that it's easier to do a quick test during development by letting the other tests run then interrupting the test runner. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
Stefano Brivio	7f2a7396e2	log.h: Avoid unnecessary GNU extension for token pasting clang says: ./log.h:23:18: warning: token pasting of ',' and __VA_ARGS__ is a GNU extension [-Wgnu-zero-variadic-macro-arguments] We need token pasting here just because of the 'format' in trace(): drop it. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
Stefano Brivio	55cdcc159b	util.h: Add missing gcc pragma push before pragma pop While building with clang: ./util.h:176:24: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas] Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	abbe01af59	icmp: Set sin6_scope_id for outbound ICMPv6 echo requests If we ping a link-local address, we need to pass this to sendto(), as it will obviously fail with -EINVAL otherwise. If we ping other addresses, it's probably a good idea anyway to specify the configured outbound interface here. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	57e2c066e9	conf: Drop excess colons in usage for DHCP and DNS options Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	6acf89638b	netlink: Disable duplicate address detection for configured IPv6 address With default options, when we pass --config-net, the IPv6 address is actually going to be recycled from the init namespace, so it is in fact duplicated, but duplicate address detection has no way to find out. With a different configured address, that's not the case, but anyway duplicate address detection will be unable to see this. In both cases, we're wasting time for nothing. Pass the IFA_F_NODAD flag as we configure globally scoped IPv6 addresses via netlink. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	6f3e38cac5	Don't create 'tap' socket for ports that are bound to loopback only If the user specifies an explicit loopback address for a port binding, we're going to use that address for the 'tap' socket, and the same exact address for the 'spliced' socket (because those are, by definition, only bound to loopback addresses). This means that the second binding will fail, and, unexpectedly, the port is forwarded, but via tap device, which means the source address in the namespace won't be a loopback address. Make it explicit under which conditions we're creating which kind of socket, by refactoring tcp_sock_init() into two separate functions for IPv4 and IPv6 and gathering those conditions at the beginning. Also, don't create spliced sockets if the user specifies explicitly a non-loopback address, those are harmless but not desired either. Fixes: `3c6ae62510` ("conf, tcp, udp: Allow address specification for forwarded ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-10-15 02:10:36 +02:00
Stefano Brivio	d0dd0242a6	tcp, tcp_splice: Fix port remapping for inbound, spliced connections In pasta mode, when we receive a new inbound connection, we need to select a socket that was created in the namespace to proceed and connect() it to its final destination. The existing condition might pick a wrong socket, though, if the destination port is remapped, because we'll check the bitmap of inbound ports using the remapped port (stored in the epoll reference) as index, and not the original port. Instead of using the port bitmap for this purpose, store this information in the epoll reference itself, by adding a new 'outbound' bit, that's set if the listening socket was created the namespace, and unset otherwise. Then, use this bit to pick a socket on the right side. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Fixes: `33482d5bf2` ("passt: Add PASTA mode, major rework") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	eab9d8d5d6	tcp, tcp_splice: Adjust comments to current meaning of inbound and outbound For tcp_sock_init_ns(), "inbound" connections used to be the ones being established toward any listening socket we create, as opposed to sockets we connect(). Similarly, tcp_splice_new() used to handle "inbound" connections in the sense that they originated from listening sockets, and they would in turn cause a connect() on an "outbound" socket. Since commit `1128fa03fe` ("Improve types and names for port forwarding configuration"), though, inbound connections are more broadly defined as the ones directed to guest or namepsace, and outbound the ones originating from there. Update comments for those two functions. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00

1 2 3 4 5 ...

900 commits