passt

Author	SHA1	Message	Date
David Gibson	d9394eb9b7	udp: Split splice field in udp_epoll_ref into (mostly) independent bits The @splice field in union udp_epoll_ref can have a number of values for different types of "spliced" packet flows. Split it into several single bit fields with more or less independent meanings. The new @splice field is just a boolean indicating whether the socket is associated with a spliced flow, making it identical to the @splice fiend in tcp_epoll_ref. The new bit @orig, indicates whether this is a socket which can originate new udp packet flows (created with -u or -U) or a socket created on the fly to handle reply socket. @ns indicates whether the socket lives in the init namespace or the pasta namespace. Making these bits more orthogonal to each other will simplify some future cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-12-06 07:41:31 +01:00
David Gibson	8517239243	udp: Remove the @bound field from union udp_epoll_ref We set this field, but nothing ever checked it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-12-06 07:41:28 +01:00
David Gibson	1cd684b09b	udp: Don't connect "forward" sockets for spliced flows Currently we connect() the socket we use to forward spliced UDP flows. However, we now only ever use sendto() rather than send() on this socket so there's not actually any need to connect it. Don't do so. Rename a number of things that referred to "connect" or "conn" since that would now be misleading. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-12-06 07:41:25 +01:00
David Gibson	9ef31b7619	udp: Always use sendto() rather than send() for forwarding spliced packets udp_sock_handler_splice() has two different ways of sending out packets once it has determined the correct destination socket. For the originating sockets (which are not connected) it uses sendto() to specify a specific address. For the forward socket (which is connected) we use send(). However we know the correct destination address even for the forward socket we do also know the correct destination address. We can use this to use sendto() instead of send(), removing the need for two different paths and some staging data structures. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-12-06 07:41:22 +01:00
David Gibson	729edc241d	udp: Separate tracking of inbound and outbound packet flows Each entry udp_splice_map[v6][N] keeps information about two essentially unrelated packet flows. @ns_conn_sock, @ns_conn_ts and @init_bound_sock track a packet flow from port N in the host init namespace to some other port in the pasta namespace (the one @ns_conn_sock is connected to). @init_conn_sock, @init_conn_ts and @ns_bound_sock track packet flow from port N in the pasta namespace to some other port in the host init namespace (the one @init_conn_sock is connected to). Split udp_splice_map[][] into two separate tables for the two directions. Each entry in each table is a 'struct udp_splice_flow' with @orig_sock (previously the bound socket), @target_sock (previously the connected socket) and @ts (the timeout for the target socket). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-12-06 07:41:18 +01:00
David Gibson	4ebb4905e9	udp: Also bind() connected ports for "splice" forwarding pasta handles "spliced" port forwarding by resending datagrams received on a bound socket in the init namespace to a connected socket in the guest namespace. This means there are actually three ports associated with each "connection". First there's the source and destination ports of the originating datagram. That's also the destination port of the forwarded datagram, but the source port of the forwarded datagram is the kernel allocated bound address of the connected socket. However, by bind()ing as well as connect()ing the forwarding socket we can choose the source port of the forwarded datagrams. By choosing it to match the original source port we remove that surprising third port number and no longer need to store port numbers in struct udp_splice_port. As a bonus this means that the recipient of the packets will see the original source port if they call getpeername(). This rarely matters, but it can't hurt. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-12-06 07:40:56 +01:00
Stefano Brivio	3a2afde87d	conf, udp: Drop mostly duplicated dns_send arrays, rename related fields Given that we use just the first valid DNS resolver address configured, or read from resolv.conf(5) on the host, to forward DNS queries to, in case --dns-forward is used, we don't need to duplicate dns[] to dns_send[]: - rename dns_send[] back to dns[]: those are the resolvers we advertise to the guest/container - for forwarding purposes, instead of dns[], use a single field (for each protocol version): dns_host - and rename dns_fwd to dns_match, so that it's clear this is the address we are matching DNS queries against, to decide if they need to be forwarded Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-11-16 15:09:31 +01:00
Stefano Brivio	817eedc28a	tcp, udp: Don't initialise IPv6/IPv4 sockets if IPv4/IPv6 are not enabled If we disable a given IP version automatically (no corresponding default route on host) or administratively (--ipv4-only or --ipv6-only options), we don't initialise related buffers and services (DHCP for IPv4, NDP and DHCPv6 for IPv6). The "tap" handlers will also ignore packets with a disabled IP version. However, in commit `3c6ae62510` ("conf, tcp, udp: Allow address specification for forwarded ports") I happily changed socket initialisation functions to take AF_UNSPEC meaning "any enabled IP version", but I forgot to add checks back for the "enabled" part. Reported by Paul: on a host without default IPv6 route, but IPv6 enabled, connect, using IPv6, to a port handled by pasta, which tries to send data to a tap device without initialised buffers for that IP version and exits because the resulting write() fails. Simpler way to reproduce: pasta -6 and inbound IPv4 connection, or pasta -4 and inbound IPv6 connection. Reported-by: Paul Holzinger <pholzing@redhat.com> Fixes: `3c6ae62510` ("conf, tcp, udp: Allow address specification for forwarded ports") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-11-10 11:17:50 +01:00
Stefano Brivio	db74679f98	udp: Check for answers to forwarded DNS queries before handling local redirects Now that we allow loopback DNS addresses to be used as targets for forwarding, we need to check if DNS answers come from those targets, before deciding to eventually remap traffic for local redirects. Otherwise, the source address won't match the one configured as forwarder, which means that the guest or the container will refuse those responses. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:32 +01:00
David Gibson	7c7b68dbe0	Use typing to reduce chances of IPv4 endianness errors We recently corrected some errors handling the endianness of IPv4 addresses. These are very easy errors to make since although we mostly store them in network endianness, we sometimes need to manipulate them in host endianness. To reduce the chances of making such mistakes again, change to always using a (struct in_addr) instead of a bare in_addr_t or uint32_t to store network endian addresses. This makes it harder to accidentally do arithmetic or comparisons on such addresses as if they were host endian. We introduce a number of IN4_IS_ADDR_*() helpers to make it easier to directly work with struct in_addr values. This has the additional benefit of making the IPv4 and IPv6 paths more visually similar. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:24 +01:00
David Gibson	dd3470d9a9	Use IPV4_IS_LOOPBACK more widely This macro checks if an IPv4 address is in the loopback network (127.0.0.0/8). There are two places where we open code an identical check, use the macro instead. There are also a number of places we specifically exclude the loopback address (127.0.0.1), but we should actually be excluding anything in the loopback network. Change those sites to use the macro as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-11-04 12:04:21 +01:00
Stefano Brivio	346da48fe6	udp: Fix port and address checks for DNS forwarder First off, as we swap endianness for source ports in udp_fill_data_v{4,6}(), we want host endianness, not network endianness. It doesn't actually matter if we use htons() or ntohs() here, but the current version is confusing. In the IPv4 path, when we remap DNS answers, we already swapped the endianness as needed for the source port: don't swap it again, otherwise we'll not map DNS answers for IPv4. In the IPv6 path, when we remap DNS answers, we want to check that they came from our upstream DNS server, not the one configured via --dns-forward (which doesn't even need to exist for this functionality to work). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	c1eff9a3c6	conf, tcp, udp: Allow specification of interface to bind to Since kernel version 5.7, commit c427bfec18f2 ("net: core: enable SO_BINDTODEVICE for non-root users"), we can bind sockets to interfaces, if they haven't been bound yet (as in bind()). Introduce an optional interface specification for forwarded ports, prefixed by %, that can be passed together with an address. Reported use case: running local services that use ports we want to have externally forwarded: https://github.com/containers/podman/issues/14425 Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-15 02:10:36 +02:00
Stefano Brivio	da152331cf	Move logging functions to a new file, log.c Logging to file is going to add some further complexity that we don't want to squeeze into util.c. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>	2022-10-14 17:38:25 +02:00
Stefano Brivio	5290b6f13e	udp: Replace pragma to ignore bogus stringop-overread warning with workaround Commit `c318ffcb4c` ("udp: Ignore bogus -Wstringop-overread for write() from gcc 12.1") uses a gcc pragma to ignore a bogus warning, which started appearing on gcc 12.1 (aarch64 and x86_64) due to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 ...but gcc 12.2 has the same issue. Not just that: if LTO is enabled, the pragma itself is ignored (this wasn't the case with gcc 12.1), because of: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922 Drop the pragma, and assign a frame (in the networking sense) pointer from the offset of the Ethernet header in the buffer, then pass it to write() and pcap(), so that gcc doesn't obsess anymore with the fact that an Ethernet header is 14 bytes and we're sending more than that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-09-29 12:23:10 +02:00
David Gibson	d5b80ccc72	Fix widespread off-by-one error dealing with port numbers Port numbers (for both TCP and UDP) are 16-bit, and so fit exactly into a 'short'. USHRT_MAX is therefore the maximum port number and this is widely used in the code. Unfortunately, a lot of those places don't actually want the maximum port number (USHRT_MAX == 65535), they want the total number of ports (65536). This leads to a number of potentially nasty consequences: * We have buffer overruns on the port_fwd::delta array if we try to use port 65535 * We have similar potential overruns for the tcp_sock_* arrays * Interestingly udp_act had the correct size, but we can calculate it in a more direct manner * We have a logical overrun of the ports bitmap as well, although it will just use an unused bit in the last byte so isnt harmful * Many loops don't consider port 65535 (which does mitigate some but not all of the buffer overruns above) * In udp_invert_portmap() we incorrectly compute the reverse port translation for return packets Correct all these by using a new NUM_PORTS defined explicitly for this purpose. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	3ede07aac9	Treat port numbers as unsigned Port numbers are unsigned values, but we're storing them in (signed) int variables in some places. This isn't actually harmful, because int is large enough to hold the entire range of ports. However in places we don't want to use an in_port_t (usually to avoid overflow on the last iteration of a loop) it makes more conceptual sense to use an unsigned int. This will also avoid some problems with later cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	f5a31ee94c	Don't use indirect remap functions for conf_ports() Now that we've delayed initialization of the UDP specific "reverse" map until udp_init(), the only difference between the various 'remap' functions used in conf_ports() is which array they target. So, simplify by open coding the logic into conf_ports() with a pointer to the correct mapping array. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	1467a35b5a	udp: Delay initialization of UDP reversed port mapping table Because it's connectionless, when mapping UDP ports we need, in addition to the table of deltas for destination ports needed by TCP, we need an inverted table to translate the source ports on return packets. Currently we fill out the inverted table at the same time we construct the main table in udp_remap_to_tap() and udp_remap_to_init(). However, we don't use either table until after we've initialized UDP, so we can delay the construction of the reverse table to udp_init(). This makes the configuration more symmetric between TCP and UDP which will enable further cleanups. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	163dc5f188	Consolidate port forwarding configuration into a common structure The configuration for how to forward ports in and out of the guest/ns is divided between several different variables. For each connect direction and protocol we have a mode in the udp/tcp context structure, a bitmap of which ports to forward also in the context structure and an array of deltas to apply if the outward facing and inward facing port numbers are different. This last is a separate global variable, rather than being in the context structure, for no particular reason. UDP also requires an additional array which has the reverse mapping used for return packets. Consolidate these into a re-used substructure in the context structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-24 14:48:35 +02:00
David Gibson	af9c98af5f	udp: Don't drop zero-length outbound UDP packets udp_tap_handler() currently skips outbound packets if they have a payload length of zero. This is not correct, since in a datagram protocol zero length packets still have meaning. Adjust this to correctly forward the zero-length packets by using a msghdr with msg_iovlen == 0. Bugzilla: https://bugs.passt.top/show_bug.cgi?id=19 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 11:14:29 +02:00
David Gibson	484652c632	udp: Don't pre-initialize msghdr array In udp_tap_handler() the array of msghdr structures, mm[], is initialized to zero. Since UIO_MAXIOV is 1024, this can be quite a large zero, which is expensive if we only end up using a few of its entries. It also makes it less obvious how we're setting all the control fields at the point we actually invoke sendmmsg(). Rather than pre-initializing it, just initialize each element as we use it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-09-13 11:14:29 +02:00
David Gibson	16f5586bb8	Make substructures for IPv4 and IPv6 specific context information The context structure contains a batch of fields specific to IPv4 and to IPv6 connectivity. Split those out into a sub-structure. This allows the conf_ip4() and conf_ip6() functions, which take the entire context but touch very little of it, to be given more specific parameters, making it clearer what it affects without stepping through the code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-07-30 22:14:07 +02:00
David Gibson	5e12d23acb	Separate IPv4 and IPv6 configuration After recent changes, conf_ip() now has essentially entirely disjoint paths for IPv4 and IPv6 configuration. So, it's cleaner to split them out into different functions conf_ip4() and conf_ip6(). Splitting these out also lets us make the interface a bit nicer, having them return success or failure directly, rather than manipulating c->v4 and c->v6 to indicate success/failure of the two versions. Since these functions may also initialize the interface index for each protocol, it turns out we can then drop c->v4 and c->v6 entirely, replacing tests on those with tests on whether c->ifi4 or c->ifi6 is non-zero (since a 0 interface index is never valid). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [sbrivio: Whitespace fixes] Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-07-30 22:12:50 +02:00
Stefano Brivio	c318ffcb4c	udp: Ignore bogus -Wstringop-overread for write() from gcc 12.1 With current OpenSUSE Tumbleweed on aarch64 (gcc-12-1.3.aarch64) and on x86_64 (gcc-12-1.4.x86_64), but curiously not on armv7hl (gcc-12-1.3.armv7hl), gcc warns about using the _pointer_ to the 802.3 header to write the whole frame to the tap descriptor: reading between 62 and 4294967357 bytes from a region of size 14 which is bogus: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 Probably declaring udp_sock_fill_data_v{4,6}() as noinline would "fix" this, but that's on the data path, so I'd rather not. Use a gcc pragma instead. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-05-19 16:27:20 +02:00
Stefano Brivio	3c6ae62510	conf, tcp, udp: Allow address specification for forwarded ports This feature is available in slirp4netns but was missing in passt and pasta. Given that we don't do dynamic memory allocation, we need to bind sockets while parsing port configuration. This means we need to process all other options first, as they might affect addressing and IP version support. It also implies a minor rework of how TCP and UDP implementations bind sockets. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-05-01 07:19:05 +02:00
Stefano Brivio	2b1fbf4631	udp: Out-of-bounds read, CWE-125 in udp_timer() Not an actual issue due to how it's typically stored, but udp_act can also be used for ports 65528-65535. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-04-07 11:44:35 +02:00
Stefano Brivio	22ed4467a4	treewide: Unchecked return value from library, CWE-252 All instances were harmless, but it might be useful to have some debug messages here and there. Reported by Coverity. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-04-07 11:44:35 +02:00
Stefano Brivio	37c228ada8	tap, tcp, udp, icmp: Cut down on some oversized buffers The existing sizes provide no measurable differences in throughput and packet rates at this point. They were probably needed as batched implementations were not complete, but they can be decreased quite a bit now. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	ad7f57a5b7	udp: Move flags before ts in struct udp_tap_port, avoid end padding Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	48582bf47f	treewide: Mark constant references as const Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	bb70811183	treewide: Packet abstraction with mandatory boundary checks Implement a packet abstraction providing boundary and size checks based on packet descriptors: packets stored in a buffer can be queued into a pool (without storage of its own), and data can be retrieved referring to an index in the pool, specifying offset and length. Checks ensure data is not read outside the boundaries of buffer and descriptors, and that packets added to a pool are within the buffer range with valid offset and indices. This implies a wider rework: usage of the "queueing" part of the abstraction mostly affects tap_handler_{passt,pasta}() functions and their callees, while the "fetching" part affects all the guest or tap facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6 handlers. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	3eb19cfd8a	tcp, udp, util: Enforce 24-bit limit on socket numbers This should never happen, but there are no formal guarantees: ensure socket numbers are below SOCKET_MAX. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-29 15:35:38 +02:00
Stefano Brivio	79217b7689	udp: Use flags for local, loopback, and configured unicast binds There's no value in keeping a separate timestamp for activity and for aging of local binds, given that they have the same timeout. Reduce that to a single timestamp, with a flag indicating the local bind. Also use flags instead of separate int fields for loopback and configured unicast address usage as source address. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-28 17:11:40 +02:00
Stefano Brivio	5eb7604203	udp: Split buffer queueing/writing parts of udp_sock_handler() ...it became too hard to follow: split it off to udp_sock_fill_data_v{4,6}. While at it, use IN6_ARE_ADDR_EQUAL(a, b), courtesy of netinet/in.h, instead of open-coded memcmp(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-28 17:11:40 +02:00
Stefano Brivio	0296a57242	udp: Drop _splice from recv, send, sendto static buffer names It's already implied by the fact they don't have "l2" in their names, and dropping it improves readability a bit. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-03-28 17:11:40 +02:00
Stefano Brivio	eed6933e6c	udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6} Not functionally needed, but gcc versions 7 to 9 (at least) will issue a warning otherwise. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-25 22:54:35 +01:00
Stefano Brivio	6c93111864	tcp, udp: Receive batching doesn't pay off when writing single frames to tap In pasta mode, when we get data from sockets and write it as single frames to the tap device, we batch receive operations considerably, and then (conceptually) split the data in many smaller writes. It looked like an obvious choice, but performance is actually better if we receive data in many small frame-sized recvmsg()/recvmmsg(). The syscall overhead with the previous behaviour, observed by perf, comes predominantly from write operations, but receiving data in shorter chunks probably improves cache locality by a considerable amount. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	9afd87b733	udp: Allow loopback connections from host using configured unicast address Likely for testing purposes only: allow connections from host to guest or namespace using, as connection target, the configured, possibly global unicast address. In this case, we have to map the destination address to a link-local address, and for port-based tracked responses, the source address needs to be again the unicast address: not loopback, not link-local. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	89678c5157	conf, udp: Introduce basic DNS forwarding For compatibility with libslirp/slirp4netns users: introduce a mechanism to map, in the UDP routines, an address facing guest or namespace to the first IPv4 or IPv6 address resulting from configuration as resolver. This can be enabled with the new --dns-forward option. This implies that sourcing and using DNS addresses and search lists, passed via command line or read from /etc/resolv.conf, is not bound anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just want to use addresses from /etc/resolv.conf as mapping target, while not passing DNS options via DHCP. Reflect this in all the involved code paths by differentiating DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns, --no-dhcp-search for passt. This should be the last bit to enable substantial compatibility between slirp4netns.sh and slirp4netns(1): pass the --dns-forward option from the script too. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	0515adceaa	passt, pasta: Namespace-based sandboxing, defer seccomp policy application To reach (at least) a conceptually equivalent security level as implemented by --enable-sandbox in slirp4netns, we need to create a new mount namespace and pivot_root() into a new (empty) mountpoint, so that passt and pasta can't access any filesystem resource after initialisation. While at it, also detach IPC, PID (only for passt, to prevent vulnerabilities based on the knowledge of a target PID), and UTS namespaces. With this approach, if we apply the seccomp filters right after the configuration step, the number of allowed syscalls grows further. To prevent this, defer the application of seccomp policies after the initialisation phase, before the main loop, that's where we expect bad things to happen, potentially. This way, we get back to 22 allowed syscalls for passt and 34 for pasta, on x86_64. While at it, move #syscalls notes to specific code paths wherever it conceptually makes sense. We have to open all the file handles we'll ever need before sandboxing: - the packet capture file can only be opened once, drop instance numbers from the default path and use the (pre-sandbox) PID instead - /proc/net/tcp{,v6} and /proc/net/udp{,v6}, for automatic detection of bound ports in pasta mode, are now opened only once, before sandboxing, and their handles are stored in the execution context - the UNIX domain socket for passt is also bound only once, before sandboxing: to reject clients after the first one, instead of closing the listening socket, keep it open, accept and immediately discard new connection if we already have a valid one Clarify the (unchanged) behaviour for --netns-only in the man page. To actually make passt and pasta processes run in a separate PID namespace, we need to unshare(CLONE_NEWPID) before forking to background (if configured to do so). Introduce a small daemon() implementation, __daemon(), that additionally saves the PID file before forking. While running in foreground, the process itself can't move to a new PID namespace (a process can't change the notion of its own PID): mention that in the man page. For some reason, fork() in a detached PID namespace causes SIGTERM and SIGQUIT to be ignored, even if the handler is still reported as SIG_DFL: add a signal handler that just exits. We can now drop most of the pasta_child_handler() implementation, that took care of terminating all processes running in the same namespace, if pasta started a shell: the shell itself is now the init process in that namespace, and all children will terminate once the init process exits. Issuing 'echo $$' in a detached PID namespace won't return the actual namespace PID as seen from the init namespace: adapt demo and test setup scripts to reflect that. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-02-21 13:41:13 +01:00
Stefano Brivio	caa22aa644	tcp, udp, util: Fixes for bitmap handling on big-endian, casts Bitmap manipulating functions would otherwise refer to inconsistent sets of bits on big-endian architectures. While at it, fix up a couple of casts. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-26 16:30:59 +01:00
Stefano Brivio	b93c2c1713	passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions This is the only remaining Linux-specific include -- drop it to avoid clang-tidy warnings and to make code more portable. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2022-01-26 07:57:09 +01:00
Stefano Brivio	627e18fa8a	passt: Add cppcheck target, test, and address resulting warnings ...mostly false positives, but a number of very relevant ones too, in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE(). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 09:41:13 +02:00
Stefano Brivio	27f5999677	udp: Avoid static initialiser for udp{4,6}_l2_buf With the new UDP_TAP_FRAMES value, the binary size grows considerably. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 04:49:25 +02:00
Stefano Brivio	85a80f8f63	udp: Fix maximum payload size calculation for IPv4 buffers, bump UDP_TAP_FRAMES The issue with a higher UDP_TAP_FRAMES was actually coming from a payload size the guest couldn't digest. Fix that, and bump UDP_TAP_FRAMES back to 128. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 04:42:09 +02:00
Stefano Brivio	dd942eaa48	passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers Unions and structs, you all have names now. Take the chance to enable bugprone-reserved-identifier, cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy. Provide a ffsl() weak declaration using gcc built-in. Start reordering includes, but that's not enough for the llvm-include-order checker yet. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-21 04:26:08 +02:00
Stefano Brivio	9618d24700	ndp, dhcpv6, tcp, udp: Always use link-local as source if gateway isn't This shouldn't happen on any sane configuration, but I just met an example of that: the default IPv6 gateway on the host is configured with a global unicast address, we use that as source for RA, DHCPv6 replies, and the guest ignores it. Same later on if we talk TCP or UDP and the guest has no idea where that address comes from. Use our link-local address in case the gateway address is global. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-20 11:10:23 +02:00
Stefano Brivio	12cfa6444c	passt: Add clang-tidy Makefile target and test, take care of warnings Most are just about style and form, but a few were actually serious mistakes (NDP-related). Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-20 08:34:22 +02:00
Stefano Brivio	b0b77118fe	passt: Address warnings from Clang's scan-build All false positives so far. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-10-20 08:29:30 +02:00

1 2

80 commits