passt

Author	SHA1	Message	Date
Stefano Brivio	d32edee60a	passt: Use INET{,6}_ADDRSTRLEN instead of open coded sizeof Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:43 +01:00
Stefano Brivio	f435e38927	udp: Fix typo in tcp_tap_handler() documentation Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:42 +01:00
Stefano Brivio	93977868f9	udp: Use size_t for return value of recvfrom() Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:42 +01:00
Stefano Brivio	cd14bff5ea	tcp: Add struct for TCP execution context, move hash_secret to it We don't need to keep small data as static variables, move the only small variable we have so far to the new struct. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:41 +01:00
Stefano Brivio	bb9fb9e2d1	tcp: Introduce hash table for socket lookup for packets from tap Replace the dummy, full array scan implementation, by a hash table based on SipHash, with chained hashing for collisions. This table is also statically allocated, and it's simply an array of socket numbers. Connection entries are chained by pointers in the connection entry itself, which now also contains socket number and hash bucket index to keep removal reasonably fast. New entries are inserted at the head of the chain, that is, the most recently inserted entry is directly mapped from the bucket. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:40 +01:00
Stefano Brivio	4f675d63e8	tcp: Ignore out-of-order ACKs from tap instead of resetting connection We might receive out-of-order ACK packets from the tap device, just like any other packet. I guess I've been overcautious and regarded it as a condition we can't recover from, but all that happens is that we have already seen a higher ACK sequence number, which means that data has been already received and discarded from the buffer. We have to ignore the lower sequence number we receive later, though, because that would force the buffer bookkeeping into throwing away more data than expected. Drop the ACK sequence assignment from tcp_tap_handler(), which was redundant, and let tcp_sock_consume() take exclusive care of that. Now that tcp_sock_consume() can never fail, make it a void, and drop checks from callers. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:39 +01:00
Stefano Brivio	a418946837	tcp: Add siphash implementation for initial sequence numbers Implement siphash routines for initial TCP sequence numbers (12 bytes input for IPv4, 36 bytes input for IPv6), and while at it, also functions we'll use later on for hash table indices and TCP timestamp offsets (with 8, 20, 32 bytes of input). Use these to set the initial sequence number, according to RFC 6528, for connections originating either from the tap device or from sockets. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-03-17 10:57:36 +01:00
Stefano Brivio	8bca388e8a	passt: Assorted fixes from "fresh eyes" review A bunch of fixes not worth single commits at this stage, notably: - make buffer, length parameter ordering consistent in ARP, DHCP, NDP handlers - strict checking of buffer, message and option length in DHCP handler (a malicious client could have easily crashed it) - set up forwarding for IPv4 and IPv6, and masquerading with nft for IPv4, from demo script - get rid of separate slow and fast timers, we don't save any overhead that way - stricter checking of buffer lengths as passed to tap handlers - proper dequeuing from qemu socket back-end: I accidentally trashed messages that were bundled up together in a single tap read operation -- the length header tells us what's the size of the next frame, but there's no apparent limit to the number of messages we get with one single receive - rework some bits of the TCP state machine, now passive and active connection closes appear to be robust -- introduce a new FIN_WAIT_1_SOCK_FIN state indicating a FIN_WAIT_1 with a FIN flag from socket - streamline TCP option parsing routine - track TCP state changes to stderr (this is temporary, proper debugging and syslogging support pending) - observe that multiplying a number by four might very well change its value, and this happens to be the case for the data offset from the TCP header as we check if it's the same as the total length to find out if it's a duplicated ACK segment - recent estimates suggest that the duration of a millisecond is closer to a million nanoseconds than a thousand of them, this trend is now reflected into the timespec_diff_ms() convenience routine Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-21 11:55:49 +01:00
Stefano Brivio	105b916361	passt: New design and implementation with native Layer 4 sockets This is a reimplementation, partially building on the earlier draft, that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW, providing L4-L2 translation functionality without requiring any security capability. Conceptually, this follows the design presented at: https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md The most significant novelty here comes from TCP and UDP translation layers. In particular, the TCP state and translation logic follows the intent of being minimalistic, without reimplementing a full TCP stack in either direction, and synchronising as much as possible the TCP dynamic and flows between guest and host kernel. Another important introduction concerns addressing, port translation and forwarding. The Layer 4 implementations now attempt to bind on all unbound ports, in order to forward connections in a transparent way. While at it: - the qemu 'tap' back-end can't be used as-is by qrap anymore, because of explicit checks now introduced in qemu to ensure that the corresponding file descriptor is actually a tap device. For this reason, qrap now operates on a 'socket' back-end type, accounting for and building the additional header reporting frame length - provide a demo script that sets up namespaces, addresses and routes, and starts the daemon. A virtual machine started in the network namespace, wrapped by qrap, will now directly interface with passt and communicate using Layer 4 sockets provided by the host kernel. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-16 09:28:55 +01:00
Stefano Brivio	d02e059ddc	passt: Add IPv6 and NDP support, further fixes for IPv4 CT Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-16 07:58:05 +01:00
Stefano Brivio	6709ade2bd	merd: Rename to PASST Plug A Simple Socket Transport. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-16 07:58:01 +01:00
Stefano Brivio	b439984641	merd: ARP and DHCP handlers, connection tracking fixes With this, merd provides a fully functional IPv4 environment to guests, requiring a single capability, CAP_NET_RAW. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-16 07:57:57 +01:00
Stefano Brivio	fa2d20908d	merd: Switch to AF_UNIX for qemu tap, provide wrapper We can bypass a full-fledged network interface between qemu and merd by connecting the qemu tap file descriptor to a provided UNIX domain socket: this could be implemented in qemu eventually, qrap covers this meanwhile. This also avoids the need for the AF_PACKET socket towards the guest. Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-16 07:57:51 +01:00
Stefano Brivio	cefcf0bc2c	merd: Initial import Signed-off-by: Stefano Brivio <sbrivio@redhat.com>	2021-02-16 07:57:46 +01:00

... 32 33 34 35 36

1764 commits