Commit graph

34 commits

Author SHA1 Message Date
Stefano Brivio
3a2afde87d conf, udp: Drop mostly duplicated dns_send arrays, rename related fields
Given that we use just the first valid DNS resolver address
configured, or read from resolv.conf(5) on the host, to forward DNS
queries to, in case --dns-forward is used, we don't need to duplicate
dns[] to dns_send[]:

- rename dns_send[] back to dns[]: those are the resolvers we
  advertise to the guest/container

- for forwarding purposes, instead of dns[], use a single field (for
  each protocol version): dns_host

- and rename dns_fwd to dns_match, so that it's clear this is the
  address we are matching DNS queries against, to decide if they need
  to be forwarded

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-11-16 15:09:31 +01:00
Stefano Brivio
73f50a76aa conf: Split the notions of read DNS addresses and offered ones
With --dns-forward, if the host has a loopback address configured as
DNS server, we should actually use it to forward queries, but, if
--no-map-gw is passed, we shouldn't offer the same address via DHCP,
NDP and DHCPv6, because it's not going to be reachable.

Problematic configuration:

* systemd-resolved configuring the usual 127.0.0.53 on the host: we
  read that from /etc/resolv.conf

* --dns-forward specified with an unrelated address, for example
  198.51.100.1

We still want to forward queries to 127.0.0.53, if we receive one
directed to 198.51.100.1, so we can't drop 127.0.0.53 from our list:
we want to use it for forwarding. At the same time, we shouldn't
offer 127.0.0.53 to the guest or container either.

With this change, I'm only covering the case of automatically
configured DNS servers from /etc/resolv.conf. We could extend this to
addresses configured with command-line options, but I don't really
see a likely use case at this point.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-11-04 12:04:32 +01:00
David Gibson
db07804d26 ndp: Use tap_icmp6_send() helper
We send ICMPv6 packets to the guest from both icmp.c and from ndp.c.  The
case in ndp() manually constructs L2 and IPv6 headers, unlike the version
in icmp.c which uses the tap_icmp6_send() helper from tap.c  Now that we've
broaded the parameters of tap_icmp6_send() we can use it in ndp() as well
saving some duplicated logic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:53 +02:00
David Gibson
cb1edae3b5 ndp: Remove unneeded eh_source parameter
ndp() takes a parameter giving the ethernet source address of the packet
it is to respond to, which it uses to determine the destination address to
send the reply packet to.

This is not necessary, because the address will always be the guest's
MAC address.  Even if the guest has just changed MAC address, then either
tap_handler_passt() or tap_handler_pasta() - which are the only call paths
leading to ndp() will have updated c->mac_guest with the new value.

So, remove the parameter, and just use c->mac_guest, making it more
consistent with other paths where we construct packets to send inwards.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:51 +02:00
David Gibson
fb5d1c5d7d tap: Remove unhelpeful vnet_pre optimization from tap_send()
Callers of tap_send() can optionally use a small optimization by adding
extra space for the 4 byte length header used on the qemu socket interface.
tap_ip_send() is currently the only user of this, but this is used only
for "slow path" ICMP and DHCP packets, so there's not a lot of value to
the optimization.

Worse, having the two paths here complicates the interface and makes future
cleanups difficult, so just remove it.  I have some plans to bring back the
optimization in a more general way in future, but for now it's just in the
way.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:43 +02:00
David Gibson
7abd2b0d72 Add csum_icmp6() helper for calculating ICMPv6 checksums
At least two places in passt calculate ICMPv6 checksums, ndp() and
tap_ip_send().  Add a helper to handle this calculation in both places.
For future flexibility, the new helper takes parameters for the fields in
the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to
be explicitly constructed.  It also allows the ICMPv6 header and payload to
be in separate buffers, although we don't use this yet.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-19 03:34:21 +02:00
Stefano Brivio
da152331cf Move logging functions to a new file, log.c
Logging to file is going to add some further complexity that we don't
want to squeeze into util.c.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2022-10-14 17:38:25 +02:00
David Gibson
16f5586bb8 Make substructures for IPv4 and IPv6 specific context information
The context structure contains a batch of fields specific to IPv4 and to
IPv6 connectivity.  Split those out into a sub-structure.

This allows the conf_ip4() and conf_ip6() functions, which take the
entire context but touch very little of it, to be given more specific
parameters, making it clearer what it affects without stepping through the
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-07-30 22:14:07 +02:00
Stefano Brivio
48582bf47f treewide: Mark constant references as const
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
bb70811183 treewide: Packet abstraction with mandatory boundary checks
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.

Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.

This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-03-29 15:35:38 +02:00
Stefano Brivio
89678c5157 conf, udp: Introduce basic DNS forwarding
For compatibility with libslirp/slirp4netns users: introduce a
mechanism to map, in the UDP routines, an address facing guest or
namespace to the first IPv4 or IPv6 address resulting from
configuration as resolver. This can be enabled with the new
--dns-forward option.

This implies that sourcing and using DNS addresses and search lists,
passed via command line or read from /etc/resolv.conf, is not bound
anymore to DHCP/DHCPv6/NDP usage: for example, pasta users might just
want to use addresses from /etc/resolv.conf as mapping target, while
not passing DNS options via DHCP.

Reflect this in all the involved code paths by differentiating
DHCP/DHCPv6/NDP usage from DNS configuration per se, and in the new
options --dhcp-dns, --dhcp-search for pasta, and --no-dhcp-dns,
--no-dhcp-search for passt.

This should be the last bit to enable substantial compatibility
between slirp4netns.sh and slirp4netns(1): pass the --dns-forward
option from the script too.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-02-21 13:41:13 +01:00
Stefano Brivio
b93c2c1713 passt: Drop <linux/ipv6.h> include, carry own ipv6hdr and opt_hdr definitions
This is the only remaining Linux-specific include -- drop it to avoid
clang-tidy warnings and to make code more portable.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-01-26 07:57:09 +01:00
Stefano Brivio
73a4a6b7cd ndp: Don't send a DNS search list if we don't have a list of DNS servers
This is not explicitly forbidden, but it confuses the ISC's DHCP client,
and doesn't make sense anyway.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 17:34:42 +02:00
Stefano Brivio
af55c4e98f ndp: Don't sabotage DAD by replying to probing neighbour solicitation
If the solicitation comes from ::, it's the guest performing
duplicate address detection -- don't answer that.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 12:16:16 +02:00
Stefano Brivio
bf68270898 ndp: Set (ICMP) hop limit to 255 in router advertisement
Found while re-reading this part, zero works as well, but a
host might legitimately refuse a value that's below a given
threshold.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 12:16:16 +02:00
Stefano Brivio
685b50c3ce Makefile: cppcheck target: Suppress unmatchedSuppression, pass CFLAGS
Some of those warnings don't trigger even on systems with very
similar toolchains, suppress unmatchedSuppression warnings, they're
basically useless.

While at it, pass CFLAGS to cppcheck.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 12:16:16 +02:00
Stefano Brivio
627e18fa8a passt: Add cppcheck target, test, and address resulting warnings
...mostly false positives, but a number of very relevant ones too,
in tcp_get_sndbuf(), tcp_conn_from_tap(), and siphash PREAMBLE().

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 09:41:13 +02:00
Stefano Brivio
dd942eaa48 passt: Fix build with gcc 7, use std=c99, enable some more Clang checkers
Unions and structs, you all have names now.

Take the chance to enable bugprone-reserved-identifier,
cert-dcl37-c, and cert-dcl51-cpp checkers in clang-tidy.

Provide a ffsl() weak declaration using gcc built-in.

Start reordering includes, but that's not enough for the
llvm-include-order checker yet.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-21 04:26:08 +02:00
Stefano Brivio
9618d24700 ndp, dhcpv6, tcp, udp: Always use link-local as source if gateway isn't
This shouldn't happen on any sane configuration, but I just met an
example of that: the default IPv6 gateway on the host is configured
with a global unicast address, we use that as source for RA, DHCPv6
replies, and the guest ignores it. Same later on if we talk TCP or
UDP and the guest has no idea where that address comes from.

Use our link-local address in case the gateway address is global.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 11:10:23 +02:00
Stefano Brivio
12cfa6444c passt: Add clang-tidy Makefile target and test, take care of warnings
Most are just about style and form, but a few were actually
serious mistakes (NDP-related).

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-10-20 08:34:22 +02:00
Stefano Brivio
4b0ccb8323 ndp: Set router lifetime to 9000s instead of 3600s
Seen while testing: lifetime expires while we're flooding a tap
interface with UDP packets, the router advertisement comes too late,
and the kernel drops the default router in the namespace. This
should only affect testing, so go for the maximum allowed value,
that is, 9000 seconds.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-27 01:28:02 +02:00
Stefano Brivio
ec2b58ea4d conf, dhcp, ndp: Fix message about default MTU, make NDP consistent
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-09 15:40:04 +02:00
Stefano Brivio
1e49d194d0 passt, pasta: Introduce command-line options and port re-mapping
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-09-01 17:00:27 +02:00
Stefano Brivio
17765f8de0 checksum: Introduce AVX2 implementation, unify helpers
Provide an AVX2-based function using compiler intrinsics for
TCP/IP-style checksums. The load/unpack/add idea and implementation
is largely based on code from BESS (the Berkeley Extensible Software
Switch) licensed as 3-Clause BSD, with a number of modifications to
further decrease pipeline stalls and to minimise cache pollution.

This speeds up considerably data paths from sockets to tap
interfaces, decreasing overhead for checksum computation, with
16-64KiB packet buffers, from approximately 11% to 7%. The rest is
just syscalls at this point.

While at it, provide convenience targets in the Makefile for avx2,
avx2_debug, and debug targets -- these simply add target-specific
CFLAGS to the build.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-26 07:18:50 +02:00
Stefano Brivio
7fa3e90290 ndp: Store link-local or global address on any NDP message received
The guest might not send other types of traffic before we try to
communicate to it, so take also this chance to store its configured
addresses.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-21 10:04:17 +02:00
Stefano Brivio
a9c8d4d924 ndp: Fix calculation of length for DNS Search List option (31)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-17 17:58:03 +02:00
Stefano Brivio
33482d5bf2 passt: Add PASTA mode, major rework
PASTA (Pack A Subtle Tap Abstraction) provides quasi-native host
connectivity to an otherwise disconnected, unprivileged network
and user namespace, similarly to slirp4netns. Given that the
implementation is largely overlapping with PASST, no separate binary
is built: 'pasta' (and 'passt4netns' for clarity) both link to
'passt', and the mode of operation is selected depending on how the
binary is invoked. Usage example:

	$ unshare -rUn
	# echo $$
	1871759

	$ ./pasta 1871759	# From another terminal

	# udhcpc -i pasta0 2>/dev/null
	# ping -c1 pasta.pizza
	PING pasta.pizza (64.190.62.111) 56(84) bytes of data.
	64 bytes from 64.190.62.111 (64.190.62.111): icmp_seq=1 ttl=255 time=34.6 ms

	--- pasta.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 34.575/34.575/34.575/0.000 ms
	# ping -c1 spaghetti.pizza
	PING spaghetti.pizza(2606:4700:3034::6815:147a (2606:4700:3034::6815:147a)) 56 data bytes
	64 bytes from 2606:4700:3034::6815:147a (2606:4700:3034::6815:147a): icmp_seq=1 ttl=255 time=29.0 ms

	--- spaghetti.pizza ping statistics ---
	1 packets transmitted, 1 received, 0% packet loss, time 0ms
	rtt min/avg/max/mdev = 28.967/28.967/28.967/0.000 ms

This entails a major rework, especially with regard to the storage of
tracked connections and to the semantics of epoll(7) references.

Indexing TCP and UDP bindings merely by socket proved to be
inflexible and unsuitable to handle different connection flows: pasta
also provides Layer-2 to Layer-2 socket mapping between init and a
separate namespace for local connections, using a pair of splice()
system calls for TCP, and a recvmmsg()/sendmmsg() pair for UDP local
bindings. For instance, building on the previous example:

	# ip link set dev lo up
	# iperf3 -s

	$ iperf3 -c ::1 -Z -w 32M -l 1024k -P2 | tail -n4
	[SUM]   0.00-10.00  sec  52.3 GBytes  44.9 Gbits/sec  283             sender
	[SUM]   0.00-10.43  sec  52.3 GBytes  43.1 Gbits/sec                  receiver

	iperf Done.

epoll(7) references now include a generic part in order to
demultiplex data to the relevant protocol handler, using 24
bits for the socket number, and an opaque portion reserved for
usage by the single protocol handlers, in order to track sockets
back to corresponding connections and bindings.

A number of fixes pertaining to TCP state machine and congestion
window handling are also included here.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-07-17 11:04:22 +02:00
Stefano Brivio
5fd6db7751 ndp: Always answer neighbour solicitations with the requested target address
The guest might try to resolve hosts other than the main host
namespace (i.e. the gateway) -- just recycle the target address from
the request and resolve it to the MAC address of the gateway.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-21 11:14:52 +02:00
Stefano Brivio
9010054ea4 dhcp, ndp, dhcpv6: Support for multiple DNS servers, search list
Add support for a variable amount of DNS servers, including zero,
from /etc/resolv.conf, in DHCP, NDP and DHCPv6 implementations.

Introduce support for domain search list for DHCP (RFC 3397),
NDP (RFC 8106), and DHCPv6 (RFC 3646), also sourced from
/etc/resolv.conf.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-05-21 11:14:47 +02:00
Stefano Brivio
4aa8e54a30 passt: Introduce a DHCPv6 server
This implementation, similarly to the IPv4 DHCP one, hands out a
single address, which is the same as the upstream address for the
host.

This avoids the need for address translation as long as the client
runs a DHCPv6 client. The NDP "Managed" flag is now set in Router
Advertisements.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-04-13 22:37:40 +02:00
Stefano Brivio
48ca38c606 passt: Run in background, add message logging with severities
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-03-18 12:58:07 +01:00
Stefano Brivio
8bca388e8a passt: Assorted fixes from "fresh eyes" review
A bunch of fixes not worth single commits at this stage, notably:

- make buffer, length parameter ordering consistent in ARP, DHCP,
  NDP handlers

- strict checking of buffer, message and option length in DHCP
  handler (a malicious client could have easily crashed it)

- set up forwarding for IPv4 and IPv6, and masquerading with nft for
  IPv4, from demo script

- get rid of separate slow and fast timers, we don't save any
  overhead that way

- stricter checking of buffer lengths as passed to tap handlers

- proper dequeuing from qemu socket back-end: I accidentally trashed
  messages that were bundled up together in a single tap read
  operation -- the length header tells us what's the size of the next
  frame, but there's no apparent limit to the number of messages we
  get with one single receive

- rework some bits of the TCP state machine, now passive and active
  connection closes appear to be robust -- introduce a new
  FIN_WAIT_1_SOCK_FIN state indicating a FIN_WAIT_1 with a FIN flag
  from socket

- streamline TCP option parsing routine

- track TCP state changes to stderr (this is temporary, proper
  debugging and syslogging support pending)

- observe that multiplying a number by four might very well change
  its value, and this happens to be the case for the data offset
  from the TCP header as we check if it's the same as the total
  length to find out if it's a duplicated ACK segment

- recent estimates suggest that the duration of a millisecond is
  closer to a million nanoseconds than a thousand of them, this
  trend is now reflected into the timespec_diff_ms() convenience
  routine

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-21 11:55:49 +01:00
Stefano Brivio
105b916361 passt: New design and implementation with native Layer 4 sockets
This is a reimplementation, partially building on the earlier draft,
that uses L4 sockets (SOCK_DGRAM, SOCK_STREAM) instead of SOCK_RAW,
providing L4-L2 translation functionality without requiring any
security capability.

Conceptually, this follows the design presented at:
	https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Networking.md

The most significant novelty here comes from TCP and UDP translation
layers. In particular, the TCP state and translation logic follows
the intent of being minimalistic, without reimplementing a full TCP
stack in either direction, and synchronising as much as possible the
TCP dynamic and flows between guest and host kernel.

Another important introduction concerns addressing, port translation
and forwarding. The Layer 4 implementations now attempt to bind on
all unbound ports, in order to forward connections in a transparent
way.

While at it:
- the qemu 'tap' back-end can't be used as-is by qrap anymore,
  because of explicit checks now introduced in qemu to ensure that
  the corresponding file descriptor is actually a tap device. For
  this reason, qrap now operates on a 'socket' back-end type,
  accounting for and building the additional header reporting
  frame length

- provide a demo script that sets up namespaces, addresses and
  routes, and starts the daemon. A virtual machine started in the
  network namespace, wrapped by qrap, will now directly interface
  with passt and communicate using Layer 4 sockets provided by the
  host kernel.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 09:28:55 +01:00
Stefano Brivio
d02e059ddc passt: Add IPv6 and NDP support, further fixes for IPv4 CT
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2021-02-16 07:58:05 +01:00