Rebase the patch for Podman on top of current upstream, and:
- add support for configuration of specific addresses for forwarded
ports
- by default, disable port forwarding, and reflect this in the man
page changes
- adjust processing to a new, incompatible format for port storage,
which I couldn't actually track down to a specific commit, but
that resulted in https://github.com/containers/podman/issues/13643
and commit eedaaf33cdbf ("fix slirp4netns port forwarding with
ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This feature is available in slirp4netns but was missing in passt and
pasta.
Given that we don't do dynamic memory allocation, we need to bind
sockets while parsing port configuration. This means we need to
process all other options first, as they might affect addressing and
IP version support. It also implies a minor rework of how TCP and UDP
implementations bind sockets.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reported by Coverity: if we fail to run the AVX2 version, once
execve() fails, we had already replaced argv[0] with the new
stack-allocated path string, and that's then passed back to
main(). Use a static variable instead.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Not an actual issue due to how it's typically stored, but udp_act
can also be used for ports 65528-65535. Reported by Coverity.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reported by Coverity: it doesn't see that tcp{4,6}_l2_buf_used are
set to zero by tcp_l2_data_buf_flush(), repeat that explicitly here.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This really just needs to be an assignment before line_read() --
turn it into a for loop. Reported by Coverity.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
All instances were harmless, but it might be useful to have some
debug messages here and there. Reported by Coverity.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Field doff in struct tcp_hdr is 4 bits wide, so optlen in
tcp_tap_handler() is already bound, but make that explicit.
Reported by Coverity.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...using dh_apparmor to ship and apply AppArmor profiles. Tried on
current Debian testing (Bookworm, 12).
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
read() will return zero if we pass a zero length, which makes no
sense: instead, track explicitly that we exhausted the buffer, flush
packets to handlers and redo.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If the tun interface disappears, we'll call tap_ns_tun() after the
seccomp profile is applied: add ioctl() and openat() to it.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The existing sizes provide no measurable differences in throughput
and packet rates at this point. They were probably needed as batched
implementations were not complete, but they can be decreased quite a
bit now.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...we don't really need two extra bits, but it's easier to organise
things differently than to silence this.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
It's actually quite easy to make it fail depending on the
environment, accurately report errors here.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Implement a packet abstraction providing boundary and size checks
based on packet descriptors: packets stored in a buffer can be queued
into a pool (without storage of its own), and data can be retrieved
referring to an index in the pool, specifying offset and length.
Checks ensure data is not read outside the boundaries of buffer and
descriptors, and that packets added to a pool are within the buffer
range with valid offset and indices.
This implies a wider rework: usage of the "queueing" part of the
abstraction mostly affects tap_handler_{passt,pasta}() functions and
their callees, while the "fetching" part affects all the guest or tap
facing implementations: TCP, UDP, ICMP, ARP, NDP, DHCP and DHCPv6
handlers.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
...by:
- storing the chained-hash next connection pointer as numeric
reference rather than as pointer
- storing the MSS as 14-bit value, and rounding it
- using only the effective amount of bits needed to store the hash
bucket number
- explicitly limiting window scaling factors to 4-bit values
(maximum factor is 14, from RFC 7323)
- scaling SO_SNDBUF values, and using a 8-bit representation for
the duplicate ACK sequence
- keeping window values unscaled, as received and sent
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
I didn't have time to investigate the root cause for the virtio_net
TX hang yet. Add a quick work-around for the moment being.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We can't take for granted that the hard limit for open files is
big enough as to allow to delay closing sockets to a timer.
Store the value of RTLIMIT_NOFILE we set at start, and use it to
understand if we're approaching the limit with pending, spliced
TCP connections. If that's the case, close sockets right away as
soon as they're not needed, instead of deferring this task to a
timer.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>