Update function comment headers in iov.c to a consistent and
standardized format.
This change ensures:
- Comment blocks for functions consistently start with /**.
- Function names in the comment summary line include parentheses ().
This improves overall comment clarity and uniformity within the file.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Standardize and fix issues in `virtio.c` and `virtio.h` comment headers.
Improvements include:
- Added `()` to function names in comment summaries.
- Added colons after parameter and enum member tags.
- Changed `/*` to `/**` for `virtq_avail_event()` comment.
- Fixed typos (e.g., "file"->"fill", "virqueue"->"virtqueue").
- Added missing `Return:` tag for `vu_queue_rewind()`.
- Corrected parameter names in `virtio.h` comments to match code.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit cleans up function comment headers in vhost_user.c to ensure
accuracy and consistency with the code. Changes include correcting
parameter names in comments and signatures (e.g., standardizing on vmsg
for vhost messages, fixing dev to vdev), updating function names in
comment descriptions, and removing/rectifying erroneous parameter
documentation.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit addresses several spelling errors identified by the `codespell`
tool. The corrections apply to:
- Code comments in `fwd.c`, `ip.h`, `isolation.c`, and `log.c`.
- An error message string in `vhost_user.c`.
Specifically, the following misspellings were corrected:
- "adddress" to "address"
- "capabilites" to "capabilities"
- "Musn't" to "Mustn't"
- "calculatd" to "calculated"
- "Invalide" to "Invalid"
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit enhances test reporting by tracking and displaying the
number of skipped tests.
The skipped test count is now visible in the tmux status bar during
execution and included in the final test summary log. This provides
a more complete overview of test suite results.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Fixes the following clang-analyzer warning:
flow_table.h:96:25: note: Subtraction of two pointers that do not point into the same array is undefined behavior
96 | return (union flow *)f - flowtab;
The `flow_idx()` function is called via `FLOW_IDX()` from
`flow_foreach_slot()`, where `f` is set to `&flowtab[idx].f`.
Therefore, `f` and `flowtab` do point to the same array.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Addresses Clang warning: "Subtraction of two pointers that do not
point into the same array is undefined behavior" for the line:
`ndp_send(c, dst, &ra, ptr - (unsigned char *)&ra);`
Here, `ptr` is `&ra.var[0]`. The subtraction calculates the offset
of `var[0]` within the `struct ra_options ra`. Since `ptr` points
inside `ra`, this pointer arithmetic is well-defined for
calculating the size of the data to send, even if `ptr` and `&ra`
are not strictly considered part of the same "array" by the analyzer.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In `virtqueue_read_indirect_desc()`, the pointer arithmetic involving
`desc` is intentional. We add the length in bytes (`read_len`)
divided by the size of `struct vring_desc` to `desc`, which is
an array of `struct vring_desc`. This correctly calculates the
offset in terms of the number of `struct vring_desc` elements.
Clang issues the following warning due to this explicit scaling:
virtio.c:238:8: error: suspicious usage of 'sizeof(...)' in pointer
arithmetic; this scaled value will be scaled again by the '+='
operator [bugprone-sizeof-expression,cert-arr39-c,-Werror]
238 | desc += read_len / sizeof(struct vring_desc);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
virtio.c:238:8: note: '+=' in pointer arithmetic internally scales
with 'sizeof(struct vring_desc)' == 16
This behavior is intended, so the warning can be considered a
false positive in this context. The code correctly advances the
pointer by the desired number of descriptor entries.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The string STR_NOTONLINK is intentionally not NUL-terminated.
Ignore the GCC error using __attribute__((nonstring)).
This error is reported by GCC 15.1.1 on Fedora 42. However,
Clang 20.1.3 does not support __attribute__((nonstring)).
Therefore, NOLINTNEXTLINE(clang-diagnostic-unknown-attributes)
is also added to suppress Clang's unknown attribute warning.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In eea8a76caf ("flow: fix podman issue #26073"), we unregister
the fd from epoll_ctl() in case of error, but we also need to close it.
As flowside_sock_l4() also calls sock_l4_sa() via flowside_sock_splice()
we can do it unconditionally.
Fixes: eea8a76caf ("flow: fix podman issue #26073")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The maximum number of flow macro name is FLOW_MAX, not MAX_FLOW.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While running pasta, we trigger the following assert:
ASSERTION FAILED in udp_at_sidx (udp_flow.c:35): flow->f.type == FLOW_UDP
in udp_at_sidx() in the following path:
902 void udp_sock_handler(const struct ctx *c, union epoll_ref ref,
903 uint32_t events, const struct timespec *now)
904 {
905 struct udp_flow *uflow = udp_at_sidx(ref.flowside);
The invalid sidx is comming from the epoll_ref provided by epoll_wait().
This assert follows the following error:
Couldn't connect flow socket: Permission denied
It appears that an error happens in udp_flow_sock() and the recently
created fd is not removed from the epoll_ctl() pool:
71 static int udp_flow_sock(const struct ctx *c,
72 struct udp_flow *uflow, unsigned sidei)
73 {
...
82 s = flowside_sock_l4(c, EPOLL_TYPE_UDP, pif, side, fref.data);
83 if (s < 0) {
84 flow_dbg_perror(uflow, "Couldn't open flow specific socket");
85 return s;
86 }
87
88 if (flowside_connect(c, s, pif, side) < 0) {
89 int rc = -errno;
90 flow_dbg_perror(uflow, "Couldn't connect flow socket");
91 return rc;
92 }
...
flowside_sock_l4() calls sock_l4_sa() that adds 's' to the epoll_ctl()
pool.
So to cleanly manage the error of flowside_connect() we need to remove
's' from the epoll_ctl() pool using epoll_del().
Link: https://github.com/containers/podman/issues/26073
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
While running piHole using podman, traffic can trigger the following
assert:
ASSSERTION FAILED in flow_alloc (flow.c:521): flow->f.state == FLOW_STATE_FREE
Backtrace shows that this happens in flow_defer_handler():
#4 0x00005610d6f5b481 flow_alloc (passt + 0xb481)
#5 0x00005610d6f74f86 udp_flow_from_sock (passt + 0x24f86)
#6 0x00005610d6f737c3 udp_sock_fwd (passt + 0x237c3)
#7 0x00005610d6f74c07 udp_flush_flow (passt + 0x24c07)
#8 0x00005610d6f752c2 udp_flow_defer (passt + 0x252c2)
#9 0x00005610d6f5bce1 flow_defer_handler (passt + 0xbce1)
We are trying to allocate a new flow inside the loop freeing them.
Inside the loop free_head points to the first free flow entry in the
current cluster. But if we allocate a new entry during the loop,
free_head is not updated and can point now to the entry we have just
allocated.
We can fix the problem by spliting the loop in two parts:
- first part where we can close some of them and allocate some new
flow entries,
- second part where we free the entries closed in the previous loop
and we aggregate the free entries to merge consecutive the clusters.
Reported-by: Martin Rijntjes <bugs@air-global.nl>
Link: https://github.com/containers/podman/issues/25959
Fixes: 9725e79888 ("udp_flow: Don't discard packets that arrive between bind() and connect()")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When building with gcc 13 and -Og, we get:
passt-repair.c: In function ‘main’:
passt-repair.c:161:23: warning: ‘ev’ may be used uninitialized [-Wmaybe-uninitialized]
161 | if (ev->len > NAME_MAX + 1 || ev->name[ev->len - 1] != '\0') {
| ~~^~~~~
but that can't actually happen, because we only exit the preceding
while loop if 'found' is true, and that only happens, in turn, as we
assign 'ev'.
Get rid of the warning by (redundantly) initialising ev to NULL.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
inetd-style socket passing traditionally starts a service with a
connected socket on file descriptors 0 and 1. passt disallowing
obtaining its socket from either of these descriptors made it
difficult to use with super-servers providing this interface — in my
case I wanted to use passt with s6-ipcserver[1]. Since (as far as I
can tell) passt does not use standard input for anything else (unlike
standard output), it should be safe to relax the restrictions on --fd
to allow setting it to 0, enabling this use case.
Link: https://skarnet.org/software/s6/s6-ipcserver.html [1]
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We've recently added support for propagating ICMP errors related to a UDP
flow from the host to the guest, by handling the extended UDP error on the
socket and synthesizing a suitable ICMP on the tap interface.
Currently we create that ICMP with a source address of the "offender" from
the extended error information - the source of the ICMP error received on
the host. However, we don't translate this address for cases where we NAT
between host and guest. This means (amongst other things) that we won't
get a "Connection refused" error as expected if send data from the guest to
the --map-host-loopback address. The error comes from 127.0.0.1 on the
host, which doesn't make sense on the tap interface and will be discarded
by the guest.
Because ICMP errors can be sent by an intermediate host, not just by the
endpoints of the flow, we can't handle this translation purely with the
information in the flow table entry. We need to explicitly translate this
address by our NAT rules, which we can do with the nat_inbound() helper.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Make a number of changes to udp_sock_recverr() to improve the robustness
of how we handle addresses.
* Get the "offender" address (source of the ICMP packet) using the
SO_EE_OFFENDER() macro, reducing assumptions about structure layout.
* Parse the offender sockaddr using inany_from_sockaddr()
* Check explicitly that the source and destination pifs are what we
expect. Previously we checked something that was probably equivalent
in practice, but isn't strictly speaking what we require for the rest
of the code.
* Verify that for an ICMPv4 error we also have an IPv4 source/offender
and destination/endpoint address
* Verify that for an ICMPv6 error we have an IPv6 endpoint
* Improve debug reporting of any failures
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
inany_from_sockaddr() expects a socket address of family AF_INET or
AF_INET6 and ASSERT()s if it gets anything else. In many of the callers we
can handle an unexpected family more gracefully, though, e.g. by failing
a single flow rather than killing passt.
Change inany_from_sockaddr() to return an error instead of ASSERT()ing,
and handle those errors in the callers. Improve the reporting of any such
errors while we're at it.
With this greater robustness, allow inany_from_sockaddr() to take a void *
rather than specifically a union sockaddr_inany *.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently the functions fwd_nat_from_*() make some address translations
based on both the IP address and protocol port numbers, and others based
only on the address. We have some upcoming cases where it's useful to use
the IP-address-only translations separately, so split them out into helper
functions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_sock_recverr() processes errors on UDP sockets and attempts to
propagate them as ICMP packets on the tap interface. To do this it
currently requires the flow with which the error is associated as a
parameter. If that's missing it will clear the error condition, but not
propagate it.
That means that we largely ignore errors on "listening" sockets. It also
means we may discard some errors on flow specific sockets if they occur
very shortly after the socket is created. In udp_flush_flow() we need to
clear any datagrams received between bind() and connect() which might not
be associated with the "final" flow for the socket. If we get errors
before that point we'll ignore them in the same way because we don't know
the flow they're associated with in advance.
This can happen in practice if we have errors which occur almost
immediately after connect(), such as ECONNREFUSED when we connect() to a
local address where nothing is listening.
Between the extended error message itself and the PKTINFO information we
do actually have enough information to find the correct flow. So, rather
than ignoring errors where we don't have a flow "hint", determine the flow
the hard way in udp_sock_recverr().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Change warn() to debug() in udp_sock_recverr()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Usually we work with the "exit early" flow style, where we return early
on "error" conditions in functions. We don't currently do this in
udp_sock_recverr() for the case where we don't have a flow to associate
the error with.
Reorganise to use the "exit early" style, which will make some subsequent
changes less awkward.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we open code parsing the control message for IP_PKTINFO in
udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO
in another place, so split this out into a helper function.
While we're there, make the parsing a bit more robust: scan all cmsgs to
look for the one we want, rather than assuming there's only one.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: udp_pktinfo(): Fix typo in comment and change err() to debug()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When we get an epoll event on a listening socket, we first deal with any
errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()).
However, it's theoretically possible that new errors could get flagged on
the socket after we call udp_sock_errs(), in which case we could get errors
returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg().
In fact, we do deal with this correctly, although the path is somewhat
non-obvious. The recvmsg() error will cause us to bail out of
udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come
back here next epoll loop and call udp_sock_errs().
Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from
epoll events. This is to deal with any packets that arrived between bind()
and connect(), and so might not be associated with the socket's intended
flow. This expects udp_sock_fwd() to flush _all_ queued datagrams, so that
anything received later must be for the correct flow.
At the moment, udp_sock_errs() might fail to flush all datagrams if errors
occur. In particular this can happen in practice for locally reported
errors which occur immediately after connect() (e.g. connecting to a local
port with nothing listening).
We can deal with the problem case, and also make the flow a little more
natural for the common case by having udp_sock_fwd() call udp_sock_errs()
to handle errors as the occur, rather than trying to deal with all errors
in advance.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_sock_recverr() and udp_sock_errs() take an epoll reference from which
they obtain both the socket fd to receive errors from, and - for flow
specific sockets - the flow and side the socket is associated with.
We have some upcoming cases where we want to clear errors when we're not
directly associated with receiving an epoll event, so it's not natural to
have an epoll reference. Therefore, make these functions take the socket
and flow from explicit parameters.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If we get an error on UDP receive, either in udp_peek_addr() or
udp_sock_recv(), we'll print an error message. However, this could be
a perfectly routine UDP error triggered by an ICMP, which need not go to
the error log.
This doesn't usually happen, because before receiving we typically clear
the error queue from udp_sock_errs(). However, it's possible an error
could be flagged after udp_sock_errs() but before we receive. So it's
better to handle this error "silently" (trace level only). We'll bail out
of the receive, return to the epoll loop, and get an EPOLLERR where we'll
handle and report the error properly.
In particular there's one situation that can trigger this case much more
easily. If we start a new outbound UDP flow to a local destination with
nothing listening, we'll get a more or less immediate connection refused
error. So, we'll get that error on the very first receive after the
connect(). That will occur in udp_flow_defer() -> udp_flush_flow() ->
udp_sock_fwd() -> udp_peek_addr() -> recvmsg(). This path doesn't call
udp_sock_errs() first, so isn't (imperfectly) protected the way we are
most of the time.
Fixes: 84ab1305fa ("udp: Polish udp_vu_sock_info() and remove from vu specific code")
Fixes: 69e5393c37 ("udp: Move some more of sock_handler tasks into sub-functions")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
UDP sockets. This lets us obtain and properly handle the specific local
address used when we're "listening" with a socket on 0.0.0.0 or ::.
However, the PKTINFO cmsgs this option generates appear on error queue
messages as well as regular datagrams. udp_sock_recverr() doesn't expect
this and so flags an unrecoverable error when it can't parse the control
message.
Correct this by adding space in udp_sock_recverr()s control buffer for the
additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
than only looking at the first one.
Link: https://bugs.passt.top/show_bug.cgi?id=99
Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address for received datagrams")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If the first resolver listed in the host's /etc/resolv.conf is a
loopback address, and --no-map-gw is given, we automatically conclude
that the resolver is not reachable, discard it, and, if it's the only
nameserver listed in /etc/resolv.conf, we'll warn that we:
Couldn't get any nameserver address
However, this isn't true in a general case: the user might have passed
--dns-forward, and in that case, while we won't map the address of the
default gateway to the host, we're still supposed to map that
particular address. Otherwise, in this common Podman usage:
pasta --config-net --dns-forward 169.254.1.1 -t none -u none -T none -U none --no-map-gw --netns /run/user/1000/netns/netns-c02a8d8f-6ee3-902e-33c5-317e0f24e0af --map-guest-addr 169.254.1.2
and with a loopback address in /etc/resolv.conf, we'll unexpectedly
refuse to forward DNS queries:
# nslookup passt.top 169.254.1.1
;; connection timed out; no servers could be reached
To fix this, make an exception for --dns-forward: if &c->ip4.dns_match
or &c->ip6.dns_match are set in add_dns_resolv4() / add_dns_resolv6(),
use that address as guest-facing resolver.
We already set 'dns_host' to the address we found in /etc/resolv.conf,
that's correct in this case and it makes us forward queries as
expected.
I'm not changing the man page as the current description of
--dns-forward is already consistent with the new behaviour: there's no
described way in which --no-map-gw should affect it.
Reported-by: Andrew Sayers <andrew-bugs.passt.top@pileofstuff.org>
Link: https://bugs.passt.top/show_bug.cgi?id=111
Suggested-by: Paul Holzinger <pholzing@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Not really valuable by itself, but dropping one level of nested blocks
makes the next change more convenient.
No functional changes intended.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Paul Holzinger <pholzing@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
So far for UDP flows (like TCP connections) we didn't record our address
(oaddr) in the flow table entry for socket based pifs. That's because we
didn't have that information when a flow was initiated by a datagram coming
to a "listening" socket with 0.0.0.0 or :: address. Even when we did have
the information, we didn't record it, to simplify address matching on
lookups.
This meant that in some circumstances we could send replies on a UDP flow
from a different address than the originating request came to, which is
surprising and breaks certain setups.
We now have code in udp_peek_addr() which does determine our address for
incoming UDP datagrams. We can use that information to properly populate
oaddr in the flow table for flow initiated from a socket.
In order to be able to consistently match datagrams to flows, we must
*always* have a specific oaddr, not an unspecified address (that's how the
flow hash table works). So, we also need to fill in oaddr correctly for
flows we initiate *to* sockets. Our forwarding logic doesn't specify
oaddr here, letting the kernel decide based on the routing table. In this
case we need to call getsockname() after connect()ing the socket to find
which local address the kernel picked.
This adds getsockname() to our seccomp profile for all variants.
Link: https://bugs.passt.top/show_bug.cgi?id=99
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
inany_from_sockaddr() can only handle sockaddrs of family AF_INET or
AF_INET6 and asserts if given something else. I hit this assertion while
debugging something else, and wanted to see what the bad sockaddr family
was. Now that we have ASSERT_WITH_MSG() its easy to add this information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently we get the source address for received datagrams from recvmsg(),
but we don't get the local destination address. Sometimes we implicitly
know this because the receiving socket is bound to a specific address, but
when listening on 0.0.0.0 or ::, we don't.
We need this information to properly direct replies to flows which come in
to a non-default local address. So, enable the IP_PKTINFO and IPV6_PKTINFO
control messages to obtain this information in udp_peek_addr(). For now
we log a trace messages but don't do anything more with the information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Like many places, tcp_splice_sock_handler() needs to handle EAGAIN
specially, in this case for both of its splice() calls. Unfortunately it
tests for EAGAIN some time after those calls. In between there has been
at least a flow_trace() which could have clobbered errno. Move the test on
errno closer to the relevant system calls to avoid this problem.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In tcp_splice_sock_handler(), if we get an EINTR on our second splice()
(pipe to output socket) we - as we should - go back and retry it. However,
we do so *after* we've already updated our byte counters. That does no
harm for the conn->written[] counter - since the second splice() returned
an error it will be advanced by 0. However we also advance the
conn->read[] counter, and then do so again when the splice() succeeds.
This results in the counters being out of sync, and us thinking we have
remaining data in the pipe when we don't, which can leave us in an
infinite loop once the stream finishes.
Fix this by moving the EINTR handling to directly next to the splice()
call (which is what we usually do for EINTR). As a bonus this removes one
mildly confusing goto.
For symmetry, also rework the EINTR handling on the first splice() the same
way, although that doesn't (as far as I can tell) have buggy side effects.
Link: https://github.com/containers/podman/issues/23686#issuecomment-2779347687
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
As reported by somebody on IRC:
$ pasta --map-guest-addr none
Invalid address to remap to host: none
that's because once we parsed "none", we try to parse it as an address
as well. But we already handled it, so stop once we're done.
Fixes: e813a4df7d ("conf: Allow address remapped to host to be configured")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
When we establish a new UDP flow we create connect()ed sockets that will
only handle datagrams for this flow. However, there is a race between
bind() and connect() where they might get some packets queued for a
different flow. Currently we handle this by simply discarding any
queued datagrams after the connect. UDP protocols should be able to handle
such packet loss, but it's not ideal.
We now have the tools we need to handle this better, by redirecting any
datagrams received during that race to the appropriate flow. We need to
use a deferred handler for this to avoid unexpectedly re-ordering datagrams
in some edge cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Update comment to udp_flow_defer()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_splice() prepare and udp_splice_send() are both quite simple functions
that now have only one caller: udp_sock_to_sock(). Fold them both into
that caller.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_listen_sock_data() forwards datagrams from a "listening" socket until
there are no more (for now). We have an upcoming use case where we want
to do that for a socket that's not a "listening" socket, and uses a
different epoll reference. So, adjust the function to take the pieces it
needs from the reference as direct parameters and rename to udp_sock_fwd().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently udp_flow_from_sock() is only used when receiving a datagram
from a "listening" socket. It takes the listening socket's epoll
reference to get the interface and port on which the datagram arrived.
We have some upcoming cases where we want to use this in different
contexts, so make it take the pif and port as direct parameters instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Drop @ref from comment to udp_flow_from_sock()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Recent changes mean that this define is no longer used anywhere except in
udp.c. Move it back into udp.c from udp_internal.h.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_buf_listen_sock_data() and udp_vu_listen_sock_data() now have
effectively identical structure. The forwarding functions used for flow
specific sockets (udp_buf_sock_to_tap(), udp_vu_sock_to_tap() and
udp_sock_to_sock()) also now take a number of datagrams. This means we
can re-use them for the listening socket path, just passing '1' so they
handle a single datagram at a time.
This allows us to merge both the vhost-user and flow specific paths into
a single, simpler udp_listen_sock_data().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_buf_reply_sock_data() can handle forwarding data either from socket
to socket ("splicing") or from socket to tap. It has a test on each
datagram for which case we're in, but that will be the same for everything
in the batch.
Split out the spliced path into a separate udp_sock_to_sock() function.
This leaves udp_{buf,vu}_reply_sock_data() handling only forwards from
socket to tap, so rename and simplify them accordingly.
This makes the code slightly longer for now, but will allow future cleanups
to shrink it back down again.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Fix typos in comments to udp_sock_recv() and
udp_vu_listen_sock_data()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Both udp_buf_reply_sock_data() and udp_vu_reply_sock_data() internally
decide what the maximum number of datagrams they will forward is. We have
some upcoming reasons to allow the caller to decide that instead, so make
the maximum number of datagrams a parameter for both of them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
A "listening" UDP socket can receive datagrams from multiple flows. So,
we currently have some quite subtle and complex code in
udp_buf_listen_sock_data() to group contiguously received packets for the
same flow into batches for forwarding.
However, since we are now always using flow specific connect()ed sockets
once a flow is established, handling of datagrams on listening sockets is
essentially a slow path. Given that, it's not worth the complexity.
Substantially simplify the code by using an approach more like vhost-user,
and "peeking" at the address of the next datagram, one at a time to
determine the correct flow before we actually receive the data,
This removes all meaningful use of the s_in and tosidx fields in
udp_meta_t, so they too can be removed, along with setting of msg_name and
msg_namelen in the msghdr arrays which referenced them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_vu_sock_info() uses MSG_PEEK to look ahead at the next datagram to be
received and gets its source address. Currently we only use it in the
vhost-user path, but there's nothing inherently vhost-user specific about
it. We have upcoming uses for it elsewhere so rename and move to udp.c.
While we're there, polish its error reporting a litle.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Drop excess newline before udp_sock_recv()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Currently udp_sock_recv() decides the maximum number of frames it is
willing to receive based on the mode. However, we have upcoming use cases
where we will have different criteria for how many frames we want with
information that's not naturally available here but is in the caller. So
make the maximum number of frames a parameter.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Fix typo in comment in udp_buf_reply_sock_data()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>