udp_buf_listen_sock_data() and udp_vu_listen_sock_data() now have
effectively identical structure. The forwarding functions used for flow
specific sockets (udp_buf_sock_to_tap(), udp_vu_sock_to_tap() and
udp_sock_to_sock()) also now take a number of datagrams. This means we
can re-use them for the listening socket path, just passing '1' so they
handle a single datagram at a time.
This allows us to merge both the vhost-user and flow specific paths into
a single, simpler udp_listen_sock_data().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_buf_reply_sock_data() can handle forwarding data either from socket
to socket ("splicing") or from socket to tap. It has a test on each
datagram for which case we're in, but that will be the same for everything
in the batch.
Split out the spliced path into a separate udp_sock_to_sock() function.
This leaves udp_{buf,vu}_reply_sock_data() handling only forwards from
socket to tap, so rename and simplify them accordingly.
This makes the code slightly longer for now, but will allow future cleanups
to shrink it back down again.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Fix typos in comments to udp_sock_recv() and
udp_vu_listen_sock_data()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Both udp_buf_reply_sock_data() and udp_vu_reply_sock_data() internally
decide what the maximum number of datagrams they will forward is. We have
some upcoming reasons to allow the caller to decide that instead, so make
the maximum number of datagrams a parameter for both of them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_vu_sock_info() uses MSG_PEEK to look ahead at the next datagram to be
received and gets its source address. Currently we only use it in the
vhost-user path, but there's nothing inherently vhost-user specific about
it. We have upcoming uses for it elsewhere so rename and move to udp.c.
While we're there, polish its error reporting a litle.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Drop excess newline before udp_sock_recv()]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Since UDP has no built in knowledge of connections, the only way we
know when we're done with a UDP flow is a timeout with no activity.
To keep track of this struct udp_flow includes a timestamp to record
the last time we saw traffic on the flow.
For data from listening sockets and from tap, this is done implicitly via
udp_flow_from_{sock,tap}() but for reply sockets it's done explicitly.
However, that logic is duplicated between the vhost-user and "buf" paths.
Make it common in udp_reply_sock_handler() instead.
Technically this is a behavioural change: previously if we got an EPOLLIN
event, but there wasn't actually any data we wouldn't update the timestamp,
now we will. This should be harmless: if there's an EPOLLIN we expect
there to be data, and even if there isn't the worst we can do is mildly
delay the cleanup of a stale flow.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Share some additional miscellaneous logic between the vhost-user and "buf"
paths for data on udp reply sockets. The biggest piece is error handling
of cases where we can't forward between the two pifs of the flow. We also
make common some more simple logic locating the correct flow and its
parameters.
This adds some lines of code due to extra comment lines, but nonetheless
reduces logic duplication.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
At the start of every cycle of the loop in udp_vu_reply_sock_data() we:
- ASSERT that uflow is not NULL
- Check if the target pif is PIF_TAP
- Initialize the v6 boolean
However, all of these depend only on the flow, which doesn't change across
the loop. This is probably a duplication from udp_vu_listen_sock_data(),
where the flow can be different for each packet. For the reply socket
case, however, factor that logic out of the loop.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
udp_{listen,reply}_sock_handler() can accept both EPOLLERR and EPOLLIN
events. However, unlike most epoll event handlers we don't check the
event bits right there. EPOLLERR is checked within udp_sock_errs() which
we call unconditionally. Checking EPOLLIN is still more buried: it is
checked within both udp_sock_recv() and udp_vu_sock_recv().
We can simplify the logic and pass less extraneous parameters around by
moving the checking of the event bits to the top level event handlers.
This makes udp_{buf,vu}_{listen,reply}_sock_handler() no longer general
event handlers, but specific to EPOLLIN events, meaning new data. So,
rename those functions to udp_{buf,vu}_{listen,reply}_sock_data() to better
reflect their function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
The vhost-user and non-vhost-user paths for both udp_listen_sock_handler()
and udp_reply_sock_handler() are more or less completely separate. Both,
however, start with essentially the same invocation of udp_sock_errs(), so
that can be made common.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
When a local peer sends a UDP message to a non-existing port on an
existing remote host, that host will return an ICMP message containing
the error code ICMP_PORT_UNREACH, plus the header and the first eight
bytes of the original message. If the sender socket has been connected,
it uses this message to issue a "Connection Refused" event to the user.
Until now, we have only read such events from the externally facing
socket, but we don't forward them back to the local sender because
we cannot read the ICMP message directly to user space. Because of
this, the local peer will hang and wait for a response that never
arrives.
We now fix this for IPv4 by recreating and forwarding a correct ICMP
message back to the internal sender. We synthesize the message based
on the information in the extended error structure, plus the returned
part of the original message body.
Note that for the sake of completeness, we even produce ICMP messages
for other error codes. We have noticed that at least ICMP_PROT_UNREACH
is propagated as an error event back to the user.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
[sbrivio: fix cppcheck warning: udp_send_conn_fail_icmp4() doesn't
modify 'in', it can be declared as const]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
In udp_vu_sock_recv(), collect a segment with a size defined to
IP_MAX_MTU + ETH_HLEN + sizeof(struct virtio_net_hdr_mrg_rxbuf)
The original version double counted the IP header: IP_MAX_MTU includes
the IP header, and so did hdrlen.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We usually want to checksum only the tail part of a frame, excluding at
least some headers. csum_iov() does that for a frame represented as an
IO vector, not actually summing the entire IO vector. We now have struct
iov_tail to explicitly represent this construct, so replace csum_iov()
with csum_iov_tail() taking that representation rather than 3 parameters.
We propagate the same change to csum_udp4() and csum_udp6() which take
similar parameters. This slightly simplifies the code, and will allow some
further simplifications as struct iov_tail is more widely used.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
add virtio and vhost-user functions to connect with QEMU.
$ ./passt --vhost-user
and
# qemu-system-x86_64 ... -m 4G \
-object memory-backend-memfd,id=memfd0,share=on,size=4G \
-numa node,memdev=memfd0 \
-chardev socket,id=chr0,path=/tmp/passt_1.socket \
-netdev vhost-user,id=netdev0,chardev=chr0 \
-device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \
...
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: as suggested by lvivier, include <netinet/if_ether.h>
before including <linux/if_ether.h> as C libraries such as musl
__UAPI_DEF_ETHHDR in <netinet/if_ether.h> if they already have
a definition of struct ethhdr]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>