passt: Relicense to GPL 2.0, or any later version
In practical terms, passt doesn't benefit from the additional
protection offered by the AGPL over the GPL, because it's not
suitable to be executed over a computer network.
Further, restricting the distribution under the version 3 of the GPL
wouldn't provide any practical advantage either, as long as the passt
codebase is concerned, and might cause unnecessary compatibility
dilemmas.
Change licensing terms to the GNU General Public License Version 2,
or any later version, with written permission from all current and
past contributors, namely: myself, David Gibson, Laine Stump, Andrea
Bolognani, Paul Holzinger, Richard W.M. Jones, Chris Kuhn, Florian
Weimer, Giuseppe Scrivano, Stefan Hajnoczi, and Vasiliy Ulyanov.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-04-05 20:11:44 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-or-later
|
2021-10-11 12:01:31 +02:00
|
|
|
|
|
|
|
/* PASST - Plug A Simple Socket Transport
|
|
|
|
* for qemu/UNIX domain socket mode
|
|
|
|
*
|
|
|
|
* PASTA - Pack A Subtle Tap Abstraction
|
|
|
|
* for network namespace/tap device mode
|
|
|
|
*
|
|
|
|
* netlink.c - rtnetlink routines: interfaces, addresses, routes
|
|
|
|
*
|
|
|
|
* Copyright (c) 2020-2021 Red Hat GmbH
|
|
|
|
* Author: Stefano Brivio <sbrivio@redhat.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sched.h>
|
|
|
|
#include <string.h>
|
|
|
|
#include <stddef.h>
|
|
|
|
#include <errno.h>
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <limits.h>
|
2023-03-21 04:54:59 +01:00
|
|
|
#include <unistd.h>
|
2023-03-08 04:00:22 +01:00
|
|
|
#include <signal.h>
|
2021-10-11 12:01:31 +02:00
|
|
|
#include <stdlib.h>
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
#include <stdbool.h>
|
2021-10-11 12:01:31 +02:00
|
|
|
#include <stdint.h>
|
|
|
|
#include <arpa/inet.h>
|
|
|
|
#include <netinet/in.h>
|
2021-10-21 04:26:08 +02:00
|
|
|
#include <netinet/if_ether.h>
|
|
|
|
|
2021-10-11 12:01:31 +02:00
|
|
|
#include <linux/netlink.h>
|
|
|
|
#include <linux/rtnetlink.h>
|
|
|
|
|
|
|
|
#include "util.h"
|
|
|
|
#include "passt.h"
|
2022-09-24 09:53:15 +02:00
|
|
|
#include "log.h"
|
2024-03-21 05:04:49 +01:00
|
|
|
#include "ip.h"
|
2021-10-11 12:01:31 +02:00
|
|
|
#include "netlink.h"
|
|
|
|
|
2023-08-03 09:19:50 +02:00
|
|
|
/* Netlink expects a buffer of at least 8kiB or the system page size,
|
|
|
|
* whichever is larger. 32kiB is recommended for more efficient.
|
|
|
|
* Since the largest page size on any remotely common Linux setup is
|
|
|
|
* 64kiB (ppc64), that should cover it.
|
|
|
|
*
|
|
|
|
* https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#buffer-sizing
|
|
|
|
*/
|
|
|
|
#define NLBUFSIZ 65536
|
2023-03-08 03:43:25 +01:00
|
|
|
|
2021-10-11 12:01:31 +02:00
|
|
|
/* Socket in init, in target namespace, sequence (just needs to be monotonic) */
|
2023-08-03 09:19:44 +02:00
|
|
|
int nl_sock = -1;
|
|
|
|
int nl_sock_ns = -1;
|
2023-08-03 09:19:46 +02:00
|
|
|
static int nl_seq = 1;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
|
|
|
/**
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
* nl_sock_init_do() - Set up netlink sockets in init or target namespace
|
|
|
|
* @arg: Execution context, if running from namespace, NULL otherwise
|
2021-10-11 12:01:31 +02:00
|
|
|
*
|
|
|
|
* Return: 0
|
|
|
|
*/
|
2021-10-21 04:26:08 +02:00
|
|
|
static int nl_sock_init_do(void *arg)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
|
|
|
struct sockaddr_nl addr = { .nl_family = AF_NETLINK, };
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
int *s = arg ? &nl_sock_ns : &nl_sock;
|
2022-01-25 19:55:54 +01:00
|
|
|
#ifdef NETLINK_GET_STRICT_CHK
|
|
|
|
int y = 1;
|
|
|
|
#endif
|
2021-10-11 12:01:31 +02:00
|
|
|
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
if (arg)
|
|
|
|
ns_enter((struct ctx *)arg);
|
|
|
|
|
2023-02-07 16:10:46 +01:00
|
|
|
*s = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
|
|
|
|
if (*s < 0 || bind(*s, (struct sockaddr *)&addr, sizeof(addr))) {
|
2021-10-11 12:01:31 +02:00
|
|
|
*s = -1;
|
|
|
|
return 0;
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
}
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2022-01-25 19:55:54 +01:00
|
|
|
#ifdef NETLINK_GET_STRICT_CHK
|
2022-04-05 07:10:30 +02:00
|
|
|
if (setsockopt(*s, SOL_NETLINK, NETLINK_GET_STRICT_CHK, &y, sizeof(y)))
|
|
|
|
debug("netlink: cannot set NETLINK_GET_STRICT_CHK on %i", *s);
|
2022-01-25 19:55:54 +01:00
|
|
|
#endif
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
return 0;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
* nl_sock_init() - Call nl_sock_init_do(), won't return on failure
|
2021-10-11 12:01:31 +02:00
|
|
|
* @c: Execution context
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
* @ns: Get socket in namespace, not in init
|
2021-10-11 12:01:31 +02:00
|
|
|
*/
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
void nl_sock_init(const struct ctx *c, bool ns)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
if (ns) {
|
2021-10-21 04:26:08 +02:00
|
|
|
NS_CALL(nl_sock_init_do, c);
|
2021-10-11 12:01:31 +02:00
|
|
|
if (nl_sock_ns == -1)
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
goto fail;
|
2021-10-11 12:01:31 +02:00
|
|
|
} else {
|
2021-10-21 04:26:08 +02:00
|
|
|
nl_sock_init_do(NULL);
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (nl_sock == -1)
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
goto fail;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
conf: Bind inbound ports with CAP_NET_BIND_SERVICE before isolate_user()
Even if CAP_NET_BIND_SERVICE is granted, we'll lose the capability in
the target user namespace as we isolate the process, which means
we're unable to bind to low ports at that point.
Bind inbound ports, and only those, before isolate_user(). Keep the
handling of outbound ports (for pasta mode only) after the setup of
the namespace, because that's where we'll bind them.
To this end, initialise the netlink socket for the init namespace
before isolate_user() as well, as we actually need to know the
addresses of the upstream interface before binding ports, in case
they're not explicitly passed by the user.
As we now call nl_sock_init() twice, checking its return code from
conf() twice looks a bit heavy: make it exit(), instead, as we
can't do much if we don't have netlink sockets.
While at it:
- move the v4_only && v6_only options check just after the first
option processing loop, as this is more strictly related to
option parsing proper
- update the man page, explaining that CAP_NET_BIND_SERVICE is
*not* the preferred way to bind ports, because passt and pasta
can be abused to allow other processes to make effective usage
of it. Add a note about the recommended sysctl instead
- simplify nl_sock_init_do() now that it's called once for each
case
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2022-10-13 18:21:27 +02:00
|
|
|
return;
|
|
|
|
|
|
|
|
fail:
|
2023-02-15 09:24:37 +01:00
|
|
|
die("Failed to get netlink socket");
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2023-08-03 09:19:51 +02:00
|
|
|
* nl_send() - Prepare and send netlink request
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2023-08-03 09:19:48 +02:00
|
|
|
* @req: Request (will fill netlink header)
|
|
|
|
* @type: Request type
|
|
|
|
* @flags: Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
|
2021-10-11 12:01:31 +02:00
|
|
|
* @len: Request length
|
|
|
|
*
|
2023-08-03 09:19:51 +02:00
|
|
|
* Return: sequence number of request on success, terminates on error
|
2021-10-11 12:01:31 +02:00
|
|
|
*/
|
2023-11-07 11:13:05 +01:00
|
|
|
static uint32_t nl_send(int s, void *req, uint16_t type,
|
2023-08-03 09:19:51 +02:00
|
|
|
uint16_t flags, ssize_t len)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
2023-08-03 09:19:48 +02:00
|
|
|
struct nlmsghdr *nh;
|
2021-10-20 00:05:11 +02:00
|
|
|
ssize_t n;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:48 +02:00
|
|
|
nh = (struct nlmsghdr *)req;
|
|
|
|
nh->nlmsg_type = type;
|
|
|
|
nh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
|
|
|
|
nh->nlmsg_len = len;
|
|
|
|
nh->nlmsg_seq = nl_seq++;
|
|
|
|
nh->nlmsg_pid = 0;
|
|
|
|
|
2023-08-03 09:19:47 +02:00
|
|
|
n = send(s, req, len, 0);
|
|
|
|
if (n < 0)
|
|
|
|
die("netlink: Failed to send(): %s", strerror(errno));
|
|
|
|
else if (n < len)
|
2023-11-29 13:17:10 +01:00
|
|
|
die("netlink: Short send (%zd of %zd bytes)", n, len);
|
2023-08-03 09:19:47 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
return nh->nlmsg_seq;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_status() - Check status given by a netlink response
|
|
|
|
* @nh: Netlink response header
|
|
|
|
* @n: Remaining space in response buffer from @nh
|
|
|
|
* @seq: Request sequence number we expect a response to
|
|
|
|
*
|
|
|
|
* Return: 0 if @nh indicated successful completion,
|
|
|
|
* < 0, negative error code if @nh indicated failure
|
|
|
|
* > 0 @n if there are more responses to request @seq
|
|
|
|
* terminates if sequence numbers are out of sync
|
|
|
|
*/
|
2023-11-07 11:13:05 +01:00
|
|
|
static int nl_status(const struct nlmsghdr *nh, ssize_t n, uint32_t seq)
|
2023-08-03 09:19:51 +02:00
|
|
|
{
|
|
|
|
ASSERT(NLMSG_OK(nh, n));
|
|
|
|
|
|
|
|
if (nh->nlmsg_seq != seq)
|
2023-11-07 11:13:05 +01:00
|
|
|
die("netlink: Unexpected sequence number (%u != %u)",
|
2023-08-03 09:19:51 +02:00
|
|
|
nh->nlmsg_seq, seq);
|
|
|
|
|
|
|
|
if (nh->nlmsg_type == NLMSG_DONE) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
if (nh->nlmsg_type == NLMSG_ERROR) {
|
|
|
|
struct nlmsgerr *errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
|
|
|
|
return errmsg->error;
|
|
|
|
}
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:47 +02:00
|
|
|
return n;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
/**
|
|
|
|
* nl_next() - Get next netlink response message, recv()ing if necessary
|
|
|
|
* @s: Netlink socket
|
|
|
|
* @buf: Buffer for responses (at least NLBUFSIZ long)
|
|
|
|
* @nh: Previous message, or NULL if there are none
|
|
|
|
* @n: Variable with remaining unread bytes in buffer (updated)
|
|
|
|
*
|
|
|
|
* Return: pointer to next unread netlink response message (may block)
|
|
|
|
*/
|
|
|
|
static struct nlmsghdr *nl_next(int s, char *buf, struct nlmsghdr *nh, ssize_t *n)
|
|
|
|
{
|
|
|
|
if (nh) {
|
|
|
|
nh = NLMSG_NEXT(nh, *n);
|
|
|
|
if (NLMSG_OK(nh, *n))
|
|
|
|
return nh;
|
|
|
|
}
|
|
|
|
|
|
|
|
*n = recv(s, buf, NLBUFSIZ, 0);
|
|
|
|
if (*n < 0)
|
|
|
|
die("netlink: Failed to recv(): %s", strerror(errno));
|
|
|
|
|
|
|
|
nh = (struct nlmsghdr *)buf;
|
|
|
|
if (!NLMSG_OK(nh, *n))
|
|
|
|
die("netlink: Response datagram with no message");
|
|
|
|
|
|
|
|
return nh;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_foreach - 'for' type macro to step through netlink response messages
|
2023-08-03 09:19:52 +02:00
|
|
|
* nl_foreach_oftype - as above, but only messages of expected type
|
2023-08-03 09:19:51 +02:00
|
|
|
* @nh: Steps through each response header (struct nlmsghdr *)
|
|
|
|
* @status: When loop exits indicates if there was an error (ssize_t)
|
|
|
|
* @s: Netlink socket
|
|
|
|
* @buf: Buffer for responses (at least NLBUFSIZ long)
|
|
|
|
* @seq: Sequence number of request we're getting responses for
|
2023-08-03 09:19:52 +02:00
|
|
|
* @type: Type of netlink message to process
|
|
|
|
*/
|
2023-08-03 09:19:51 +02:00
|
|
|
#define nl_foreach(nh, status, s, buf, seq) \
|
|
|
|
for ((nh) = nl_next((s), (buf), NULL, &(status)); \
|
|
|
|
((status) = nl_status((nh), (status), (seq))) > 0; \
|
|
|
|
(nh) = nl_next((s), (buf), (nh), &(status)))
|
|
|
|
|
2023-08-03 09:19:52 +02:00
|
|
|
#define nl_foreach_oftype(nh, status, s, buf, seq, type) \
|
|
|
|
nl_foreach((nh), (status), (s), (buf), (seq)) \
|
|
|
|
if ((nh)->nlmsg_type != (type)) { \
|
|
|
|
warn("netlink: Unexpected message type"); \
|
|
|
|
} else
|
|
|
|
|
2023-08-03 09:19:49 +02:00
|
|
|
/**
|
|
|
|
* nl_do() - Send netlink "do" request, and wait for acknowledgement
|
|
|
|
* @s: Netlink socket
|
|
|
|
* @req: Request (will fill netlink header)
|
|
|
|
* @type: Request type
|
|
|
|
* @flags: Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
|
|
|
|
* @len: Request length
|
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on error
|
|
|
|
*/
|
|
|
|
static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
|
|
|
|
{
|
|
|
|
struct nlmsghdr *nh;
|
|
|
|
char buf[NLBUFSIZ];
|
2023-08-03 09:19:51 +02:00
|
|
|
ssize_t status;
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2023-08-03 09:19:49 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s, req, type, flags, len);
|
|
|
|
nl_foreach(nh, status, s, buf, seq)
|
|
|
|
warn("netlink: Unexpected response message");
|
2023-08-03 09:19:49 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
return status;
|
2023-08-03 09:19:49 +02:00
|
|
|
}
|
|
|
|
|
2021-10-11 12:01:31 +02:00
|
|
|
/**
|
2022-07-22 07:31:13 +02:00
|
|
|
* nl_get_ext_if() - Get interface index supporting IP version being probed
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2022-07-22 07:31:13 +02:00
|
|
|
* @af: Address family (AF_INET or AF_INET6) to look for connectivity
|
|
|
|
* for.
|
2021-10-11 12:01:31 +02:00
|
|
|
*
|
|
|
|
* Return: interface index, 0 if not found
|
|
|
|
*/
|
2023-08-03 09:19:44 +02:00
|
|
|
unsigned int nl_get_ext_if(int s, sa_family_t af)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
|
|
|
struct { struct nlmsghdr nlh; struct rtmsg rtm; } req = {
|
|
|
|
.rtm.rtm_table = RT_TABLE_MAIN,
|
|
|
|
.rtm.rtm_scope = RT_SCOPE_UNIVERSE,
|
|
|
|
.rtm.rtm_type = RTN_UNICAST,
|
2022-07-22 07:31:13 +02:00
|
|
|
.rtm.rtm_family = af,
|
2021-10-11 12:01:31 +02:00
|
|
|
};
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
unsigned defifi = 0, anyifi = 0;
|
|
|
|
unsigned ndef = 0, nany = 0;
|
2021-10-11 12:01:31 +02:00
|
|
|
struct nlmsghdr *nh;
|
|
|
|
struct rtattr *rta;
|
2023-03-08 03:43:25 +01:00
|
|
|
char buf[NLBUFSIZ];
|
2023-08-03 09:19:51 +02:00
|
|
|
ssize_t status;
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2022-04-05 07:10:30 +02:00
|
|
|
size_t na;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2024-03-15 13:25:44 +01:00
|
|
|
/* Look for an interface with a default route first, failing that, look
|
|
|
|
* for any interface with a route, and pick it only if it's the only
|
|
|
|
* interface with a route.
|
|
|
|
*/
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
|
2023-08-03 09:19:52 +02:00
|
|
|
nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
|
2022-09-28 06:33:19 +02:00
|
|
|
struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
|
2024-03-21 05:04:49 +01:00
|
|
|
const void *dst = NULL;
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
unsigned thisifi = 0;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
if (rtm->rtm_family != af)
|
|
|
|
continue;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2021-10-20 00:05:11 +02:00
|
|
|
for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
|
|
|
|
rta = RTA_NEXT(rta, na)) {
|
2024-02-02 00:09:37 +01:00
|
|
|
if (rta->rta_type == RTA_OIF) {
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
thisifi = *(unsigned int *)RTA_DATA(rta);
|
2024-02-02 00:09:37 +01:00
|
|
|
} else if (rta->rta_type == RTA_MULTIPATH) {
|
2024-02-12 05:05:28 +01:00
|
|
|
const struct rtnexthop *rtnh;
|
2024-02-02 00:09:37 +01:00
|
|
|
|
|
|
|
rtnh = (struct rtnexthop *)RTA_DATA(rta);
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
thisifi = rtnh->rtnh_ifindex;
|
2024-03-21 05:04:49 +01:00
|
|
|
} else if (rta->rta_type == RTA_DST) {
|
|
|
|
dst = RTA_DATA(rta);
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
}
|
|
|
|
}
|
2024-03-15 13:25:44 +01:00
|
|
|
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
if (!thisifi)
|
|
|
|
continue; /* No interface for this route */
|
2024-03-15 13:25:44 +01:00
|
|
|
|
2024-03-21 05:04:49 +01:00
|
|
|
/* Skip routes to link-local addresses */
|
|
|
|
if (af == AF_INET && dst &&
|
|
|
|
IN4_IS_PREFIX_LINKLOCAL(dst, rtm->rtm_dst_len))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (af == AF_INET6 && dst &&
|
|
|
|
IN6_IS_PREFIX_LINKLOCAL(dst, rtm->rtm_dst_len))
|
|
|
|
continue;
|
|
|
|
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
if (rtm->rtm_dst_len == 0) {
|
|
|
|
/* Default route */
|
|
|
|
ndef++;
|
|
|
|
if (!defifi)
|
|
|
|
defifi = thisifi;
|
|
|
|
} else {
|
|
|
|
/* Non-default route */
|
|
|
|
nany++;
|
|
|
|
if (!anyifi)
|
|
|
|
anyifi = thisifi;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
}
|
2024-02-02 00:09:37 +01:00
|
|
|
|
2023-08-03 09:19:55 +02:00
|
|
|
if (status < 0)
|
|
|
|
warn("netlink: RTM_GETROUTE failed: %s", strerror(-status));
|
2021-10-11 12:01:31 +02:00
|
|
|
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
if (defifi) {
|
|
|
|
if (ndef > 1)
|
|
|
|
info("Multiple default %s routes, picked first",
|
2024-03-21 05:04:48 +01:00
|
|
|
af_name(af));
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
return defifi;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (anyifi) {
|
|
|
|
if (nany == 1)
|
|
|
|
return anyifi;
|
|
|
|
|
|
|
|
info("Multiple interfaces with %s routes, use -i to select one",
|
2024-03-21 05:04:48 +01:00
|
|
|
af_name(af));
|
2024-03-15 13:25:44 +01:00
|
|
|
}
|
|
|
|
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
if (!nany)
|
2024-03-21 05:04:49 +01:00
|
|
|
info("No interfaces with usable %s routes", af_name(af));
|
netlink: Fix selection of template interface
Since f919dc7a4b1c ("conf, netlink: Don't require a default route to
start"), if there is only one host interface with routes, we will pick that
as the template interface, even if there are no default routes for an IP
version. Unfortunately this selection had a serious flaw: in some cases
it would 'return' in the middle of an nl_foreach() loop, meaning we
wouldn't consume all the netlink responses for our query. This could cause
later netlink operations to fail as we read leftover responses from the
aborted query.
Rewrite the interface detection to avoid this problem. While we're there:
* Perform detection of both default and non-default routes in a single
pass, avoiding an ugly goto
* Give more detail on error and working but unusual paths about the
situation (no suitable interface, multiple possible candidates, etc.).
Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start")
Link: https://bugs.passt.top/show_bug.cgi?id=83
Link: https://github.com/containers/podman/issues/22052
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[sbrivio: Use info(), not warn() for somewhat expected cases where one
IP version has no default routes, or no routes at all]
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-20 06:33:39 +01:00
|
|
|
|
|
|
|
return 0;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
2024-02-02 00:09:37 +01:00
|
|
|
/**
|
|
|
|
* nl_route_get_def_multipath() - Get lowest-weight route from nexthop list
|
|
|
|
* @rta: Routing netlink attribute with type RTA_MULTIPATH
|
|
|
|
* @gw: Default gateway to fill
|
|
|
|
*
|
|
|
|
* Return: true if a gateway was found, false otherwise
|
|
|
|
*/
|
|
|
|
bool nl_route_get_def_multipath(struct rtattr *rta, void *gw)
|
|
|
|
{
|
|
|
|
struct rtnexthop *rtnh;
|
|
|
|
bool found = false;
|
|
|
|
int hops = -1;
|
|
|
|
|
|
|
|
for (rtnh = (struct rtnexthop *)RTA_DATA(rta);
|
|
|
|
RTNH_OK(rtnh, RTA_PAYLOAD(rta)); rtnh = RTNH_NEXT(rtnh)) {
|
|
|
|
size_t len = rtnh->rtnh_len - sizeof(*rtnh);
|
|
|
|
struct rtattr *rta_inner;
|
|
|
|
|
|
|
|
if (rtnh->rtnh_hops < hops)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
hops = rtnh->rtnh_hops;
|
|
|
|
|
|
|
|
for (rta_inner = RTNH_DATA(rtnh); RTA_OK(rta_inner, len);
|
|
|
|
rta_inner = RTA_NEXT(rta_inner, len)) {
|
|
|
|
|
|
|
|
if (rta_inner->rta_type != RTA_GATEWAY)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
memcpy(gw, RTA_DATA(rta_inner), RTA_PAYLOAD(rta_inner));
|
|
|
|
found = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return found;
|
|
|
|
}
|
|
|
|
|
2021-10-11 12:01:31 +02:00
|
|
|
/**
|
2023-08-03 09:19:42 +02:00
|
|
|
* nl_route_get_def() - Get default route for given interface and address family
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2023-08-03 09:19:42 +02:00
|
|
|
* @ifi: Interface index
|
2021-10-11 12:01:31 +02:00
|
|
|
* @af: Address family
|
2023-08-03 09:19:42 +02:00
|
|
|
* @gw: Default gateway to fill on NL_GET
|
2023-08-03 09:19:55 +02:00
|
|
|
*
|
2024-03-15 13:25:44 +01:00
|
|
|
* Return: error on netlink failure, or 0 (gw unset if default route not found)
|
2021-10-11 12:01:31 +02:00
|
|
|
*/
|
2023-08-03 09:19:55 +02:00
|
|
|
int nl_route_get_def(int s, unsigned int ifi, sa_family_t af, void *gw)
|
2023-08-03 09:19:42 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct rtmsg rtm;
|
|
|
|
struct rtattr rta;
|
|
|
|
unsigned int ifi;
|
|
|
|
} req = {
|
|
|
|
.rtm.rtm_family = af,
|
|
|
|
.rtm.rtm_table = RT_TABLE_MAIN,
|
|
|
|
.rtm.rtm_scope = RT_SCOPE_UNIVERSE,
|
|
|
|
.rtm.rtm_type = RTN_UNICAST,
|
|
|
|
|
|
|
|
.rta.rta_type = RTA_OIF,
|
|
|
|
.rta.rta_len = RTA_LENGTH(sizeof(unsigned int)),
|
|
|
|
.ifi = ifi,
|
|
|
|
};
|
|
|
|
struct nlmsghdr *nh;
|
2023-08-03 09:19:54 +02:00
|
|
|
bool found = false;
|
2023-08-03 09:19:42 +02:00
|
|
|
char buf[NLBUFSIZ];
|
2023-08-03 09:19:51 +02:00
|
|
|
ssize_t status;
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2023-08-03 09:19:42 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
|
2023-08-03 09:19:52 +02:00
|
|
|
nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) {
|
2023-08-03 09:19:42 +02:00
|
|
|
struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
|
|
|
|
struct rtattr *rta;
|
|
|
|
size_t na;
|
|
|
|
|
2023-08-03 09:19:54 +02:00
|
|
|
if (found || rtm->rtm_dst_len)
|
2023-08-03 09:19:42 +02:00
|
|
|
continue;
|
|
|
|
|
|
|
|
for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
|
|
|
|
rta = RTA_NEXT(rta, na)) {
|
2024-02-02 00:09:37 +01:00
|
|
|
if (rta->rta_type == RTA_MULTIPATH)
|
|
|
|
found = nl_route_get_def_multipath(rta, gw);
|
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
if (rta->rta_type != RTA_GATEWAY)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
memcpy(gw, RTA_DATA(rta), RTA_PAYLOAD(rta));
|
2023-08-03 09:19:54 +02:00
|
|
|
found = true;
|
2023-08-03 09:19:42 +02:00
|
|
|
}
|
|
|
|
}
|
2023-08-03 09:19:55 +02:00
|
|
|
return status;
|
2023-08-03 09:19:42 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_route_set_def() - Set default route for given interface and address family
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2023-08-03 09:19:42 +02:00
|
|
|
* @ifi: Interface index in target namespace
|
|
|
|
* @af: Address family
|
|
|
|
* @gw: Default gateway to set
|
2023-08-03 09:19:53 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
2023-08-03 09:19:42 +02:00
|
|
|
*/
|
2023-09-29 07:50:19 +02:00
|
|
|
int nl_route_set_def(int s, unsigned int ifi, sa_family_t af, const void *gw)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct rtmsg rtm;
|
|
|
|
struct rtattr rta;
|
|
|
|
unsigned int ifi;
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
struct rtattr rta_dst;
|
|
|
|
struct in6_addr d;
|
|
|
|
struct rtattr rta_gw;
|
|
|
|
struct in6_addr a;
|
|
|
|
} r6;
|
|
|
|
struct {
|
|
|
|
struct rtattr rta_dst;
|
2023-08-03 09:19:43 +02:00
|
|
|
struct in_addr d;
|
2021-10-11 12:01:31 +02:00
|
|
|
struct rtattr rta_gw;
|
2023-08-03 09:19:43 +02:00
|
|
|
struct in_addr a;
|
2021-10-11 12:01:31 +02:00
|
|
|
} r4;
|
2021-10-21 04:26:08 +02:00
|
|
|
} set;
|
2021-10-11 12:01:31 +02:00
|
|
|
} req = {
|
|
|
|
.rtm.rtm_family = af,
|
|
|
|
.rtm.rtm_table = RT_TABLE_MAIN,
|
|
|
|
.rtm.rtm_scope = RT_SCOPE_UNIVERSE,
|
|
|
|
.rtm.rtm_type = RTN_UNICAST,
|
2023-08-03 09:19:42 +02:00
|
|
|
.rtm.rtm_protocol = RTPROT_BOOT,
|
2021-10-11 12:01:31 +02:00
|
|
|
|
|
|
|
.rta.rta_type = RTA_OIF,
|
|
|
|
.rta.rta_len = RTA_LENGTH(sizeof(unsigned int)),
|
2023-08-03 09:19:42 +02:00
|
|
|
.ifi = ifi,
|
2021-10-11 12:01:31 +02:00
|
|
|
};
|
2023-08-03 09:19:48 +02:00
|
|
|
ssize_t len;
|
2021-10-21 04:26:08 +02:00
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
if (af == AF_INET6) {
|
|
|
|
size_t rta_len = RTA_LENGTH(sizeof(req.set.r6.d));
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:48 +02:00
|
|
|
len = offsetof(struct req_t, set.r6) + sizeof(req.set.r6);
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
req.set.r6.rta_dst.rta_type = RTA_DST;
|
|
|
|
req.set.r6.rta_dst.rta_len = rta_len;
|
2021-10-21 04:26:08 +02:00
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
memcpy(&req.set.r6.a, gw, sizeof(req.set.r6.a));
|
|
|
|
req.set.r6.rta_gw.rta_type = RTA_GATEWAY;
|
|
|
|
req.set.r6.rta_gw.rta_len = rta_len;
|
|
|
|
} else {
|
|
|
|
size_t rta_len = RTA_LENGTH(sizeof(req.set.r4.d));
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:48 +02:00
|
|
|
len = offsetof(struct req_t, set.r4) + sizeof(req.set.r4);
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
req.set.r4.rta_dst.rta_type = RTA_DST;
|
|
|
|
req.set.r4.rta_dst.rta_len = rta_len;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:43 +02:00
|
|
|
memcpy(&req.set.r4.a, gw, sizeof(req.set.r4.a));
|
2023-08-03 09:19:42 +02:00
|
|
|
req.set.r4.rta_gw.rta_type = RTA_GATEWAY;
|
|
|
|
req.set.r4.rta_gw.rta_len = rta_len;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
2023-08-03 09:19:53 +02:00
|
|
|
return nl_do(s, &req, RTM_NEWROUTE, NLM_F_CREATE | NLM_F_EXCL, len);
|
2023-08-03 09:19:42 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_route_dup() - Copy routes for given interface and address family
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s_src: Netlink socket in source namespace
|
|
|
|
* @ifi_src: Source interface index
|
|
|
|
* @s_dst: Netlink socket in destination namespace
|
|
|
|
* @ifi_dst: Interface index in destination namespace
|
2023-08-03 09:19:42 +02:00
|
|
|
* @af: Address family
|
2023-08-03 09:19:56 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
2023-08-03 09:19:42 +02:00
|
|
|
*/
|
2023-08-03 09:19:56 +02:00
|
|
|
int nl_route_dup(int s_src, unsigned int ifi_src,
|
|
|
|
int s_dst, unsigned int ifi_dst, sa_family_t af)
|
2023-08-03 09:19:42 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct rtmsg rtm;
|
|
|
|
struct rtattr rta;
|
|
|
|
unsigned int ifi;
|
|
|
|
} req = {
|
|
|
|
.rtm.rtm_family = af,
|
|
|
|
.rtm.rtm_table = RT_TABLE_MAIN,
|
|
|
|
.rtm.rtm_scope = RT_SCOPE_UNIVERSE,
|
|
|
|
.rtm.rtm_type = RTN_UNICAST,
|
|
|
|
|
|
|
|
.rta.rta_type = RTA_OIF,
|
|
|
|
.rta.rta_len = RTA_LENGTH(sizeof(unsigned int)),
|
2023-08-03 09:19:44 +02:00
|
|
|
.ifi = ifi_src,
|
2023-08-03 09:19:42 +02:00
|
|
|
};
|
netlink: Fix handling of NLMSG_DONE in nl_route_dup()
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.
Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro. However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.
The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request. It needs to get all the routes, then make multiple passes
through them. We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.
After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them. It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).
What we do after the loop depends on which exit case we had. If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.
We attempt to check for that second case by re-checking NLMSG_OK(nh,
status). However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check. To fix this we need separate variables for the number of bytes
left and the final status code.
We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.
Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-19 05:53:41 +01:00
|
|
|
ssize_t nlmsgs_size, left, status;
|
2023-08-03 09:19:42 +02:00
|
|
|
unsigned dup_routes = 0;
|
|
|
|
struct nlmsghdr *nh;
|
2023-08-03 09:19:45 +02:00
|
|
|
char buf[NLBUFSIZ];
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2023-08-03 09:19:42 +02:00
|
|
|
unsigned i;
|
netlink: Add functionality to copy routes from outer namespace
Instead of just fetching the default gateway and configuring a single
equivalent route in the target namespace, on 'pasta --config-net', it
might be desirable in some cases to copy the whole set of routes
corresponding to a given output interface.
For instance, in:
https://github.com/containers/podman/issues/18539
IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
configuring the default gateway won't work without a gateway-less
route (specifying the output interface only), because the default
gateway is, somewhat dubiously, not on the same subnet as the
container.
This is a similar case to the one covered by commit 7656a6f88882
("conf: Adjust netmask on mismatch between IPv4 address/netmask and
gateway"), and I'm not exactly proud of that workaround.
We also have:
https://bugs.passt.top/show_bug.cgi?id=49
pasta does not work with tap-style interface
for which, eventually, we should be able to configure a gateway-less
route in the target namespace.
Introduce different operation modes for nl_route(), including a new
NL_DUP one, not exposed yet, which simply parrots back to the kernel
the route dump for a given interface from the outer namespace, fixing
up flags and interface indices on the way, and requesting to add the
same routes in the target namespace, on the interface we manage.
For n routes we want to duplicate, send n identical netlink requests
including the full dump: routes might depend on each other and the
kernel processes RTM_NEWROUTE messages sequentially, not atomically,
and repeating the full dump naturally resolves dependencies without
the need to actually calculate them.
I'm not kidding, it actually works pretty well.
Link: https://github.com/containers/podman/issues/18539
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-05-14 13:49:43 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s_src, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req));
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
/* nl_foreach() will step through multiple response datagrams,
|
|
|
|
* which we don't want here because we need to have all the
|
|
|
|
* routes in the buffer at once.
|
|
|
|
*/
|
|
|
|
nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
|
netlink: Fix handling of NLMSG_DONE in nl_route_dup()
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.
Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro. However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.
The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request. It needs to get all the routes, then make multiple passes
through them. We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.
After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them. It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).
What we do after the loop depends on which exit case we had. If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.
We attempt to check for that second case by re-checking NLMSG_OK(nh,
status). However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check. To fix this we need separate variables for the number of bytes
left and the final status code.
We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.
Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-19 05:53:41 +01:00
|
|
|
for (left = nlmsgs_size;
|
|
|
|
NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
|
|
|
|
nh = NLMSG_NEXT(nh, left)) {
|
2023-08-03 09:19:42 +02:00
|
|
|
struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
|
|
|
|
struct rtattr *rta;
|
|
|
|
size_t na;
|
netlink: Add functionality to copy routes from outer namespace
Instead of just fetching the default gateway and configuring a single
equivalent route in the target namespace, on 'pasta --config-net', it
might be desirable in some cases to copy the whole set of routes
corresponding to a given output interface.
For instance, in:
https://github.com/containers/podman/issues/18539
IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
configuring the default gateway won't work without a gateway-less
route (specifying the output interface only), because the default
gateway is, somewhat dubiously, not on the same subnet as the
container.
This is a similar case to the one covered by commit 7656a6f88882
("conf: Adjust netmask on mismatch between IPv4 address/netmask and
gateway"), and I'm not exactly proud of that workaround.
We also have:
https://bugs.passt.top/show_bug.cgi?id=49
pasta does not work with tap-style interface
for which, eventually, we should be able to configure a gateway-less
route in the target namespace.
Introduce different operation modes for nl_route(), including a new
NL_DUP one, not exposed yet, which simply parrots back to the kernel
the route dump for a given interface from the outer namespace, fixing
up flags and interface indices on the way, and requesting to add the
same routes in the target namespace, on the interface we manage.
For n routes we want to duplicate, send n identical netlink requests
including the full dump: routes might depend on each other and the
kernel processes RTM_NEWROUTE messages sequentially, not atomically,
and repeating the full dump naturally resolves dependencies without
the need to actually calculate them.
I'm not kidding, it actually works pretty well.
Link: https://github.com/containers/podman/issues/18539
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-05-14 13:49:43 +02:00
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
if (nh->nlmsg_type != RTM_NEWROUTE)
|
2021-10-11 12:01:31 +02:00
|
|
|
continue;
|
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
dup_routes++;
|
|
|
|
|
2021-10-20 00:05:11 +02:00
|
|
|
for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na);
|
|
|
|
rta = RTA_NEXT(rta, na)) {
|
2024-04-04 17:04:37 +02:00
|
|
|
/* RTA_OIF and RTA_MULTIPATH attributes carry the
|
|
|
|
* identifier of a host interface. Change them to match
|
|
|
|
* the corresponding identifier in the target namespace.
|
|
|
|
*/
|
2023-08-23 09:03:38 +02:00
|
|
|
if (rta->rta_type == RTA_OIF) {
|
2023-08-03 09:19:44 +02:00
|
|
|
*(unsigned int *)RTA_DATA(rta) = ifi_dst;
|
2024-04-04 17:04:37 +02:00
|
|
|
} else if (rta->rta_type == RTA_MULTIPATH) {
|
|
|
|
struct rtnexthop *rtnh;
|
|
|
|
|
|
|
|
for (rtnh = (struct rtnexthop *)RTA_DATA(rta);
|
|
|
|
RTNH_OK(rtnh, RTA_PAYLOAD(rta));
|
|
|
|
rtnh = RTNH_NEXT(rtnh))
|
|
|
|
rtnh->rtnh_ifindex = ifi_dst;
|
2023-08-23 09:03:38 +02:00
|
|
|
} else if (rta->rta_type == RTA_PREFSRC) {
|
|
|
|
/* Host routes might include a preferred source
|
|
|
|
* address, which must be one of the host's
|
|
|
|
* addresses. However, with -a pasta will use a
|
|
|
|
* different namespace address, making such a
|
|
|
|
* route invalid in the namespace. Strip off
|
|
|
|
* RTA_PREFSRC attributes to avoid that. */
|
|
|
|
rta->rta_type = RTA_UNSPEC;
|
|
|
|
}
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
}
|
netlink: Add functionality to copy routes from outer namespace
Instead of just fetching the default gateway and configuring a single
equivalent route in the target namespace, on 'pasta --config-net', it
might be desirable in some cases to copy the whole set of routes
corresponding to a given output interface.
For instance, in:
https://github.com/containers/podman/issues/18539
IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
configuring the default gateway won't work without a gateway-less
route (specifying the output interface only), because the default
gateway is, somewhat dubiously, not on the same subnet as the
container.
This is a similar case to the one covered by commit 7656a6f88882
("conf: Adjust netmask on mismatch between IPv4 address/netmask and
gateway"), and I'm not exactly proud of that workaround.
We also have:
https://bugs.passt.top/show_bug.cgi?id=49
pasta does not work with tap-style interface
for which, eventually, we should be able to configure a gateway-less
route in the target namespace.
Introduce different operation modes for nl_route(), including a new
NL_DUP one, not exposed yet, which simply parrots back to the kernel
the route dump for a given interface from the outer namespace, fixing
up flags and interface indices on the way, and requesting to add the
same routes in the target namespace, on the interface we manage.
For n routes we want to duplicate, send n identical netlink requests
including the full dump: routes might depend on each other and the
kernel processes RTM_NEWROUTE messages sequentially, not atomically,
and repeating the full dump naturally resolves dependencies without
the need to actually calculate them.
I'm not kidding, it actually works pretty well.
Link: https://github.com/containers/podman/issues/18539
Link: https://bugs.passt.top/show_bug.cgi?id=49
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2023-05-14 13:49:43 +02:00
|
|
|
|
netlink: Fix handling of NLMSG_DONE in nl_route_dup()
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.
Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro. However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.
The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request. It needs to get all the routes, then make multiple passes
through them. We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.
After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them. It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).
What we do after the loop depends on which exit case we had. If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.
We attempt to check for that second case by re-checking NLMSG_OK(nh,
status). However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check. To fix this we need separate variables for the number of bytes
left and the final status code.
We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.
Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-19 05:53:41 +01:00
|
|
|
if (!NLMSG_OK(nh, left)) {
|
2023-08-03 09:19:54 +02:00
|
|
|
/* Process any remaining datagrams in a different
|
|
|
|
* buffer so we don't overwrite the first one.
|
|
|
|
*/
|
|
|
|
char tail[NLBUFSIZ];
|
|
|
|
unsigned extra = 0;
|
|
|
|
|
|
|
|
nl_foreach_oftype(nh, status, s_src, tail, seq, RTM_NEWROUTE)
|
|
|
|
extra++;
|
|
|
|
|
|
|
|
if (extra) {
|
|
|
|
err("netlink: Too many routes to duplicate");
|
2023-08-03 09:19:56 +02:00
|
|
|
return -E2BIG;
|
2023-08-03 09:19:54 +02:00
|
|
|
}
|
|
|
|
}
|
2023-08-03 09:19:56 +02:00
|
|
|
if (status < 0)
|
|
|
|
return status;
|
2023-08-03 09:19:54 +02:00
|
|
|
|
2023-08-03 09:19:42 +02:00
|
|
|
/* Routes might have dependencies between each other, and the kernel
|
2023-08-03 09:19:45 +02:00
|
|
|
* processes RTM_NEWROUTE messages sequentially. For n routes, we might
|
|
|
|
* need to send the requests up to n times to get all of them inserted.
|
|
|
|
* Routes that have been already inserted will return -EEXIST, but we
|
|
|
|
* can safely ignore that and repeat the requests. This avoids the need
|
|
|
|
* to calculate dependencies: let the kernel do that.
|
2023-08-03 09:19:42 +02:00
|
|
|
*/
|
2023-08-03 09:19:45 +02:00
|
|
|
for (i = 0; i < dup_routes; i++) {
|
netlink: Fix handling of NLMSG_DONE in nl_route_dup()
A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.
Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro. However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.
The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request. It needs to get all the routes, then make multiple passes
through them. We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.
After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them. It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).
What we do after the loop depends on which exit case we had. If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.
We attempt to check for that second case by re-checking NLMSG_OK(nh,
status). However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check. To fix this we need separate variables for the number of bytes
left and the final status code.
We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.
Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2024-03-19 05:53:41 +01:00
|
|
|
for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
|
|
|
|
NLMSG_OK(nh, left);
|
|
|
|
nh = NLMSG_NEXT(nh, left)) {
|
2023-08-03 09:19:48 +02:00
|
|
|
uint16_t flags = nh->nlmsg_flags;
|
2023-08-03 09:19:56 +02:00
|
|
|
int rc;
|
2023-08-03 09:19:45 +02:00
|
|
|
|
|
|
|
if (nh->nlmsg_type != RTM_NEWROUTE)
|
|
|
|
continue;
|
|
|
|
|
2023-08-03 09:19:56 +02:00
|
|
|
rc = nl_do(s_dst, nh, RTM_NEWROUTE,
|
|
|
|
(flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
|
|
|
|
nh->nlmsg_len);
|
|
|
|
if (rc < 0 && rc != -ENETUNREACH && rc != -EEXIST)
|
|
|
|
return rc;
|
2023-08-03 09:19:45 +02:00
|
|
|
}
|
|
|
|
}
|
2023-08-03 09:19:56 +02:00
|
|
|
|
|
|
|
return 0;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
* nl_addr_get() - Get most specific global address, given interface and family
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2023-05-14 18:44:53 +02:00
|
|
|
* @ifi: Interface index in outer network namespace
|
2021-10-11 12:01:31 +02:00
|
|
|
* @af: Address family
|
2023-08-03 09:19:41 +02:00
|
|
|
* @addr: Global address to fill
|
|
|
|
* @prefix_len: Mask or prefix length, to fill (for IPv4)
|
|
|
|
* @addr_l: Link-scoped address to fill (for IPv6)
|
2023-08-03 09:19:55 +02:00
|
|
|
*
|
|
|
|
* Return: 9 on success, negative error code on failure
|
2023-08-03 09:19:41 +02:00
|
|
|
*/
|
2023-08-03 09:19:55 +02:00
|
|
|
int nl_addr_get(int s, unsigned int ifi, sa_family_t af,
|
|
|
|
void *addr, int *prefix_len, void *addr_l)
|
2023-08-03 09:19:41 +02:00
|
|
|
{
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
uint8_t prefix_max = 0, prefix_max_ll = 0;
|
2023-08-03 09:19:41 +02:00
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct ifaddrmsg ifa;
|
|
|
|
} req = {
|
|
|
|
.ifa.ifa_family = af,
|
|
|
|
.ifa.ifa_index = ifi,
|
|
|
|
};
|
|
|
|
struct nlmsghdr *nh;
|
|
|
|
char buf[NLBUFSIZ];
|
2023-08-03 09:19:51 +02:00
|
|
|
ssize_t status;
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2023-08-03 09:19:41 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
|
2023-08-03 09:19:52 +02:00
|
|
|
nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) {
|
2023-08-03 09:19:41 +02:00
|
|
|
struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
|
|
|
|
struct rtattr *rta;
|
|
|
|
size_t na;
|
|
|
|
|
|
|
|
if (ifa->ifa_index != ifi)
|
|
|
|
continue;
|
|
|
|
|
2023-08-15 05:51:28 +02:00
|
|
|
for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na);
|
2023-08-03 09:19:41 +02:00
|
|
|
rta = RTA_NEXT(rta, na)) {
|
netlink: For IPv4, IFA_LOCAL is the interface address, not IFA_ADDRESS
See the comment to the unnamed enum in linux/if_addr.h, which
currently states:
/*
* Important comment:
* IFA_ADDRESS is prefix address, rather than local interface address.
* It makes no difference for normally configured broadcast interfaces,
* but for point-to-point IFA_ADDRESS is DESTINATION address,
* local address is supplied in IFA_LOCAL attribute.
*
* [...]
*/
if we fetch IFA_ADDRESS, and we have a point-to-point link with a peer
address configured, we'll source the peer address as "our" address,
and refuse to resolve it in arp().
This was reported with pasta and a tun upstream interface configured
by OpenVPN in "p2p" topology: the target namespace will have similar
addresses and routes as the host, which is fine, and will try to
resolve the point-to-point peer address (because it's the default
gateway).
Given that we configure it as our address (only internally, not
visibly in the namespace), we'll fail to resolve that and traffic
doesn't go anywhere.
Note that this is not the case for IPv6: there, IFA_ADDRESS is the
actual, local address of the interface, and IFA_LOCAL is not
necessarily present, so the comment in linux/if_addr.h doesn't apply
either.
Link: https://github.com/containers/podman/issues/22320
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2024-04-25 07:11:55 +02:00
|
|
|
if ((af == AF_INET && rta->rta_type != IFA_LOCAL) ||
|
|
|
|
(af == AF_INET6 && rta->rta_type != IFA_ADDRESS))
|
2023-08-03 09:19:41 +02:00
|
|
|
continue;
|
|
|
|
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
if (af == AF_INET && ifa->ifa_prefixlen > prefix_max) {
|
2023-08-03 09:19:41 +02:00
|
|
|
memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
|
|
|
|
prefix_max = *prefix_len = ifa->ifa_prefixlen;
|
2023-08-03 09:19:41 +02:00
|
|
|
} else if (af == AF_INET6 && addr &&
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
ifa->ifa_scope == RT_SCOPE_UNIVERSE &&
|
|
|
|
ifa->ifa_prefixlen > prefix_max) {
|
2023-08-03 09:19:41 +02:00
|
|
|
memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta));
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
|
|
|
|
prefix_max = ifa->ifa_prefixlen;
|
2023-08-03 09:19:41 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
if (addr_l &&
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK &&
|
|
|
|
ifa->ifa_prefixlen > prefix_max_ll) {
|
2023-08-03 09:19:41 +02:00
|
|
|
memcpy(addr_l, RTA_DATA(rta), RTA_PAYLOAD(rta));
|
netlink: Fetch most specific (longest prefix) address in nl_addr_get()
This happened in most cases implicitly before commit eff3bcb24547
("netlink: Split nl_addr() into separate operation functions"): while
going through results from netlink, we would only copy an address
into the provided return buffer if no address had been picked yet.
Because of the insertion logic in the kernel (ipv6_link_dev_addr()),
the first returned address would also be the one added last, and, in
case of a Linux guest using a DHCPv6 client as well as SLAAC, that
would be the address assigned via DHCPv6, because SLAAC happens
before the DHCPv6 exchange.
The effect of, instead, picking the last returned address (first
assigned) is visible when passt or pasta runs nested, given that, by
default, they advertise a prefix for SLAAC usage, plus an address via
DHCPv6.
The first level (L1 guest) would get a /64 address by means of SLAAC,
and a /128 address via DHCPv6, the latter matching the address on the
host.
The second level (L2 guest) would also get two addresses: a /64 via
SLAAC (same prefix as the host), and a /128 via DHCPv6, matching the
the L1 SLAAC-assigned address, not the one obtained via DHCPv6. That
is, none of the L2 addresses would match the address on the host. The
whole point of having a DHCPv6 server is to avoid (implicit) NAT when
possible, though.
Fix this in a more explicit way than the behaviour we initially had:
pick the first address among the set of most specific ones, by
comparing prefix lengths. Do this for IPv4 and for link-local
addresses, too, to match in any case the implementation of the
default source address selection.
Reported-by: Yalan Zhang <yalzhang@redhat.com>
Fixes: eff3bcb24547 ("netlink: Split nl_addr() into separate operation functions")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-12-27 14:46:39 +01:00
|
|
|
|
|
|
|
prefix_max_ll = ifa->ifa_prefixlen;
|
|
|
|
}
|
2023-08-03 09:19:41 +02:00
|
|
|
}
|
|
|
|
}
|
2023-08-03 09:19:55 +02:00
|
|
|
return status;
|
2023-08-03 09:19:41 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_add_set() - Set IP addresses for given interface and address family
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2023-08-03 09:19:41 +02:00
|
|
|
* @ifi: Interface index
|
|
|
|
* @af: Address family
|
|
|
|
* @addr: Global address to set
|
|
|
|
* @prefix_len: Mask or prefix length to set
|
2023-08-03 09:19:53 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
2021-10-11 12:01:31 +02:00
|
|
|
*/
|
2023-08-03 09:19:53 +02:00
|
|
|
int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
|
2023-09-29 07:50:19 +02:00
|
|
|
const void *addr, int prefix_len)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct ifaddrmsg ifa;
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
struct rtattr rta_l;
|
2023-08-03 09:19:43 +02:00
|
|
|
struct in_addr l;
|
2021-10-11 12:01:31 +02:00
|
|
|
struct rtattr rta_a;
|
2023-08-03 09:19:43 +02:00
|
|
|
struct in_addr a;
|
2021-10-11 12:01:31 +02:00
|
|
|
} a4;
|
|
|
|
struct {
|
|
|
|
struct rtattr rta_l;
|
|
|
|
struct in6_addr l;
|
|
|
|
struct rtattr rta_a;
|
|
|
|
struct in6_addr a;
|
|
|
|
} a6;
|
2021-10-21 04:26:08 +02:00
|
|
|
} set;
|
2021-10-11 12:01:31 +02:00
|
|
|
} req = {
|
|
|
|
.ifa.ifa_family = af,
|
2023-08-03 09:19:41 +02:00
|
|
|
.ifa.ifa_index = ifi,
|
|
|
|
.ifa.ifa_prefixlen = prefix_len,
|
|
|
|
.ifa.ifa_scope = RT_SCOPE_UNIVERSE,
|
2021-10-11 12:01:31 +02:00
|
|
|
};
|
2023-08-03 09:19:48 +02:00
|
|
|
ssize_t len;
|
2021-10-21 04:26:08 +02:00
|
|
|
|
2023-08-03 09:19:41 +02:00
|
|
|
if (af == AF_INET6) {
|
|
|
|
size_t rta_len = RTA_LENGTH(sizeof(req.set.a6.l));
|
2022-10-11 00:36:30 +02:00
|
|
|
|
2023-08-03 09:19:41 +02:00
|
|
|
/* By default, strictly speaking, it's duplicated */
|
|
|
|
req.ifa.ifa_flags = IFA_F_NODAD;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:48 +02:00
|
|
|
len = offsetof(struct req_t, set.a6) + sizeof(req.set.a6);
|
2021-10-21 04:26:08 +02:00
|
|
|
|
2023-08-03 09:19:41 +02:00
|
|
|
memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l));
|
|
|
|
req.set.a6.rta_l.rta_len = rta_len;
|
|
|
|
req.set.a4.rta_l.rta_type = IFA_LOCAL;
|
|
|
|
memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a));
|
|
|
|
req.set.a6.rta_a.rta_len = rta_len;
|
|
|
|
req.set.a6.rta_a.rta_type = IFA_ADDRESS;
|
|
|
|
} else {
|
|
|
|
size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:48 +02:00
|
|
|
len = offsetof(struct req_t, set.a4) + sizeof(req.set.a4);
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:43 +02:00
|
|
|
memcpy(&req.set.a4.l, addr, sizeof(req.set.a4.l));
|
2023-08-03 09:19:41 +02:00
|
|
|
req.set.a4.rta_l.rta_len = rta_len;
|
|
|
|
req.set.a4.rta_l.rta_type = IFA_LOCAL;
|
2023-08-23 09:34:44 +02:00
|
|
|
memcpy(&req.set.a4.a, addr, sizeof(req.set.a4.a));
|
2023-08-03 09:19:41 +02:00
|
|
|
req.set.a4.rta_a.rta_len = rta_len;
|
|
|
|
req.set.a4.rta_a.rta_type = IFA_ADDRESS;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
2023-08-03 09:19:53 +02:00
|
|
|
return nl_do(s, &req, RTM_NEWADDR, NLM_F_CREATE | NLM_F_EXCL, len);
|
2023-08-03 09:19:41 +02:00
|
|
|
}
|
2023-05-14 18:44:53 +02:00
|
|
|
|
2023-08-03 09:19:41 +02:00
|
|
|
/**
|
|
|
|
* nl_addr_dup() - Copy IP addresses for given interface and address family
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s_src: Netlink socket in source network namespace
|
|
|
|
* @ifi_src: Interface index in source network namespace
|
|
|
|
* @s_dst: Netlink socket in destination network namespace
|
|
|
|
* @ifi_dst: Interface index in destination namespace
|
2023-08-03 09:19:41 +02:00
|
|
|
* @af: Address family
|
2023-08-03 09:19:56 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
2023-08-03 09:19:41 +02:00
|
|
|
*/
|
2023-08-03 09:19:56 +02:00
|
|
|
int nl_addr_dup(int s_src, unsigned int ifi_src,
|
|
|
|
int s_dst, unsigned int ifi_dst, sa_family_t af)
|
2023-08-03 09:19:41 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct ifaddrmsg ifa;
|
|
|
|
} req = {
|
|
|
|
.ifa.ifa_family = af,
|
2023-08-03 09:19:44 +02:00
|
|
|
.ifa.ifa_index = ifi_src,
|
2023-08-03 09:19:41 +02:00
|
|
|
.ifa.ifa_prefixlen = 0,
|
|
|
|
};
|
2023-08-03 09:19:45 +02:00
|
|
|
char buf[NLBUFSIZ];
|
2023-08-03 09:19:41 +02:00
|
|
|
struct nlmsghdr *nh;
|
2023-08-03 09:19:51 +02:00
|
|
|
ssize_t status;
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2023-08-03 09:19:56 +02:00
|
|
|
int rc = 0;
|
2021-10-11 12:01:31 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s_src, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req));
|
2023-08-03 09:19:52 +02:00
|
|
|
nl_foreach_oftype(nh, status, s_src, buf, seq, RTM_NEWADDR) {
|
2023-08-03 09:19:41 +02:00
|
|
|
struct ifaddrmsg *ifa;
|
|
|
|
struct rtattr *rta;
|
|
|
|
size_t na;
|
|
|
|
|
2021-10-11 12:01:31 +02:00
|
|
|
ifa = (struct ifaddrmsg *)NLMSG_DATA(nh);
|
2023-05-14 18:44:53 +02:00
|
|
|
|
2023-08-03 09:19:56 +02:00
|
|
|
if (rc < 0 || ifa->ifa_scope == RT_SCOPE_LINK ||
|
2023-08-03 09:19:45 +02:00
|
|
|
ifa->ifa_index != ifi_src)
|
2023-08-03 09:19:41 +02:00
|
|
|
continue;
|
2023-05-14 18:44:53 +02:00
|
|
|
|
2023-08-03 09:19:44 +02:00
|
|
|
ifa->ifa_index = ifi_dst;
|
2024-04-26 00:04:53 +02:00
|
|
|
/* Same as nl_addr_set(), but here it's more than a default */
|
|
|
|
ifa->ifa_flags |= IFA_F_NODAD;
|
2023-05-14 18:44:53 +02:00
|
|
|
|
2023-08-15 05:51:28 +02:00
|
|
|
for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na);
|
2021-10-20 00:05:11 +02:00
|
|
|
rta = RTA_NEXT(rta, na)) {
|
netlink: Don't propagate host address expiry to the container
When we copy addresses from the host to the container in nl_addr_dup(), we
copy all the address's attributes, including IFA_CACHEINFO, which controls
the address's lifetime. If the host address is managed by, for example,
DHCP, it will typically have a finite lifetime.
When we copy that lifetime to the pasta container, that lifetime will
remain, meaning the kernel will eventually remove the address, typically
some hours later. The container, however, won't have the DHCP client or
whatever was managing and maintaining the address in the host, so it will
just lose connectivity.
Long term, we may want to monitor host address changes and reflect them to
the guest. But for now, we just want to take a snapshot of the host's
address and set those in the container permanently. We can accomplish that
by stripping off the IFA_CACHEINFO attribute as we copy addresses.
Link: https://github.com/containers/podman/issues/19405
Link: https://bugs.passt.top/show_bug.cgi?id=70
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-15 05:51:29 +02:00
|
|
|
/* Strip label and expiry (cacheinfo) information */
|
|
|
|
if (rta->rta_type == IFA_LABEL ||
|
|
|
|
rta->rta_type == IFA_CACHEINFO)
|
2023-05-14 18:44:53 +02:00
|
|
|
rta->rta_type = IFA_UNSPEC;
|
2024-04-26 00:04:53 +02:00
|
|
|
|
|
|
|
/* If 32-bit flags are used, add IFA_F_NODAD there */
|
|
|
|
if (rta->rta_type == IFA_FLAGS)
|
|
|
|
*(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
2023-05-14 18:44:53 +02:00
|
|
|
|
2023-08-03 09:19:56 +02:00
|
|
|
rc = nl_do(s_dst, nh, RTM_NEWADDR,
|
|
|
|
(nh->nlmsg_flags & ~NLM_F_DUMP_FILTERED) | NLM_F_CREATE,
|
|
|
|
nh->nlmsg_len);
|
2023-08-03 09:19:45 +02:00
|
|
|
}
|
2023-08-03 09:19:56 +02:00
|
|
|
if (status < 0)
|
|
|
|
return status;
|
|
|
|
|
|
|
|
return rc;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
* nl_link_get_mac() - Get link MAC address
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
2021-10-11 12:01:31 +02:00
|
|
|
* @ifi: Interface index
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
* @mac: Fill with current MAC address
|
2023-08-03 09:19:55 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
2021-10-11 12:01:31 +02:00
|
|
|
*/
|
2023-08-03 09:19:55 +02:00
|
|
|
int nl_link_get_mac(int s, unsigned int ifi, void *mac)
|
2021-10-11 12:01:31 +02:00
|
|
|
{
|
2022-02-23 10:50:09 +01:00
|
|
|
struct req_t {
|
2021-10-11 12:01:31 +02:00
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct ifinfomsg ifm;
|
|
|
|
} req = {
|
2021-10-14 13:05:56 +02:00
|
|
|
.ifm.ifi_family = AF_UNSPEC,
|
|
|
|
.ifm.ifi_index = ifi,
|
2021-10-11 12:01:31 +02:00
|
|
|
};
|
|
|
|
struct nlmsghdr *nh;
|
2023-03-08 03:43:25 +01:00
|
|
|
char buf[NLBUFSIZ];
|
2023-08-03 09:19:51 +02:00
|
|
|
ssize_t status;
|
2023-11-07 11:13:05 +01:00
|
|
|
uint32_t seq;
|
2022-04-05 07:10:30 +02:00
|
|
|
|
2023-08-03 09:19:51 +02:00
|
|
|
seq = nl_send(s, &req, RTM_GETLINK, 0, sizeof(req));
|
2023-08-03 09:19:52 +02:00
|
|
|
nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWLINK) {
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
struct ifinfomsg *ifm = (struct ifinfomsg *)NLMSG_DATA(nh);
|
|
|
|
struct rtattr *rta;
|
|
|
|
size_t na;
|
2021-10-14 13:05:56 +02:00
|
|
|
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
for (rta = IFLA_RTA(ifm), na = RTM_PAYLOAD(nh);
|
|
|
|
RTA_OK(rta, na);
|
2021-10-20 00:05:11 +02:00
|
|
|
rta = RTA_NEXT(rta, na)) {
|
2021-10-11 12:01:31 +02:00
|
|
|
if (rta->rta_type != IFLA_ADDRESS)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
memcpy(mac, RTA_DATA(rta), ETH_ALEN);
|
|
|
|
}
|
|
|
|
}
|
2023-08-03 09:19:55 +02:00
|
|
|
return status;
|
2021-10-11 12:01:31 +02:00
|
|
|
}
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_link_set_mac() - Set link MAC address
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
* @ns: Use netlink socket in namespace
|
|
|
|
* @ifi: Interface index
|
|
|
|
* @mac: MAC address to set
|
2023-08-03 09:19:53 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
*/
|
2023-09-29 07:50:19 +02:00
|
|
|
int nl_link_set_mac(int s, unsigned int ifi, const void *mac)
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct ifinfomsg ifm;
|
|
|
|
struct rtattr rta;
|
|
|
|
unsigned char mac[ETH_ALEN];
|
|
|
|
} req = {
|
|
|
|
.ifm.ifi_family = AF_UNSPEC,
|
|
|
|
.ifm.ifi_index = ifi,
|
|
|
|
.rta.rta_type = IFLA_ADDRESS,
|
|
|
|
.rta.rta_len = RTA_LENGTH(ETH_ALEN),
|
|
|
|
};
|
|
|
|
|
|
|
|
memcpy(req.mac, mac, ETH_ALEN);
|
|
|
|
|
2023-08-03 09:19:53 +02:00
|
|
|
return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req));
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* nl_link_up() - Bring link up
|
2023-08-03 09:19:44 +02:00
|
|
|
* @s: Netlink socket
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
* @ifi: Interface index
|
|
|
|
* @mtu: If non-zero, set interface MTU
|
2023-08-03 09:19:53 +02:00
|
|
|
*
|
|
|
|
* Return: 0 on success, negative error code on failure
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
*/
|
2023-08-03 09:19:53 +02:00
|
|
|
int nl_link_up(int s, unsigned int ifi, int mtu)
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
{
|
|
|
|
struct req_t {
|
|
|
|
struct nlmsghdr nlh;
|
|
|
|
struct ifinfomsg ifm;
|
|
|
|
struct rtattr rta;
|
|
|
|
unsigned int mtu;
|
|
|
|
} req = {
|
|
|
|
.ifm.ifi_family = AF_UNSPEC,
|
|
|
|
.ifm.ifi_index = ifi,
|
|
|
|
.ifm.ifi_flags = IFF_UP,
|
|
|
|
.ifm.ifi_change = IFF_UP,
|
|
|
|
.rta.rta_type = IFLA_MTU,
|
|
|
|
.rta.rta_len = RTA_LENGTH(sizeof(unsigned int)),
|
|
|
|
.mtu = mtu,
|
|
|
|
};
|
2023-08-03 09:19:48 +02:00
|
|
|
ssize_t len = sizeof(req);
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
|
|
|
|
if (!mtu)
|
|
|
|
/* Shorten request to drop MTU attribute */
|
2023-08-03 09:19:48 +02:00
|
|
|
len = offsetof(struct req_t, rta);
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
|
2023-08-03 09:19:53 +02:00
|
|
|
return nl_do(s, &req, RTM_NEWLINK, 0, len);
|
netlink: Split up functionality of nl_link()
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
2023-08-03 09:19:40 +02:00
|
|
|
}
|