netlink: Fix handling of NLMSG_DONE in nl_route_dup()

A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into
same read() as families") changed netlink behaviour so that the
NLMSG_DONE terminating a bunch of responses can go in the same
datagram as those responses, rather than in a separate one.

Our netlink code is supposed to handle that behaviour, and indeed does
so for most cases, using the nl_foreach() macro.  However, there was a
subtle error in nl_route_dup() which doesn't work with this change.
f00b1534 ("netlink: Don't try to get further datagrams in
nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own
subtle error.

The problem arises because nl_route_dup(), unlike other cases doesn't
just make a single pass through all the responses to a netlink
request.  It needs to get all the routes, then make multiple passes
through them.  We don't really have anywhere to buffer multiple
datagrams, so we only support the case where all the routes fit in a
single datagram - but we need to fail gracefully when that's not the
case.

After receiving the first datagram of responses (with nl_next()) we
have a first loop scanning them.  It needs to exit when either we run
out of messages in the datagram (!NLMSG_OK()) or when we get a message
indicating the last response (nl_status() <= 0).

What we do after the loop depends on which exit case we had.  If we
saw the last response, we're done, but otherwise we need to receive
more datagrams to discard the rest of the responses.

We attempt to check for that second case by re-checking NLMSG_OK(nh,
status).  However in the got-last-response case, we've altered status
from the number of remaining bytes to the error code (usually 0). That
means NLMSG_OK() now returns false even if it didn't during the loop
check.  To fix this we need separate variables for the number of bytes
left and the final status code.

We also checked status after the loop, but this was redundant: we can
only exit the loop with NLMSG_OK() == true if status <= 0.

Reported-by: Martin Pitt <mpitt@redhat.com>
Fixes: f00b153414 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE")
Fixes: 4d6e9d0816 ("netlink: Always process all responses to a netlink request")
Link: https://github.com/containers/podman/issues/22052
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
This commit is contained in:
David Gibson 2024-03-19 15:53:41 +11:00 committed by Stefano Brivio
parent 615d370ca2
commit d35bcbee90

View file

@ -504,7 +504,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
.rta.rta_len = RTA_LENGTH(sizeof(unsigned int)),
.ifi = ifi_src,
};
ssize_t nlmsgs_size, status;
ssize_t nlmsgs_size, left, status;
unsigned dup_routes = 0;
struct nlmsghdr *nh;
char buf[NLBUFSIZ];
@ -518,9 +518,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
* routes in the buffer at once.
*/
nh = nl_next(s_src, buf, NULL, &nlmsgs_size);
for (status = nlmsgs_size;
NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0;
nh = NLMSG_NEXT(nh, status)) {
for (left = nlmsgs_size;
NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0;
nh = NLMSG_NEXT(nh, left)) {
struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh);
struct rtattr *rta;
size_t na;
@ -550,8 +550,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
}
}
if (nh->nlmsg_type != NLMSG_DONE &&
(!NLMSG_OK(nh, status) || status > 0)) {
if (!NLMSG_OK(nh, left)) {
/* Process any remaining datagrams in a different
* buffer so we don't overwrite the first one.
*/
@ -577,9 +576,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src,
* to calculate dependencies: let the kernel do that.
*/
for (i = 0; i < dup_routes; i++) {
for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size;
NLMSG_OK(nh, status);
nh = NLMSG_NEXT(nh, status)) {
for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size;
NLMSG_OK(nh, left);
nh = NLMSG_NEXT(nh, left)) {
uint16_t flags = nh->nlmsg_flags;
int rc;