From 9054b41c4e1b5725e573c13166cee56bf7034bbd Mon Sep 17 00:00:00 2001 From: Miaoqian Lin Date: Sun, 18 Dec 2022 17:09:54 +0400 Subject: net: Fix documentation for unregister_netdevice_notifier_net unregister_netdevice_notifier_net() is used for unregister a notifier registered by register_netdevice_notifier_net(). Also s/into/from/. Signed-off-by: Miaoqian Lin Signed-off-by: David S. Miller --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/core/dev.c b/net/core/dev.c index b76fb37b381e..cf78f35bc0b9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1840,7 +1840,7 @@ EXPORT_SYMBOL(register_netdevice_notifier_net); * @nb: notifier * * Unregister a notifier previously registered by - * register_netdevice_notifier(). The notifier is unlinked into the + * register_netdevice_notifier_net(). The notifier is unlinked from the * kernel structures and may then be reused. A negative errno code * is returned on a failure. * -- cgit v1.2.3 From e26aa600ba6a62fe84659f1df497a381bab6d07e Mon Sep 17 00:00:00 2001 From: Christian Ehrig Date: Sun, 18 Dec 2022 06:17:31 +0100 Subject: bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key() This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY flag. On egress, the resulting tunnel header will not contain a tunnel key if the protocol and implementation supports it. At the moment bpf_tunnel_key wants a user to specify a numeric tunnel key. This will wrap the inner packet into a tunnel header with the key bit and value set accordingly. This is problematic when using a tunnel protocol that supports optional tunnel keys and a receiving tunnel device that is not expecting packets with the key bit set. The receiver won't decapsulate and drop the packet. RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is useful. It allows for generating packets, that can be decapsulated by a GRE tunnel device not operating in collect metadata mode or not expecting the key bit set. Signed-off-by: Christian Ehrig Signed-off-by: Daniel Borkmann Reviewed-by: Jakub Sitnicki Acked-by: Stanislav Fomichev Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com --- include/uapi/linux/bpf.h | 4 ++++ net/core/filter.c | 5 ++++- tools/include/uapi/linux/bpf.h | 4 ++++ 3 files changed, 12 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 464ca3f01fe7..bc1a3d232ae4 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2001,6 +2001,9 @@ union bpf_attr { * sending the packet. This flag was added for GRE * encapsulation, but might be used with other protocols * as well in the future. + * **BPF_F_NO_TUNNEL_KEY** + * Add a flag to tunnel metadata indicating that no tunnel + * key should be set in the resulting tunnel header. * * Here is a typical usage on the transmit path: * @@ -5764,6 +5767,7 @@ enum { BPF_F_ZERO_CSUM_TX = (1ULL << 1), BPF_F_DONT_FRAGMENT = (1ULL << 2), BPF_F_SEQ_NUMBER = (1ULL << 3), + BPF_F_NO_TUNNEL_KEY = (1ULL << 4), }; /* BPF_FUNC_skb_get_tunnel_key flags. */ diff --git a/net/core/filter.c b/net/core/filter.c index 929358677183..c746e4d77214 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4615,7 +4615,8 @@ BPF_CALL_4(bpf_skb_set_tunnel_key, struct sk_buff *, skb, struct ip_tunnel_info *info; if (unlikely(flags & ~(BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX | - BPF_F_DONT_FRAGMENT | BPF_F_SEQ_NUMBER))) + BPF_F_DONT_FRAGMENT | BPF_F_SEQ_NUMBER | + BPF_F_NO_TUNNEL_KEY))) return -EINVAL; if (unlikely(size != sizeof(struct bpf_tunnel_key))) { switch (size) { @@ -4653,6 +4654,8 @@ BPF_CALL_4(bpf_skb_set_tunnel_key, struct sk_buff *, skb, info->key.tun_flags &= ~TUNNEL_CSUM; if (flags & BPF_F_SEQ_NUMBER) info->key.tun_flags |= TUNNEL_SEQ; + if (flags & BPF_F_NO_TUNNEL_KEY) + info->key.tun_flags &= ~TUNNEL_KEY; info->key.tun_id = cpu_to_be64(from->tunnel_id); info->key.tos = from->tunnel_tos; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 464ca3f01fe7..bc1a3d232ae4 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2001,6 +2001,9 @@ union bpf_attr { * sending the packet. This flag was added for GRE * encapsulation, but might be used with other protocols * as well in the future. + * **BPF_F_NO_TUNNEL_KEY** + * Add a flag to tunnel metadata indicating that no tunnel + * key should be set in the resulting tunnel header. * * Here is a typical usage on the transmit path: * @@ -5764,6 +5767,7 @@ enum { BPF_F_ZERO_CSUM_TX = (1ULL << 1), BPF_F_DONT_FRAGMENT = (1ULL << 2), BPF_F_SEQ_NUMBER = (1ULL << 3), + BPF_F_NO_TUNNEL_KEY = (1ULL << 4), }; /* BPF_FUNC_skb_get_tunnel_key flags. */ -- cgit v1.2.3 From ed3557c947e1d4164d370cc2d69dd7eb92706f0a Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 3 Jan 2023 17:56:39 +0100 Subject: ieee802154: Add support for user scanning requests The ieee802154 layer should be able to scan a set of channels in order to look for beacons advertizing PANs. Supporting this involves adding two user commands: triggering scans and aborting scans. The user should also be notified when a new beacon is received and also upon scan termination. A scan request structure is created to list the requirements and to be accessed asynchronously when changing channels or receiving beacons. Mac layers may now implement the ->trigger_scan() and ->abort_scan() hooks. Co-developed-by: David Girault Signed-off-by: David Girault Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230103165644.432209-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/linux/ieee802154.h | 3 + include/net/cfg802154.h | 25 ++++++ include/net/nl802154.h | 58 ++++++++++++ net/ieee802154/nl802154.c | 220 +++++++++++++++++++++++++++++++++++++++++++++ net/ieee802154/nl802154.h | 3 + net/ieee802154/rdev-ops.h | 28 ++++++ net/ieee802154/trace.h | 40 +++++++++ 7 files changed, 377 insertions(+) (limited to 'net') diff --git a/include/linux/ieee802154.h b/include/linux/ieee802154.h index 0303eb84d596..b22e4147d334 100644 --- a/include/linux/ieee802154.h +++ b/include/linux/ieee802154.h @@ -44,6 +44,9 @@ #define IEEE802154_SHORT_ADDR_LEN 2 #define IEEE802154_PAN_ID_LEN 2 +/* Duration in superframe order */ +#define IEEE802154_MAX_SCAN_DURATION 14 +#define IEEE802154_ACTIVE_SCAN_DURATION 15 #define IEEE802154_LIFS_PERIOD 40 #define IEEE802154_SIFS_PERIOD 12 #define IEEE802154_MAX_SIFS_FRAME_SIZE 18 diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index d09c393d229f..76d4f95e9974 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -18,6 +18,7 @@ struct wpan_phy; struct wpan_phy_cca; +struct cfg802154_scan_request; #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL struct ieee802154_llsec_device_key; @@ -67,6 +68,10 @@ struct cfg802154_ops { struct wpan_dev *wpan_dev, bool mode); int (*set_ackreq_default)(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, bool ackreq); + int (*trigger_scan)(struct wpan_phy *wpan_phy, + struct cfg802154_scan_request *request); + int (*abort_scan)(struct wpan_phy *wpan_phy, + struct wpan_dev *wpan_dev); #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL void (*get_llsec_table)(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, @@ -278,6 +283,26 @@ struct ieee802154_coord_desc { bool gts_permit; }; +/** + * struct cfg802154_scan_request - Scan request + * + * @type: type of scan to be performed + * @page: page on which to perform the scan + * @channels: channels in te %page to be scanned + * @duration: time spent on each channel, calculated with: + * aBaseSuperframeDuration * (2 ^ duration + 1) + * @wpan_dev: the wpan device on which to perform the scan + * @wpan_phy: the wpan phy on which to perform the scan + */ +struct cfg802154_scan_request { + enum nl802154_scan_types type; + u8 page; + u32 channels; + u8 duration; + struct wpan_dev *wpan_dev; + struct wpan_phy *wpan_phy; +}; + struct ieee802154_llsec_key_id { u8 mode; u8 id; diff --git a/include/net/nl802154.h b/include/net/nl802154.h index b79a89d5207c..c267fa1c5aac 100644 --- a/include/net/nl802154.h +++ b/include/net/nl802154.h @@ -73,6 +73,9 @@ enum nl802154_commands { NL802154_CMD_DEL_SEC_LEVEL, NL802154_CMD_SCAN_EVENT, + NL802154_CMD_TRIGGER_SCAN, + NL802154_CMD_ABORT_SCAN, + NL802154_CMD_SCAN_DONE, /* add new commands above here */ @@ -134,6 +137,13 @@ enum nl802154_attrs { NL802154_ATTR_NETNS_FD, NL802154_ATTR_COORDINATOR, + NL802154_ATTR_SCAN_TYPE, + NL802154_ATTR_SCAN_FLAGS, + NL802154_ATTR_SCAN_CHANNELS, + NL802154_ATTR_SCAN_PREAMBLE_CODES, + NL802154_ATTR_SCAN_MEAN_PRF, + NL802154_ATTR_SCAN_DURATION, + NL802154_ATTR_SCAN_DONE_REASON, /* add attributes here, update the policy in nl802154.c */ @@ -259,6 +269,54 @@ enum nl802154_coord { NL802154_COORD_MAX, }; +/** + * enum nl802154_scan_types - Scan types + * + * @__NL802154_SCAN_INVALID: scan type number 0 is reserved + * @NL802154_SCAN_ED: An ED scan allows a device to obtain a measure of the peak + * energy in each requested channel + * @NL802154_SCAN_ACTIVE: Locate any coordinator transmitting Beacon frames using + * a Beacon Request command + * @NL802154_SCAN_PASSIVE: Locate any coordinator transmitting Beacon frames + * @NL802154_SCAN_ORPHAN: Relocate coordinator following a loss of synchronisation + * @NL802154_SCAN_ENHANCED_ACTIVE: Same as Active using Enhanced Beacon Request + * command instead of Beacon Request command + * @NL802154_SCAN_RIT_PASSIVE: Passive scan for RIT Data Request command frames + * instead of Beacon frames + * @NL802154_SCAN_ATTR_MAX: Maximum SCAN attribute number + */ +enum nl802154_scan_types { + __NL802154_SCAN_INVALID, + NL802154_SCAN_ED, + NL802154_SCAN_ACTIVE, + NL802154_SCAN_PASSIVE, + NL802154_SCAN_ORPHAN, + NL802154_SCAN_ENHANCED_ACTIVE, + NL802154_SCAN_RIT_PASSIVE, + + /* keep last */ + NL802154_SCAN_ATTR_MAX, +}; + +/** + * enum nl802154_scan_done_reasons - End of scan reasons + * + * @__NL802154_SCAN_DONE_REASON_INVALID: scan done reason number 0 is reserved. + * @NL802154_SCAN_DONE_REASON_FINISHED: The scan just finished naturally after + * going through all the requested and possible (complex) channels. + * @NL802154_SCAN_DONE_REASON_ABORTED: The scan was aborted upon user request. + * a Beacon Request command + * @NL802154_SCAN_DONE_REASON_MAX: Maximum scan done reason attribute number. + */ +enum nl802154_scan_done_reasons { + __NL802154_SCAN_DONE_REASON_INVALID, + NL802154_SCAN_DONE_REASON_FINISHED, + NL802154_SCAN_DONE_REASON_ABORTED, + + /* keep last */ + NL802154_SCAN_DONE_REASON_MAX, +}; + /** * enum nl802154_cca_modes - cca modes * diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index 248ad5e46969..11aa693af449 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -221,6 +221,13 @@ static const struct nla_policy nl802154_policy[NL802154_ATTR_MAX+1] = { [NL802154_ATTR_COORDINATOR] = { .type = NLA_NESTED }, + [NL802154_ATTR_SCAN_TYPE] = { .type = NLA_U8 }, + [NL802154_ATTR_SCAN_CHANNELS] = { .type = NLA_U32 }, + [NL802154_ATTR_SCAN_PREAMBLE_CODES] = { .type = NLA_U64 }, + [NL802154_ATTR_SCAN_MEAN_PRF] = { .type = NLA_U8 }, + [NL802154_ATTR_SCAN_DURATION] = { .type = NLA_U8 }, + [NL802154_ATTR_SCAN_DONE_REASON] = { .type = NLA_U8 }, + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL [NL802154_ATTR_SEC_ENABLED] = { .type = NLA_U8, }, [NL802154_ATTR_SEC_OUT_LEVEL] = { .type = NLA_U32, }, @@ -1384,6 +1391,203 @@ int nl802154_scan_event(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, } EXPORT_SYMBOL_GPL(nl802154_scan_event); +static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info) +{ + struct cfg802154_registered_device *rdev = info->user_ptr[0]; + struct net_device *dev = info->user_ptr[1]; + struct wpan_dev *wpan_dev = dev->ieee802154_ptr; + struct wpan_phy *wpan_phy = &rdev->wpan_phy; + struct cfg802154_scan_request *request; + u8 type; + int err; + + /* Monitors are not allowed to perform scans */ + if (wpan_dev->iftype == NL802154_IFTYPE_MONITOR) + return -EPERM; + + request = kzalloc(sizeof(*request), GFP_KERNEL); + if (!request) + return -ENOMEM; + + request->wpan_dev = wpan_dev; + request->wpan_phy = wpan_phy; + + type = nla_get_u8(info->attrs[NL802154_ATTR_SCAN_TYPE]); + switch (type) { + case NL802154_SCAN_PASSIVE: + request->type = type; + break; + default: + pr_err("Unsupported scan type: %d\n", type); + err = -EINVAL; + goto free_request; + } + + if (info->attrs[NL802154_ATTR_PAGE]) { + request->page = nla_get_u8(info->attrs[NL802154_ATTR_PAGE]); + if (request->page > IEEE802154_MAX_PAGE) { + pr_err("Invalid page %d > %d\n", + request->page, IEEE802154_MAX_PAGE); + err = -EINVAL; + goto free_request; + } + } else { + /* Use current page by default */ + request->page = wpan_phy->current_page; + } + + if (info->attrs[NL802154_ATTR_SCAN_CHANNELS]) { + request->channels = nla_get_u32(info->attrs[NL802154_ATTR_SCAN_CHANNELS]); + if (request->channels >= BIT(IEEE802154_MAX_CHANNEL + 1)) { + pr_err("Invalid channels bitfield %x ≥ %lx\n", + request->channels, + BIT(IEEE802154_MAX_CHANNEL + 1)); + err = -EINVAL; + goto free_request; + } + } else { + /* Scan all supported channels by default */ + request->channels = wpan_phy->supported.channels[request->page]; + } + + if (info->attrs[NL802154_ATTR_SCAN_PREAMBLE_CODES] || + info->attrs[NL802154_ATTR_SCAN_MEAN_PRF]) { + pr_err("Preamble codes and mean PRF not supported yet\n"); + err = -EINVAL; + goto free_request; + } + + if (info->attrs[NL802154_ATTR_SCAN_DURATION]) { + request->duration = nla_get_u8(info->attrs[NL802154_ATTR_SCAN_DURATION]); + if (request->duration > IEEE802154_MAX_SCAN_DURATION) { + pr_err("Duration is out of range\n"); + err = -EINVAL; + goto free_request; + } + } else { + /* Use maximum duration order by default */ + request->duration = IEEE802154_MAX_SCAN_DURATION; + } + + if (wpan_dev->netdev) + dev_hold(wpan_dev->netdev); + + err = rdev_trigger_scan(rdev, request); + if (err) { + pr_err("Failure starting scanning (%d)\n", err); + goto free_device; + } + + return 0; + +free_device: + if (wpan_dev->netdev) + dev_put(wpan_dev->netdev); +free_request: + kfree(request); + + return err; +} + +static int nl802154_prep_scan_msg(struct sk_buff *msg, + struct cfg802154_registered_device *rdev, + struct wpan_dev *wpan_dev, u32 portid, + u32 seq, int flags, u8 cmd, u8 arg) +{ + void *hdr; + + hdr = nl802154hdr_put(msg, portid, seq, flags, cmd); + if (!hdr) + return -ENOBUFS; + + if (nla_put_u32(msg, NL802154_ATTR_WPAN_PHY, rdev->wpan_phy_idx)) + goto nla_put_failure; + + if (wpan_dev->netdev && + nla_put_u32(msg, NL802154_ATTR_IFINDEX, wpan_dev->netdev->ifindex)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, NL802154_ATTR_WPAN_DEV, + wpan_dev_id(wpan_dev), NL802154_ATTR_PAD)) + goto nla_put_failure; + + if (cmd == NL802154_CMD_SCAN_DONE && + nla_put_u8(msg, NL802154_ATTR_SCAN_DONE_REASON, arg)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + + return -EMSGSIZE; +} + +static int nl802154_send_scan_msg(struct cfg802154_registered_device *rdev, + struct wpan_dev *wpan_dev, u8 cmd, u8 arg) +{ + struct sk_buff *msg; + int ret; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + ret = nl802154_prep_scan_msg(msg, rdev, wpan_dev, 0, 0, 0, cmd, arg); + if (ret < 0) { + nlmsg_free(msg); + return ret; + } + + return genlmsg_multicast_netns(&nl802154_fam, + wpan_phy_net(&rdev->wpan_phy), msg, 0, + NL802154_MCGRP_SCAN, GFP_KERNEL); +} + +int nl802154_scan_started(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev) +{ + struct cfg802154_registered_device *rdev = wpan_phy_to_rdev(wpan_phy); + int err; + + /* Ignore errors when there are no listeners */ + err = nl802154_send_scan_msg(rdev, wpan_dev, NL802154_CMD_TRIGGER_SCAN, 0); + if (err == -ESRCH) + err = 0; + + return err; +} +EXPORT_SYMBOL_GPL(nl802154_scan_started); + +int nl802154_scan_done(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, + enum nl802154_scan_done_reasons reason) +{ + struct cfg802154_registered_device *rdev = wpan_phy_to_rdev(wpan_phy); + int err; + + /* Ignore errors when there are no listeners */ + err = nl802154_send_scan_msg(rdev, wpan_dev, NL802154_CMD_SCAN_DONE, reason); + if (err == -ESRCH) + err = 0; + + if (wpan_dev->netdev) + dev_put(wpan_dev->netdev); + + return err; +} +EXPORT_SYMBOL_GPL(nl802154_scan_done); + +static int nl802154_abort_scan(struct sk_buff *skb, struct genl_info *info) +{ + struct cfg802154_registered_device *rdev = info->user_ptr[0]; + struct net_device *dev = info->user_ptr[1]; + struct wpan_dev *wpan_dev = dev->ieee802154_ptr; + + /* Resources are released in the notification helper above */ + return rdev_abort_scan(rdev, wpan_dev); +} + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL static const struct nla_policy nl802154_dev_addr_policy[NL802154_DEV_ADDR_ATTR_MAX + 1] = { [NL802154_DEV_ADDR_ATTR_PAN_ID] = { .type = NLA_U16 }, @@ -2474,6 +2678,22 @@ static const struct genl_ops nl802154_ops[] = { .internal_flags = NL802154_FLAG_NEED_NETDEV | NL802154_FLAG_NEED_RTNL, }, + { + .cmd = NL802154_CMD_TRIGGER_SCAN, + .doit = nl802154_trigger_scan, + .flags = GENL_ADMIN_PERM, + .internal_flags = NL802154_FLAG_NEED_NETDEV | + NL802154_FLAG_CHECK_NETDEV_UP | + NL802154_FLAG_NEED_RTNL, + }, + { + .cmd = NL802154_CMD_ABORT_SCAN, + .doit = nl802154_abort_scan, + .flags = GENL_ADMIN_PERM, + .internal_flags = NL802154_FLAG_NEED_NETDEV | + NL802154_FLAG_CHECK_NETDEV_UP | + NL802154_FLAG_NEED_RTNL, + }, #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL { .cmd = NL802154_CMD_SET_SEC_PARAMS, diff --git a/net/ieee802154/nl802154.h b/net/ieee802154/nl802154.h index 89b805500032..cfa7134be747 100644 --- a/net/ieee802154/nl802154.h +++ b/net/ieee802154/nl802154.h @@ -6,5 +6,8 @@ int nl802154_init(void); void nl802154_exit(void); int nl802154_scan_event(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, struct ieee802154_coord_desc *desc); +int nl802154_scan_started(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev); +int nl802154_scan_done(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, + enum nl802154_scan_done_reasons reason); #endif /* __IEEE802154_NL802154_H */ diff --git a/net/ieee802154/rdev-ops.h b/net/ieee802154/rdev-ops.h index 598f5af49775..e171d74c3251 100644 --- a/net/ieee802154/rdev-ops.h +++ b/net/ieee802154/rdev-ops.h @@ -209,6 +209,34 @@ rdev_set_ackreq_default(struct cfg802154_registered_device *rdev, return ret; } +static inline int rdev_trigger_scan(struct cfg802154_registered_device *rdev, + struct cfg802154_scan_request *request) +{ + int ret; + + if (!rdev->ops->trigger_scan) + return -EOPNOTSUPP; + + trace_802154_rdev_trigger_scan(&rdev->wpan_phy, request); + ret = rdev->ops->trigger_scan(&rdev->wpan_phy, request); + trace_802154_rdev_return_int(&rdev->wpan_phy, ret); + return ret; +} + +static inline int rdev_abort_scan(struct cfg802154_registered_device *rdev, + struct wpan_dev *wpan_dev) +{ + int ret; + + if (!rdev->ops->abort_scan) + return -EOPNOTSUPP; + + trace_802154_rdev_abort_scan(&rdev->wpan_phy, wpan_dev); + ret = rdev->ops->abort_scan(&rdev->wpan_phy, wpan_dev); + trace_802154_rdev_return_int(&rdev->wpan_phy, ret); + return ret; +} + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL /* TODO this is already a nl802154, so move into ieee802154 */ static inline void diff --git a/net/ieee802154/trace.h b/net/ieee802154/trace.h index 19c2e5d60e76..e5405f737ded 100644 --- a/net/ieee802154/trace.h +++ b/net/ieee802154/trace.h @@ -295,6 +295,46 @@ TRACE_EVENT(802154_rdev_set_ackreq_default, WPAN_DEV_PR_ARG, BOOL_TO_STR(__entry->ackreq)) ); +TRACE_EVENT(802154_rdev_trigger_scan, + TP_PROTO(struct wpan_phy *wpan_phy, + struct cfg802154_scan_request *request), + TP_ARGS(wpan_phy, request), + TP_STRUCT__entry( + WPAN_PHY_ENTRY + __field(u8, page) + __field(u32, channels) + __field(u8, duration) + ), + TP_fast_assign( + WPAN_PHY_ASSIGN; + __entry->page = request->page; + __entry->channels = request->channels; + __entry->duration = request->duration; + ), + TP_printk(WPAN_PHY_PR_FMT ", scan, page: %d, channels: %x, duration %d", + WPAN_PHY_PR_ARG, __entry->page, __entry->channels, __entry->duration) +); + +DECLARE_EVENT_CLASS(802154_wdev_template, + TP_PROTO(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev), + TP_ARGS(wpan_phy, wpan_dev), + TP_STRUCT__entry( + WPAN_PHY_ENTRY + WPAN_DEV_ENTRY + ), + TP_fast_assign( + WPAN_PHY_ASSIGN; + WPAN_DEV_ASSIGN; + ), + TP_printk(WPAN_PHY_PR_FMT ", " WPAN_DEV_PR_FMT, + WPAN_PHY_PR_ARG, WPAN_DEV_PR_ARG) +); + +DEFINE_EVENT(802154_wdev_template, 802154_rdev_abort_scan, + TP_PROTO(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev), + TP_ARGS(wpan_phy, wpan_dev) +); + TRACE_EVENT(802154_rdev_return_int, TP_PROTO(struct wpan_phy *wpan_phy, int ret), TP_ARGS(wpan_phy, ret), -- cgit v1.2.3 From d2aaf2a01792ccf214f933d0b1ca2d41788c7b16 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 3 Jan 2023 17:56:41 +0100 Subject: ieee802154: Introduce a helper to validate a channel This helper for now only checks if the page member and channel member are valid (in the specification range) and supported (by checking the device capabilities). Soon two new parameters will be introduced and having this helper will let us only modify its content rather than modifying the logic everywhere else in the subsystem. There is not functional change. Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230103165644.432209-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/net/cfg802154.h | 11 +++++++++++ net/ieee802154/nl802154.c | 3 +-- 2 files changed, 12 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index 76d4f95e9974..1184b543fba7 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -246,6 +246,17 @@ static inline void wpan_phy_net_set(struct wpan_phy *wpan_phy, struct net *net) write_pnet(&wpan_phy->_net, net); } +static inline bool ieee802154_chan_is_valid(struct wpan_phy *phy, + u8 page, u8 channel) +{ + if (page > IEEE802154_MAX_PAGE || + channel > IEEE802154_MAX_CHANNEL || + !(phy->supported.channels[page] & BIT(channel))) + return false; + + return true; +} + /** * struct ieee802154_addr - IEEE802.15.4 device address * @mode: Address mode from frame header. Can be one of: diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index 11aa693af449..0b7a9f16b3b6 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -976,8 +976,7 @@ static int nl802154_set_channel(struct sk_buff *skb, struct genl_info *info) channel = nla_get_u8(info->attrs[NL802154_ATTR_CHANNEL]); /* check 802.15.4 constraints */ - if (page > IEEE802154_MAX_PAGE || channel > IEEE802154_MAX_CHANNEL || - !(rdev->wpan_phy.supported.channels[page] & BIT(channel))) + if (!ieee802154_chan_is_valid(&rdev->wpan_phy, page, channel)) return -EINVAL; return rdev_set_channel(rdev, page, channel); -- cgit v1.2.3 From 5755cd4d9432779027771e43e51d81a2994ed795 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 3 Jan 2023 17:56:42 +0100 Subject: mac802154: Prepare forcing specific symbol duration The scan logic will bypass the whole ->set_channel() logic from the top by calling the driver hook to just switch between channels when required. We can no longer rely on the "current" page/channel settings to set the right symbol duration. Let's add these as new parameters to allow providing the page/channel couple that we want. There is no functional change. Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230103165644.432209-5-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/net/cfg802154.h | 3 ++- net/mac802154/cfg.c | 2 +- net/mac802154/main.c | 20 +++++++++++--------- 3 files changed, 14 insertions(+), 11 deletions(-) (limited to 'net') diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index 1184b543fba7..c16ae5d2dc86 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -483,6 +483,7 @@ static inline const char *wpan_phy_name(struct wpan_phy *phy) return dev_name(&phy->dev); } -void ieee802154_configure_durations(struct wpan_phy *phy); +void ieee802154_configure_durations(struct wpan_phy *phy, + unsigned int page, unsigned int channel); #endif /* __NET_CFG802154_H */ diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c index dc2d918fac68..469d6e8dd2dd 100644 --- a/net/mac802154/cfg.c +++ b/net/mac802154/cfg.c @@ -118,7 +118,7 @@ ieee802154_set_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel) if (!ret) { wpan_phy->current_page = page; wpan_phy->current_channel = channel; - ieee802154_configure_durations(wpan_phy); + ieee802154_configure_durations(wpan_phy, page, channel); } return ret; diff --git a/net/mac802154/main.c b/net/mac802154/main.c index 3ed31daf7b9c..12a13a850fdf 100644 --- a/net/mac802154/main.c +++ b/net/mac802154/main.c @@ -113,32 +113,33 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) } EXPORT_SYMBOL(ieee802154_alloc_hw); -void ieee802154_configure_durations(struct wpan_phy *phy) +void ieee802154_configure_durations(struct wpan_phy *phy, + unsigned int page, unsigned int channel) { u32 duration = 0; - switch (phy->current_page) { + switch (page) { case 0: - if (BIT(phy->current_channel) & 0x1) + if (BIT(channel) & 0x1) /* 868 MHz BPSK 802.15.4-2003: 20 ksym/s */ duration = 50 * NSEC_PER_USEC; - else if (BIT(phy->current_channel) & 0x7FE) + else if (BIT(channel) & 0x7FE) /* 915 MHz BPSK 802.15.4-2003: 40 ksym/s */ duration = 25 * NSEC_PER_USEC; - else if (BIT(phy->current_channel) & 0x7FFF800) + else if (BIT(channel) & 0x7FFF800) /* 2400 MHz O-QPSK 802.15.4-2006: 62.5 ksym/s */ duration = 16 * NSEC_PER_USEC; break; case 2: - if (BIT(phy->current_channel) & 0x1) + if (BIT(channel) & 0x1) /* 868 MHz O-QPSK 802.15.4-2006: 25 ksym/s */ duration = 40 * NSEC_PER_USEC; - else if (BIT(phy->current_channel) & 0x7FE) + else if (BIT(channel) & 0x7FE) /* 915 MHz O-QPSK 802.15.4-2006: 62.5 ksym/s */ duration = 16 * NSEC_PER_USEC; break; case 3: - if (BIT(phy->current_channel) & 0x3FFF) + if (BIT(channel) & 0x3FFF) /* 2.4 GHz CSS 802.15.4a-2007: 1/6 Msym/s */ duration = 6 * NSEC_PER_USEC; break; @@ -201,7 +202,8 @@ int ieee802154_register_hw(struct ieee802154_hw *hw) ieee802154_setup_wpan_phy_pib(local->phy); - ieee802154_configure_durations(local->phy); + ieee802154_configure_durations(local->phy, local->phy->current_page, + local->phy->current_channel); if (!(hw->flags & IEEE802154_HW_CSMA_PARAMS)) { local->phy->supported.min_csma_backoffs = 4; -- cgit v1.2.3 From dd18096256c89612e3eb858de748c29d10e8f3e7 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 3 Jan 2023 17:56:43 +0100 Subject: mac802154: Add MLME Tx locked helpers These have the exact same behavior as before, except they expect the rtnl to be already taken (and will complain otherwise). This allows performing MLME transmissions from different contexts. Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230103165644.432209-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/mac802154/ieee802154_i.h | 6 ++++++ net/mac802154/tx.c | 42 +++++++++++++++++++++++++++++------------- 2 files changed, 35 insertions(+), 13 deletions(-) (limited to 'net') diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h index 509e0172fe82..aeadee543a9c 100644 --- a/net/mac802154/ieee802154_i.h +++ b/net/mac802154/ieee802154_i.h @@ -141,10 +141,16 @@ int ieee802154_mlme_op_pre(struct ieee802154_local *local); int ieee802154_mlme_tx(struct ieee802154_local *local, struct ieee802154_sub_if_data *sdata, struct sk_buff *skb); +int ieee802154_mlme_tx_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata, + struct sk_buff *skb); void ieee802154_mlme_op_post(struct ieee802154_local *local); int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct ieee802154_sub_if_data *sdata, struct sk_buff *skb); +int ieee802154_mlme_tx_one_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata, + struct sk_buff *skb); netdev_tx_t ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev); netdev_tx_t diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c index 9d8d43cf1e64..2a6f1ed763c9 100644 --- a/net/mac802154/tx.c +++ b/net/mac802154/tx.c @@ -137,34 +137,37 @@ int ieee802154_mlme_op_pre(struct ieee802154_local *local) return ieee802154_sync_and_hold_queue(local); } -int ieee802154_mlme_tx(struct ieee802154_local *local, - struct ieee802154_sub_if_data *sdata, - struct sk_buff *skb) +int ieee802154_mlme_tx_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata, + struct sk_buff *skb) { - int ret; - /* Avoid possible calls to ->ndo_stop() when we asynchronously perform * MLME transmissions. */ - rtnl_lock(); + ASSERT_RTNL(); /* Ensure the device was not stopped, otherwise error out */ - if (!local->open_count) { - rtnl_unlock(); + if (!local->open_count) return -ENETDOWN; - } /* Warn if the ieee802154 core thinks MLME frames can be sent while the * net interface expects this cannot happen. */ - if (WARN_ON_ONCE(!netif_running(sdata->dev))) { - rtnl_unlock(); + if (WARN_ON_ONCE(!netif_running(sdata->dev))) return -ENETDOWN; - } ieee802154_tx(local, skb); - ret = ieee802154_sync_queue(local); + return ieee802154_sync_queue(local); +} + +int ieee802154_mlme_tx(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata, + struct sk_buff *skb) +{ + int ret; + rtnl_lock(); + ret = ieee802154_mlme_tx_locked(local, sdata, skb); rtnl_unlock(); return ret; @@ -188,6 +191,19 @@ int ieee802154_mlme_tx_one(struct ieee802154_local *local, return ret; } +int ieee802154_mlme_tx_one_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata, + struct sk_buff *skb) +{ + int ret; + + ieee802154_mlme_op_pre(local); + ret = ieee802154_mlme_tx_locked(local, sdata, skb); + ieee802154_mlme_op_post(local); + + return ret; +} + static bool ieee802154_queue_is_stopped(struct ieee802154_local *local) { return test_bit(WPAN_PHY_FLAG_STATE_QUEUE_STOPPED, &local->phy->flags); -- cgit v1.2.3 From 57588c71177f0bfc08509c2c3a9bfe32850c0786 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 3 Jan 2023 17:56:44 +0100 Subject: mac802154: Handle passive scanning Implement the core hooks in order to provide the softMAC layer support for passive scans. Scans are requested by the user and can be aborted. Changing channels manually is prohibited during scans. The implementation uses a workqueue triggered at a certain interval depending on the symbol duration for the current channel and the duration order provided. More advanced drivers with internal scheduling capabilities might require additional care but there is none mainline yet. Received beacons during a passive scan are processed in a work queue and their result forwarded to the upper layer. Active scanning is not supported yet. Co-developed-by: David Girault Signed-off-by: David Girault Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230103165644.432209-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/linux/ieee802154.h | 4 + include/net/cfg802154.h | 16 +++ net/mac802154/Makefile | 2 +- net/mac802154/cfg.c | 31 +++++ net/mac802154/ieee802154_i.h | 37 +++++- net/mac802154/iface.c | 3 + net/mac802154/main.c | 16 ++- net/mac802154/rx.c | 36 +++++- net/mac802154/scan.c | 288 +++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 427 insertions(+), 6 deletions(-) create mode 100644 net/mac802154/scan.c (limited to 'net') diff --git a/include/linux/ieee802154.h b/include/linux/ieee802154.h index b22e4147d334..140f61ec0f5f 100644 --- a/include/linux/ieee802154.h +++ b/include/linux/ieee802154.h @@ -47,6 +47,10 @@ /* Duration in superframe order */ #define IEEE802154_MAX_SCAN_DURATION 14 #define IEEE802154_ACTIVE_SCAN_DURATION 15 +/* Superframe duration in slots */ +#define IEEE802154_SUPERFRAME_PERIOD 16 +/* Various periods expressed in symbols */ +#define IEEE802154_SLOT_PERIOD 60 #define IEEE802154_LIFS_PERIOD 40 #define IEEE802154_SIFS_PERIOD 12 #define IEEE802154_MAX_SIFS_FRAME_SIZE 18 diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index c16ae5d2dc86..0b0f81a945b6 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -314,6 +314,22 @@ struct cfg802154_scan_request { struct wpan_phy *wpan_phy; }; +/** + * struct cfg802154_mac_pkt - MAC packet descriptor (beacon/command) + * @node: MAC packets to process list member + * @skb: the received sk_buff + * @sdata: the interface on which @skb was received + * @page: page configuration when @skb was received + * @channel: channel configuration when @skb was received + */ +struct cfg802154_mac_pkt { + struct list_head node; + struct sk_buff *skb; + struct ieee802154_sub_if_data *sdata; + u8 page; + u8 channel; +}; + struct ieee802154_llsec_key_id { u8 mode; u8 id; diff --git a/net/mac802154/Makefile b/net/mac802154/Makefile index 4059295fdbf8..43d1347b37ee 100644 --- a/net/mac802154/Makefile +++ b/net/mac802154/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_MAC802154) += mac802154.o mac802154-objs := main.o rx.o tx.o mac_cmd.o mib.o \ - iface.o llsec.o util.o cfg.o trace.o + iface.o llsec.o util.o cfg.o scan.o trace.o CFLAGS_trace.o := -I$(src) diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c index 469d6e8dd2dd..187cebcaf233 100644 --- a/net/mac802154/cfg.c +++ b/net/mac802154/cfg.c @@ -114,6 +114,10 @@ ieee802154_set_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel) wpan_phy->current_channel == channel) return 0; + /* Refuse to change channels during a scanning operation */ + if (mac802154_is_scanning(local)) + return -EBUSY; + ret = drv_set_channel(local, page, channel); if (!ret) { wpan_phy->current_page = page; @@ -261,6 +265,31 @@ ieee802154_set_ackreq_default(struct wpan_phy *wpan_phy, return 0; } +static int mac802154_trigger_scan(struct wpan_phy *wpan_phy, + struct cfg802154_scan_request *request) +{ + struct ieee802154_sub_if_data *sdata; + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(request->wpan_dev); + + ASSERT_RTNL(); + + return mac802154_trigger_scan_locked(sdata, request); +} + +static int mac802154_abort_scan(struct wpan_phy *wpan_phy, + struct wpan_dev *wpan_dev) +{ + struct ieee802154_local *local = wpan_phy_priv(wpan_phy); + struct ieee802154_sub_if_data *sdata; + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(wpan_dev); + + ASSERT_RTNL(); + + return mac802154_abort_scan_locked(local, sdata); +} + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL static void ieee802154_get_llsec_table(struct wpan_phy *wpan_phy, @@ -468,6 +497,8 @@ const struct cfg802154_ops mac802154_config_ops = { .set_max_frame_retries = ieee802154_set_max_frame_retries, .set_lbt_mode = ieee802154_set_lbt_mode, .set_ackreq_default = ieee802154_set_ackreq_default, + .trigger_scan = mac802154_trigger_scan, + .abort_scan = mac802154_abort_scan, #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL .get_llsec_table = ieee802154_get_llsec_table, .lock_llsec_table = ieee802154_lock_llsec_table, diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h index aeadee543a9c..0e4db967bd1d 100644 --- a/net/mac802154/ieee802154_i.h +++ b/net/mac802154/ieee802154_i.h @@ -21,6 +21,10 @@ #include "llsec.h" +enum ieee802154_ongoing { + IEEE802154_IS_SCANNING = BIT(0), +}; + /* mac802154 device private data */ struct ieee802154_local { struct ieee802154_hw hw; @@ -43,15 +47,26 @@ struct ieee802154_local { struct list_head interfaces; struct mutex iflist_mtx; - /* This one is used for scanning and other jobs not to be interfered - * with serial driver. - */ + /* Data related workqueue */ struct workqueue_struct *workqueue; + /* MAC commands related workqueue */ + struct workqueue_struct *mac_wq; struct hrtimer ifs_timer; + /* Scanning */ + u8 scan_page; + u8 scan_channel; + struct cfg802154_scan_request __rcu *scan_req; + struct delayed_work scan_work; + + /* Asynchronous tasks */ + struct list_head rx_beacon_list; + struct work_struct rx_beacon_work; + bool started; bool suspended; + unsigned long ongoing; struct tasklet_struct tasklet; struct sk_buff_head skb_queue; @@ -226,6 +241,22 @@ void mac802154_unlock_table(struct net_device *dev); int mac802154_wpan_update_llsec(struct net_device *dev); +/* PAN management handling */ +void mac802154_scan_worker(struct work_struct *work); +int mac802154_trigger_scan_locked(struct ieee802154_sub_if_data *sdata, + struct cfg802154_scan_request *request); +int mac802154_abort_scan_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata); +int mac802154_process_beacon(struct ieee802154_local *local, + struct sk_buff *skb, + u8 page, u8 channel); +void mac802154_rx_beacon_worker(struct work_struct *work); + +static inline bool mac802154_is_scanning(struct ieee802154_local *local) +{ + return test_bit(IEEE802154_IS_SCANNING, &local->ongoing); +} + /* interface handling */ int ieee802154_iface_init(void); void ieee802154_iface_exit(void); diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c index 7de2f843379c..a5958d323ea3 100644 --- a/net/mac802154/iface.c +++ b/net/mac802154/iface.c @@ -302,6 +302,9 @@ static int mac802154_slave_close(struct net_device *dev) ASSERT_RTNL(); + if (mac802154_is_scanning(local)) + mac802154_abort_scan_locked(local, sdata); + netif_stop_queue(dev); local->open_count--; diff --git a/net/mac802154/main.c b/net/mac802154/main.c index 12a13a850fdf..b1111279e06d 100644 --- a/net/mac802154/main.c +++ b/net/mac802154/main.c @@ -89,6 +89,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) local->ops = ops; INIT_LIST_HEAD(&local->interfaces); + INIT_LIST_HEAD(&local->rx_beacon_list); mutex_init(&local->iflist_mtx); tasklet_setup(&local->tasklet, ieee802154_tasklet_handler); @@ -96,6 +97,8 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) skb_queue_head_init(&local->skb_queue); INIT_WORK(&local->sync_tx_work, ieee802154_xmit_sync_worker); + INIT_DELAYED_WORK(&local->scan_work, mac802154_scan_worker); + INIT_WORK(&local->rx_beacon_work, mac802154_rx_beacon_worker); /* init supported flags with 802.15.4 default ranges */ phy->supported.max_minbe = 8; @@ -185,6 +188,7 @@ static void ieee802154_setup_wpan_phy_pib(struct wpan_phy *wpan_phy) int ieee802154_register_hw(struct ieee802154_hw *hw) { struct ieee802154_local *local = hw_to_local(hw); + char mac_wq_name[IFNAMSIZ + 10] = {}; struct net_device *dev; int rc = -ENOSYS; @@ -195,6 +199,13 @@ int ieee802154_register_hw(struct ieee802154_hw *hw) goto out; } + snprintf(mac_wq_name, IFNAMSIZ + 10, "%s-mac-cmds", wpan_phy_name(local->phy)); + local->mac_wq = create_singlethread_workqueue(mac_wq_name); + if (!local->mac_wq) { + rc = -ENOMEM; + goto out_wq; + } + hrtimer_init(&local->ifs_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); local->ifs_timer.function = ieee802154_xmit_ifs_timer; @@ -224,7 +235,7 @@ int ieee802154_register_hw(struct ieee802154_hw *hw) rc = wpan_phy_register(local->phy); if (rc < 0) - goto out_wq; + goto out_mac_wq; rtnl_lock(); @@ -243,6 +254,8 @@ int ieee802154_register_hw(struct ieee802154_hw *hw) out_phy: wpan_phy_unregister(local->phy); +out_mac_wq: + destroy_workqueue(local->mac_wq); out_wq: destroy_workqueue(local->workqueue); out: @@ -263,6 +276,7 @@ void ieee802154_unregister_hw(struct ieee802154_hw *hw) rtnl_unlock(); + destroy_workqueue(local->mac_wq); destroy_workqueue(local->workqueue); wpan_phy_unregister(local->phy); } diff --git a/net/mac802154/rx.c b/net/mac802154/rx.c index c2aae2a6d6a6..2b0a80571097 100644 --- a/net/mac802154/rx.c +++ b/net/mac802154/rx.c @@ -29,12 +29,31 @@ static int ieee802154_deliver_skb(struct sk_buff *skb) return netif_receive_skb(skb); } +void mac802154_rx_beacon_worker(struct work_struct *work) +{ + struct ieee802154_local *local = + container_of(work, struct ieee802154_local, rx_beacon_work); + struct cfg802154_mac_pkt *mac_pkt; + + mac_pkt = list_first_entry_or_null(&local->rx_beacon_list, + struct cfg802154_mac_pkt, node); + if (!mac_pkt) + return; + + mac802154_process_beacon(local, mac_pkt->skb, mac_pkt->page, mac_pkt->channel); + + list_del(&mac_pkt->node); + kfree_skb(mac_pkt->skb); + kfree(mac_pkt); +} + static int ieee802154_subif_frame(struct ieee802154_sub_if_data *sdata, struct sk_buff *skb, const struct ieee802154_hdr *hdr) { - struct wpan_dev *wpan_dev = &sdata->wpan_dev; struct wpan_phy *wpan_phy = sdata->local->hw.phy; + struct wpan_dev *wpan_dev = &sdata->wpan_dev; + struct cfg802154_mac_pkt *mac_pkt; __le16 span, sshort; int rc; @@ -106,6 +125,21 @@ ieee802154_subif_frame(struct ieee802154_sub_if_data *sdata, switch (mac_cb(skb)->type) { case IEEE802154_FC_TYPE_BEACON: + dev_dbg(&sdata->dev->dev, "BEACON received\n"); + if (!mac802154_is_scanning(sdata->local)) + goto fail; + + mac_pkt = kzalloc(sizeof(*mac_pkt), GFP_ATOMIC); + if (!mac_pkt) + goto fail; + + mac_pkt->skb = skb_get(skb); + mac_pkt->sdata = sdata; + mac_pkt->page = sdata->local->scan_page; + mac_pkt->channel = sdata->local->scan_channel; + list_add_tail(&mac_pkt->node, &sdata->local->rx_beacon_list); + queue_work(sdata->local->mac_wq, &sdata->local->rx_beacon_work); + return NET_RX_SUCCESS; case IEEE802154_FC_TYPE_ACK: case IEEE802154_FC_TYPE_MAC_CMD: goto fail; diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c new file mode 100644 index 000000000000..56056b9c93c1 --- /dev/null +++ b/net/mac802154/scan.c @@ -0,0 +1,288 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * IEEE 802.15.4 scanning management + * + * Copyright (C) 2021 Qorvo US, Inc + * Authors: + * - David Girault + * - Miquel Raynal + */ + +#include +#include +#include + +#include "ieee802154_i.h" +#include "driver-ops.h" +#include "../ieee802154/nl802154.h" + +/* mac802154_scan_cleanup_locked() must be called upon scan completion or abort. + * - Completions are asynchronous, not locked by the rtnl and decided by the + * scan worker. + * - Aborts are decided by userspace, and locked by the rtnl. + * + * Concurrent modifications to the PHY, the interfaces or the hardware is in + * general prevented by the rtnl. So in most cases we don't need additional + * protection. + * + * However, the scan worker get's triggered without anybody noticing and thus we + * must ensure the presence of the devices as well as data consistency: + * - The sub-interface and device driver module get both their reference + * counters incremented whenever we start a scan, so they cannot disappear + * during operation. + * - Data consistency is achieved by the use of rcu protected pointers. + */ +static int mac802154_scan_cleanup_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata, + bool aborted) +{ + struct wpan_dev *wpan_dev = &sdata->wpan_dev; + struct wpan_phy *wpan_phy = local->phy; + struct cfg802154_scan_request *request; + u8 arg; + + /* Prevent any further use of the scan request */ + clear_bit(IEEE802154_IS_SCANNING, &local->ongoing); + cancel_delayed_work(&local->scan_work); + request = rcu_replace_pointer(local->scan_req, NULL, 1); + if (!request) + return 0; + kfree_rcu(request); + + /* Advertize first, while we know the devices cannot be removed */ + if (aborted) + arg = NL802154_SCAN_DONE_REASON_ABORTED; + else + arg = NL802154_SCAN_DONE_REASON_FINISHED; + nl802154_scan_done(wpan_phy, wpan_dev, arg); + + /* Cleanup software stack */ + ieee802154_mlme_op_post(local); + + /* Set the hardware back in its original state */ + drv_set_channel(local, wpan_phy->current_page, + wpan_phy->current_channel); + ieee802154_configure_durations(wpan_phy, wpan_phy->current_page, + wpan_phy->current_channel); + drv_stop(local); + synchronize_net(); + sdata->required_filtering = sdata->iface_default_filtering; + drv_start(local, sdata->required_filtering, &local->addr_filt); + + return 0; +} + +int mac802154_abort_scan_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata) +{ + ASSERT_RTNL(); + + if (!mac802154_is_scanning(local)) + return -ESRCH; + + return mac802154_scan_cleanup_locked(local, sdata, true); +} + +static unsigned int mac802154_scan_get_channel_time(u8 duration_order, + u8 symbol_duration) +{ + u64 base_super_frame_duration = (u64)symbol_duration * + IEEE802154_SUPERFRAME_PERIOD * IEEE802154_SLOT_PERIOD; + + return usecs_to_jiffies(base_super_frame_duration * + (BIT(duration_order) + 1)); +} + +static void mac802154_flush_queued_beacons(struct ieee802154_local *local) +{ + struct cfg802154_mac_pkt *mac_pkt, *tmp; + + list_for_each_entry_safe(mac_pkt, tmp, &local->rx_beacon_list, node) { + list_del(&mac_pkt->node); + kfree_skb(mac_pkt->skb); + kfree(mac_pkt); + } +} + +static void +mac802154_scan_get_next_channel(struct ieee802154_local *local, + struct cfg802154_scan_request *scan_req, + u8 *channel) +{ + (*channel)++; + *channel = find_next_bit((const unsigned long *)&scan_req->channels, + IEEE802154_MAX_CHANNEL + 1, + *channel); +} + +static int mac802154_scan_find_next_chan(struct ieee802154_local *local, + struct cfg802154_scan_request *scan_req, + u8 page, u8 *channel) +{ + mac802154_scan_get_next_channel(local, scan_req, channel); + if (*channel > IEEE802154_MAX_CHANNEL) + return -EINVAL; + + return 0; +} + +void mac802154_scan_worker(struct work_struct *work) +{ + struct ieee802154_local *local = + container_of(work, struct ieee802154_local, scan_work.work); + struct cfg802154_scan_request *scan_req; + struct ieee802154_sub_if_data *sdata; + unsigned int scan_duration = 0; + struct wpan_phy *wpan_phy; + u8 scan_req_duration; + u8 page, channel; + int ret; + + /* Ensure the device receiver is turned off when changing channels + * because there is no atomic way to change the channel and know on + * which one a beacon might have been received. + */ + drv_stop(local); + synchronize_net(); + mac802154_flush_queued_beacons(local); + + rcu_read_lock(); + scan_req = rcu_dereference(local->scan_req); + if (unlikely(!scan_req)) { + rcu_read_unlock(); + return; + } + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(scan_req->wpan_dev); + + /* Wait an arbitrary amount of time in case we cannot use the device */ + if (local->suspended || !ieee802154_sdata_running(sdata)) { + rcu_read_unlock(); + queue_delayed_work(local->mac_wq, &local->scan_work, + msecs_to_jiffies(1000)); + return; + } + + wpan_phy = scan_req->wpan_phy; + scan_req_duration = scan_req->duration; + + /* Look for the next valid chan */ + page = local->scan_page; + channel = local->scan_channel; + do { + ret = mac802154_scan_find_next_chan(local, scan_req, page, &channel); + if (ret) { + rcu_read_unlock(); + goto end_scan; + } + } while (!ieee802154_chan_is_valid(scan_req->wpan_phy, page, channel)); + + rcu_read_unlock(); + + /* Bypass the stack on purpose when changing the channel */ + rtnl_lock(); + ret = drv_set_channel(local, page, channel); + rtnl_unlock(); + if (ret) { + dev_err(&sdata->dev->dev, + "Channel change failure during scan, aborting (%d)\n", ret); + goto end_scan; + } + + local->scan_page = page; + local->scan_channel = channel; + + rtnl_lock(); + ret = drv_start(local, IEEE802154_FILTERING_3_SCAN, &local->addr_filt); + rtnl_unlock(); + if (ret) { + dev_err(&sdata->dev->dev, + "Restarting failure after channel change, aborting (%d)\n", ret); + goto end_scan; + } + + ieee802154_configure_durations(wpan_phy, page, channel); + scan_duration = mac802154_scan_get_channel_time(scan_req_duration, + wpan_phy->symbol_duration); + dev_dbg(&sdata->dev->dev, + "Scan page %u channel %u for %ums\n", + page, channel, jiffies_to_msecs(scan_duration)); + queue_delayed_work(local->mac_wq, &local->scan_work, scan_duration); + return; + +end_scan: + rtnl_lock(); + mac802154_scan_cleanup_locked(local, sdata, false); + rtnl_unlock(); +} + +int mac802154_trigger_scan_locked(struct ieee802154_sub_if_data *sdata, + struct cfg802154_scan_request *request) +{ + struct ieee802154_local *local = sdata->local; + + ASSERT_RTNL(); + + if (mac802154_is_scanning(local)) + return -EBUSY; + + /* TODO: support other scanning type */ + if (request->type != NL802154_SCAN_PASSIVE) + return -EOPNOTSUPP; + + /* Store scanning parameters */ + rcu_assign_pointer(local->scan_req, request); + + /* Software scanning requires to set promiscuous mode, so we need to + * pause the Tx queue during the entire operation. + */ + ieee802154_mlme_op_pre(local); + + sdata->required_filtering = IEEE802154_FILTERING_3_SCAN; + local->scan_page = request->page; + local->scan_channel = -1; + set_bit(IEEE802154_IS_SCANNING, &local->ongoing); + + nl802154_scan_started(request->wpan_phy, request->wpan_dev); + + queue_delayed_work(local->mac_wq, &local->scan_work, 0); + + return 0; +} + +int mac802154_process_beacon(struct ieee802154_local *local, + struct sk_buff *skb, + u8 page, u8 channel) +{ + struct ieee802154_beacon_hdr *bh = (void *)skb->data; + struct ieee802154_addr *src = &mac_cb(skb)->source; + struct cfg802154_scan_request *scan_req; + struct ieee802154_coord_desc desc; + + if (skb->len != sizeof(*bh)) + return -EINVAL; + + if (unlikely(src->mode == IEEE802154_ADDR_NONE)) + return -EINVAL; + + dev_dbg(&skb->dev->dev, + "BEACON received on page %u channel %u\n", + page, channel); + + memcpy(&desc.addr, src, sizeof(desc.addr)); + desc.page = page; + desc.channel = channel; + desc.link_quality = mac_cb(skb)->lqi; + desc.superframe_spec = get_unaligned_le16(skb->data); + desc.gts_permit = bh->gts_permit; + + trace_802154_scan_event(&desc); + + rcu_read_lock(); + scan_req = rcu_dereference(local->scan_req); + if (likely(scan_req)) + nl802154_scan_event(scan_req->wpan_phy, scan_req->wpan_dev, &desc); + rcu_read_unlock(); + + return 0; +} -- cgit v1.2.3 From f05bd8ebeb69c803efd6d8a76d96b7fcd7011094 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:17 -0800 Subject: devlink: move code to a dedicated directory The devlink code is hard to navigate with 13kLoC in one file. I really like the way Michal split the ethtool into per-command files and core. It'd probably be too much to split it all up, but we can at least separate the core parts out of the per-cmd implementations and put it in a directory so that new commands can be separate files. Move the code, subsequent commit will do a partial split. Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- MAINTAINERS | 2 +- net/Makefile | 1 + net/core/Makefile | 1 - net/core/devlink.c | 13029 ----------------------------------------------- net/devlink/Makefile | 3 + net/devlink/leftover.c | 13029 +++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 13034 insertions(+), 13031 deletions(-) delete mode 100644 net/core/devlink.c create mode 100644 net/devlink/Makefile create mode 100644 net/devlink/leftover.c (limited to 'net') diff --git a/MAINTAINERS b/MAINTAINERS index ea941dc469fa..0f61b5d64716 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6099,7 +6099,7 @@ S: Supported F: Documentation/networking/devlink F: include/net/devlink.h F: include/uapi/linux/devlink.h -F: net/core/devlink.c +F: net/devlink/ DH ELECTRONICS IMX6 DHCOM/DHCOR BOARD SUPPORT M: Christoph Niedermaier diff --git a/net/Makefile b/net/Makefile index 6a62e5b27378..0914bea9c335 100644 --- a/net/Makefile +++ b/net/Makefile @@ -23,6 +23,7 @@ obj-$(CONFIG_BPFILTER) += bpfilter/ obj-$(CONFIG_PACKET) += packet/ obj-$(CONFIG_NET_KEY) += key/ obj-$(CONFIG_BRIDGE) += bridge/ +obj-$(CONFIG_NET_DEVLINK) += devlink/ obj-$(CONFIG_NET_DSA) += dsa/ obj-$(CONFIG_ATALK) += appletalk/ obj-$(CONFIG_X25) += x25/ diff --git a/net/core/Makefile b/net/core/Makefile index 5857cec87b83..10edd66a8a37 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -33,7 +33,6 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o obj-$(CONFIG_DST_CACHE) += dst_cache.o obj-$(CONFIG_HWBM) += hwbm.o -obj-$(CONFIG_NET_DEVLINK) += devlink.o obj-$(CONFIG_GRO_CELLS) += gro_cells.o obj-$(CONFIG_FAILOVER) += failover.o obj-$(CONFIG_NET_SOCK_MSG) += skmsg.o diff --git a/net/core/devlink.c b/net/core/devlink.c deleted file mode 100644 index 032d6d0a5ce6..000000000000 --- a/net/core/devlink.c +++ /dev/null @@ -1,13029 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * net/core/devlink.c - Network physical/parent device Netlink interface - * - * Heavily inspired by net/wireless/ - * Copyright (c) 2016 Mellanox Technologies. All rights reserved. - * Copyright (c) 2016 Jiri Pirko - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#define CREATE_TRACE_POINTS -#include - -#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \ - (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX) - -struct devlink_dev_stats { - u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; - u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; -}; - -struct devlink { - u32 index; - struct xarray ports; - struct list_head rate_list; - struct list_head sb_list; - struct list_head dpipe_table_list; - struct list_head resource_list; - struct list_head param_list; - struct list_head region_list; - struct list_head reporter_list; - struct mutex reporters_lock; /* protects reporter_list */ - struct devlink_dpipe_headers *dpipe_headers; - struct list_head trap_list; - struct list_head trap_group_list; - struct list_head trap_policer_list; - struct list_head linecard_list; - struct mutex linecards_lock; /* protects linecard_list */ - const struct devlink_ops *ops; - u64 features; - struct xarray snapshot_ids; - struct devlink_dev_stats stats; - struct device *dev; - possible_net_t _net; - /* Serializes access to devlink instance specific objects such as - * port, sb, dpipe, resource, params, region, traps and more. - */ - struct mutex lock; - struct lock_class_key lock_key; - u8 reload_failed:1; - refcount_t refcount; - struct completion comp; - struct rcu_head rcu; - struct notifier_block netdevice_nb; - char priv[] __aligned(NETDEV_ALIGN); -}; - -struct devlink_linecard_ops; -struct devlink_linecard_type; - -struct devlink_linecard { - struct list_head list; - struct devlink *devlink; - unsigned int index; - refcount_t refcount; - const struct devlink_linecard_ops *ops; - void *priv; - enum devlink_linecard_state state; - struct mutex state_lock; /* Protects state */ - const char *type; - struct devlink_linecard_type *types; - unsigned int types_count; - struct devlink *nested_devlink; -}; - -/** - * struct devlink_resource - devlink resource - * @name: name of the resource - * @id: id, per devlink instance - * @size: size of the resource - * @size_new: updated size of the resource, reload is needed - * @size_valid: valid in case the total size of the resource is valid - * including its children - * @parent: parent resource - * @size_params: size parameters - * @list: parent list - * @resource_list: list of child resources - * @occ_get: occupancy getter callback - * @occ_get_priv: occupancy getter callback priv - */ -struct devlink_resource { - const char *name; - u64 id; - u64 size; - u64 size_new; - bool size_valid; - struct devlink_resource *parent; - struct devlink_resource_size_params size_params; - struct list_head list; - struct list_head resource_list; - devlink_resource_occ_get_t *occ_get; - void *occ_get_priv; -}; - -void *devlink_priv(struct devlink *devlink) -{ - return &devlink->priv; -} -EXPORT_SYMBOL_GPL(devlink_priv); - -struct devlink *priv_to_devlink(void *priv) -{ - return container_of(priv, struct devlink, priv); -} -EXPORT_SYMBOL_GPL(priv_to_devlink); - -struct device *devlink_to_dev(const struct devlink *devlink) -{ - return devlink->dev; -} -EXPORT_SYMBOL_GPL(devlink_to_dev); - -static struct devlink_dpipe_field devlink_dpipe_fields_ethernet[] = { - { - .name = "destination mac", - .id = DEVLINK_DPIPE_FIELD_ETHERNET_DST_MAC, - .bitwidth = 48, - }, -}; - -struct devlink_dpipe_header devlink_dpipe_header_ethernet = { - .name = "ethernet", - .id = DEVLINK_DPIPE_HEADER_ETHERNET, - .fields = devlink_dpipe_fields_ethernet, - .fields_count = ARRAY_SIZE(devlink_dpipe_fields_ethernet), - .global = true, -}; -EXPORT_SYMBOL_GPL(devlink_dpipe_header_ethernet); - -static struct devlink_dpipe_field devlink_dpipe_fields_ipv4[] = { - { - .name = "destination ip", - .id = DEVLINK_DPIPE_FIELD_IPV4_DST_IP, - .bitwidth = 32, - }, -}; - -struct devlink_dpipe_header devlink_dpipe_header_ipv4 = { - .name = "ipv4", - .id = DEVLINK_DPIPE_HEADER_IPV4, - .fields = devlink_dpipe_fields_ipv4, - .fields_count = ARRAY_SIZE(devlink_dpipe_fields_ipv4), - .global = true, -}; -EXPORT_SYMBOL_GPL(devlink_dpipe_header_ipv4); - -static struct devlink_dpipe_field devlink_dpipe_fields_ipv6[] = { - { - .name = "destination ip", - .id = DEVLINK_DPIPE_FIELD_IPV6_DST_IP, - .bitwidth = 128, - }, -}; - -struct devlink_dpipe_header devlink_dpipe_header_ipv6 = { - .name = "ipv6", - .id = DEVLINK_DPIPE_HEADER_IPV6, - .fields = devlink_dpipe_fields_ipv6, - .fields_count = ARRAY_SIZE(devlink_dpipe_fields_ipv6), - .global = true, -}; -EXPORT_SYMBOL_GPL(devlink_dpipe_header_ipv6); - -EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwmsg); -EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwerr); -EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report); - -#define DEVLINK_PORT_FN_CAPS_VALID_MASK \ - (_BITUL(__DEVLINK_PORT_FN_ATTR_CAPS_MAX) - 1) - -static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = { - [DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY }, - [DEVLINK_PORT_FN_ATTR_STATE] = - NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FN_STATE_INACTIVE, - DEVLINK_PORT_FN_STATE_ACTIVE), - [DEVLINK_PORT_FN_ATTR_CAPS] = - NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), -}; - -static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ID_MAX + 1] = { - [DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG }, -}; - -static DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC); -#define DEVLINK_REGISTERED XA_MARK_1 -#define DEVLINK_UNREGISTERING XA_MARK_2 - -/* devlink instances are open to the access from the user space after - * devlink_register() call. Such logical barrier allows us to have certain - * expectations related to locking. - * - * Before *_register() - we are in initialization stage and no parallel - * access possible to the devlink instance. All drivers perform that phase - * by implicitly holding device_lock. - * - * After *_register() - users and driver can access devlink instance at - * the same time. - */ -#define ASSERT_DEVLINK_REGISTERED(d) \ - WARN_ON_ONCE(!xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) -#define ASSERT_DEVLINK_NOT_REGISTERED(d) \ - WARN_ON_ONCE(xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) - -struct net *devlink_net(const struct devlink *devlink) -{ - return read_pnet(&devlink->_net); -} -EXPORT_SYMBOL_GPL(devlink_net); - -static void __devlink_put_rcu(struct rcu_head *head) -{ - struct devlink *devlink = container_of(head, struct devlink, rcu); - - complete(&devlink->comp); -} - -void devlink_put(struct devlink *devlink) -{ - if (refcount_dec_and_test(&devlink->refcount)) - /* Make sure unregister operation that may await the completion - * is unblocked only after all users are after the end of - * RCU grace period. - */ - call_rcu(&devlink->rcu, __devlink_put_rcu); -} - -struct devlink *__must_check devlink_try_get(struct devlink *devlink) -{ - if (refcount_inc_not_zero(&devlink->refcount)) - return devlink; - return NULL; -} - -void devl_assert_locked(struct devlink *devlink) -{ - lockdep_assert_held(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_assert_locked); - -#ifdef CONFIG_LOCKDEP -/* For use in conjunction with LOCKDEP only e.g. rcu_dereference_protected() */ -bool devl_lock_is_held(struct devlink *devlink) -{ - return lockdep_is_held(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_lock_is_held); -#endif - -void devl_lock(struct devlink *devlink) -{ - mutex_lock(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_lock); - -int devl_trylock(struct devlink *devlink) -{ - return mutex_trylock(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_trylock); - -void devl_unlock(struct devlink *devlink) -{ - mutex_unlock(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_unlock); - -static struct devlink * -devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter, - void * (*xa_find_fn)(struct xarray *, unsigned long *, - unsigned long, xa_mark_t)) -{ - struct devlink *devlink; - - rcu_read_lock(); -retry: - devlink = xa_find_fn(&devlinks, indexp, ULONG_MAX, DEVLINK_REGISTERED); - if (!devlink) - goto unlock; - - /* In case devlink_unregister() was already called and "unregistering" - * mark was set, do not allow to get a devlink reference here. - * This prevents live-lock of devlink_unregister() wait for completion. - */ - if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING)) - goto retry; - - /* For a possible retry, the xa_find_after() should be always used */ - xa_find_fn = xa_find_after; - if (!devlink_try_get(devlink)) - goto retry; - if (!net_eq(devlink_net(devlink), net)) { - devlink_put(devlink); - goto retry; - } -unlock: - rcu_read_unlock(); - return devlink; -} - -static struct devlink *devlinks_xa_find_get_first(struct net *net, - unsigned long *indexp, - xa_mark_t filter) -{ - return devlinks_xa_find_get(net, indexp, filter, xa_find); -} - -static struct devlink *devlinks_xa_find_get_next(struct net *net, - unsigned long *indexp, - xa_mark_t filter) -{ - return devlinks_xa_find_get(net, indexp, filter, xa_find_after); -} - -/* Iterate over devlink pointers which were possible to get reference to. - * devlink_put() needs to be called for each iterated devlink pointer - * in loop body in order to release the reference. - */ -#define devlinks_xa_for_each_get(net, index, devlink, filter) \ - for (index = 0, \ - devlink = devlinks_xa_find_get_first(net, &index, filter); \ - devlink; devlink = devlinks_xa_find_get_next(net, &index, filter)) - -#define devlinks_xa_for_each_registered_get(net, index, devlink) \ - devlinks_xa_for_each_get(net, index, devlink, DEVLINK_REGISTERED) - -static struct devlink *devlink_get_from_attrs(struct net *net, - struct nlattr **attrs) -{ - struct devlink *devlink; - unsigned long index; - char *busname; - char *devname; - - if (!attrs[DEVLINK_ATTR_BUS_NAME] || !attrs[DEVLINK_ATTR_DEV_NAME]) - return ERR_PTR(-EINVAL); - - busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]); - devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); - - devlinks_xa_for_each_registered_get(net, index, devlink) { - if (strcmp(devlink->dev->bus->name, busname) == 0 && - strcmp(dev_name(devlink->dev), devname) == 0) - return devlink; - devlink_put(devlink); - } - - return ERR_PTR(-ENODEV); -} - -#define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ - WARN_ON_ONCE(!(devlink_port)->registered) -#define ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port) \ - WARN_ON_ONCE((devlink_port)->registered) -#define ASSERT_DEVLINK_PORT_INITIALIZED(devlink_port) \ - WARN_ON_ONCE(!(devlink_port)->initialized) - -static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink, - unsigned int port_index) -{ - return xa_load(&devlink->ports, port_index); -} - -static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, - struct nlattr **attrs) -{ - if (attrs[DEVLINK_ATTR_PORT_INDEX]) { - u32 port_index = nla_get_u32(attrs[DEVLINK_ATTR_PORT_INDEX]); - struct devlink_port *devlink_port; - - devlink_port = devlink_port_get_by_index(devlink, port_index); - if (!devlink_port) - return ERR_PTR(-ENODEV); - return devlink_port; - } - return ERR_PTR(-EINVAL); -} - -static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - return devlink_port_get_from_attrs(devlink, info->attrs); -} - -static inline bool -devlink_rate_is_leaf(struct devlink_rate *devlink_rate) -{ - return devlink_rate->type == DEVLINK_RATE_TYPE_LEAF; -} - -static inline bool -devlink_rate_is_node(struct devlink_rate *devlink_rate) -{ - return devlink_rate->type == DEVLINK_RATE_TYPE_NODE; -} - -static struct devlink_rate * -devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info) -{ - struct devlink_rate *devlink_rate; - struct devlink_port *devlink_port; - - devlink_port = devlink_port_get_from_attrs(devlink, info->attrs); - if (IS_ERR(devlink_port)) - return ERR_CAST(devlink_port); - devlink_rate = devlink_port->devlink_rate; - return devlink_rate ?: ERR_PTR(-ENODEV); -} - -static struct devlink_rate * -devlink_rate_node_get_by_name(struct devlink *devlink, const char *node_name) -{ - static struct devlink_rate *devlink_rate; - - list_for_each_entry(devlink_rate, &devlink->rate_list, list) { - if (devlink_rate_is_node(devlink_rate) && - !strcmp(node_name, devlink_rate->name)) - return devlink_rate; - } - return ERR_PTR(-ENODEV); -} - -static struct devlink_rate * -devlink_rate_node_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) -{ - const char *rate_node_name; - size_t len; - - if (!attrs[DEVLINK_ATTR_RATE_NODE_NAME]) - return ERR_PTR(-EINVAL); - rate_node_name = nla_data(attrs[DEVLINK_ATTR_RATE_NODE_NAME]); - len = strlen(rate_node_name); - /* Name cannot be empty or decimal number */ - if (!len || strspn(rate_node_name, "0123456789") == len) - return ERR_PTR(-EINVAL); - - return devlink_rate_node_get_by_name(devlink, rate_node_name); -} - -static struct devlink_rate * -devlink_rate_node_get_from_info(struct devlink *devlink, struct genl_info *info) -{ - return devlink_rate_node_get_from_attrs(devlink, info->attrs); -} - -static struct devlink_rate * -devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info) -{ - struct nlattr **attrs = info->attrs; - - if (attrs[DEVLINK_ATTR_PORT_INDEX]) - return devlink_rate_leaf_get_from_info(devlink, info); - else if (attrs[DEVLINK_ATTR_RATE_NODE_NAME]) - return devlink_rate_node_get_from_info(devlink, info); - else - return ERR_PTR(-EINVAL); -} - -static struct devlink_linecard * -devlink_linecard_get_by_index(struct devlink *devlink, - unsigned int linecard_index) -{ - struct devlink_linecard *devlink_linecard; - - list_for_each_entry(devlink_linecard, &devlink->linecard_list, list) { - if (devlink_linecard->index == linecard_index) - return devlink_linecard; - } - return NULL; -} - -static bool devlink_linecard_index_exists(struct devlink *devlink, - unsigned int linecard_index) -{ - return devlink_linecard_get_by_index(devlink, linecard_index); -} - -static struct devlink_linecard * -devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) -{ - if (attrs[DEVLINK_ATTR_LINECARD_INDEX]) { - u32 linecard_index = nla_get_u32(attrs[DEVLINK_ATTR_LINECARD_INDEX]); - struct devlink_linecard *linecard; - - mutex_lock(&devlink->linecards_lock); - linecard = devlink_linecard_get_by_index(devlink, linecard_index); - if (linecard) - refcount_inc(&linecard->refcount); - mutex_unlock(&devlink->linecards_lock); - if (!linecard) - return ERR_PTR(-ENODEV); - return linecard; - } - return ERR_PTR(-EINVAL); -} - -static struct devlink_linecard * -devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info) -{ - return devlink_linecard_get_from_attrs(devlink, info->attrs); -} - -static void devlink_linecard_put(struct devlink_linecard *linecard) -{ - if (refcount_dec_and_test(&linecard->refcount)) { - mutex_destroy(&linecard->state_lock); - kfree(linecard); - } -} - -struct devlink_sb { - struct list_head list; - unsigned int index; - u32 size; - u16 ingress_pools_count; - u16 egress_pools_count; - u16 ingress_tc_count; - u16 egress_tc_count; -}; - -static u16 devlink_sb_pool_count(struct devlink_sb *devlink_sb) -{ - return devlink_sb->ingress_pools_count + devlink_sb->egress_pools_count; -} - -static struct devlink_sb *devlink_sb_get_by_index(struct devlink *devlink, - unsigned int sb_index) -{ - struct devlink_sb *devlink_sb; - - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - if (devlink_sb->index == sb_index) - return devlink_sb; - } - return NULL; -} - -static bool devlink_sb_index_exists(struct devlink *devlink, - unsigned int sb_index) -{ - return devlink_sb_get_by_index(devlink, sb_index); -} - -static struct devlink_sb *devlink_sb_get_from_attrs(struct devlink *devlink, - struct nlattr **attrs) -{ - if (attrs[DEVLINK_ATTR_SB_INDEX]) { - u32 sb_index = nla_get_u32(attrs[DEVLINK_ATTR_SB_INDEX]); - struct devlink_sb *devlink_sb; - - devlink_sb = devlink_sb_get_by_index(devlink, sb_index); - if (!devlink_sb) - return ERR_PTR(-ENODEV); - return devlink_sb; - } - return ERR_PTR(-EINVAL); -} - -static struct devlink_sb *devlink_sb_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - return devlink_sb_get_from_attrs(devlink, info->attrs); -} - -static int devlink_sb_pool_index_get_from_attrs(struct devlink_sb *devlink_sb, - struct nlattr **attrs, - u16 *p_pool_index) -{ - u16 val; - - if (!attrs[DEVLINK_ATTR_SB_POOL_INDEX]) - return -EINVAL; - - val = nla_get_u16(attrs[DEVLINK_ATTR_SB_POOL_INDEX]); - if (val >= devlink_sb_pool_count(devlink_sb)) - return -EINVAL; - *p_pool_index = val; - return 0; -} - -static int devlink_sb_pool_index_get_from_info(struct devlink_sb *devlink_sb, - struct genl_info *info, - u16 *p_pool_index) -{ - return devlink_sb_pool_index_get_from_attrs(devlink_sb, info->attrs, - p_pool_index); -} - -static int -devlink_sb_pool_type_get_from_attrs(struct nlattr **attrs, - enum devlink_sb_pool_type *p_pool_type) -{ - u8 val; - - if (!attrs[DEVLINK_ATTR_SB_POOL_TYPE]) - return -EINVAL; - - val = nla_get_u8(attrs[DEVLINK_ATTR_SB_POOL_TYPE]); - if (val != DEVLINK_SB_POOL_TYPE_INGRESS && - val != DEVLINK_SB_POOL_TYPE_EGRESS) - return -EINVAL; - *p_pool_type = val; - return 0; -} - -static int -devlink_sb_pool_type_get_from_info(struct genl_info *info, - enum devlink_sb_pool_type *p_pool_type) -{ - return devlink_sb_pool_type_get_from_attrs(info->attrs, p_pool_type); -} - -static int -devlink_sb_th_type_get_from_attrs(struct nlattr **attrs, - enum devlink_sb_threshold_type *p_th_type) -{ - u8 val; - - if (!attrs[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE]) - return -EINVAL; - - val = nla_get_u8(attrs[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE]); - if (val != DEVLINK_SB_THRESHOLD_TYPE_STATIC && - val != DEVLINK_SB_THRESHOLD_TYPE_DYNAMIC) - return -EINVAL; - *p_th_type = val; - return 0; -} - -static int -devlink_sb_th_type_get_from_info(struct genl_info *info, - enum devlink_sb_threshold_type *p_th_type) -{ - return devlink_sb_th_type_get_from_attrs(info->attrs, p_th_type); -} - -static int -devlink_sb_tc_index_get_from_attrs(struct devlink_sb *devlink_sb, - struct nlattr **attrs, - enum devlink_sb_pool_type pool_type, - u16 *p_tc_index) -{ - u16 val; - - if (!attrs[DEVLINK_ATTR_SB_TC_INDEX]) - return -EINVAL; - - val = nla_get_u16(attrs[DEVLINK_ATTR_SB_TC_INDEX]); - if (pool_type == DEVLINK_SB_POOL_TYPE_INGRESS && - val >= devlink_sb->ingress_tc_count) - return -EINVAL; - if (pool_type == DEVLINK_SB_POOL_TYPE_EGRESS && - val >= devlink_sb->egress_tc_count) - return -EINVAL; - *p_tc_index = val; - return 0; -} - -static void devlink_port_fn_cap_fill(struct nla_bitfield32 *caps, - u32 cap, bool is_enable) -{ - caps->selector |= cap; - if (is_enable) - caps->value |= cap; -} - -static int devlink_port_fn_roce_fill(const struct devlink_ops *ops, - struct devlink_port *devlink_port, - struct nla_bitfield32 *caps, - struct netlink_ext_ack *extack) -{ - bool is_enable; - int err; - - if (!ops->port_fn_roce_get) - return 0; - - err = ops->port_fn_roce_get(devlink_port, &is_enable, extack); - if (err) { - if (err == -EOPNOTSUPP) - return 0; - return err; - } - - devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_ROCE, is_enable); - return 0; -} - -static int devlink_port_fn_migratable_fill(const struct devlink_ops *ops, - struct devlink_port *devlink_port, - struct nla_bitfield32 *caps, - struct netlink_ext_ack *extack) -{ - bool is_enable; - int err; - - if (!ops->port_fn_migratable_get || - devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) - return 0; - - err = ops->port_fn_migratable_get(devlink_port, &is_enable, extack); - if (err) { - if (err == -EOPNOTSUPP) - return 0; - return err; - } - - devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_MIGRATABLE, is_enable); - return 0; -} - -static int devlink_port_fn_caps_fill(const struct devlink_ops *ops, - struct devlink_port *devlink_port, - struct sk_buff *msg, - struct netlink_ext_ack *extack, - bool *msg_updated) -{ - struct nla_bitfield32 caps = {}; - int err; - - err = devlink_port_fn_roce_fill(ops, devlink_port, &caps, extack); - if (err) - return err; - - err = devlink_port_fn_migratable_fill(ops, devlink_port, &caps, extack); - if (err) - return err; - - if (!caps.selector) - return 0; - err = nla_put_bitfield32(msg, DEVLINK_PORT_FN_ATTR_CAPS, caps.value, - caps.selector); - if (err) - return err; - - *msg_updated = true; - return 0; -} - -static int -devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb, - struct genl_info *info, - enum devlink_sb_pool_type pool_type, - u16 *p_tc_index) -{ - return devlink_sb_tc_index_get_from_attrs(devlink_sb, info->attrs, - pool_type, p_tc_index); -} - -struct devlink_region { - struct devlink *devlink; - struct devlink_port *port; - struct list_head list; - union { - const struct devlink_region_ops *ops; - const struct devlink_port_region_ops *port_ops; - }; - struct mutex snapshot_lock; /* protects snapshot_list, - * max_snapshots and cur_snapshots - * consistency. - */ - struct list_head snapshot_list; - u32 max_snapshots; - u32 cur_snapshots; - u64 size; -}; - -struct devlink_snapshot { - struct list_head list; - struct devlink_region *region; - u8 *data; - u32 id; -}; - -static struct devlink_region * -devlink_region_get_by_name(struct devlink *devlink, const char *region_name) -{ - struct devlink_region *region; - - list_for_each_entry(region, &devlink->region_list, list) - if (!strcmp(region->ops->name, region_name)) - return region; - - return NULL; -} - -static struct devlink_region * -devlink_port_region_get_by_name(struct devlink_port *port, - const char *region_name) -{ - struct devlink_region *region; - - list_for_each_entry(region, &port->region_list, list) - if (!strcmp(region->ops->name, region_name)) - return region; - - return NULL; -} - -static struct devlink_snapshot * -devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id) -{ - struct devlink_snapshot *snapshot; - - list_for_each_entry(snapshot, ®ion->snapshot_list, list) - if (snapshot->id == id) - return snapshot; - - return NULL; -} - -#define DEVLINK_NL_FLAG_NEED_PORT BIT(0) -#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) -#define DEVLINK_NL_FLAG_NEED_RATE BIT(2) -#define DEVLINK_NL_FLAG_NEED_RATE_NODE BIT(3) -#define DEVLINK_NL_FLAG_NEED_LINECARD BIT(4) - -static int devlink_nl_pre_doit(const struct genl_split_ops *ops, - struct sk_buff *skb, struct genl_info *info) -{ - struct devlink_linecard *linecard; - struct devlink_port *devlink_port; - struct devlink *devlink; - int err; - - devlink = devlink_get_from_attrs(genl_info_net(info), info->attrs); - if (IS_ERR(devlink)) - return PTR_ERR(devlink); - devl_lock(devlink); - info->user_ptr[0] = devlink; - if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) { - devlink_port = devlink_port_get_from_info(devlink, info); - if (IS_ERR(devlink_port)) { - err = PTR_ERR(devlink_port); - goto unlock; - } - info->user_ptr[1] = devlink_port; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT) { - devlink_port = devlink_port_get_from_info(devlink, info); - if (!IS_ERR(devlink_port)) - info->user_ptr[1] = devlink_port; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE) { - struct devlink_rate *devlink_rate; - - devlink_rate = devlink_rate_get_from_info(devlink, info); - if (IS_ERR(devlink_rate)) { - err = PTR_ERR(devlink_rate); - goto unlock; - } - info->user_ptr[1] = devlink_rate; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE_NODE) { - struct devlink_rate *rate_node; - - rate_node = devlink_rate_node_get_from_info(devlink, info); - if (IS_ERR(rate_node)) { - err = PTR_ERR(rate_node); - goto unlock; - } - info->user_ptr[1] = rate_node; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { - linecard = devlink_linecard_get_from_info(devlink, info); - if (IS_ERR(linecard)) { - err = PTR_ERR(linecard); - goto unlock; - } - info->user_ptr[1] = linecard; - } - return 0; - -unlock: - devl_unlock(devlink); - devlink_put(devlink); - return err; -} - -static void devlink_nl_post_doit(const struct genl_split_ops *ops, - struct sk_buff *skb, struct genl_info *info) -{ - struct devlink_linecard *linecard; - struct devlink *devlink; - - devlink = info->user_ptr[0]; - if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { - linecard = info->user_ptr[1]; - devlink_linecard_put(linecard); - } - devl_unlock(devlink); - devlink_put(devlink); -} - -static struct genl_family devlink_nl_family; - -enum devlink_multicast_groups { - DEVLINK_MCGRP_CONFIG, -}; - -static const struct genl_multicast_group devlink_nl_mcgrps[] = { - [DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME }, -}; - -static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink) -{ - if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name)) - return -EMSGSIZE; - if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(devlink->dev))) - return -EMSGSIZE; - return 0; -} - -static int devlink_nl_put_nested_handle(struct sk_buff *msg, struct devlink *devlink) -{ - struct nlattr *nested_attr; - - nested_attr = nla_nest_start(msg, DEVLINK_ATTR_NESTED_DEVLINK); - if (!nested_attr) - return -EMSGSIZE; - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - nla_nest_end(msg, nested_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(msg, nested_attr); - return -EMSGSIZE; -} - -int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port) -{ - if (devlink_nl_put_handle(msg, devlink_port->devlink)) - return -EMSGSIZE; - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) - return -EMSGSIZE; - return 0; -} - -size_t devlink_nl_port_handle_size(struct devlink_port *devlink_port) -{ - struct devlink *devlink = devlink_port->devlink; - - return nla_total_size(strlen(devlink->dev->bus->name) + 1) /* DEVLINK_ATTR_BUS_NAME */ - + nla_total_size(strlen(dev_name(devlink->dev)) + 1) /* DEVLINK_ATTR_DEV_NAME */ - + nla_total_size(4); /* DEVLINK_ATTR_PORT_INDEX */ -} - -struct devlink_reload_combination { - enum devlink_reload_action action; - enum devlink_reload_limit limit; -}; - -static const struct devlink_reload_combination devlink_reload_invalid_combinations[] = { - { - /* can't reinitialize driver with no down time */ - .action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - .limit = DEVLINK_RELOAD_LIMIT_NO_RESET, - }, -}; - -static bool -devlink_reload_combination_is_invalid(enum devlink_reload_action action, - enum devlink_reload_limit limit) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) - if (devlink_reload_invalid_combinations[i].action == action && - devlink_reload_invalid_combinations[i].limit == limit) - return true; - return false; -} - -static bool -devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action) -{ - return test_bit(action, &devlink->ops->reload_actions); -} - -static bool -devlink_reload_limit_is_supported(struct devlink *devlink, enum devlink_reload_limit limit) -{ - return test_bit(limit, &devlink->ops->reload_limits); -} - -static int devlink_reload_stat_put(struct sk_buff *msg, - enum devlink_reload_limit limit, u32 value) -{ - struct nlattr *reload_stats_entry; - - reload_stats_entry = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS_ENTRY); - if (!reload_stats_entry) - return -EMSGSIZE; - - if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_STATS_LIMIT, limit) || - nla_put_u32(msg, DEVLINK_ATTR_RELOAD_STATS_VALUE, value)) - goto nla_put_failure; - nla_nest_end(msg, reload_stats_entry); - return 0; - -nla_put_failure: - nla_nest_cancel(msg, reload_stats_entry); - return -EMSGSIZE; -} - -static int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote) -{ - struct nlattr *reload_stats_attr, *act_info, *act_stats; - int i, j, stat_idx; - u32 value; - - if (!is_remote) - reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS); - else - reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_REMOTE_RELOAD_STATS); - - if (!reload_stats_attr) - return -EMSGSIZE; - - for (i = 0; i <= DEVLINK_RELOAD_ACTION_MAX; i++) { - if ((!is_remote && - !devlink_reload_action_is_supported(devlink, i)) || - i == DEVLINK_RELOAD_ACTION_UNSPEC) - continue; - act_info = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_INFO); - if (!act_info) - goto nla_put_failure; - - if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_ACTION, i)) - goto action_info_nest_cancel; - act_stats = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_STATS); - if (!act_stats) - goto action_info_nest_cancel; - - for (j = 0; j <= DEVLINK_RELOAD_LIMIT_MAX; j++) { - /* Remote stats are shown even if not locally supported. - * Stats of actions with unspecified limit are shown - * though drivers don't need to register unspecified - * limit. - */ - if ((!is_remote && j != DEVLINK_RELOAD_LIMIT_UNSPEC && - !devlink_reload_limit_is_supported(devlink, j)) || - devlink_reload_combination_is_invalid(i, j)) - continue; - - stat_idx = j * __DEVLINK_RELOAD_ACTION_MAX + i; - if (!is_remote) - value = devlink->stats.reload_stats[stat_idx]; - else - value = devlink->stats.remote_reload_stats[stat_idx]; - if (devlink_reload_stat_put(msg, j, value)) - goto action_stats_nest_cancel; - } - nla_nest_end(msg, act_stats); - nla_nest_end(msg, act_info); - } - nla_nest_end(msg, reload_stats_attr); - return 0; - -action_stats_nest_cancel: - nla_nest_cancel(msg, act_stats); -action_info_nest_cancel: - nla_nest_cancel(msg, act_info); -nla_put_failure: - nla_nest_cancel(msg, reload_stats_attr); - return -EMSGSIZE; -} - -static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - struct nlattr *dev_stats; - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed)) - goto nla_put_failure; - - dev_stats = nla_nest_start(msg, DEVLINK_ATTR_DEV_STATS); - if (!dev_stats) - goto nla_put_failure; - - if (devlink_reload_stats_put(msg, devlink, false)) - goto dev_stats_nest_cancel; - if (devlink_reload_stats_put(msg, devlink, true)) - goto dev_stats_nest_cancel; - - nla_nest_end(msg, dev_stats); - genlmsg_end(msg, hdr); - return 0; - -dev_stats_nest_cancel: - nla_nest_cancel(msg, dev_stats); -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void devlink_notify(struct devlink *devlink, enum devlink_command cmd) -{ - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_NEW && cmd != DEVLINK_CMD_DEL); - WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_fill(msg, devlink, cmd, 0, 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int devlink_nl_port_attrs_put(struct sk_buff *msg, - struct devlink_port *devlink_port) -{ - struct devlink_port_attrs *attrs = &devlink_port->attrs; - - if (!devlink_port->attrs_set) - return 0; - if (attrs->lanes) { - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_LANES, attrs->lanes)) - return -EMSGSIZE; - } - if (nla_put_u8(msg, DEVLINK_ATTR_PORT_SPLITTABLE, attrs->splittable)) - return -EMSGSIZE; - if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour)) - return -EMSGSIZE; - switch (devlink_port->attrs.flavour) { - case DEVLINK_PORT_FLAVOUR_PCI_PF: - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, - attrs->pci_pf.controller) || - nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_pf.pf)) - return -EMSGSIZE; - if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external)) - return -EMSGSIZE; - break; - case DEVLINK_PORT_FLAVOUR_PCI_VF: - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, - attrs->pci_vf.controller) || - nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_vf.pf) || - nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER, attrs->pci_vf.vf)) - return -EMSGSIZE; - if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external)) - return -EMSGSIZE; - break; - case DEVLINK_PORT_FLAVOUR_PCI_SF: - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, - attrs->pci_sf.controller) || - nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, - attrs->pci_sf.pf) || - nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, - attrs->pci_sf.sf)) - return -EMSGSIZE; - break; - case DEVLINK_PORT_FLAVOUR_PHYSICAL: - case DEVLINK_PORT_FLAVOUR_CPU: - case DEVLINK_PORT_FLAVOUR_DSA: - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, - attrs->phys.port_number)) - return -EMSGSIZE; - if (!attrs->split) - return 0; - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, - attrs->phys.port_number)) - return -EMSGSIZE; - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER, - attrs->phys.split_subport_number)) - return -EMSGSIZE; - break; - default: - break; - } - return 0; -} - -static int devlink_port_fn_hw_addr_fill(const struct devlink_ops *ops, - struct devlink_port *port, - struct sk_buff *msg, - struct netlink_ext_ack *extack, - bool *msg_updated) -{ - u8 hw_addr[MAX_ADDR_LEN]; - int hw_addr_len; - int err; - - if (!ops->port_function_hw_addr_get) - return 0; - - err = ops->port_function_hw_addr_get(port, hw_addr, &hw_addr_len, - extack); - if (err) { - if (err == -EOPNOTSUPP) - return 0; - return err; - } - err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr); - if (err) - return err; - *msg_updated = true; - return 0; -} - -static int devlink_nl_rate_fill(struct sk_buff *msg, - struct devlink_rate *devlink_rate, - enum devlink_command cmd, u32 portid, u32 seq, - int flags, struct netlink_ext_ack *extack) -{ - struct devlink *devlink = devlink_rate->devlink; - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TYPE, devlink_rate->type)) - goto nla_put_failure; - - if (devlink_rate_is_leaf(devlink_rate)) { - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, - devlink_rate->devlink_port->index)) - goto nla_put_failure; - } else if (devlink_rate_is_node(devlink_rate)) { - if (nla_put_string(msg, DEVLINK_ATTR_RATE_NODE_NAME, - devlink_rate->name)) - goto nla_put_failure; - } - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE, - devlink_rate->tx_share, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX, - devlink_rate->tx_max, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - if (nla_put_u32(msg, DEVLINK_ATTR_RATE_TX_PRIORITY, - devlink_rate->tx_priority)) - goto nla_put_failure; - - if (nla_put_u32(msg, DEVLINK_ATTR_RATE_TX_WEIGHT, - devlink_rate->tx_weight)) - goto nla_put_failure; - - if (devlink_rate->parent) - if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME, - devlink_rate->parent->name)) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static bool -devlink_port_fn_state_valid(enum devlink_port_fn_state state) -{ - return state == DEVLINK_PORT_FN_STATE_INACTIVE || - state == DEVLINK_PORT_FN_STATE_ACTIVE; -} - -static bool -devlink_port_fn_opstate_valid(enum devlink_port_fn_opstate opstate) -{ - return opstate == DEVLINK_PORT_FN_OPSTATE_DETACHED || - opstate == DEVLINK_PORT_FN_OPSTATE_ATTACHED; -} - -static int devlink_port_fn_state_fill(const struct devlink_ops *ops, - struct devlink_port *port, - struct sk_buff *msg, - struct netlink_ext_ack *extack, - bool *msg_updated) -{ - enum devlink_port_fn_opstate opstate; - enum devlink_port_fn_state state; - int err; - - if (!ops->port_fn_state_get) - return 0; - - err = ops->port_fn_state_get(port, &state, &opstate, extack); - if (err) { - if (err == -EOPNOTSUPP) - return 0; - return err; - } - if (!devlink_port_fn_state_valid(state)) { - WARN_ON_ONCE(1); - NL_SET_ERR_MSG_MOD(extack, "Invalid state read from driver"); - return -EINVAL; - } - if (!devlink_port_fn_opstate_valid(opstate)) { - WARN_ON_ONCE(1); - NL_SET_ERR_MSG_MOD(extack, - "Invalid operational state read from driver"); - return -EINVAL; - } - if (nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_STATE, state) || - nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_OPSTATE, opstate)) - return -EMSGSIZE; - *msg_updated = true; - return 0; -} - -static int -devlink_port_fn_mig_set(struct devlink_port *devlink_port, bool enable, - struct netlink_ext_ack *extack) -{ - const struct devlink_ops *ops = devlink_port->devlink->ops; - - return ops->port_fn_migratable_set(devlink_port, enable, extack); -} - -static int -devlink_port_fn_roce_set(struct devlink_port *devlink_port, bool enable, - struct netlink_ext_ack *extack) -{ - const struct devlink_ops *ops = devlink_port->devlink->ops; - - return ops->port_fn_roce_set(devlink_port, enable, extack); -} - -static int devlink_port_fn_caps_set(struct devlink_port *devlink_port, - const struct nlattr *attr, - struct netlink_ext_ack *extack) -{ - struct nla_bitfield32 caps; - u32 caps_value; - int err; - - caps = nla_get_bitfield32(attr); - caps_value = caps.value & caps.selector; - if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE) { - err = devlink_port_fn_roce_set(devlink_port, - caps_value & DEVLINK_PORT_FN_CAP_ROCE, - extack); - if (err) - return err; - } - if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) { - err = devlink_port_fn_mig_set(devlink_port, caps_value & - DEVLINK_PORT_FN_CAP_MIGRATABLE, - extack); - if (err) - return err; - } - return 0; -} - -static int -devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port, - struct netlink_ext_ack *extack) -{ - const struct devlink_ops *ops; - struct nlattr *function_attr; - bool msg_updated = false; - int err; - - function_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PORT_FUNCTION); - if (!function_attr) - return -EMSGSIZE; - - ops = port->devlink->ops; - err = devlink_port_fn_hw_addr_fill(ops, port, msg, extack, - &msg_updated); - if (err) - goto out; - err = devlink_port_fn_caps_fill(ops, port, msg, extack, - &msg_updated); - if (err) - goto out; - err = devlink_port_fn_state_fill(ops, port, msg, extack, &msg_updated); -out: - if (err || !msg_updated) - nla_nest_cancel(msg, function_attr); - else - nla_nest_end(msg, function_attr); - return err; -} - -static int devlink_nl_port_fill(struct sk_buff *msg, - struct devlink_port *devlink_port, - enum devlink_command cmd, u32 portid, u32 seq, - int flags, struct netlink_ext_ack *extack) -{ - struct devlink *devlink = devlink_port->devlink; - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) - goto nla_put_failure; - - spin_lock_bh(&devlink_port->type_lock); - if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type)) - goto nla_put_failure_type_locked; - if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET && - nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE, - devlink_port->desired_type)) - goto nla_put_failure_type_locked; - if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) { - if (devlink_port->type_eth.netdev && - (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX, - devlink_port->type_eth.ifindex) || - nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME, - devlink_port->type_eth.ifname))) - goto nla_put_failure_type_locked; - } - if (devlink_port->type == DEVLINK_PORT_TYPE_IB) { - struct ib_device *ibdev = devlink_port->type_ib.ibdev; - - if (ibdev && - nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME, - ibdev->name)) - goto nla_put_failure_type_locked; - } - spin_unlock_bh(&devlink_port->type_lock); - if (devlink_nl_port_attrs_put(msg, devlink_port)) - goto nla_put_failure; - if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack)) - goto nla_put_failure; - if (devlink_port->linecard && - nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, - devlink_port->linecard->index)) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure_type_locked: - spin_unlock_bh(&devlink_port->type_lock); -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void devlink_port_notify(struct devlink_port *devlink_port, - enum devlink_command cmd) -{ - struct devlink *devlink = devlink_port->devlink; - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd != DEVLINK_CMD_PORT_DEL); - - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_port_fill(msg, devlink_port, cmd, 0, 0, 0, NULL); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, - 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static void devlink_rate_notify(struct devlink_rate *devlink_rate, - enum devlink_command cmd) -{ - struct devlink *devlink = devlink_rate->devlink; - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_RATE_NEW && cmd != DEVLINK_CMD_RATE_DEL); - - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_rate_fill(msg, devlink_rate, cmd, 0, 0, 0, NULL); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, - 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink_rate *devlink_rate; - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - list_for_each_entry(devlink_rate, &devlink->rate_list, list) { - enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; - u32 id = NETLINK_CB(cb->skb).portid; - - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_rate_fill(msg, devlink_rate, cmd, id, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, NULL); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -static int devlink_nl_cmd_rate_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_rate *devlink_rate = info->user_ptr[1]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_rate_fill(msg, devlink_rate, DEVLINK_CMD_RATE_NEW, - info->snd_portid, info->snd_seq, 0, - info->extack); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static bool -devlink_rate_is_parent_node(struct devlink_rate *devlink_rate, - struct devlink_rate *parent) -{ - while (parent) { - if (parent == devlink_rate) - return true; - parent = parent->parent; - } - return false; -} - -static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < start) { - idx++; - devlink_put(devlink); - continue; - } - - devl_lock(devlink); - err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI); - devl_unlock(devlink); - devlink_put(devlink); - - if (err) - goto out; - idx++; - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_PORT_NEW, - info->snd_portid, info->snd_seq, 0, - info->extack); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - struct devlink_port *devlink_port; - unsigned long index, port_index; - int start = cb->args[0]; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - xa_for_each(&devlink->ports, port_index, devlink_port) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_port_fill(msg, devlink_port, - DEVLINK_CMD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, cb->extack); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int devlink_port_type_set(struct devlink_port *devlink_port, - enum devlink_port_type port_type) - -{ - int err; - - if (!devlink_port->devlink->ops->port_type_set) - return -EOPNOTSUPP; - - if (port_type == devlink_port->type) - return 0; - - err = devlink_port->devlink->ops->port_type_set(devlink_port, - port_type); - if (err) - return err; - - devlink_port->desired_type = port_type; - devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); - return 0; -} - -static int devlink_port_function_hw_addr_set(struct devlink_port *port, - const struct nlattr *attr, - struct netlink_ext_ack *extack) -{ - const struct devlink_ops *ops = port->devlink->ops; - const u8 *hw_addr; - int hw_addr_len; - - hw_addr = nla_data(attr); - hw_addr_len = nla_len(attr); - if (hw_addr_len > MAX_ADDR_LEN) { - NL_SET_ERR_MSG_MOD(extack, "Port function hardware address too long"); - return -EINVAL; - } - if (port->type == DEVLINK_PORT_TYPE_ETH) { - if (hw_addr_len != ETH_ALEN) { - NL_SET_ERR_MSG_MOD(extack, "Address must be 6 bytes for Ethernet device"); - return -EINVAL; - } - if (!is_unicast_ether_addr(hw_addr)) { - NL_SET_ERR_MSG_MOD(extack, "Non-unicast hardware address unsupported"); - return -EINVAL; - } - } - - return ops->port_function_hw_addr_set(port, hw_addr, hw_addr_len, - extack); -} - -static int devlink_port_fn_state_set(struct devlink_port *port, - const struct nlattr *attr, - struct netlink_ext_ack *extack) -{ - enum devlink_port_fn_state state; - const struct devlink_ops *ops; - - state = nla_get_u8(attr); - ops = port->devlink->ops; - return ops->port_fn_state_set(port, state, extack); -} - -static int devlink_port_function_validate(struct devlink_port *devlink_port, - struct nlattr **tb, - struct netlink_ext_ack *extack) -{ - const struct devlink_ops *ops = devlink_port->devlink->ops; - struct nlattr *attr; - - if (tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] && - !ops->port_function_hw_addr_set) { - NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR], - "Port doesn't support function attributes"); - return -EOPNOTSUPP; - } - if (tb[DEVLINK_PORT_FN_ATTR_STATE] && !ops->port_fn_state_set) { - NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR], - "Function does not support state setting"); - return -EOPNOTSUPP; - } - attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; - if (attr) { - struct nla_bitfield32 caps; - - caps = nla_get_bitfield32(attr); - if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE && - !ops->port_fn_roce_set) { - NL_SET_ERR_MSG_ATTR(extack, attr, - "Port doesn't support RoCE function attribute"); - return -EOPNOTSUPP; - } - if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) { - if (!ops->port_fn_migratable_set) { - NL_SET_ERR_MSG_ATTR(extack, attr, - "Port doesn't support migratable function attribute"); - return -EOPNOTSUPP; - } - if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) { - NL_SET_ERR_MSG_ATTR(extack, attr, - "migratable function attribute supported for VFs only"); - return -EOPNOTSUPP; - } - } - } - return 0; -} - -static int devlink_port_function_set(struct devlink_port *port, - const struct nlattr *attr, - struct netlink_ext_ack *extack) -{ - struct nlattr *tb[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1]; - int err; - - err = nla_parse_nested(tb, DEVLINK_PORT_FUNCTION_ATTR_MAX, attr, - devlink_function_nl_policy, extack); - if (err < 0) { - NL_SET_ERR_MSG_MOD(extack, "Fail to parse port function attributes"); - return err; - } - - err = devlink_port_function_validate(port, tb, extack); - if (err) - return err; - - attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR]; - if (attr) { - err = devlink_port_function_hw_addr_set(port, attr, extack); - if (err) - return err; - } - - attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; - if (attr) { - err = devlink_port_fn_caps_set(port, attr, extack); - if (err) - return err; - } - - /* Keep this as the last function attribute set, so that when - * multiple port function attributes are set along with state, - * Those can be applied first before activating the state. - */ - attr = tb[DEVLINK_PORT_FN_ATTR_STATE]; - if (attr) - err = devlink_port_fn_state_set(port, attr, extack); - - if (!err) - devlink_port_notify(port, DEVLINK_CMD_PORT_NEW); - return err; -} - -static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - int err; - - if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) { - enum devlink_port_type port_type; - - port_type = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_TYPE]); - err = devlink_port_type_set(devlink_port, port_type); - if (err) - return err; - } - - if (info->attrs[DEVLINK_ATTR_PORT_FUNCTION]) { - struct nlattr *attr = info->attrs[DEVLINK_ATTR_PORT_FUNCTION]; - struct netlink_ext_ack *extack = info->extack; - - err = devlink_port_function_set(devlink_port, attr, extack); - if (err) - return err; - } - - return 0; -} - -static int devlink_nl_cmd_port_split_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct devlink *devlink = info->user_ptr[0]; - u32 count; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PORT_SPLIT_COUNT)) - return -EINVAL; - if (!devlink->ops->port_split) - return -EOPNOTSUPP; - - count = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_SPLIT_COUNT]); - - if (!devlink_port->attrs.splittable) { - /* Split ports cannot be split. */ - if (devlink_port->attrs.split) - NL_SET_ERR_MSG_MOD(info->extack, "Port cannot be split further"); - else - NL_SET_ERR_MSG_MOD(info->extack, "Port cannot be split"); - return -EINVAL; - } - - if (count < 2 || !is_power_of_2(count) || count > devlink_port->attrs.lanes) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid split count"); - return -EINVAL; - } - - return devlink->ops->port_split(devlink, devlink_port, count, - info->extack); -} - -static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct devlink *devlink = info->user_ptr[0]; - - if (!devlink->ops->port_unsplit) - return -EOPNOTSUPP; - return devlink->ops->port_unsplit(devlink, devlink_port, info->extack); -} - -static int devlink_port_new_notify(struct devlink *devlink, - unsigned int port_index, - struct genl_info *info) -{ - struct devlink_port *devlink_port; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - lockdep_assert_held(&devlink->lock); - devlink_port = devlink_port_get_by_index(devlink, port_index); - if (!devlink_port) { - err = -ENODEV; - goto out; - } - - err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_NEW, - info->snd_portid, info->snd_seq, 0, NULL); - if (err) - goto out; - - return genlmsg_reply(msg, info); - -out: - nlmsg_free(msg); - return err; -} - -static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct devlink_port_new_attrs new_attrs = {}; - struct devlink *devlink = info->user_ptr[0]; - unsigned int new_port_index; - int err; - - if (!devlink->ops->port_new || !devlink->ops->port_del) - return -EOPNOTSUPP; - - if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] || - !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) { - NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified"); - return -EINVAL; - } - new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]); - new_attrs.pfnum = - nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]); - - if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { - /* Port index of the new port being created by driver. */ - new_attrs.port_index = - nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); - new_attrs.port_index_valid = true; - } - if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) { - new_attrs.controller = - nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]); - new_attrs.controller_valid = true; - } - if (new_attrs.flavour == DEVLINK_PORT_FLAVOUR_PCI_SF && - info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) { - new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]); - new_attrs.sfnum_valid = true; - } - - err = devlink->ops->port_new(devlink, &new_attrs, extack, - &new_port_index); - if (err) - return err; - - err = devlink_port_new_notify(devlink, new_port_index, info); - if (err && err != -ENODEV) { - /* Fail to send the response; destroy newly created port. */ - devlink->ops->port_del(devlink, new_port_index, extack); - } - return err; -} - -static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - unsigned int port_index; - - if (!devlink->ops->port_del) - return -EOPNOTSUPP; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PORT_INDEX)) { - NL_SET_ERR_MSG_MOD(extack, "Port index is not specified"); - return -EINVAL; - } - port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); - - return devlink->ops->port_del(devlink, port_index, extack); -} - -static int -devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate, - struct genl_info *info, - struct nlattr *nla_parent) -{ - struct devlink *devlink = devlink_rate->devlink; - const char *parent_name = nla_data(nla_parent); - const struct devlink_ops *ops = devlink->ops; - size_t len = strlen(parent_name); - struct devlink_rate *parent; - int err = -EOPNOTSUPP; - - parent = devlink_rate->parent; - - if (parent && !len) { - if (devlink_rate_is_leaf(devlink_rate)) - err = ops->rate_leaf_parent_set(devlink_rate, NULL, - devlink_rate->priv, NULL, - info->extack); - else if (devlink_rate_is_node(devlink_rate)) - err = ops->rate_node_parent_set(devlink_rate, NULL, - devlink_rate->priv, NULL, - info->extack); - if (err) - return err; - - refcount_dec(&parent->refcnt); - devlink_rate->parent = NULL; - } else if (len) { - parent = devlink_rate_node_get_by_name(devlink, parent_name); - if (IS_ERR(parent)) - return -ENODEV; - - if (parent == devlink_rate) { - NL_SET_ERR_MSG_MOD(info->extack, "Parent to self is not allowed"); - return -EINVAL; - } - - if (devlink_rate_is_node(devlink_rate) && - devlink_rate_is_parent_node(devlink_rate, parent->parent)) { - NL_SET_ERR_MSG_MOD(info->extack, "Node is already a parent of parent node."); - return -EEXIST; - } - - if (devlink_rate_is_leaf(devlink_rate)) - err = ops->rate_leaf_parent_set(devlink_rate, parent, - devlink_rate->priv, parent->priv, - info->extack); - else if (devlink_rate_is_node(devlink_rate)) - err = ops->rate_node_parent_set(devlink_rate, parent, - devlink_rate->priv, parent->priv, - info->extack); - if (err) - return err; - - if (devlink_rate->parent) - /* we're reassigning to other parent in this case */ - refcount_dec(&devlink_rate->parent->refcnt); - - refcount_inc(&parent->refcnt); - devlink_rate->parent = parent; - } - - return 0; -} - -static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, - const struct devlink_ops *ops, - struct genl_info *info) -{ - struct nlattr *nla_parent, **attrs = info->attrs; - int err = -EOPNOTSUPP; - u32 priority; - u32 weight; - u64 rate; - - if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) { - rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_SHARE]); - if (devlink_rate_is_leaf(devlink_rate)) - err = ops->rate_leaf_tx_share_set(devlink_rate, devlink_rate->priv, - rate, info->extack); - else if (devlink_rate_is_node(devlink_rate)) - err = ops->rate_node_tx_share_set(devlink_rate, devlink_rate->priv, - rate, info->extack); - if (err) - return err; - devlink_rate->tx_share = rate; - } - - if (attrs[DEVLINK_ATTR_RATE_TX_MAX]) { - rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_MAX]); - if (devlink_rate_is_leaf(devlink_rate)) - err = ops->rate_leaf_tx_max_set(devlink_rate, devlink_rate->priv, - rate, info->extack); - else if (devlink_rate_is_node(devlink_rate)) - err = ops->rate_node_tx_max_set(devlink_rate, devlink_rate->priv, - rate, info->extack); - if (err) - return err; - devlink_rate->tx_max = rate; - } - - if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]) { - priority = nla_get_u32(attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]); - if (devlink_rate_is_leaf(devlink_rate)) - err = ops->rate_leaf_tx_priority_set(devlink_rate, devlink_rate->priv, - priority, info->extack); - else if (devlink_rate_is_node(devlink_rate)) - err = ops->rate_node_tx_priority_set(devlink_rate, devlink_rate->priv, - priority, info->extack); - - if (err) - return err; - devlink_rate->tx_priority = priority; - } - - if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]) { - weight = nla_get_u32(attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]); - if (devlink_rate_is_leaf(devlink_rate)) - err = ops->rate_leaf_tx_weight_set(devlink_rate, devlink_rate->priv, - weight, info->extack); - else if (devlink_rate_is_node(devlink_rate)) - err = ops->rate_node_tx_weight_set(devlink_rate, devlink_rate->priv, - weight, info->extack); - - if (err) - return err; - devlink_rate->tx_weight = weight; - } - - nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]; - if (nla_parent) { - err = devlink_nl_rate_parent_node_set(devlink_rate, info, - nla_parent); - if (err) - return err; - } - - return 0; -} - -static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops, - struct genl_info *info, - enum devlink_rate_type type) -{ - struct nlattr **attrs = info->attrs; - - if (type == DEVLINK_RATE_TYPE_LEAF) { - if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_leaf_tx_share_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the leafs"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_leaf_tx_max_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the leafs"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] && - !ops->rate_leaf_parent_set) { - NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) { - NL_SET_ERR_MSG_ATTR(info->extack, - attrs[DEVLINK_ATTR_RATE_TX_PRIORITY], - "TX priority set isn't supported for the leafs"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_leaf_tx_weight_set) { - NL_SET_ERR_MSG_ATTR(info->extack, - attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], - "TX weight set isn't supported for the leafs"); - return false; - } - } else if (type == DEVLINK_RATE_TYPE_NODE) { - if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_node_tx_max_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the nodes"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] && - !ops->rate_node_parent_set) { - NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) { - NL_SET_ERR_MSG_ATTR(info->extack, - attrs[DEVLINK_ATTR_RATE_TX_PRIORITY], - "TX priority set isn't supported for the nodes"); - return false; - } - if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_node_tx_weight_set) { - NL_SET_ERR_MSG_ATTR(info->extack, - attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], - "TX weight set isn't supported for the nodes"); - return false; - } - } else { - WARN(1, "Unknown type of rate object"); - return false; - } - - return true; -} - -static int devlink_nl_cmd_rate_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_rate *devlink_rate = info->user_ptr[1]; - struct devlink *devlink = devlink_rate->devlink; - const struct devlink_ops *ops = devlink->ops; - int err; - - if (!ops || !devlink_rate_set_ops_supported(ops, info, devlink_rate->type)) - return -EOPNOTSUPP; - - err = devlink_nl_rate_set(devlink_rate, ops, info); - - if (!err) - devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW); - return err; -} - -static int devlink_nl_cmd_rate_new_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_rate *rate_node; - const struct devlink_ops *ops; - int err; - - ops = devlink->ops; - if (!ops || !ops->rate_node_new || !ops->rate_node_del) { - NL_SET_ERR_MSG_MOD(info->extack, "Rate nodes aren't supported"); - return -EOPNOTSUPP; - } - - if (!devlink_rate_set_ops_supported(ops, info, DEVLINK_RATE_TYPE_NODE)) - return -EOPNOTSUPP; - - rate_node = devlink_rate_node_get_from_attrs(devlink, info->attrs); - if (!IS_ERR(rate_node)) - return -EEXIST; - else if (rate_node == ERR_PTR(-EINVAL)) - return -EINVAL; - - rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL); - if (!rate_node) - return -ENOMEM; - - rate_node->devlink = devlink; - rate_node->type = DEVLINK_RATE_TYPE_NODE; - rate_node->name = nla_strdup(info->attrs[DEVLINK_ATTR_RATE_NODE_NAME], GFP_KERNEL); - if (!rate_node->name) { - err = -ENOMEM; - goto err_strdup; - } - - err = ops->rate_node_new(rate_node, &rate_node->priv, info->extack); - if (err) - goto err_node_new; - - err = devlink_nl_rate_set(rate_node, ops, info); - if (err) - goto err_rate_set; - - refcount_set(&rate_node->refcnt, 1); - list_add(&rate_node->list, &devlink->rate_list); - devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); - return 0; - -err_rate_set: - ops->rate_node_del(rate_node, rate_node->priv, info->extack); -err_node_new: - kfree(rate_node->name); -err_strdup: - kfree(rate_node); - return err; -} - -static int devlink_nl_cmd_rate_del_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_rate *rate_node = info->user_ptr[1]; - struct devlink *devlink = rate_node->devlink; - const struct devlink_ops *ops = devlink->ops; - int err; - - if (refcount_read(&rate_node->refcnt) > 1) { - NL_SET_ERR_MSG_MOD(info->extack, "Node has children. Cannot delete node."); - return -EBUSY; - } - - devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL); - err = ops->rate_node_del(rate_node, rate_node->priv, info->extack); - if (rate_node->parent) - refcount_dec(&rate_node->parent->refcnt); - list_del(&rate_node->list); - kfree(rate_node->name); - kfree(rate_node); - return err; -} - -struct devlink_linecard_type { - const char *type; - const void *priv; -}; - -static int devlink_nl_linecard_fill(struct sk_buff *msg, - struct devlink *devlink, - struct devlink_linecard *linecard, - enum devlink_command cmd, u32 portid, - u32 seq, int flags, - struct netlink_ext_ack *extack) -{ - struct devlink_linecard_type *linecard_type; - struct nlattr *attr; - void *hdr; - int i; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index)) - goto nla_put_failure; - if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state)) - goto nla_put_failure; - if (linecard->type && - nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE, linecard->type)) - goto nla_put_failure; - - if (linecard->types_count) { - attr = nla_nest_start(msg, - DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES); - if (!attr) - goto nla_put_failure; - for (i = 0; i < linecard->types_count; i++) { - linecard_type = &linecard->types[i]; - if (nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE, - linecard_type->type)) { - nla_nest_cancel(msg, attr); - goto nla_put_failure; - } - } - nla_nest_end(msg, attr); - } - - if (linecard->nested_devlink && - devlink_nl_put_nested_handle(msg, linecard->nested_devlink)) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void devlink_linecard_notify(struct devlink_linecard *linecard, - enum devlink_command cmd) -{ - struct devlink *devlink = linecard->devlink; - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_LINECARD_NEW && - cmd != DEVLINK_CMD_LINECARD_DEL); - - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_linecard_fill(msg, devlink, linecard, cmd, 0, 0, 0, - NULL); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int devlink_nl_cmd_linecard_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_linecard *linecard = info->user_ptr[1]; - struct devlink *devlink = linecard->devlink; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - mutex_lock(&linecard->state_lock); - err = devlink_nl_linecard_fill(msg, devlink, linecard, - DEVLINK_CMD_LINECARD_NEW, - info->snd_portid, info->snd_seq, 0, - info->extack); - mutex_unlock(&linecard->state_lock); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink_linecard *linecard; - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - mutex_lock(&devlink->linecards_lock); - list_for_each_entry(linecard, &devlink->linecard_list, list) { - if (idx < start) { - idx++; - continue; - } - mutex_lock(&linecard->state_lock); - err = devlink_nl_linecard_fill(msg, devlink, linecard, - DEVLINK_CMD_LINECARD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, - cb->extack); - mutex_unlock(&linecard->state_lock); - if (err) { - mutex_unlock(&devlink->linecards_lock); - devlink_put(devlink); - goto out; - } - idx++; - } - mutex_unlock(&devlink->linecards_lock); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static struct devlink_linecard_type * -devlink_linecard_type_lookup(struct devlink_linecard *linecard, - const char *type) -{ - struct devlink_linecard_type *linecard_type; - int i; - - for (i = 0; i < linecard->types_count; i++) { - linecard_type = &linecard->types[i]; - if (!strcmp(type, linecard_type->type)) - return linecard_type; - } - return NULL; -} - -static int devlink_linecard_type_set(struct devlink_linecard *linecard, - const char *type, - struct netlink_ext_ack *extack) -{ - const struct devlink_linecard_ops *ops = linecard->ops; - struct devlink_linecard_type *linecard_type; - int err; - - mutex_lock(&linecard->state_lock); - if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being provisioned"); - err = -EBUSY; - goto out; - } - if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being unprovisioned"); - err = -EBUSY; - goto out; - } - - linecard_type = devlink_linecard_type_lookup(linecard, type); - if (!linecard_type) { - NL_SET_ERR_MSG_MOD(extack, "Unsupported line card type provided"); - err = -EINVAL; - goto out; - } - - if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED && - linecard->state != DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { - NL_SET_ERR_MSG_MOD(extack, "Line card already provisioned"); - err = -EBUSY; - /* Check if the line card is provisioned in the same - * way the user asks. In case it is, make the operation - * to return success. - */ - if (ops->same_provision && - ops->same_provision(linecard, linecard->priv, - linecard_type->type, - linecard_type->priv)) - err = 0; - goto out; - } - - linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING; - linecard->type = linecard_type->type; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); - err = ops->provision(linecard, linecard->priv, linecard_type->type, - linecard_type->priv, extack); - if (err) { - /* Provisioning failed. Assume the linecard is unprovisioned - * for future operations. - */ - mutex_lock(&linecard->state_lock); - linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; - linecard->type = NULL; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); - } - return err; - -out: - mutex_unlock(&linecard->state_lock); - return err; -} - -static int devlink_linecard_type_unset(struct devlink_linecard *linecard, - struct netlink_ext_ack *extack) -{ - int err; - - mutex_lock(&linecard->state_lock); - if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being provisioned"); - err = -EBUSY; - goto out; - } - if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being unprovisioned"); - err = -EBUSY; - goto out; - } - if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { - linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; - linecard->type = NULL; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - err = 0; - goto out; - } - - if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) { - NL_SET_ERR_MSG_MOD(extack, "Line card is not provisioned"); - err = 0; - goto out; - } - linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONING; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); - err = linecard->ops->unprovision(linecard, linecard->priv, - extack); - if (err) { - /* Unprovisioning failed. Assume the linecard is unprovisioned - * for future operations. - */ - mutex_lock(&linecard->state_lock); - linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; - linecard->type = NULL; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); - } - return err; - -out: - mutex_unlock(&linecard->state_lock); - return err; -} - -static int devlink_nl_cmd_linecard_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_linecard *linecard = info->user_ptr[1]; - struct netlink_ext_ack *extack = info->extack; - int err; - - if (info->attrs[DEVLINK_ATTR_LINECARD_TYPE]) { - const char *type; - - type = nla_data(info->attrs[DEVLINK_ATTR_LINECARD_TYPE]); - if (strcmp(type, "")) { - err = devlink_linecard_type_set(linecard, type, extack); - if (err) - return err; - } else { - err = devlink_linecard_type_unset(linecard, extack); - if (err) - return err; - } - } - - return 0; -} - -static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink, - struct devlink_sb *devlink_sb, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_SIZE, devlink_sb->size)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_INGRESS_POOL_COUNT, - devlink_sb->ingress_pools_count)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_EGRESS_POOL_COUNT, - devlink_sb->egress_pools_count)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_INGRESS_TC_COUNT, - devlink_sb->ingress_tc_count)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_EGRESS_TC_COUNT, - devlink_sb->egress_tc_count)) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_sb *devlink_sb; - struct sk_buff *msg; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_sb_fill(msg, devlink, devlink_sb, - DEVLINK_CMD_SB_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - struct devlink_sb *devlink_sb; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_sb_fill(msg, devlink, devlink_sb, - DEVLINK_CMD_SB_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int devlink_nl_sb_pool_fill(struct sk_buff *msg, struct devlink *devlink, - struct devlink_sb *devlink_sb, - u16 pool_index, enum devlink_command cmd, - u32 portid, u32 seq, int flags) -{ - struct devlink_sb_pool_info pool_info; - void *hdr; - int err; - - err = devlink->ops->sb_pool_get(devlink, devlink_sb->index, - pool_index, &pool_info); - if (err) - return err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_POOL_INDEX, pool_index)) - goto nla_put_failure; - if (nla_put_u8(msg, DEVLINK_ATTR_SB_POOL_TYPE, pool_info.pool_type)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_POOL_SIZE, pool_info.size)) - goto nla_put_failure; - if (nla_put_u8(msg, DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE, - pool_info.threshold_type)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_POOL_CELL_SIZE, - pool_info.cell_size)) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_sb_pool_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_sb *devlink_sb; - struct sk_buff *msg; - u16 pool_index; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - err = devlink_sb_pool_index_get_from_info(devlink_sb, info, - &pool_index); - if (err) - return err; - - if (!devlink->ops->sb_pool_get) - return -EOPNOTSUPP; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_sb_pool_fill(msg, devlink, devlink_sb, pool_index, - DEVLINK_CMD_SB_POOL_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int __sb_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, - struct devlink *devlink, - struct devlink_sb *devlink_sb, - u32 portid, u32 seq) -{ - u16 pool_count = devlink_sb_pool_count(devlink_sb); - u16 pool_index; - int err; - - for (pool_index = 0; pool_index < pool_count; pool_index++) { - if (*p_idx < start) { - (*p_idx)++; - continue; - } - err = devlink_nl_sb_pool_fill(msg, devlink, - devlink_sb, - pool_index, - DEVLINK_CMD_SB_POOL_NEW, - portid, seq, NLM_F_MULTI); - if (err) - return err; - (*p_idx)++; - } - return 0; -} - -static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - struct devlink_sb *devlink_sb; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (!devlink->ops->sb_pool_get) - goto retry; - - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_pool_get_dumpit(msg, start, &idx, devlink, - devlink_sb, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - } - devl_unlock(devlink); -retry: - devlink_put(devlink); - } -out: - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -static int devlink_sb_pool_set(struct devlink *devlink, unsigned int sb_index, - u16 pool_index, u32 size, - enum devlink_sb_threshold_type threshold_type, - struct netlink_ext_ack *extack) - -{ - const struct devlink_ops *ops = devlink->ops; - - if (ops->sb_pool_set) - return ops->sb_pool_set(devlink, sb_index, pool_index, - size, threshold_type, extack); - return -EOPNOTSUPP; -} - -static int devlink_nl_cmd_sb_pool_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - enum devlink_sb_threshold_type threshold_type; - struct devlink_sb *devlink_sb; - u16 pool_index; - u32 size; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - err = devlink_sb_pool_index_get_from_info(devlink_sb, info, - &pool_index); - if (err) - return err; - - err = devlink_sb_th_type_get_from_info(info, &threshold_type); - if (err) - return err; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SB_POOL_SIZE)) - return -EINVAL; - - size = nla_get_u32(info->attrs[DEVLINK_ATTR_SB_POOL_SIZE]); - return devlink_sb_pool_set(devlink, devlink_sb->index, - pool_index, size, threshold_type, - info->extack); -} - -static int devlink_nl_sb_port_pool_fill(struct sk_buff *msg, - struct devlink *devlink, - struct devlink_port *devlink_port, - struct devlink_sb *devlink_sb, - u16 pool_index, - enum devlink_command cmd, - u32 portid, u32 seq, int flags) -{ - const struct devlink_ops *ops = devlink->ops; - u32 threshold; - void *hdr; - int err; - - err = ops->sb_port_pool_get(devlink_port, devlink_sb->index, - pool_index, &threshold); - if (err) - return err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_POOL_INDEX, pool_index)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_THRESHOLD, threshold)) - goto nla_put_failure; - - if (ops->sb_occ_port_pool_get) { - u32 cur; - u32 max; - - err = ops->sb_occ_port_pool_get(devlink_port, devlink_sb->index, - pool_index, &cur, &max); - if (err && err != -EOPNOTSUPP) - goto sb_occ_get_failure; - if (!err) { - if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_CUR, cur)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_MAX, max)) - goto nla_put_failure; - } - } - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - err = -EMSGSIZE; -sb_occ_get_failure: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_sb_port_pool_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct devlink *devlink = devlink_port->devlink; - struct devlink_sb *devlink_sb; - struct sk_buff *msg; - u16 pool_index; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - err = devlink_sb_pool_index_get_from_info(devlink_sb, info, - &pool_index); - if (err) - return err; - - if (!devlink->ops->sb_port_pool_get) - return -EOPNOTSUPP; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_sb_port_pool_fill(msg, devlink, devlink_port, - devlink_sb, pool_index, - DEVLINK_CMD_SB_PORT_POOL_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int __sb_port_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, - struct devlink *devlink, - struct devlink_sb *devlink_sb, - u32 portid, u32 seq) -{ - struct devlink_port *devlink_port; - u16 pool_count = devlink_sb_pool_count(devlink_sb); - unsigned long port_index; - u16 pool_index; - int err; - - xa_for_each(&devlink->ports, port_index, devlink_port) { - for (pool_index = 0; pool_index < pool_count; pool_index++) { - if (*p_idx < start) { - (*p_idx)++; - continue; - } - err = devlink_nl_sb_port_pool_fill(msg, devlink, - devlink_port, - devlink_sb, - pool_index, - DEVLINK_CMD_SB_PORT_POOL_NEW, - portid, seq, - NLM_F_MULTI); - if (err) - return err; - (*p_idx)++; - } - } - return 0; -} - -static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - struct devlink_sb *devlink_sb; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (!devlink->ops->sb_port_pool_get) - goto retry; - - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_port_pool_get_dumpit(msg, start, &idx, - devlink, devlink_sb, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - } - devl_unlock(devlink); -retry: - devlink_put(devlink); - } -out: - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -static int devlink_sb_port_pool_set(struct devlink_port *devlink_port, - unsigned int sb_index, u16 pool_index, - u32 threshold, - struct netlink_ext_ack *extack) - -{ - const struct devlink_ops *ops = devlink_port->devlink->ops; - - if (ops->sb_port_pool_set) - return ops->sb_port_pool_set(devlink_port, sb_index, - pool_index, threshold, extack); - return -EOPNOTSUPP; -} - -static int devlink_nl_cmd_sb_port_pool_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct devlink *devlink = info->user_ptr[0]; - struct devlink_sb *devlink_sb; - u16 pool_index; - u32 threshold; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - err = devlink_sb_pool_index_get_from_info(devlink_sb, info, - &pool_index); - if (err) - return err; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SB_THRESHOLD)) - return -EINVAL; - - threshold = nla_get_u32(info->attrs[DEVLINK_ATTR_SB_THRESHOLD]); - return devlink_sb_port_pool_set(devlink_port, devlink_sb->index, - pool_index, threshold, info->extack); -} - -static int -devlink_nl_sb_tc_pool_bind_fill(struct sk_buff *msg, struct devlink *devlink, - struct devlink_port *devlink_port, - struct devlink_sb *devlink_sb, u16 tc_index, - enum devlink_sb_pool_type pool_type, - enum devlink_command cmd, - u32 portid, u32 seq, int flags) -{ - const struct devlink_ops *ops = devlink->ops; - u16 pool_index; - u32 threshold; - void *hdr; - int err; - - err = ops->sb_tc_pool_bind_get(devlink_port, devlink_sb->index, - tc_index, pool_type, - &pool_index, &threshold); - if (err) - return err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_TC_INDEX, tc_index)) - goto nla_put_failure; - if (nla_put_u8(msg, DEVLINK_ATTR_SB_POOL_TYPE, pool_type)) - goto nla_put_failure; - if (nla_put_u16(msg, DEVLINK_ATTR_SB_POOL_INDEX, pool_index)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_THRESHOLD, threshold)) - goto nla_put_failure; - - if (ops->sb_occ_tc_port_bind_get) { - u32 cur; - u32 max; - - err = ops->sb_occ_tc_port_bind_get(devlink_port, - devlink_sb->index, - tc_index, pool_type, - &cur, &max); - if (err && err != -EOPNOTSUPP) - return err; - if (!err) { - if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_CUR, cur)) - goto nla_put_failure; - if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_MAX, max)) - goto nla_put_failure; - } - } - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_sb_tc_pool_bind_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct devlink *devlink = devlink_port->devlink; - struct devlink_sb *devlink_sb; - struct sk_buff *msg; - enum devlink_sb_pool_type pool_type; - u16 tc_index; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - err = devlink_sb_pool_type_get_from_info(info, &pool_type); - if (err) - return err; - - err = devlink_sb_tc_index_get_from_info(devlink_sb, info, - pool_type, &tc_index); - if (err) - return err; - - if (!devlink->ops->sb_tc_pool_bind_get) - return -EOPNOTSUPP; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_sb_tc_pool_bind_fill(msg, devlink, devlink_port, - devlink_sb, tc_index, pool_type, - DEVLINK_CMD_SB_TC_POOL_BIND_NEW, - info->snd_portid, - info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int __sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, - int start, int *p_idx, - struct devlink *devlink, - struct devlink_sb *devlink_sb, - u32 portid, u32 seq) -{ - struct devlink_port *devlink_port; - unsigned long port_index; - u16 tc_index; - int err; - - xa_for_each(&devlink->ports, port_index, devlink_port) { - for (tc_index = 0; - tc_index < devlink_sb->ingress_tc_count; tc_index++) { - if (*p_idx < start) { - (*p_idx)++; - continue; - } - err = devlink_nl_sb_tc_pool_bind_fill(msg, devlink, - devlink_port, - devlink_sb, - tc_index, - DEVLINK_SB_POOL_TYPE_INGRESS, - DEVLINK_CMD_SB_TC_POOL_BIND_NEW, - portid, seq, - NLM_F_MULTI); - if (err) - return err; - (*p_idx)++; - } - for (tc_index = 0; - tc_index < devlink_sb->egress_tc_count; tc_index++) { - if (*p_idx < start) { - (*p_idx)++; - continue; - } - err = devlink_nl_sb_tc_pool_bind_fill(msg, devlink, - devlink_port, - devlink_sb, - tc_index, - DEVLINK_SB_POOL_TYPE_EGRESS, - DEVLINK_CMD_SB_TC_POOL_BIND_NEW, - portid, seq, - NLM_F_MULTI); - if (err) - return err; - (*p_idx)++; - } - } - return 0; -} - -static int -devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - struct devlink_sb *devlink_sb; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (!devlink->ops->sb_tc_pool_bind_get) - goto retry; - - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_tc_pool_bind_get_dumpit(msg, start, &idx, - devlink, - devlink_sb, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - } - devl_unlock(devlink); -retry: - devlink_put(devlink); - } -out: - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -static int devlink_sb_tc_pool_bind_set(struct devlink_port *devlink_port, - unsigned int sb_index, u16 tc_index, - enum devlink_sb_pool_type pool_type, - u16 pool_index, u32 threshold, - struct netlink_ext_ack *extack) - -{ - const struct devlink_ops *ops = devlink_port->devlink->ops; - - if (ops->sb_tc_pool_bind_set) - return ops->sb_tc_pool_bind_set(devlink_port, sb_index, - tc_index, pool_type, - pool_index, threshold, extack); - return -EOPNOTSUPP; -} - -static int devlink_nl_cmd_sb_tc_pool_bind_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_port *devlink_port = info->user_ptr[1]; - struct devlink *devlink = info->user_ptr[0]; - enum devlink_sb_pool_type pool_type; - struct devlink_sb *devlink_sb; - u16 tc_index; - u16 pool_index; - u32 threshold; - int err; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - err = devlink_sb_pool_type_get_from_info(info, &pool_type); - if (err) - return err; - - err = devlink_sb_tc_index_get_from_info(devlink_sb, info, - pool_type, &tc_index); - if (err) - return err; - - err = devlink_sb_pool_index_get_from_info(devlink_sb, info, - &pool_index); - if (err) - return err; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SB_THRESHOLD)) - return -EINVAL; - - threshold = nla_get_u32(info->attrs[DEVLINK_ATTR_SB_THRESHOLD]); - return devlink_sb_tc_pool_bind_set(devlink_port, devlink_sb->index, - tc_index, pool_type, - pool_index, threshold, info->extack); -} - -static int devlink_nl_cmd_sb_occ_snapshot_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - const struct devlink_ops *ops = devlink->ops; - struct devlink_sb *devlink_sb; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - if (ops->sb_occ_snapshot) - return ops->sb_occ_snapshot(devlink, devlink_sb->index); - return -EOPNOTSUPP; -} - -static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - const struct devlink_ops *ops = devlink->ops; - struct devlink_sb *devlink_sb; - - devlink_sb = devlink_sb_get_from_info(devlink, info); - if (IS_ERR(devlink_sb)) - return PTR_ERR(devlink_sb); - - if (ops->sb_occ_max_clear) - return ops->sb_occ_max_clear(devlink, devlink_sb->index); - return -EOPNOTSUPP; -} - -static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - const struct devlink_ops *ops = devlink->ops; - enum devlink_eswitch_encap_mode encap_mode; - u8 inline_mode; - void *hdr; - int err = 0; - u16 mode; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - err = devlink_nl_put_handle(msg, devlink); - if (err) - goto nla_put_failure; - - if (ops->eswitch_mode_get) { - err = ops->eswitch_mode_get(devlink, &mode); - if (err) - goto nla_put_failure; - err = nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode); - if (err) - goto nla_put_failure; - } - - if (ops->eswitch_inline_mode_get) { - err = ops->eswitch_inline_mode_get(devlink, &inline_mode); - if (err) - goto nla_put_failure; - err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_INLINE_MODE, - inline_mode); - if (err) - goto nla_put_failure; - } - - if (ops->eswitch_encap_mode_get) { - err = ops->eswitch_encap_mode_get(devlink, &encap_mode); - if (err) - goto nla_put_failure; - err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, encap_mode); - if (err) - goto nla_put_failure; - } - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_GET, - info->snd_portid, info->snd_seq, 0); - - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, - struct netlink_ext_ack *extack) -{ - struct devlink_rate *devlink_rate; - - list_for_each_entry(devlink_rate, &devlink->rate_list, list) - if (devlink_rate_is_node(devlink_rate)) { - NL_SET_ERR_MSG_MOD(extack, "Rate node(s) exists."); - return -EBUSY; - } - return 0; -} - -static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - const struct devlink_ops *ops = devlink->ops; - enum devlink_eswitch_encap_mode encap_mode; - u8 inline_mode; - int err = 0; - u16 mode; - - if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) { - if (!ops->eswitch_mode_set) - return -EOPNOTSUPP; - mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]); - err = devlink_rate_nodes_check(devlink, mode, info->extack); - if (err) - return err; - err = ops->eswitch_mode_set(devlink, mode, info->extack); - if (err) - return err; - } - - if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) { - if (!ops->eswitch_inline_mode_set) - return -EOPNOTSUPP; - inline_mode = nla_get_u8( - info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]); - err = ops->eswitch_inline_mode_set(devlink, inline_mode, - info->extack); - if (err) - return err; - } - - if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) { - if (!ops->eswitch_encap_mode_set) - return -EOPNOTSUPP; - encap_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]); - err = ops->eswitch_encap_mode_set(devlink, encap_mode, - info->extack); - if (err) - return err; - } - - return 0; -} - -int devlink_dpipe_match_put(struct sk_buff *skb, - struct devlink_dpipe_match *match) -{ - struct devlink_dpipe_header *header = match->header; - struct devlink_dpipe_field *field = &header->fields[match->field_id]; - struct nlattr *match_attr; - - match_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_MATCH); - if (!match_attr) - return -EMSGSIZE; - - if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_MATCH_TYPE, match->type) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_INDEX, match->header_index) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) || - nla_put_u8(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL, header->global)) - goto nla_put_failure; - - nla_nest_end(skb, match_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, match_attr); - return -EMSGSIZE; -} -EXPORT_SYMBOL_GPL(devlink_dpipe_match_put); - -static int devlink_dpipe_matches_put(struct devlink_dpipe_table *table, - struct sk_buff *skb) -{ - struct nlattr *matches_attr; - - matches_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_TABLE_MATCHES); - if (!matches_attr) - return -EMSGSIZE; - - if (table->table_ops->matches_dump(table->priv, skb)) - goto nla_put_failure; - - nla_nest_end(skb, matches_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, matches_attr); - return -EMSGSIZE; -} - -int devlink_dpipe_action_put(struct sk_buff *skb, - struct devlink_dpipe_action *action) -{ - struct devlink_dpipe_header *header = action->header; - struct devlink_dpipe_field *field = &header->fields[action->field_id]; - struct nlattr *action_attr; - - action_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_ACTION); - if (!action_attr) - return -EMSGSIZE; - - if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_ACTION_TYPE, action->type) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_INDEX, action->header_index) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) || - nla_put_u8(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL, header->global)) - goto nla_put_failure; - - nla_nest_end(skb, action_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, action_attr); - return -EMSGSIZE; -} -EXPORT_SYMBOL_GPL(devlink_dpipe_action_put); - -static int devlink_dpipe_actions_put(struct devlink_dpipe_table *table, - struct sk_buff *skb) -{ - struct nlattr *actions_attr; - - actions_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_TABLE_ACTIONS); - if (!actions_attr) - return -EMSGSIZE; - - if (table->table_ops->actions_dump(table->priv, skb)) - goto nla_put_failure; - - nla_nest_end(skb, actions_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, actions_attr); - return -EMSGSIZE; -} - -static int devlink_dpipe_table_put(struct sk_buff *skb, - struct devlink_dpipe_table *table) -{ - struct nlattr *table_attr; - u64 table_size; - - table_size = table->table_ops->size_get(table->priv); - table_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_TABLE); - if (!table_attr) - return -EMSGSIZE; - - if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_TABLE_NAME, table->name) || - nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_SIZE, table_size, - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (nla_put_u8(skb, DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED, - table->counters_enabled)) - goto nla_put_failure; - - if (table->resource_valid) { - if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID, - table->resource_id, DEVLINK_ATTR_PAD) || - nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_UNITS, - table->resource_units, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - } - if (devlink_dpipe_matches_put(table, skb)) - goto nla_put_failure; - - if (devlink_dpipe_actions_put(table, skb)) - goto nla_put_failure; - - nla_nest_end(skb, table_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, table_attr); - return -EMSGSIZE; -} - -static int devlink_dpipe_send_and_alloc_skb(struct sk_buff **pskb, - struct genl_info *info) -{ - int err; - - if (*pskb) { - err = genlmsg_reply(*pskb, info); - if (err) - return err; - } - *pskb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!*pskb) - return -ENOMEM; - return 0; -} - -static int devlink_dpipe_tables_fill(struct genl_info *info, - enum devlink_command cmd, int flags, - struct list_head *dpipe_tables, - const char *table_name) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_dpipe_table *table; - struct nlattr *tables_attr; - struct sk_buff *skb = NULL; - struct nlmsghdr *nlh; - bool incomplete; - void *hdr; - int i; - int err; - - table = list_first_entry(dpipe_tables, - struct devlink_dpipe_table, list); -start_again: - err = devlink_dpipe_send_and_alloc_skb(&skb, info); - if (err) - return err; - - hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, - &devlink_nl_family, NLM_F_MULTI, cmd); - if (!hdr) { - nlmsg_free(skb); - return -EMSGSIZE; - } - - if (devlink_nl_put_handle(skb, devlink)) - goto nla_put_failure; - tables_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_TABLES); - if (!tables_attr) - goto nla_put_failure; - - i = 0; - incomplete = false; - list_for_each_entry_from(table, dpipe_tables, list) { - if (!table_name) { - err = devlink_dpipe_table_put(skb, table); - if (err) { - if (!i) - goto err_table_put; - incomplete = true; - break; - } - } else { - if (!strcmp(table->name, table_name)) { - err = devlink_dpipe_table_put(skb, table); - if (err) - break; - } - } - i++; - } - - nla_nest_end(skb, tables_attr); - genlmsg_end(skb, hdr); - if (incomplete) - goto start_again; - -send_done: - nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, - NLMSG_DONE, 0, flags | NLM_F_MULTI); - if (!nlh) { - err = devlink_dpipe_send_and_alloc_skb(&skb, info); - if (err) - return err; - goto send_done; - } - - return genlmsg_reply(skb, info); - -nla_put_failure: - err = -EMSGSIZE; -err_table_put: - nlmsg_free(skb); - return err; -} - -static int devlink_nl_cmd_dpipe_table_get(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - const char *table_name = NULL; - - if (info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]) - table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]); - - return devlink_dpipe_tables_fill(info, DEVLINK_CMD_DPIPE_TABLE_GET, 0, - &devlink->dpipe_table_list, - table_name); -} - -static int devlink_dpipe_value_put(struct sk_buff *skb, - struct devlink_dpipe_value *value) -{ - if (nla_put(skb, DEVLINK_ATTR_DPIPE_VALUE, - value->value_size, value->value)) - return -EMSGSIZE; - if (value->mask) - if (nla_put(skb, DEVLINK_ATTR_DPIPE_VALUE_MASK, - value->value_size, value->mask)) - return -EMSGSIZE; - if (value->mapping_valid) - if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_VALUE_MAPPING, - value->mapping_value)) - return -EMSGSIZE; - return 0; -} - -static int devlink_dpipe_action_value_put(struct sk_buff *skb, - struct devlink_dpipe_value *value) -{ - if (!value->action) - return -EINVAL; - if (devlink_dpipe_action_put(skb, value->action)) - return -EMSGSIZE; - if (devlink_dpipe_value_put(skb, value)) - return -EMSGSIZE; - return 0; -} - -static int devlink_dpipe_action_values_put(struct sk_buff *skb, - struct devlink_dpipe_value *values, - unsigned int values_count) -{ - struct nlattr *action_attr; - int i; - int err; - - for (i = 0; i < values_count; i++) { - action_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_ACTION_VALUE); - if (!action_attr) - return -EMSGSIZE; - err = devlink_dpipe_action_value_put(skb, &values[i]); - if (err) - goto err_action_value_put; - nla_nest_end(skb, action_attr); - } - return 0; - -err_action_value_put: - nla_nest_cancel(skb, action_attr); - return err; -} - -static int devlink_dpipe_match_value_put(struct sk_buff *skb, - struct devlink_dpipe_value *value) -{ - if (!value->match) - return -EINVAL; - if (devlink_dpipe_match_put(skb, value->match)) - return -EMSGSIZE; - if (devlink_dpipe_value_put(skb, value)) - return -EMSGSIZE; - return 0; -} - -static int devlink_dpipe_match_values_put(struct sk_buff *skb, - struct devlink_dpipe_value *values, - unsigned int values_count) -{ - struct nlattr *match_attr; - int i; - int err; - - for (i = 0; i < values_count; i++) { - match_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_MATCH_VALUE); - if (!match_attr) - return -EMSGSIZE; - err = devlink_dpipe_match_value_put(skb, &values[i]); - if (err) - goto err_match_value_put; - nla_nest_end(skb, match_attr); - } - return 0; - -err_match_value_put: - nla_nest_cancel(skb, match_attr); - return err; -} - -static int devlink_dpipe_entry_put(struct sk_buff *skb, - struct devlink_dpipe_entry *entry) -{ - struct nlattr *entry_attr, *matches_attr, *actions_attr; - int err; - - entry_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_ENTRY); - if (!entry_attr) - return -EMSGSIZE; - - if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_ENTRY_INDEX, entry->index, - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (entry->counter_valid) - if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_ENTRY_COUNTER, - entry->counter, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - matches_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_ENTRY_MATCH_VALUES); - if (!matches_attr) - goto nla_put_failure; - - err = devlink_dpipe_match_values_put(skb, entry->match_values, - entry->match_values_count); - if (err) { - nla_nest_cancel(skb, matches_attr); - goto err_match_values_put; - } - nla_nest_end(skb, matches_attr); - - actions_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_ENTRY_ACTION_VALUES); - if (!actions_attr) - goto nla_put_failure; - - err = devlink_dpipe_action_values_put(skb, entry->action_values, - entry->action_values_count); - if (err) { - nla_nest_cancel(skb, actions_attr); - goto err_action_values_put; - } - nla_nest_end(skb, actions_attr); - - nla_nest_end(skb, entry_attr); - return 0; - -nla_put_failure: - err = -EMSGSIZE; -err_match_values_put: -err_action_values_put: - nla_nest_cancel(skb, entry_attr); - return err; -} - -static struct devlink_dpipe_table * -devlink_dpipe_table_find(struct list_head *dpipe_tables, - const char *table_name, struct devlink *devlink) -{ - struct devlink_dpipe_table *table; - list_for_each_entry_rcu(table, dpipe_tables, list, - lockdep_is_held(&devlink->lock)) { - if (!strcmp(table->name, table_name)) - return table; - } - return NULL; -} - -int devlink_dpipe_entry_ctx_prepare(struct devlink_dpipe_dump_ctx *dump_ctx) -{ - struct devlink *devlink; - int err; - - err = devlink_dpipe_send_and_alloc_skb(&dump_ctx->skb, - dump_ctx->info); - if (err) - return err; - - dump_ctx->hdr = genlmsg_put(dump_ctx->skb, - dump_ctx->info->snd_portid, - dump_ctx->info->snd_seq, - &devlink_nl_family, NLM_F_MULTI, - dump_ctx->cmd); - if (!dump_ctx->hdr) - goto nla_put_failure; - - devlink = dump_ctx->info->user_ptr[0]; - if (devlink_nl_put_handle(dump_ctx->skb, devlink)) - goto nla_put_failure; - dump_ctx->nest = nla_nest_start_noflag(dump_ctx->skb, - DEVLINK_ATTR_DPIPE_ENTRIES); - if (!dump_ctx->nest) - goto nla_put_failure; - return 0; - -nla_put_failure: - nlmsg_free(dump_ctx->skb); - return -EMSGSIZE; -} -EXPORT_SYMBOL_GPL(devlink_dpipe_entry_ctx_prepare); - -int devlink_dpipe_entry_ctx_append(struct devlink_dpipe_dump_ctx *dump_ctx, - struct devlink_dpipe_entry *entry) -{ - return devlink_dpipe_entry_put(dump_ctx->skb, entry); -} -EXPORT_SYMBOL_GPL(devlink_dpipe_entry_ctx_append); - -int devlink_dpipe_entry_ctx_close(struct devlink_dpipe_dump_ctx *dump_ctx) -{ - nla_nest_end(dump_ctx->skb, dump_ctx->nest); - genlmsg_end(dump_ctx->skb, dump_ctx->hdr); - return 0; -} -EXPORT_SYMBOL_GPL(devlink_dpipe_entry_ctx_close); - -void devlink_dpipe_entry_clear(struct devlink_dpipe_entry *entry) - -{ - unsigned int value_count, value_index; - struct devlink_dpipe_value *value; - - value = entry->action_values; - value_count = entry->action_values_count; - for (value_index = 0; value_index < value_count; value_index++) { - kfree(value[value_index].value); - kfree(value[value_index].mask); - } - - value = entry->match_values; - value_count = entry->match_values_count; - for (value_index = 0; value_index < value_count; value_index++) { - kfree(value[value_index].value); - kfree(value[value_index].mask); - } -} -EXPORT_SYMBOL_GPL(devlink_dpipe_entry_clear); - -static int devlink_dpipe_entries_fill(struct genl_info *info, - enum devlink_command cmd, int flags, - struct devlink_dpipe_table *table) -{ - struct devlink_dpipe_dump_ctx dump_ctx; - struct nlmsghdr *nlh; - int err; - - dump_ctx.skb = NULL; - dump_ctx.cmd = cmd; - dump_ctx.info = info; - - err = table->table_ops->entries_dump(table->priv, - table->counters_enabled, - &dump_ctx); - if (err) - return err; - -send_done: - nlh = nlmsg_put(dump_ctx.skb, info->snd_portid, info->snd_seq, - NLMSG_DONE, 0, flags | NLM_F_MULTI); - if (!nlh) { - err = devlink_dpipe_send_and_alloc_skb(&dump_ctx.skb, info); - if (err) - return err; - goto send_done; - } - return genlmsg_reply(dump_ctx.skb, info); -} - -static int devlink_nl_cmd_dpipe_entries_get(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_dpipe_table *table; - const char *table_name; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_DPIPE_TABLE_NAME)) - return -EINVAL; - - table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]); - table = devlink_dpipe_table_find(&devlink->dpipe_table_list, - table_name, devlink); - if (!table) - return -EINVAL; - - if (!table->table_ops->entries_dump) - return -EINVAL; - - return devlink_dpipe_entries_fill(info, DEVLINK_CMD_DPIPE_ENTRIES_GET, - 0, table); -} - -static int devlink_dpipe_fields_put(struct sk_buff *skb, - const struct devlink_dpipe_header *header) -{ - struct devlink_dpipe_field *field; - struct nlattr *field_attr; - int i; - - for (i = 0; i < header->fields_count; i++) { - field = &header->fields[i]; - field_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_FIELD); - if (!field_attr) - return -EMSGSIZE; - if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_FIELD_NAME, field->name) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH, field->bitwidth) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE, field->mapping_type)) - goto nla_put_failure; - nla_nest_end(skb, field_attr); - } - return 0; - -nla_put_failure: - nla_nest_cancel(skb, field_attr); - return -EMSGSIZE; -} - -static int devlink_dpipe_header_put(struct sk_buff *skb, - struct devlink_dpipe_header *header) -{ - struct nlattr *fields_attr, *header_attr; - int err; - - header_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_HEADER); - if (!header_attr) - return -EMSGSIZE; - - if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_HEADER_NAME, header->name) || - nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) || - nla_put_u8(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL, header->global)) - goto nla_put_failure; - - fields_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_DPIPE_HEADER_FIELDS); - if (!fields_attr) - goto nla_put_failure; - - err = devlink_dpipe_fields_put(skb, header); - if (err) { - nla_nest_cancel(skb, fields_attr); - goto nla_put_failure; - } - nla_nest_end(skb, fields_attr); - nla_nest_end(skb, header_attr); - return 0; - -nla_put_failure: - err = -EMSGSIZE; - nla_nest_cancel(skb, header_attr); - return err; -} - -static int devlink_dpipe_headers_fill(struct genl_info *info, - enum devlink_command cmd, int flags, - struct devlink_dpipe_headers * - dpipe_headers) -{ - struct devlink *devlink = info->user_ptr[0]; - struct nlattr *headers_attr; - struct sk_buff *skb = NULL; - struct nlmsghdr *nlh; - void *hdr; - int i, j; - int err; - - i = 0; -start_again: - err = devlink_dpipe_send_and_alloc_skb(&skb, info); - if (err) - return err; - - hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, - &devlink_nl_family, NLM_F_MULTI, cmd); - if (!hdr) { - nlmsg_free(skb); - return -EMSGSIZE; - } - - if (devlink_nl_put_handle(skb, devlink)) - goto nla_put_failure; - headers_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_HEADERS); - if (!headers_attr) - goto nla_put_failure; - - j = 0; - for (; i < dpipe_headers->headers_count; i++) { - err = devlink_dpipe_header_put(skb, dpipe_headers->headers[i]); - if (err) { - if (!j) - goto err_table_put; - break; - } - j++; - } - nla_nest_end(skb, headers_attr); - genlmsg_end(skb, hdr); - if (i != dpipe_headers->headers_count) - goto start_again; - -send_done: - nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, - NLMSG_DONE, 0, flags | NLM_F_MULTI); - if (!nlh) { - err = devlink_dpipe_send_and_alloc_skb(&skb, info); - if (err) - return err; - goto send_done; - } - return genlmsg_reply(skb, info); - -nla_put_failure: - err = -EMSGSIZE; -err_table_put: - nlmsg_free(skb); - return err; -} - -static int devlink_nl_cmd_dpipe_headers_get(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - - if (!devlink->dpipe_headers) - return -EOPNOTSUPP; - return devlink_dpipe_headers_fill(info, DEVLINK_CMD_DPIPE_HEADERS_GET, - 0, devlink->dpipe_headers); -} - -static int devlink_dpipe_table_counters_set(struct devlink *devlink, - const char *table_name, - bool enable) -{ - struct devlink_dpipe_table *table; - - table = devlink_dpipe_table_find(&devlink->dpipe_table_list, - table_name, devlink); - if (!table) - return -EINVAL; - - if (table->counter_control_extern) - return -EOPNOTSUPP; - - if (!(table->counters_enabled ^ enable)) - return 0; - - table->counters_enabled = enable; - if (table->table_ops->counters_set_update) - table->table_ops->counters_set_update(table->priv, enable); - return 0; -} - -static int devlink_nl_cmd_dpipe_table_counters_set(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - const char *table_name; - bool counters_enable; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_DPIPE_TABLE_NAME) || - GENL_REQ_ATTR_CHECK(info, - DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED)) - return -EINVAL; - - table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]); - counters_enable = !!nla_get_u8(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED]); - - return devlink_dpipe_table_counters_set(devlink, table_name, - counters_enable); -} - -static struct devlink_resource * -devlink_resource_find(struct devlink *devlink, - struct devlink_resource *resource, u64 resource_id) -{ - struct list_head *resource_list; - - if (resource) - resource_list = &resource->resource_list; - else - resource_list = &devlink->resource_list; - - list_for_each_entry(resource, resource_list, list) { - struct devlink_resource *child_resource; - - if (resource->id == resource_id) - return resource; - - child_resource = devlink_resource_find(devlink, resource, - resource_id); - if (child_resource) - return child_resource; - } - return NULL; -} - -static void -devlink_resource_validate_children(struct devlink_resource *resource) -{ - struct devlink_resource *child_resource; - bool size_valid = true; - u64 parts_size = 0; - - if (list_empty(&resource->resource_list)) - goto out; - - list_for_each_entry(child_resource, &resource->resource_list, list) - parts_size += child_resource->size_new; - - if (parts_size > resource->size_new) - size_valid = false; -out: - resource->size_valid = size_valid; -} - -static int -devlink_resource_validate_size(struct devlink_resource *resource, u64 size, - struct netlink_ext_ack *extack) -{ - u64 reminder; - int err = 0; - - if (size > resource->size_params.size_max) { - NL_SET_ERR_MSG_MOD(extack, "Size larger than maximum"); - err = -EINVAL; - } - - if (size < resource->size_params.size_min) { - NL_SET_ERR_MSG_MOD(extack, "Size smaller than minimum"); - err = -EINVAL; - } - - div64_u64_rem(size, resource->size_params.size_granularity, &reminder); - if (reminder) { - NL_SET_ERR_MSG_MOD(extack, "Wrong granularity"); - err = -EINVAL; - } - - return err; -} - -static int devlink_nl_cmd_resource_set(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_resource *resource; - u64 resource_id; - u64 size; - int err; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_RESOURCE_ID) || - GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_RESOURCE_SIZE)) - return -EINVAL; - resource_id = nla_get_u64(info->attrs[DEVLINK_ATTR_RESOURCE_ID]); - - resource = devlink_resource_find(devlink, NULL, resource_id); - if (!resource) - return -EINVAL; - - size = nla_get_u64(info->attrs[DEVLINK_ATTR_RESOURCE_SIZE]); - err = devlink_resource_validate_size(resource, size, info->extack); - if (err) - return err; - - resource->size_new = size; - devlink_resource_validate_children(resource); - if (resource->parent) - devlink_resource_validate_children(resource->parent); - return 0; -} - -static int -devlink_resource_size_params_put(struct devlink_resource *resource, - struct sk_buff *skb) -{ - struct devlink_resource_size_params *size_params; - - size_params = &resource->size_params; - if (nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_GRAN, - size_params->size_granularity, DEVLINK_ATTR_PAD) || - nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_MAX, - size_params->size_max, DEVLINK_ATTR_PAD) || - nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_MIN, - size_params->size_min, DEVLINK_ATTR_PAD) || - nla_put_u8(skb, DEVLINK_ATTR_RESOURCE_UNIT, size_params->unit)) - return -EMSGSIZE; - return 0; -} - -static int devlink_resource_occ_put(struct devlink_resource *resource, - struct sk_buff *skb) -{ - if (!resource->occ_get) - return 0; - return nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_OCC, - resource->occ_get(resource->occ_get_priv), - DEVLINK_ATTR_PAD); -} - -static int devlink_resource_put(struct devlink *devlink, struct sk_buff *skb, - struct devlink_resource *resource) -{ - struct devlink_resource *child_resource; - struct nlattr *child_resource_attr; - struct nlattr *resource_attr; - - resource_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_RESOURCE); - if (!resource_attr) - return -EMSGSIZE; - - if (nla_put_string(skb, DEVLINK_ATTR_RESOURCE_NAME, resource->name) || - nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE, resource->size, - DEVLINK_ATTR_PAD) || - nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_ID, resource->id, - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (resource->size != resource->size_new && - nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_NEW, - resource->size_new, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (devlink_resource_occ_put(resource, skb)) - goto nla_put_failure; - if (devlink_resource_size_params_put(resource, skb)) - goto nla_put_failure; - if (list_empty(&resource->resource_list)) - goto out; - - if (nla_put_u8(skb, DEVLINK_ATTR_RESOURCE_SIZE_VALID, - resource->size_valid)) - goto nla_put_failure; - - child_resource_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_RESOURCE_LIST); - if (!child_resource_attr) - goto nla_put_failure; - - list_for_each_entry(child_resource, &resource->resource_list, list) { - if (devlink_resource_put(devlink, skb, child_resource)) - goto resource_put_failure; - } - - nla_nest_end(skb, child_resource_attr); -out: - nla_nest_end(skb, resource_attr); - return 0; - -resource_put_failure: - nla_nest_cancel(skb, child_resource_attr); -nla_put_failure: - nla_nest_cancel(skb, resource_attr); - return -EMSGSIZE; -} - -static int devlink_resource_fill(struct genl_info *info, - enum devlink_command cmd, int flags) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_resource *resource; - struct nlattr *resources_attr; - struct sk_buff *skb = NULL; - struct nlmsghdr *nlh; - bool incomplete; - void *hdr; - int i; - int err; - - resource = list_first_entry(&devlink->resource_list, - struct devlink_resource, list); -start_again: - err = devlink_dpipe_send_and_alloc_skb(&skb, info); - if (err) - return err; - - hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, - &devlink_nl_family, NLM_F_MULTI, cmd); - if (!hdr) { - nlmsg_free(skb); - return -EMSGSIZE; - } - - if (devlink_nl_put_handle(skb, devlink)) - goto nla_put_failure; - - resources_attr = nla_nest_start_noflag(skb, - DEVLINK_ATTR_RESOURCE_LIST); - if (!resources_attr) - goto nla_put_failure; - - incomplete = false; - i = 0; - list_for_each_entry_from(resource, &devlink->resource_list, list) { - err = devlink_resource_put(devlink, skb, resource); - if (err) { - if (!i) - goto err_resource_put; - incomplete = true; - break; - } - i++; - } - nla_nest_end(skb, resources_attr); - genlmsg_end(skb, hdr); - if (incomplete) - goto start_again; -send_done: - nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, - NLMSG_DONE, 0, flags | NLM_F_MULTI); - if (!nlh) { - err = devlink_dpipe_send_and_alloc_skb(&skb, info); - if (err) - return err; - goto send_done; - } - return genlmsg_reply(skb, info); - -nla_put_failure: - err = -EMSGSIZE; -err_resource_put: - nlmsg_free(skb); - return err; -} - -static int devlink_nl_cmd_resource_dump(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - - if (list_empty(&devlink->resource_list)) - return -EOPNOTSUPP; - - return devlink_resource_fill(info, DEVLINK_CMD_RESOURCE_DUMP, 0); -} - -static int -devlink_resources_validate(struct devlink *devlink, - struct devlink_resource *resource, - struct genl_info *info) -{ - struct list_head *resource_list; - int err = 0; - - if (resource) - resource_list = &resource->resource_list; - else - resource_list = &devlink->resource_list; - - list_for_each_entry(resource, resource_list, list) { - if (!resource->size_valid) - return -EINVAL; - err = devlink_resources_validate(devlink, resource, info); - if (err) - return err; - } - return err; -} - -static struct net *devlink_netns_get(struct sk_buff *skb, - struct genl_info *info) -{ - struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID]; - struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD]; - struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID]; - struct net *net; - - if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) { - NL_SET_ERR_MSG_MOD(info->extack, "multiple netns identifying attributes specified"); - return ERR_PTR(-EINVAL); - } - - if (netns_pid_attr) { - net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr)); - } else if (netns_fd_attr) { - net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr)); - } else if (netns_id_attr) { - net = get_net_ns_by_id(sock_net(skb->sk), - nla_get_u32(netns_id_attr)); - if (!net) - net = ERR_PTR(-EINVAL); - } else { - WARN_ON(1); - net = ERR_PTR(-EINVAL); - } - if (IS_ERR(net)) { - NL_SET_ERR_MSG_MOD(info->extack, "Unknown network namespace"); - return ERR_PTR(-EINVAL); - } - if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) { - put_net(net); - return ERR_PTR(-EPERM); - } - return net; -} - -static void devlink_param_notify(struct devlink *devlink, - unsigned int port_index, - struct devlink_param_item *param_item, - enum devlink_command cmd); - -static void devlink_ns_change_notify(struct devlink *devlink, - struct net *dest_net, struct net *curr_net, - bool new) -{ - struct devlink_param_item *param_item; - enum devlink_command cmd; - - /* Userspace needs to be notified about devlink objects - * removed from original and entering new network namespace. - * The rest of the devlink objects are re-created during - * reload process so the notifications are generated separatelly. - */ - - if (!dest_net || net_eq(dest_net, curr_net)) - return; - - if (new) - devlink_notify(devlink, DEVLINK_CMD_NEW); - - cmd = new ? DEVLINK_CMD_PARAM_NEW : DEVLINK_CMD_PARAM_DEL; - list_for_each_entry(param_item, &devlink->param_list, list) - devlink_param_notify(devlink, 0, param_item, cmd); - - if (!new) - devlink_notify(devlink, DEVLINK_CMD_DEL); -} - -static bool devlink_reload_supported(const struct devlink_ops *ops) -{ - return ops->reload_down && ops->reload_up; -} - -static void devlink_reload_failed_set(struct devlink *devlink, - bool reload_failed) -{ - if (devlink->reload_failed == reload_failed) - return; - devlink->reload_failed = reload_failed; - devlink_notify(devlink, DEVLINK_CMD_NEW); -} - -bool devlink_is_reload_failed(const struct devlink *devlink) -{ - return devlink->reload_failed; -} -EXPORT_SYMBOL_GPL(devlink_is_reload_failed); - -static void -__devlink_reload_stats_update(struct devlink *devlink, u32 *reload_stats, - enum devlink_reload_limit limit, u32 actions_performed) -{ - unsigned long actions = actions_performed; - int stat_idx; - int action; - - for_each_set_bit(action, &actions, __DEVLINK_RELOAD_ACTION_MAX) { - stat_idx = limit * __DEVLINK_RELOAD_ACTION_MAX + action; - reload_stats[stat_idx]++; - } - devlink_notify(devlink, DEVLINK_CMD_NEW); -} - -static void -devlink_reload_stats_update(struct devlink *devlink, enum devlink_reload_limit limit, - u32 actions_performed) -{ - __devlink_reload_stats_update(devlink, devlink->stats.reload_stats, limit, - actions_performed); -} - -/** - * devlink_remote_reload_actions_performed - Update devlink on reload actions - * performed which are not a direct result of devlink reload call. - * - * This should be called by a driver after performing reload actions in case it was not - * a result of devlink reload call. For example fw_activate was performed as a result - * of devlink reload triggered fw_activate on another host. - * The motivation for this function is to keep data on reload actions performed on this - * function whether it was done due to direct devlink reload call or not. - * - * @devlink: devlink - * @limit: reload limit - * @actions_performed: bitmask of actions performed - */ -void devlink_remote_reload_actions_performed(struct devlink *devlink, - enum devlink_reload_limit limit, - u32 actions_performed) -{ - if (WARN_ON(!actions_performed || - actions_performed & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || - actions_performed >= BIT(__DEVLINK_RELOAD_ACTION_MAX) || - limit > DEVLINK_RELOAD_LIMIT_MAX)) - return; - - __devlink_reload_stats_update(devlink, devlink->stats.remote_reload_stats, limit, - actions_performed); -} -EXPORT_SYMBOL_GPL(devlink_remote_reload_actions_performed); - -static int devlink_reload(struct devlink *devlink, struct net *dest_net, - enum devlink_reload_action action, enum devlink_reload_limit limit, - u32 *actions_performed, struct netlink_ext_ack *extack) -{ - u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; - struct net *curr_net; - int err; - - memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, - sizeof(remote_reload_stats)); - - curr_net = devlink_net(devlink); - devlink_ns_change_notify(devlink, dest_net, curr_net, false); - err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack); - if (err) - return err; - - if (dest_net && !net_eq(dest_net, curr_net)) { - move_netdevice_notifier_net(curr_net, dest_net, - &devlink->netdevice_nb); - write_pnet(&devlink->_net, dest_net); - } - - err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack); - devlink_reload_failed_set(devlink, !!err); - if (err) - return err; - - devlink_ns_change_notify(devlink, dest_net, curr_net, true); - WARN_ON(!(*actions_performed & BIT(action))); - /* Catch driver on updating the remote action within devlink reload */ - WARN_ON(memcmp(remote_reload_stats, devlink->stats.remote_reload_stats, - sizeof(remote_reload_stats))); - devlink_reload_stats_update(devlink, limit, *actions_performed); - return 0; -} - -static int -devlink_nl_reload_actions_performed_snd(struct devlink *devlink, u32 actions_performed, - enum devlink_command cmd, struct genl_info *info) -{ - struct sk_buff *msg; - void *hdr; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &devlink_nl_family, 0, cmd); - if (!hdr) - goto free_msg; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (nla_put_bitfield32(msg, DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, actions_performed, - actions_performed)) - goto nla_put_failure; - genlmsg_end(msg, hdr); - - return genlmsg_reply(msg, info); - -nla_put_failure: - genlmsg_cancel(msg, hdr); -free_msg: - nlmsg_free(msg); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - enum devlink_reload_action action; - enum devlink_reload_limit limit; - struct net *dest_net = NULL; - u32 actions_performed; - int err; - - if (!(devlink->features & DEVLINK_F_RELOAD)) - return -EOPNOTSUPP; - - err = devlink_resources_validate(devlink, NULL, info); - if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed"); - return err; - } - - if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION]) - action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]); - else - action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT; - - if (!devlink_reload_action_is_supported(devlink, action)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested reload action is not supported by the driver"); - return -EOPNOTSUPP; - } - - limit = DEVLINK_RELOAD_LIMIT_UNSPEC; - if (info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) { - struct nla_bitfield32 limits; - u32 limits_selected; - - limits = nla_get_bitfield32(info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]); - limits_selected = limits.value & limits.selector; - if (!limits_selected) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid limit selected"); - return -EINVAL; - } - for (limit = 0 ; limit <= DEVLINK_RELOAD_LIMIT_MAX ; limit++) - if (limits_selected & BIT(limit)) - break; - /* UAPI enables multiselection, but currently it is not used */ - if (limits_selected != BIT(limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Multiselection of limit is not supported"); - return -EOPNOTSUPP; - } - if (!devlink_reload_limit_is_supported(devlink, limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested limit is not supported by the driver"); - return -EOPNOTSUPP; - } - if (devlink_reload_combination_is_invalid(action, limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested limit is invalid for this action"); - return -EINVAL; - } - } - if (info->attrs[DEVLINK_ATTR_NETNS_PID] || - info->attrs[DEVLINK_ATTR_NETNS_FD] || - info->attrs[DEVLINK_ATTR_NETNS_ID]) { - dest_net = devlink_netns_get(skb, info); - if (IS_ERR(dest_net)) - return PTR_ERR(dest_net); - } - - err = devlink_reload(devlink, dest_net, action, limit, &actions_performed, info->extack); - - if (dest_net) - put_net(dest_net); - - if (err) - return err; - /* For backward compatibility generate reply only if attributes used by user */ - if (!info->attrs[DEVLINK_ATTR_RELOAD_ACTION] && !info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) - return 0; - - return devlink_nl_reload_actions_performed_snd(devlink, actions_performed, - DEVLINK_CMD_RELOAD, info); -} - -static int devlink_nl_flash_update_fill(struct sk_buff *msg, - struct devlink *devlink, - enum devlink_command cmd, - struct devlink_flash_notify *params) -{ - void *hdr; - - hdr = genlmsg_put(msg, 0, 0, &devlink_nl_family, 0, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS) - goto out; - - if (params->status_msg && - nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG, - params->status_msg)) - goto nla_put_failure; - if (params->component && - nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_COMPONENT, - params->component)) - goto nla_put_failure; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE, - params->done, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL, - params->total, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, - params->timeout, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - -out: - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void __devlink_flash_update_notify(struct devlink *devlink, - enum devlink_command cmd, - struct devlink_flash_notify *params) -{ - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_FLASH_UPDATE && - cmd != DEVLINK_CMD_FLASH_UPDATE_END && - cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS); - - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_flash_update_fill(msg, devlink, cmd, params); - if (err) - goto out_free_msg; - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); - return; - -out_free_msg: - nlmsg_free(msg); -} - -static void devlink_flash_update_begin_notify(struct devlink *devlink) -{ - struct devlink_flash_notify params = {}; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE, - ¶ms); -} - -static void devlink_flash_update_end_notify(struct devlink *devlink) -{ - struct devlink_flash_notify params = {}; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE_END, - ¶ms); -} - -void devlink_flash_update_status_notify(struct devlink *devlink, - const char *status_msg, - const char *component, - unsigned long done, - unsigned long total) -{ - struct devlink_flash_notify params = { - .status_msg = status_msg, - .component = component, - .done = done, - .total = total, - }; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE_STATUS, - ¶ms); -} -EXPORT_SYMBOL_GPL(devlink_flash_update_status_notify); - -void devlink_flash_update_timeout_notify(struct devlink *devlink, - const char *status_msg, - const char *component, - unsigned long timeout) -{ - struct devlink_flash_notify params = { - .status_msg = status_msg, - .component = component, - .timeout = timeout, - }; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE_STATUS, - ¶ms); -} -EXPORT_SYMBOL_GPL(devlink_flash_update_timeout_notify); - -struct devlink_info_req { - struct sk_buff *msg; - void (*version_cb)(const char *version_name, - enum devlink_info_version_type version_type, - void *version_cb_priv); - void *version_cb_priv; -}; - -struct devlink_flash_component_lookup_ctx { - const char *lookup_name; - bool lookup_name_found; -}; - -static void -devlink_flash_component_lookup_cb(const char *version_name, - enum devlink_info_version_type version_type, - void *version_cb_priv) -{ - struct devlink_flash_component_lookup_ctx *lookup_ctx = version_cb_priv; - - if (version_type != DEVLINK_INFO_VERSION_TYPE_COMPONENT || - lookup_ctx->lookup_name_found) - return; - - lookup_ctx->lookup_name_found = - !strcmp(lookup_ctx->lookup_name, version_name); -} - -static int devlink_flash_component_get(struct devlink *devlink, - struct nlattr *nla_component, - const char **p_component, - struct netlink_ext_ack *extack) -{ - struct devlink_flash_component_lookup_ctx lookup_ctx = {}; - struct devlink_info_req req = {}; - const char *component; - int ret; - - if (!nla_component) - return 0; - - component = nla_data(nla_component); - - if (!devlink->ops->info_get) { - NL_SET_ERR_MSG_ATTR(extack, nla_component, - "component update is not supported by this device"); - return -EOPNOTSUPP; - } - - lookup_ctx.lookup_name = component; - req.version_cb = devlink_flash_component_lookup_cb; - req.version_cb_priv = &lookup_ctx; - - ret = devlink->ops->info_get(devlink, &req, NULL); - if (ret) - return ret; - - if (!lookup_ctx.lookup_name_found) { - NL_SET_ERR_MSG_ATTR(extack, nla_component, - "selected component is not supported by this device"); - return -EINVAL; - } - *p_component = component; - return 0; -} - -static int devlink_nl_cmd_flash_update(struct sk_buff *skb, - struct genl_info *info) -{ - struct nlattr *nla_overwrite_mask, *nla_file_name; - struct devlink_flash_update_params params = {}; - struct devlink *devlink = info->user_ptr[0]; - const char *file_name; - u32 supported_params; - int ret; - - if (!devlink->ops->flash_update) - return -EOPNOTSUPP; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME)) - return -EINVAL; - - ret = devlink_flash_component_get(devlink, - info->attrs[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT], - ¶ms.component, info->extack); - if (ret) - return ret; - - supported_params = devlink->ops->supported_flash_update_params; - - nla_overwrite_mask = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK]; - if (nla_overwrite_mask) { - struct nla_bitfield32 sections; - - if (!(supported_params & DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK)) { - NL_SET_ERR_MSG_ATTR(info->extack, nla_overwrite_mask, - "overwrite settings are not supported by this device"); - return -EOPNOTSUPP; - } - sections = nla_get_bitfield32(nla_overwrite_mask); - params.overwrite_mask = sections.value & sections.selector; - } - - nla_file_name = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME]; - file_name = nla_data(nla_file_name); - ret = request_firmware(¶ms.fw, file_name, devlink->dev); - if (ret) { - NL_SET_ERR_MSG_ATTR(info->extack, nla_file_name, "failed to locate the requested firmware file"); - return ret; - } - - devlink_flash_update_begin_notify(devlink); - ret = devlink->ops->flash_update(devlink, ¶ms, info->extack); - devlink_flash_update_end_notify(devlink); - - release_firmware(params.fw); - - return ret; -} - -static int -devlink_nl_selftests_fill(struct sk_buff *msg, struct devlink *devlink, - u32 portid, u32 seq, int flags, - struct netlink_ext_ack *extack) -{ - struct nlattr *selftests; - void *hdr; - int err; - int i; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, - DEVLINK_CMD_SELFTESTS_GET); - if (!hdr) - return -EMSGSIZE; - - err = -EMSGSIZE; - if (devlink_nl_put_handle(msg, devlink)) - goto err_cancel_msg; - - selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); - if (!selftests) - goto err_cancel_msg; - - for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; - i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { - if (devlink->ops->selftest_check(devlink, i, extack)) { - err = nla_put_flag(msg, i); - if (err) - goto err_cancel_msg; - } - } - - nla_nest_end(msg, selftests); - genlmsg_end(msg, hdr); - return 0; - -err_cancel_msg: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - if (!devlink->ops->selftest_check) - return -EOPNOTSUPP; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_selftests_fill(msg, devlink, info->snd_portid, - info->snd_seq, 0, info->extack); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < start || !devlink->ops->selftest_check) - goto inc; - - devl_lock(devlink); - err = devlink_nl_selftests_fill(msg, devlink, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->extack); - devl_unlock(devlink); - if (err) { - devlink_put(devlink); - break; - } -inc: - idx++; - devlink_put(devlink); - } - - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -static int devlink_selftest_result_put(struct sk_buff *skb, unsigned int id, - enum devlink_selftest_status test_status) -{ - struct nlattr *result_attr; - - result_attr = nla_nest_start(skb, DEVLINK_ATTR_SELFTEST_RESULT); - if (!result_attr) - return -EMSGSIZE; - - if (nla_put_u32(skb, DEVLINK_ATTR_SELFTEST_RESULT_ID, id) || - nla_put_u8(skb, DEVLINK_ATTR_SELFTEST_RESULT_STATUS, - test_status)) - goto nla_put_failure; - - nla_nest_end(skb, result_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, result_attr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_selftests_run(struct sk_buff *skb, - struct genl_info *info) -{ - struct nlattr *tb[DEVLINK_ATTR_SELFTEST_ID_MAX + 1]; - struct devlink *devlink = info->user_ptr[0]; - struct nlattr *attrs, *selftests; - struct sk_buff *msg; - void *hdr; - int err; - int i; - - if (!devlink->ops->selftest_run || !devlink->ops->selftest_check) - return -EOPNOTSUPP; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SELFTESTS)) - return -EINVAL; - - attrs = info->attrs[DEVLINK_ATTR_SELFTESTS]; - - err = nla_parse_nested(tb, DEVLINK_ATTR_SELFTEST_ID_MAX, attrs, - devlink_selftest_nl_policy, info->extack); - if (err < 0) - return err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = -EMSGSIZE; - hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, - &devlink_nl_family, 0, DEVLINK_CMD_SELFTESTS_RUN); - if (!hdr) - goto free_msg; - - if (devlink_nl_put_handle(msg, devlink)) - goto genlmsg_cancel; - - selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); - if (!selftests) - goto genlmsg_cancel; - - for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; - i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { - enum devlink_selftest_status test_status; - - if (nla_get_flag(tb[i])) { - if (!devlink->ops->selftest_check(devlink, i, - info->extack)) { - if (devlink_selftest_result_put(msg, i, - DEVLINK_SELFTEST_STATUS_SKIP)) - goto selftests_nest_cancel; - continue; - } - - test_status = devlink->ops->selftest_run(devlink, i, - info->extack); - if (devlink_selftest_result_put(msg, i, test_status)) - goto selftests_nest_cancel; - } - } - - nla_nest_end(msg, selftests); - genlmsg_end(msg, hdr); - return genlmsg_reply(msg, info); - -selftests_nest_cancel: - nla_nest_cancel(msg, selftests); -genlmsg_cancel: - genlmsg_cancel(msg, hdr); -free_msg: - nlmsg_free(msg); - return err; -} - -static const struct devlink_param devlink_param_generic[] = { - { - .id = DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, - .name = DEVLINK_PARAM_GENERIC_INT_ERR_RESET_NAME, - .type = DEVLINK_PARAM_GENERIC_INT_ERR_RESET_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - .name = DEVLINK_PARAM_GENERIC_MAX_MACS_NAME, - .type = DEVLINK_PARAM_GENERIC_MAX_MACS_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV, - .name = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, - .name = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME, - .type = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI, - .name = DEVLINK_PARAM_GENERIC_IGNORE_ARI_NAME, - .type = DEVLINK_PARAM_GENERIC_IGNORE_ARI_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX, - .name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_NAME, - .type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN, - .name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_NAME, - .type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, - .name = DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_NAME, - .type = DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE, - .name = DEVLINK_PARAM_GENERIC_RESET_DEV_ON_DRV_PROBE_NAME, - .type = DEVLINK_PARAM_GENERIC_RESET_DEV_ON_DRV_PROBE_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, - .name = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET, - .name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, - .name = DEVLINK_PARAM_GENERIC_ENABLE_ETH_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_ETH_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA, - .name = DEVLINK_PARAM_GENERIC_ENABLE_RDMA_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_RDMA_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET, - .name = DEVLINK_PARAM_GENERIC_ENABLE_VNET_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_VNET_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_IWARP, - .name = DEVLINK_PARAM_GENERIC_ENABLE_IWARP_NAME, - .type = DEVLINK_PARAM_GENERIC_ENABLE_IWARP_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE, - .name = DEVLINK_PARAM_GENERIC_IO_EQ_SIZE_NAME, - .type = DEVLINK_PARAM_GENERIC_IO_EQ_SIZE_TYPE, - }, - { - .id = DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE, - .name = DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_NAME, - .type = DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_TYPE, - }, -}; - -static int devlink_param_generic_verify(const struct devlink_param *param) -{ - /* verify it match generic parameter by id and name */ - if (param->id > DEVLINK_PARAM_GENERIC_ID_MAX) - return -EINVAL; - if (strcmp(param->name, devlink_param_generic[param->id].name)) - return -ENOENT; - - WARN_ON(param->type != devlink_param_generic[param->id].type); - - return 0; -} - -static int devlink_param_driver_verify(const struct devlink_param *param) -{ - int i; - - if (param->id <= DEVLINK_PARAM_GENERIC_ID_MAX) - return -EINVAL; - /* verify no such name in generic params */ - for (i = 0; i <= DEVLINK_PARAM_GENERIC_ID_MAX; i++) - if (!strcmp(param->name, devlink_param_generic[i].name)) - return -EEXIST; - - return 0; -} - -static struct devlink_param_item * -devlink_param_find_by_name(struct list_head *param_list, - const char *param_name) -{ - struct devlink_param_item *param_item; - - list_for_each_entry(param_item, param_list, list) - if (!strcmp(param_item->param->name, param_name)) - return param_item; - return NULL; -} - -static struct devlink_param_item * -devlink_param_find_by_id(struct list_head *param_list, u32 param_id) -{ - struct devlink_param_item *param_item; - - list_for_each_entry(param_item, param_list, list) - if (param_item->param->id == param_id) - return param_item; - return NULL; -} - -static bool -devlink_param_cmode_is_supported(const struct devlink_param *param, - enum devlink_param_cmode cmode) -{ - return test_bit(cmode, ¶m->supported_cmodes); -} - -static int devlink_param_get(struct devlink *devlink, - const struct devlink_param *param, - struct devlink_param_gset_ctx *ctx) -{ - if (!param->get || devlink->reload_failed) - return -EOPNOTSUPP; - return param->get(devlink, param->id, ctx); -} - -static int devlink_param_set(struct devlink *devlink, - const struct devlink_param *param, - struct devlink_param_gset_ctx *ctx) -{ - if (!param->set || devlink->reload_failed) - return -EOPNOTSUPP; - return param->set(devlink, param->id, ctx); -} - -static int -devlink_param_type_to_nla_type(enum devlink_param_type param_type) -{ - switch (param_type) { - case DEVLINK_PARAM_TYPE_U8: - return NLA_U8; - case DEVLINK_PARAM_TYPE_U16: - return NLA_U16; - case DEVLINK_PARAM_TYPE_U32: - return NLA_U32; - case DEVLINK_PARAM_TYPE_STRING: - return NLA_STRING; - case DEVLINK_PARAM_TYPE_BOOL: - return NLA_FLAG; - default: - return -EINVAL; - } -} - -static int -devlink_nl_param_value_fill_one(struct sk_buff *msg, - enum devlink_param_type type, - enum devlink_param_cmode cmode, - union devlink_param_value val) -{ - struct nlattr *param_value_attr; - - param_value_attr = nla_nest_start_noflag(msg, - DEVLINK_ATTR_PARAM_VALUE); - if (!param_value_attr) - goto nla_put_failure; - - if (nla_put_u8(msg, DEVLINK_ATTR_PARAM_VALUE_CMODE, cmode)) - goto value_nest_cancel; - - switch (type) { - case DEVLINK_PARAM_TYPE_U8: - if (nla_put_u8(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, val.vu8)) - goto value_nest_cancel; - break; - case DEVLINK_PARAM_TYPE_U16: - if (nla_put_u16(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, val.vu16)) - goto value_nest_cancel; - break; - case DEVLINK_PARAM_TYPE_U32: - if (nla_put_u32(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, val.vu32)) - goto value_nest_cancel; - break; - case DEVLINK_PARAM_TYPE_STRING: - if (nla_put_string(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, - val.vstr)) - goto value_nest_cancel; - break; - case DEVLINK_PARAM_TYPE_BOOL: - if (val.vbool && - nla_put_flag(msg, DEVLINK_ATTR_PARAM_VALUE_DATA)) - goto value_nest_cancel; - break; - } - - nla_nest_end(msg, param_value_attr); - return 0; - -value_nest_cancel: - nla_nest_cancel(msg, param_value_attr); -nla_put_failure: - return -EMSGSIZE; -} - -static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink, - unsigned int port_index, - struct devlink_param_item *param_item, - enum devlink_command cmd, - u32 portid, u32 seq, int flags) -{ - union devlink_param_value param_value[DEVLINK_PARAM_CMODE_MAX + 1]; - bool param_value_set[DEVLINK_PARAM_CMODE_MAX + 1] = {}; - const struct devlink_param *param = param_item->param; - struct devlink_param_gset_ctx ctx; - struct nlattr *param_values_list; - struct nlattr *param_attr; - int nla_type; - void *hdr; - int err; - int i; - - /* Get value from driver part to driverinit configuration mode */ - for (i = 0; i <= DEVLINK_PARAM_CMODE_MAX; i++) { - if (!devlink_param_cmode_is_supported(param, i)) - continue; - if (i == DEVLINK_PARAM_CMODE_DRIVERINIT) { - if (!param_item->driverinit_value_valid) - return -EOPNOTSUPP; - param_value[i] = param_item->driverinit_value; - } else { - ctx.cmode = i; - err = devlink_param_get(devlink, param, &ctx); - if (err) - return err; - param_value[i] = ctx.val; - } - param_value_set[i] = true; - } - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto genlmsg_cancel; - - if (cmd == DEVLINK_CMD_PORT_PARAM_GET || - cmd == DEVLINK_CMD_PORT_PARAM_NEW || - cmd == DEVLINK_CMD_PORT_PARAM_DEL) - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, port_index)) - goto genlmsg_cancel; - - param_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PARAM); - if (!param_attr) - goto genlmsg_cancel; - if (nla_put_string(msg, DEVLINK_ATTR_PARAM_NAME, param->name)) - goto param_nest_cancel; - if (param->generic && nla_put_flag(msg, DEVLINK_ATTR_PARAM_GENERIC)) - goto param_nest_cancel; - - nla_type = devlink_param_type_to_nla_type(param->type); - if (nla_type < 0) - goto param_nest_cancel; - if (nla_put_u8(msg, DEVLINK_ATTR_PARAM_TYPE, nla_type)) - goto param_nest_cancel; - - param_values_list = nla_nest_start_noflag(msg, - DEVLINK_ATTR_PARAM_VALUES_LIST); - if (!param_values_list) - goto param_nest_cancel; - - for (i = 0; i <= DEVLINK_PARAM_CMODE_MAX; i++) { - if (!param_value_set[i]) - continue; - err = devlink_nl_param_value_fill_one(msg, param->type, - i, param_value[i]); - if (err) - goto values_list_nest_cancel; - } - - nla_nest_end(msg, param_values_list); - nla_nest_end(msg, param_attr); - genlmsg_end(msg, hdr); - return 0; - -values_list_nest_cancel: - nla_nest_end(msg, param_values_list); -param_nest_cancel: - nla_nest_cancel(msg, param_attr); -genlmsg_cancel: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void devlink_param_notify(struct devlink *devlink, - unsigned int port_index, - struct devlink_param_item *param_item, - enum devlink_command cmd) -{ - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_PARAM_NEW && cmd != DEVLINK_CMD_PARAM_DEL && - cmd != DEVLINK_CMD_PORT_PARAM_NEW && - cmd != DEVLINK_CMD_PORT_PARAM_DEL); - ASSERT_DEVLINK_REGISTERED(devlink); - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - err = devlink_nl_param_fill(msg, devlink, port_index, param_item, cmd, - 0, 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink_param_item *param_item; - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - list_for_each_entry(param_item, &devlink->param_list, list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_param_fill(msg, devlink, 0, param_item, - DEVLINK_CMD_PARAM_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -static int -devlink_param_type_get_from_info(struct genl_info *info, - enum devlink_param_type *param_type) -{ - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PARAM_TYPE)) - return -EINVAL; - - switch (nla_get_u8(info->attrs[DEVLINK_ATTR_PARAM_TYPE])) { - case NLA_U8: - *param_type = DEVLINK_PARAM_TYPE_U8; - break; - case NLA_U16: - *param_type = DEVLINK_PARAM_TYPE_U16; - break; - case NLA_U32: - *param_type = DEVLINK_PARAM_TYPE_U32; - break; - case NLA_STRING: - *param_type = DEVLINK_PARAM_TYPE_STRING; - break; - case NLA_FLAG: - *param_type = DEVLINK_PARAM_TYPE_BOOL; - break; - default: - return -EINVAL; - } - - return 0; -} - -static int -devlink_param_value_get_from_info(const struct devlink_param *param, - struct genl_info *info, - union devlink_param_value *value) -{ - struct nlattr *param_data; - int len; - - param_data = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]; - - if (param->type != DEVLINK_PARAM_TYPE_BOOL && !param_data) - return -EINVAL; - - switch (param->type) { - case DEVLINK_PARAM_TYPE_U8: - if (nla_len(param_data) != sizeof(u8)) - return -EINVAL; - value->vu8 = nla_get_u8(param_data); - break; - case DEVLINK_PARAM_TYPE_U16: - if (nla_len(param_data) != sizeof(u16)) - return -EINVAL; - value->vu16 = nla_get_u16(param_data); - break; - case DEVLINK_PARAM_TYPE_U32: - if (nla_len(param_data) != sizeof(u32)) - return -EINVAL; - value->vu32 = nla_get_u32(param_data); - break; - case DEVLINK_PARAM_TYPE_STRING: - len = strnlen(nla_data(param_data), nla_len(param_data)); - if (len == nla_len(param_data) || - len >= __DEVLINK_PARAM_MAX_STRING_VALUE) - return -EINVAL; - strcpy(value->vstr, nla_data(param_data)); - break; - case DEVLINK_PARAM_TYPE_BOOL: - if (param_data && nla_len(param_data)) - return -EINVAL; - value->vbool = nla_get_flag(param_data); - break; - } - return 0; -} - -static struct devlink_param_item * -devlink_param_get_from_info(struct list_head *param_list, - struct genl_info *info) -{ - char *param_name; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PARAM_NAME)) - return NULL; - - param_name = nla_data(info->attrs[DEVLINK_ATTR_PARAM_NAME]); - return devlink_param_find_by_name(param_list, param_name); -} - -static int devlink_nl_cmd_param_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_param_item *param_item; - struct sk_buff *msg; - int err; - - param_item = devlink_param_get_from_info(&devlink->param_list, info); - if (!param_item) - return -EINVAL; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_param_fill(msg, devlink, 0, param_item, - DEVLINK_CMD_PARAM_GET, - info->snd_portid, info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink, - unsigned int port_index, - struct list_head *param_list, - struct genl_info *info, - enum devlink_command cmd) -{ - enum devlink_param_type param_type; - struct devlink_param_gset_ctx ctx; - enum devlink_param_cmode cmode; - struct devlink_param_item *param_item; - const struct devlink_param *param; - union devlink_param_value value; - int err = 0; - - param_item = devlink_param_get_from_info(param_list, info); - if (!param_item) - return -EINVAL; - param = param_item->param; - err = devlink_param_type_get_from_info(info, ¶m_type); - if (err) - return err; - if (param_type != param->type) - return -EINVAL; - err = devlink_param_value_get_from_info(param, info, &value); - if (err) - return err; - if (param->validate) { - err = param->validate(devlink, param->id, value, info->extack); - if (err) - return err; - } - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PARAM_VALUE_CMODE)) - return -EINVAL; - cmode = nla_get_u8(info->attrs[DEVLINK_ATTR_PARAM_VALUE_CMODE]); - if (!devlink_param_cmode_is_supported(param, cmode)) - return -EOPNOTSUPP; - - if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) { - if (param->type == DEVLINK_PARAM_TYPE_STRING) - strcpy(param_item->driverinit_value.vstr, value.vstr); - else - param_item->driverinit_value = value; - param_item->driverinit_value_valid = true; - } else { - if (!param->set) - return -EOPNOTSUPP; - ctx.val = value; - ctx.cmode = cmode; - err = devlink_param_set(devlink, param, &ctx); - if (err) - return err; - } - - devlink_param_notify(devlink, port_index, param_item, cmd); - return 0; -} - -static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - - return __devlink_nl_cmd_param_set_doit(devlink, 0, &devlink->param_list, - info, DEVLINK_CMD_PARAM_NEW); -} - -static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - NL_SET_ERR_MSG_MOD(cb->extack, "Port params are not supported"); - return msg->len; -} - -static int devlink_nl_cmd_port_param_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - NL_SET_ERR_MSG_MOD(info->extack, "Port params are not supported"); - return -EINVAL; -} - -static int devlink_nl_cmd_port_param_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - NL_SET_ERR_MSG_MOD(info->extack, "Port params are not supported"); - return -EINVAL; -} - -static int devlink_nl_region_snapshot_id_put(struct sk_buff *msg, - struct devlink *devlink, - struct devlink_snapshot *snapshot) -{ - struct nlattr *snap_attr; - int err; - - snap_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_REGION_SNAPSHOT); - if (!snap_attr) - return -EINVAL; - - err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, snapshot->id); - if (err) - goto nla_put_failure; - - nla_nest_end(msg, snap_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(msg, snap_attr); - return err; -} - -static int devlink_nl_region_snapshots_id_put(struct sk_buff *msg, - struct devlink *devlink, - struct devlink_region *region) -{ - struct devlink_snapshot *snapshot; - struct nlattr *snapshots_attr; - int err; - - snapshots_attr = nla_nest_start_noflag(msg, - DEVLINK_ATTR_REGION_SNAPSHOTS); - if (!snapshots_attr) - return -EINVAL; - - list_for_each_entry(snapshot, ®ion->snapshot_list, list) { - err = devlink_nl_region_snapshot_id_put(msg, devlink, snapshot); - if (err) - goto nla_put_failure; - } - - nla_nest_end(msg, snapshots_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(msg, snapshots_attr); - return err; -} - -static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags, - struct devlink_region *region) -{ - void *hdr; - int err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - err = devlink_nl_put_handle(msg, devlink); - if (err) - goto nla_put_failure; - - if (region->port) { - err = nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, - region->port->index); - if (err) - goto nla_put_failure; - } - - err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->ops->name); - if (err) - goto nla_put_failure; - - err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE, - region->size, - DEVLINK_ATTR_PAD); - if (err) - goto nla_put_failure; - - err = nla_put_u32(msg, DEVLINK_ATTR_REGION_MAX_SNAPSHOTS, - region->max_snapshots); - if (err) - goto nla_put_failure; - - err = devlink_nl_region_snapshots_id_put(msg, devlink, region); - if (err) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return err; -} - -static struct sk_buff * -devlink_nl_region_notify_build(struct devlink_region *region, - struct devlink_snapshot *snapshot, - enum devlink_command cmd, u32 portid, u32 seq) -{ - struct devlink *devlink = region->devlink; - struct sk_buff *msg; - void *hdr; - int err; - - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return ERR_PTR(-ENOMEM); - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, 0, cmd); - if (!hdr) { - err = -EMSGSIZE; - goto out_free_msg; - } - - err = devlink_nl_put_handle(msg, devlink); - if (err) - goto out_cancel_msg; - - if (region->port) { - err = nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, - region->port->index); - if (err) - goto out_cancel_msg; - } - - err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, - region->ops->name); - if (err) - goto out_cancel_msg; - - if (snapshot) { - err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, - snapshot->id); - if (err) - goto out_cancel_msg; - } else { - err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE, - region->size, DEVLINK_ATTR_PAD); - if (err) - goto out_cancel_msg; - } - genlmsg_end(msg, hdr); - - return msg; - -out_cancel_msg: - genlmsg_cancel(msg, hdr); -out_free_msg: - nlmsg_free(msg); - return ERR_PTR(err); -} - -static void devlink_nl_region_notify(struct devlink_region *region, - struct devlink_snapshot *snapshot, - enum devlink_command cmd) -{ - struct devlink *devlink = region->devlink; - struct sk_buff *msg; - - WARN_ON(cmd != DEVLINK_CMD_REGION_NEW && cmd != DEVLINK_CMD_REGION_DEL); - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = devlink_nl_region_notify_build(region, snapshot, cmd, 0, 0); - if (IS_ERR(msg)) - return; - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, - 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -/** - * __devlink_snapshot_id_increment - Increment number of snapshots using an id - * @devlink: devlink instance - * @id: the snapshot id - * - * Track when a new snapshot begins using an id. Load the count for the - * given id from the snapshot xarray, increment it, and store it back. - * - * Called when a new snapshot is created with the given id. - * - * The id *must* have been previously allocated by - * devlink_region_snapshot_id_get(). - * - * Returns 0 on success, or an error on failure. - */ -static int __devlink_snapshot_id_increment(struct devlink *devlink, u32 id) -{ - unsigned long count; - void *p; - int err; - - xa_lock(&devlink->snapshot_ids); - p = xa_load(&devlink->snapshot_ids, id); - if (WARN_ON(!p)) { - err = -EINVAL; - goto unlock; - } - - if (WARN_ON(!xa_is_value(p))) { - err = -EINVAL; - goto unlock; - } - - count = xa_to_value(p); - count++; - - err = xa_err(__xa_store(&devlink->snapshot_ids, id, xa_mk_value(count), - GFP_ATOMIC)); -unlock: - xa_unlock(&devlink->snapshot_ids); - return err; -} - -/** - * __devlink_snapshot_id_decrement - Decrease number of snapshots using an id - * @devlink: devlink instance - * @id: the snapshot id - * - * Track when a snapshot is deleted and stops using an id. Load the count - * for the given id from the snapshot xarray, decrement it, and store it - * back. - * - * If the count reaches zero, erase this id from the xarray, freeing it - * up for future re-use by devlink_region_snapshot_id_get(). - * - * Called when a snapshot using the given id is deleted, and when the - * initial allocator of the id is finished using it. - */ -static void __devlink_snapshot_id_decrement(struct devlink *devlink, u32 id) -{ - unsigned long count; - void *p; - - xa_lock(&devlink->snapshot_ids); - p = xa_load(&devlink->snapshot_ids, id); - if (WARN_ON(!p)) - goto unlock; - - if (WARN_ON(!xa_is_value(p))) - goto unlock; - - count = xa_to_value(p); - - if (count > 1) { - count--; - __xa_store(&devlink->snapshot_ids, id, xa_mk_value(count), - GFP_ATOMIC); - } else { - /* If this was the last user, we can erase this id */ - __xa_erase(&devlink->snapshot_ids, id); - } -unlock: - xa_unlock(&devlink->snapshot_ids); -} - -/** - * __devlink_snapshot_id_insert - Insert a specific snapshot ID - * @devlink: devlink instance - * @id: the snapshot id - * - * Mark the given snapshot id as used by inserting a zero value into the - * snapshot xarray. - * - * This must be called while holding the devlink instance lock. Unlike - * devlink_snapshot_id_get, the initial reference count is zero, not one. - * It is expected that the id will immediately be used before - * releasing the devlink instance lock. - * - * Returns zero on success, or an error code if the snapshot id could not - * be inserted. - */ -static int __devlink_snapshot_id_insert(struct devlink *devlink, u32 id) -{ - int err; - - xa_lock(&devlink->snapshot_ids); - if (xa_load(&devlink->snapshot_ids, id)) { - xa_unlock(&devlink->snapshot_ids); - return -EEXIST; - } - err = xa_err(__xa_store(&devlink->snapshot_ids, id, xa_mk_value(0), - GFP_ATOMIC)); - xa_unlock(&devlink->snapshot_ids); - return err; -} - -/** - * __devlink_region_snapshot_id_get - get snapshot ID - * @devlink: devlink instance - * @id: storage to return snapshot id - * - * Allocates a new snapshot id. Returns zero on success, or a negative - * error on failure. Must be called while holding the devlink instance - * lock. - * - * Snapshot IDs are tracked using an xarray which stores the number of - * users of the snapshot id. - * - * Note that the caller of this function counts as a 'user', in order to - * avoid race conditions. The caller must release its hold on the - * snapshot by using devlink_region_snapshot_id_put. - */ -static int __devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id) -{ - return xa_alloc(&devlink->snapshot_ids, id, xa_mk_value(1), - xa_limit_32b, GFP_KERNEL); -} - -/** - * __devlink_region_snapshot_create - create a new snapshot - * This will add a new snapshot of a region. The snapshot - * will be stored on the region struct and can be accessed - * from devlink. This is useful for future analyses of snapshots. - * Multiple snapshots can be created on a region. - * The @snapshot_id should be obtained using the getter function. - * - * Must be called only while holding the region snapshot lock. - * - * @region: devlink region of the snapshot - * @data: snapshot data - * @snapshot_id: snapshot id to be created - */ -static int -__devlink_region_snapshot_create(struct devlink_region *region, - u8 *data, u32 snapshot_id) -{ - struct devlink *devlink = region->devlink; - struct devlink_snapshot *snapshot; - int err; - - lockdep_assert_held(®ion->snapshot_lock); - - /* check if region can hold one more snapshot */ - if (region->cur_snapshots == region->max_snapshots) - return -ENOSPC; - - if (devlink_region_snapshot_get_by_id(region, snapshot_id)) - return -EEXIST; - - snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL); - if (!snapshot) - return -ENOMEM; - - err = __devlink_snapshot_id_increment(devlink, snapshot_id); - if (err) - goto err_snapshot_id_increment; - - snapshot->id = snapshot_id; - snapshot->region = region; - snapshot->data = data; - - list_add_tail(&snapshot->list, ®ion->snapshot_list); - - region->cur_snapshots++; - - devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW); - return 0; - -err_snapshot_id_increment: - kfree(snapshot); - return err; -} - -static void devlink_region_snapshot_del(struct devlink_region *region, - struct devlink_snapshot *snapshot) -{ - struct devlink *devlink = region->devlink; - - lockdep_assert_held(®ion->snapshot_lock); - - devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL); - region->cur_snapshots--; - list_del(&snapshot->list); - region->ops->destructor(snapshot->data); - __devlink_snapshot_id_decrement(devlink, snapshot->id); - kfree(snapshot); -} - -static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_port *port = NULL; - struct devlink_region *region; - const char *region_name; - struct sk_buff *msg; - unsigned int index; - int err; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME)) - return -EINVAL; - - if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { - index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); - - port = devlink_port_get_by_index(devlink, index); - if (!port) - return -ENODEV; - } - - region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]); - if (port) - region = devlink_port_region_get_by_name(port, region_name); - else - region = devlink_region_get_by_name(devlink, region_name); - - if (!region) - return -EINVAL; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_region_fill(msg, devlink, DEVLINK_CMD_REGION_GET, - info->snd_portid, info->snd_seq, 0, - region); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_region_get_port_dumpit(struct sk_buff *msg, - struct netlink_callback *cb, - struct devlink_port *port, - int *idx, - int start) -{ - struct devlink_region *region; - int err = 0; - - list_for_each_entry(region, &port->region_list, list) { - if (*idx < start) { - (*idx)++; - continue; - } - err = devlink_nl_region_fill(msg, port->devlink, - DEVLINK_CMD_REGION_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, region); - if (err) - goto out; - (*idx)++; - } - -out: - return err; -} - -static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, - struct netlink_callback *cb, - struct devlink *devlink, - int *idx, - int start) -{ - struct devlink_region *region; - struct devlink_port *port; - unsigned long port_index; - int err = 0; - - devl_lock(devlink); - list_for_each_entry(region, &devlink->region_list, list) { - if (*idx < start) { - (*idx)++; - continue; - } - err = devlink_nl_region_fill(msg, devlink, - DEVLINK_CMD_REGION_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, region); - if (err) - goto out; - (*idx)++; - } - - xa_for_each(&devlink->ports, port_index, port) { - err = devlink_nl_cmd_region_get_port_dumpit(msg, cb, port, idx, - start); - if (err) - goto out; - } - -out: - devl_unlock(devlink); - return err; -} - -static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, - &idx, start); - devlink_put(devlink); - if (err) - goto out; - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int devlink_nl_cmd_region_del(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_snapshot *snapshot; - struct devlink_port *port = NULL; - struct devlink_region *region; - const char *region_name; - unsigned int index; - u32 snapshot_id; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME) || - GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_SNAPSHOT_ID)) - return -EINVAL; - - region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]); - snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]); - - if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { - index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); - - port = devlink_port_get_by_index(devlink, index); - if (!port) - return -ENODEV; - } - - if (port) - region = devlink_port_region_get_by_name(port, region_name); - else - region = devlink_region_get_by_name(devlink, region_name); - - if (!region) - return -EINVAL; - - mutex_lock(®ion->snapshot_lock); - snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); - if (!snapshot) { - mutex_unlock(®ion->snapshot_lock); - return -EINVAL; - } - - devlink_region_snapshot_del(region, snapshot); - mutex_unlock(®ion->snapshot_lock); - return 0; -} - -static int -devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_snapshot *snapshot; - struct devlink_port *port = NULL; - struct nlattr *snapshot_id_attr; - struct devlink_region *region; - const char *region_name; - unsigned int index; - u32 snapshot_id; - u8 *data; - int err; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME)) { - NL_SET_ERR_MSG_MOD(info->extack, "No region name provided"); - return -EINVAL; - } - - region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]); - - if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { - index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); - - port = devlink_port_get_by_index(devlink, index); - if (!port) - return -ENODEV; - } - - if (port) - region = devlink_port_region_get_by_name(port, region_name); - else - region = devlink_region_get_by_name(devlink, region_name); - - if (!region) { - NL_SET_ERR_MSG_MOD(info->extack, "The requested region does not exist"); - return -EINVAL; - } - - if (!region->ops->snapshot) { - NL_SET_ERR_MSG_MOD(info->extack, "The requested region does not support taking an immediate snapshot"); - return -EOPNOTSUPP; - } - - mutex_lock(®ion->snapshot_lock); - - if (region->cur_snapshots == region->max_snapshots) { - NL_SET_ERR_MSG_MOD(info->extack, "The region has reached the maximum number of stored snapshots"); - err = -ENOSPC; - goto unlock; - } - - snapshot_id_attr = info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; - if (snapshot_id_attr) { - snapshot_id = nla_get_u32(snapshot_id_attr); - - if (devlink_region_snapshot_get_by_id(region, snapshot_id)) { - NL_SET_ERR_MSG_MOD(info->extack, "The requested snapshot id is already in use"); - err = -EEXIST; - goto unlock; - } - - err = __devlink_snapshot_id_insert(devlink, snapshot_id); - if (err) - goto unlock; - } else { - err = __devlink_region_snapshot_id_get(devlink, &snapshot_id); - if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "Failed to allocate a new snapshot id"); - goto unlock; - } - } - - if (port) - err = region->port_ops->snapshot(port, region->port_ops, - info->extack, &data); - else - err = region->ops->snapshot(devlink, region->ops, - info->extack, &data); - if (err) - goto err_snapshot_capture; - - err = __devlink_region_snapshot_create(region, data, snapshot_id); - if (err) - goto err_snapshot_create; - - if (!snapshot_id_attr) { - struct sk_buff *msg; - - snapshot = devlink_region_snapshot_get_by_id(region, - snapshot_id); - if (WARN_ON(!snapshot)) { - err = -EINVAL; - goto unlock; - } - - msg = devlink_nl_region_notify_build(region, snapshot, - DEVLINK_CMD_REGION_NEW, - info->snd_portid, - info->snd_seq); - err = PTR_ERR_OR_ZERO(msg); - if (err) - goto err_notify; - - err = genlmsg_reply(msg, info); - if (err) - goto err_notify; - } - - mutex_unlock(®ion->snapshot_lock); - return 0; - -err_snapshot_create: - region->ops->destructor(data); -err_snapshot_capture: - __devlink_snapshot_id_decrement(devlink, snapshot_id); - mutex_unlock(®ion->snapshot_lock); - return err; - -err_notify: - devlink_region_snapshot_del(region, snapshot); -unlock: - mutex_unlock(®ion->snapshot_lock); - return err; -} - -static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg, - u8 *chunk, u32 chunk_size, - u64 addr) -{ - struct nlattr *chunk_attr; - int err; - - chunk_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_REGION_CHUNK); - if (!chunk_attr) - return -EINVAL; - - err = nla_put(msg, DEVLINK_ATTR_REGION_CHUNK_DATA, chunk_size, chunk); - if (err) - goto nla_put_failure; - - err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_CHUNK_ADDR, addr, - DEVLINK_ATTR_PAD); - if (err) - goto nla_put_failure; - - nla_nest_end(msg, chunk_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(msg, chunk_attr); - return err; -} - -#define DEVLINK_REGION_READ_CHUNK_SIZE 256 - -typedef int devlink_chunk_fill_t(void *cb_priv, u8 *chunk, u32 chunk_size, - u64 curr_offset, - struct netlink_ext_ack *extack); - -static int -devlink_nl_region_read_fill(struct sk_buff *skb, devlink_chunk_fill_t *cb, - void *cb_priv, u64 start_offset, u64 end_offset, - u64 *new_offset, struct netlink_ext_ack *extack) -{ - u64 curr_offset = start_offset; - int err = 0; - u8 *data; - - /* Allocate and re-use a single buffer */ - data = kmalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL); - if (!data) - return -ENOMEM; - - *new_offset = start_offset; - - while (curr_offset < end_offset) { - u32 data_size; - - data_size = min_t(u32, end_offset - curr_offset, - DEVLINK_REGION_READ_CHUNK_SIZE); - - err = cb(cb_priv, data, data_size, curr_offset, extack); - if (err) - break; - - err = devlink_nl_cmd_region_read_chunk_fill(skb, data, data_size, curr_offset); - if (err) - break; - - curr_offset += data_size; - } - *new_offset = curr_offset; - - kfree(data); - - return err; -} - -static int -devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size, - u64 curr_offset, - struct netlink_ext_ack __always_unused *extack) -{ - struct devlink_snapshot *snapshot = cb_priv; - - memcpy(chunk, &snapshot->data[curr_offset], chunk_size); - - return 0; -} - -static int -devlink_region_port_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, - u64 curr_offset, struct netlink_ext_ack *extack) -{ - struct devlink_region *region = cb_priv; - - return region->port_ops->read(region->port, region->port_ops, extack, - curr_offset, chunk_size, chunk); -} - -static int -devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, - u64 curr_offset, struct netlink_ext_ack *extack) -{ - struct devlink_region *region = cb_priv; - - return region->ops->read(region->devlink, region->ops, extack, - curr_offset, chunk_size, chunk); -} - -static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, - struct netlink_callback *cb) -{ - const struct genl_dumpit_info *info = genl_dumpit_info(cb); - struct nlattr *chunks_attr, *region_attr, *snapshot_attr; - u64 ret_offset, start_offset, end_offset = U64_MAX; - struct nlattr **attrs = info->attrs; - struct devlink_port *port = NULL; - devlink_chunk_fill_t *region_cb; - struct devlink_region *region; - const char *region_name; - struct devlink *devlink; - unsigned int index; - void *region_cb_priv; - void *hdr; - int err; - - start_offset = *((u64 *)&cb->args[0]); - - devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); - if (IS_ERR(devlink)) - return PTR_ERR(devlink); - - devl_lock(devlink); - - if (!attrs[DEVLINK_ATTR_REGION_NAME]) { - NL_SET_ERR_MSG(cb->extack, "No region name provided"); - err = -EINVAL; - goto out_unlock; - } - - if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { - index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); - - port = devlink_port_get_by_index(devlink, index); - if (!port) { - err = -ENODEV; - goto out_unlock; - } - } - - region_attr = attrs[DEVLINK_ATTR_REGION_NAME]; - region_name = nla_data(region_attr); - - if (port) - region = devlink_port_region_get_by_name(port, region_name); - else - region = devlink_region_get_by_name(devlink, region_name); - - if (!region) { - NL_SET_ERR_MSG_ATTR(cb->extack, region_attr, "Requested region does not exist"); - err = -EINVAL; - goto out_unlock; - } - - snapshot_attr = attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; - if (!snapshot_attr) { - if (!nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) { - NL_SET_ERR_MSG(cb->extack, "No snapshot id provided"); - err = -EINVAL; - goto out_unlock; - } - - if (!region->ops->read) { - NL_SET_ERR_MSG(cb->extack, "Requested region does not support direct read"); - err = -EOPNOTSUPP; - goto out_unlock; - } - - if (port) - region_cb = &devlink_region_port_direct_fill; - else - region_cb = &devlink_region_direct_fill; - region_cb_priv = region; - } else { - struct devlink_snapshot *snapshot; - u32 snapshot_id; - - if (nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) { - NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Direct region read does not use snapshot"); - err = -EINVAL; - goto out_unlock; - } - - snapshot_id = nla_get_u32(snapshot_attr); - snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); - if (!snapshot) { - NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Requested snapshot does not exist"); - err = -EINVAL; - goto out_unlock; - } - region_cb = &devlink_region_snapshot_fill; - region_cb_priv = snapshot; - } - - if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] && - attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]) { - if (!start_offset) - start_offset = - nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR]); - - end_offset = nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR]); - end_offset += nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]); - } - - if (end_offset > region->size) - end_offset = region->size; - - /* return 0 if there is no further data to read */ - if (start_offset == end_offset) { - err = 0; - goto out_unlock; - } - - hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, - &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI, - DEVLINK_CMD_REGION_READ); - if (!hdr) { - err = -EMSGSIZE; - goto out_unlock; - } - - err = devlink_nl_put_handle(skb, devlink); - if (err) - goto nla_put_failure; - - if (region->port) { - err = nla_put_u32(skb, DEVLINK_ATTR_PORT_INDEX, - region->port->index); - if (err) - goto nla_put_failure; - } - - err = nla_put_string(skb, DEVLINK_ATTR_REGION_NAME, region_name); - if (err) - goto nla_put_failure; - - chunks_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_REGION_CHUNKS); - if (!chunks_attr) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv, - start_offset, end_offset, &ret_offset, - cb->extack); - - if (err && err != -EMSGSIZE) - goto nla_put_failure; - - /* Check if there was any progress done to prevent infinite loop */ - if (ret_offset == start_offset) { - err = -EINVAL; - goto nla_put_failure; - } - - *((u64 *)&cb->args[0]) = ret_offset; - - nla_nest_end(skb, chunks_attr); - genlmsg_end(skb, hdr); - devl_unlock(devlink); - devlink_put(devlink); - return skb->len; - -nla_put_failure: - genlmsg_cancel(skb, hdr); -out_unlock: - devl_unlock(devlink); - devlink_put(devlink); - return err; -} - -int devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn) -{ - if (!req->msg) - return 0; - return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn); -} -EXPORT_SYMBOL_GPL(devlink_info_serial_number_put); - -int devlink_info_board_serial_number_put(struct devlink_info_req *req, - const char *bsn) -{ - if (!req->msg) - return 0; - return nla_put_string(req->msg, DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, - bsn); -} -EXPORT_SYMBOL_GPL(devlink_info_board_serial_number_put); - -static int devlink_info_version_put(struct devlink_info_req *req, int attr, - const char *version_name, - const char *version_value, - enum devlink_info_version_type version_type) -{ - struct nlattr *nest; - int err; - - if (req->version_cb) - req->version_cb(version_name, version_type, - req->version_cb_priv); - - if (!req->msg) - return 0; - - nest = nla_nest_start_noflag(req->msg, attr); - if (!nest) - return -EMSGSIZE; - - err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_NAME, - version_name); - if (err) - goto nla_put_failure; - - err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_VALUE, - version_value); - if (err) - goto nla_put_failure; - - nla_nest_end(req->msg, nest); - - return 0; - -nla_put_failure: - nla_nest_cancel(req->msg, nest); - return err; -} - -int devlink_info_version_fixed_put(struct devlink_info_req *req, - const char *version_name, - const char *version_value) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_FIXED, - version_name, version_value, - DEVLINK_INFO_VERSION_TYPE_NONE); -} -EXPORT_SYMBOL_GPL(devlink_info_version_fixed_put); - -int devlink_info_version_stored_put(struct devlink_info_req *req, - const char *version_name, - const char *version_value) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, - version_name, version_value, - DEVLINK_INFO_VERSION_TYPE_NONE); -} -EXPORT_SYMBOL_GPL(devlink_info_version_stored_put); - -int devlink_info_version_stored_put_ext(struct devlink_info_req *req, - const char *version_name, - const char *version_value, - enum devlink_info_version_type version_type) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, - version_name, version_value, - version_type); -} -EXPORT_SYMBOL_GPL(devlink_info_version_stored_put_ext); - -int devlink_info_version_running_put(struct devlink_info_req *req, - const char *version_name, - const char *version_value) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, - version_name, version_value, - DEVLINK_INFO_VERSION_TYPE_NONE); -} -EXPORT_SYMBOL_GPL(devlink_info_version_running_put); - -int devlink_info_version_running_put_ext(struct devlink_info_req *req, - const char *version_name, - const char *version_value, - enum devlink_info_version_type version_type) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, - version_name, version_value, - version_type); -} -EXPORT_SYMBOL_GPL(devlink_info_version_running_put_ext); - -static int devlink_nl_driver_info_get(struct device_driver *drv, - struct devlink_info_req *req) -{ - if (!drv) - return 0; - - if (drv->name[0]) - return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRIVER_NAME, - drv->name); - - return 0; -} - -static int -devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags, struct netlink_ext_ack *extack) -{ - struct device *dev = devlink_to_dev(devlink); - struct devlink_info_req req = {}; - void *hdr; - int err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - err = -EMSGSIZE; - if (devlink_nl_put_handle(msg, devlink)) - goto err_cancel_msg; - - req.msg = msg; - if (devlink->ops->info_get) { - err = devlink->ops->info_get(devlink, &req, extack); - if (err) - goto err_cancel_msg; - } - - err = devlink_nl_driver_info_get(dev->driver, &req); - if (err) - goto err_cancel_msg; - - genlmsg_end(msg, hdr); - return 0; - -err_cancel_msg: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, - info->snd_portid, info->snd_seq, 0, - info->extack); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err = 0; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < start) - goto inc; - - devl_lock(devlink); - err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->extack); - devl_unlock(devlink); - if (err == -EOPNOTSUPP) - err = 0; - else if (err) { - devlink_put(devlink); - break; - } -inc: - idx++; - devlink_put(devlink); - } - - if (err != -EMSGSIZE) - return err; - - cb->args[0] = idx; - return msg->len; -} - -struct devlink_fmsg_item { - struct list_head list; - int attrtype; - u8 nla_type; - u16 len; - int value[]; -}; - -struct devlink_fmsg { - struct list_head item_list; - bool putting_binary; /* This flag forces enclosing of binary data - * in an array brackets. It forces using - * of designated API: - * devlink_fmsg_binary_pair_nest_start() - * devlink_fmsg_binary_pair_nest_end() - */ -}; - -static struct devlink_fmsg *devlink_fmsg_alloc(void) -{ - struct devlink_fmsg *fmsg; - - fmsg = kzalloc(sizeof(*fmsg), GFP_KERNEL); - if (!fmsg) - return NULL; - - INIT_LIST_HEAD(&fmsg->item_list); - - return fmsg; -} - -static void devlink_fmsg_free(struct devlink_fmsg *fmsg) -{ - struct devlink_fmsg_item *item, *tmp; - - list_for_each_entry_safe(item, tmp, &fmsg->item_list, list) { - list_del(&item->list); - kfree(item); - } - kfree(fmsg); -} - -static int devlink_fmsg_nest_common(struct devlink_fmsg *fmsg, - int attrtype) -{ - struct devlink_fmsg_item *item; - - item = kzalloc(sizeof(*item), GFP_KERNEL); - if (!item) - return -ENOMEM; - - item->attrtype = attrtype; - list_add_tail(&item->list, &fmsg->item_list); - - return 0; -} - -int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_OBJ_NEST_START); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_start); - -static int devlink_fmsg_nest_end(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_NEST_END); -} - -int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_end(fmsg); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_end); - -#define DEVLINK_FMSG_MAX_SIZE (GENLMSG_DEFAULT_SIZE - GENL_HDRLEN - NLA_HDRLEN) - -static int devlink_fmsg_put_name(struct devlink_fmsg *fmsg, const char *name) -{ - struct devlink_fmsg_item *item; - - if (fmsg->putting_binary) - return -EINVAL; - - if (strlen(name) + 1 > DEVLINK_FMSG_MAX_SIZE) - return -EMSGSIZE; - - item = kzalloc(sizeof(*item) + strlen(name) + 1, GFP_KERNEL); - if (!item) - return -ENOMEM; - - item->nla_type = NLA_NUL_STRING; - item->len = strlen(name) + 1; - item->attrtype = DEVLINK_ATTR_FMSG_OBJ_NAME; - memcpy(&item->value, name, item->len); - list_add_tail(&item->list, &fmsg->item_list); - - return 0; -} - -int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name) -{ - int err; - - if (fmsg->putting_binary) - return -EINVAL; - - err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_PAIR_NEST_START); - if (err) - return err; - - err = devlink_fmsg_put_name(fmsg, name); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_start); - -int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_end(fmsg); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_end); - -int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg, - const char *name) -{ - int err; - - if (fmsg->putting_binary) - return -EINVAL; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_ARR_NEST_START); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_start); - -int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg) -{ - int err; - - if (fmsg->putting_binary) - return -EINVAL; - - err = devlink_fmsg_nest_end(fmsg); - if (err) - return err; - - err = devlink_fmsg_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_end); - -int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg *fmsg, - const char *name) -{ - int err; - - err = devlink_fmsg_arr_pair_nest_start(fmsg, name); - if (err) - return err; - - fmsg->putting_binary = true; - return err; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_start); - -int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg *fmsg) -{ - if (!fmsg->putting_binary) - return -EINVAL; - - fmsg->putting_binary = false; - return devlink_fmsg_arr_pair_nest_end(fmsg); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_end); - -static int devlink_fmsg_put_value(struct devlink_fmsg *fmsg, - const void *value, u16 value_len, - u8 value_nla_type) -{ - struct devlink_fmsg_item *item; - - if (value_len > DEVLINK_FMSG_MAX_SIZE) - return -EMSGSIZE; - - item = kzalloc(sizeof(*item) + value_len, GFP_KERNEL); - if (!item) - return -ENOMEM; - - item->nla_type = value_nla_type; - item->len = value_len; - item->attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; - memcpy(&item->value, value, item->len); - list_add_tail(&item->list, &fmsg->item_list); - - return 0; -} - -static int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_FLAG); -} - -static int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U8); -} - -int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U32); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u32_put); - -static int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U64); -} - -int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, value, strlen(value) + 1, - NLA_NUL_STRING); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_string_put); - -int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value, - u16 value_len) -{ - if (!fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, value, value_len, NLA_BINARY); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_put); - -int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name, - bool value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_bool_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_bool_pair_put); - -int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name, - u8 value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_u8_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u8_pair_put); - -int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name, - u32 value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_u32_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u32_pair_put); - -int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name, - u64 value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_u64_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u64_pair_put); - -int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name, - const char *value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_string_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_string_pair_put); - -int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name, - const void *value, u32 value_len) -{ - u32 data_size; - int end_err; - u32 offset; - int err; - - err = devlink_fmsg_binary_pair_nest_start(fmsg, name); - if (err) - return err; - - for (offset = 0; offset < value_len; offset += data_size) { - data_size = value_len - offset; - if (data_size > DEVLINK_FMSG_MAX_SIZE) - data_size = DEVLINK_FMSG_MAX_SIZE; - err = devlink_fmsg_binary_put(fmsg, value + offset, data_size); - if (err) - break; - /* Exit from loop with a break (instead of - * return) to make sure putting_binary is turned off in - * devlink_fmsg_binary_pair_nest_end - */ - } - - end_err = devlink_fmsg_binary_pair_nest_end(fmsg); - if (end_err) - err = end_err; - - return err; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_put); - -static int -devlink_fmsg_item_fill_type(struct devlink_fmsg_item *msg, struct sk_buff *skb) -{ - switch (msg->nla_type) { - case NLA_FLAG: - case NLA_U8: - case NLA_U32: - case NLA_U64: - case NLA_NUL_STRING: - case NLA_BINARY: - return nla_put_u8(skb, DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, - msg->nla_type); - default: - return -EINVAL; - } -} - -static int -devlink_fmsg_item_fill_data(struct devlink_fmsg_item *msg, struct sk_buff *skb) -{ - int attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; - u8 tmp; - - switch (msg->nla_type) { - case NLA_FLAG: - /* Always provide flag data, regardless of its value */ - tmp = *(bool *) msg->value; - - return nla_put_u8(skb, attrtype, tmp); - case NLA_U8: - return nla_put_u8(skb, attrtype, *(u8 *) msg->value); - case NLA_U32: - return nla_put_u32(skb, attrtype, *(u32 *) msg->value); - case NLA_U64: - return nla_put_u64_64bit(skb, attrtype, *(u64 *) msg->value, - DEVLINK_ATTR_PAD); - case NLA_NUL_STRING: - return nla_put_string(skb, attrtype, (char *) &msg->value); - case NLA_BINARY: - return nla_put(skb, attrtype, msg->len, (void *) &msg->value); - default: - return -EINVAL; - } -} - -static int -devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb, - int *start) -{ - struct devlink_fmsg_item *item; - struct nlattr *fmsg_nlattr; - int i = 0; - int err; - - fmsg_nlattr = nla_nest_start_noflag(skb, DEVLINK_ATTR_FMSG); - if (!fmsg_nlattr) - return -EMSGSIZE; - - list_for_each_entry(item, &fmsg->item_list, list) { - if (i < *start) { - i++; - continue; - } - - switch (item->attrtype) { - case DEVLINK_ATTR_FMSG_OBJ_NEST_START: - case DEVLINK_ATTR_FMSG_PAIR_NEST_START: - case DEVLINK_ATTR_FMSG_ARR_NEST_START: - case DEVLINK_ATTR_FMSG_NEST_END: - err = nla_put_flag(skb, item->attrtype); - break; - case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA: - err = devlink_fmsg_item_fill_type(item, skb); - if (err) - break; - err = devlink_fmsg_item_fill_data(item, skb); - break; - case DEVLINK_ATTR_FMSG_OBJ_NAME: - err = nla_put_string(skb, item->attrtype, - (char *) &item->value); - break; - default: - err = -EINVAL; - break; - } - if (!err) - *start = ++i; - else - break; - } - - nla_nest_end(skb, fmsg_nlattr); - return err; -} - -static int devlink_fmsg_snd(struct devlink_fmsg *fmsg, - struct genl_info *info, - enum devlink_command cmd, int flags) -{ - struct nlmsghdr *nlh; - struct sk_buff *skb; - bool last = false; - int index = 0; - void *hdr; - int err; - - while (!last) { - int tmp_index = index; - - skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!skb) - return -ENOMEM; - - hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, - &devlink_nl_family, flags | NLM_F_MULTI, cmd); - if (!hdr) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - err = devlink_fmsg_prepare_skb(fmsg, skb, &index); - if (!err) - last = true; - else if (err != -EMSGSIZE || tmp_index == index) - goto nla_put_failure; - - genlmsg_end(skb, hdr); - err = genlmsg_reply(skb, info); - if (err) - return err; - } - - skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!skb) - return -ENOMEM; - nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, - NLMSG_DONE, 0, flags | NLM_F_MULTI); - if (!nlh) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - return genlmsg_reply(skb, info); - -nla_put_failure: - nlmsg_free(skb); - return err; -} - -static int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, - struct netlink_callback *cb, - enum devlink_command cmd) -{ - int index = cb->args[0]; - int tmp_index = index; - void *hdr; - int err; - - hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, - &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI, cmd); - if (!hdr) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - err = devlink_fmsg_prepare_skb(fmsg, skb, &index); - if ((err && err != -EMSGSIZE) || tmp_index == index) - goto nla_put_failure; - - cb->args[0] = index; - genlmsg_end(skb, hdr); - return skb->len; - -nla_put_failure: - genlmsg_cancel(skb, hdr); - return err; -} - -struct devlink_health_reporter { - struct list_head list; - void *priv; - const struct devlink_health_reporter_ops *ops; - struct devlink *devlink; - struct devlink_port *devlink_port; - struct devlink_fmsg *dump_fmsg; - struct mutex dump_lock; /* lock parallel read/write from dump buffers */ - u64 graceful_period; - bool auto_recover; - bool auto_dump; - u8 health_state; - u64 dump_ts; - u64 dump_real_ts; - u64 error_count; - u64 recovery_count; - u64 last_recovery_ts; - refcount_t refcount; -}; - -void * -devlink_health_reporter_priv(struct devlink_health_reporter *reporter) -{ - return reporter->priv; -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_priv); - -static struct devlink_health_reporter * -__devlink_health_reporter_find_by_name(struct list_head *reporter_list, - struct mutex *list_lock, - const char *reporter_name) -{ - struct devlink_health_reporter *reporter; - - lockdep_assert_held(list_lock); - list_for_each_entry(reporter, reporter_list, list) - if (!strcmp(reporter->ops->name, reporter_name)) - return reporter; - return NULL; -} - -static struct devlink_health_reporter * -devlink_health_reporter_find_by_name(struct devlink *devlink, - const char *reporter_name) -{ - return __devlink_health_reporter_find_by_name(&devlink->reporter_list, - &devlink->reporters_lock, - reporter_name); -} - -static struct devlink_health_reporter * -devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, - const char *reporter_name) -{ - return __devlink_health_reporter_find_by_name(&devlink_port->reporter_list, - &devlink_port->reporters_lock, - reporter_name); -} - -static struct devlink_health_reporter * -__devlink_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - if (WARN_ON(graceful_period && !ops->recover)) - return ERR_PTR(-EINVAL); - - reporter = kzalloc(sizeof(*reporter), GFP_KERNEL); - if (!reporter) - return ERR_PTR(-ENOMEM); - - reporter->priv = priv; - reporter->ops = ops; - reporter->devlink = devlink; - reporter->graceful_period = graceful_period; - reporter->auto_recover = !!ops->recover; - reporter->auto_dump = !!ops->dump; - mutex_init(&reporter->dump_lock); - refcount_set(&reporter->refcount, 1); - return reporter; -} - -/** - * devlink_port_health_reporter_create - create devlink health reporter for - * specified port instance - * - * @port: devlink_port which should contain the new reporter - * @ops: ops - * @graceful_period: to avoid recovery loops, in msecs - * @priv: priv - */ -struct devlink_health_reporter * -devlink_port_health_reporter_create(struct devlink_port *port, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - mutex_lock(&port->reporters_lock); - if (__devlink_health_reporter_find_by_name(&port->reporter_list, - &port->reporters_lock, ops->name)) { - reporter = ERR_PTR(-EEXIST); - goto unlock; - } - - reporter = __devlink_health_reporter_create(port->devlink, ops, - graceful_period, priv); - if (IS_ERR(reporter)) - goto unlock; - - reporter->devlink_port = port; - list_add_tail(&reporter->list, &port->reporter_list); -unlock: - mutex_unlock(&port->reporters_lock); - return reporter; -} -EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create); - -/** - * devlink_health_reporter_create - create devlink health reporter - * - * @devlink: devlink - * @ops: ops - * @graceful_period: to avoid recovery loops, in msecs - * @priv: priv - */ -struct devlink_health_reporter * -devlink_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - mutex_lock(&devlink->reporters_lock); - if (devlink_health_reporter_find_by_name(devlink, ops->name)) { - reporter = ERR_PTR(-EEXIST); - goto unlock; - } - - reporter = __devlink_health_reporter_create(devlink, ops, - graceful_period, priv); - if (IS_ERR(reporter)) - goto unlock; - - list_add_tail(&reporter->list, &devlink->reporter_list); -unlock: - mutex_unlock(&devlink->reporters_lock); - return reporter; -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_create); - -static void -devlink_health_reporter_free(struct devlink_health_reporter *reporter) -{ - mutex_destroy(&reporter->dump_lock); - if (reporter->dump_fmsg) - devlink_fmsg_free(reporter->dump_fmsg); - kfree(reporter); -} - -static void -devlink_health_reporter_put(struct devlink_health_reporter *reporter) -{ - if (refcount_dec_and_test(&reporter->refcount)) - devlink_health_reporter_free(reporter); -} - -static void -__devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - list_del(&reporter->list); - devlink_health_reporter_put(reporter); -} - -/** - * devlink_health_reporter_destroy - destroy devlink health reporter - * - * @reporter: devlink health reporter to destroy - */ -void -devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - struct mutex *lock = &reporter->devlink->reporters_lock; - - mutex_lock(lock); - __devlink_health_reporter_destroy(reporter); - mutex_unlock(lock); -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); - -/** - * devlink_port_health_reporter_destroy - destroy devlink port health reporter - * - * @reporter: devlink health reporter to destroy - */ -void -devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - struct mutex *lock = &reporter->devlink_port->reporters_lock; - - mutex_lock(lock); - __devlink_health_reporter_destroy(reporter); - mutex_unlock(lock); -} -EXPORT_SYMBOL_GPL(devlink_port_health_reporter_destroy); - -static int -devlink_nl_health_reporter_fill(struct sk_buff *msg, - struct devlink_health_reporter *reporter, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - struct devlink *devlink = reporter->devlink; - struct nlattr *reporter_attr; - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto genlmsg_cancel; - - if (reporter->devlink_port) { - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, reporter->devlink_port->index)) - goto genlmsg_cancel; - } - reporter_attr = nla_nest_start_noflag(msg, - DEVLINK_ATTR_HEALTH_REPORTER); - if (!reporter_attr) - goto genlmsg_cancel; - if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME, - reporter->ops->name)) - goto reporter_nest_cancel; - if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE, - reporter->health_state)) - goto reporter_nest_cancel; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR_COUNT, - reporter->error_count, DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT, - reporter->recovery_count, DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->ops->recover && - nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, - reporter->graceful_period, - DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->ops->recover && - nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, - reporter->auto_recover)) - goto reporter_nest_cancel; - if (reporter->dump_fmsg && - nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, - jiffies_to_msecs(reporter->dump_ts), - DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->dump_fmsg && - nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS, - reporter->dump_real_ts, DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->ops->dump && - nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, - reporter->auto_dump)) - goto reporter_nest_cancel; - - nla_nest_end(msg, reporter_attr); - genlmsg_end(msg, hdr); - return 0; - -reporter_nest_cancel: - nla_nest_end(msg, reporter_attr); -genlmsg_cancel: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void devlink_recover_notify(struct devlink_health_reporter *reporter, - enum devlink_command cmd) -{ - struct devlink *devlink = reporter->devlink; - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_HEALTH_REPORTER_RECOVER); - WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_health_reporter_fill(msg, reporter, cmd, 0, 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, - 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -void -devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter) -{ - reporter->recovery_count++; - reporter->last_recovery_ts = jiffies; -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done); - -static int -devlink_health_reporter_recover(struct devlink_health_reporter *reporter, - void *priv_ctx, struct netlink_ext_ack *extack) -{ - int err; - - if (reporter->health_state == DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) - return 0; - - if (!reporter->ops->recover) - return -EOPNOTSUPP; - - err = reporter->ops->recover(reporter, priv_ctx, extack); - if (err) - return err; - - devlink_health_reporter_recovery_done(reporter); - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; - devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); - - return 0; -} - -static void -devlink_health_dump_clear(struct devlink_health_reporter *reporter) -{ - if (!reporter->dump_fmsg) - return; - devlink_fmsg_free(reporter->dump_fmsg); - reporter->dump_fmsg = NULL; -} - -static int devlink_health_do_dump(struct devlink_health_reporter *reporter, - void *priv_ctx, - struct netlink_ext_ack *extack) -{ - int err; - - if (!reporter->ops->dump) - return 0; - - if (reporter->dump_fmsg) - return 0; - - reporter->dump_fmsg = devlink_fmsg_alloc(); - if (!reporter->dump_fmsg) { - err = -ENOMEM; - return err; - } - - err = devlink_fmsg_obj_nest_start(reporter->dump_fmsg); - if (err) - goto dump_err; - - err = reporter->ops->dump(reporter, reporter->dump_fmsg, - priv_ctx, extack); - if (err) - goto dump_err; - - err = devlink_fmsg_obj_nest_end(reporter->dump_fmsg); - if (err) - goto dump_err; - - reporter->dump_ts = jiffies; - reporter->dump_real_ts = ktime_get_real_ns(); - - return 0; - -dump_err: - devlink_health_dump_clear(reporter); - return err; -} - -int devlink_health_report(struct devlink_health_reporter *reporter, - const char *msg, void *priv_ctx) -{ - enum devlink_health_reporter_state prev_health_state; - struct devlink *devlink = reporter->devlink; - unsigned long recover_ts_threshold; - int ret; - - /* write a log message of the current error */ - WARN_ON(!msg); - trace_devlink_health_report(devlink, reporter->ops->name, msg); - reporter->error_count++; - prev_health_state = reporter->health_state; - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; - devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); - - /* abort if the previous error wasn't recovered */ - recover_ts_threshold = reporter->last_recovery_ts + - msecs_to_jiffies(reporter->graceful_period); - if (reporter->auto_recover && - (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || - (reporter->last_recovery_ts && reporter->recovery_count && - time_is_after_jiffies(recover_ts_threshold)))) { - trace_devlink_health_recover_aborted(devlink, - reporter->ops->name, - reporter->health_state, - jiffies - - reporter->last_recovery_ts); - return -ECANCELED; - } - - if (reporter->auto_dump) { - mutex_lock(&reporter->dump_lock); - /* store current dump of current error, for later analysis */ - devlink_health_do_dump(reporter, priv_ctx, NULL); - mutex_unlock(&reporter->dump_lock); - } - - if (!reporter->auto_recover) - return 0; - - devl_lock(devlink); - ret = devlink_health_reporter_recover(reporter, priv_ctx, NULL); - devl_unlock(devlink); - - return ret; -} -EXPORT_SYMBOL_GPL(devlink_health_report); - -static struct devlink_health_reporter * -devlink_health_reporter_get_from_attrs(struct devlink *devlink, - struct nlattr **attrs) -{ - struct devlink_health_reporter *reporter; - struct devlink_port *devlink_port; - char *reporter_name; - - if (!attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]) - return NULL; - - reporter_name = nla_data(attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]); - devlink_port = devlink_port_get_from_attrs(devlink, attrs); - if (IS_ERR(devlink_port)) { - mutex_lock(&devlink->reporters_lock); - reporter = devlink_health_reporter_find_by_name(devlink, reporter_name); - if (reporter) - refcount_inc(&reporter->refcount); - mutex_unlock(&devlink->reporters_lock); - } else { - mutex_lock(&devlink_port->reporters_lock); - reporter = devlink_port_health_reporter_find_by_name(devlink_port, reporter_name); - if (reporter) - refcount_inc(&reporter->refcount); - mutex_unlock(&devlink_port->reporters_lock); - } - - return reporter; -} - -static struct devlink_health_reporter * -devlink_health_reporter_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - return devlink_health_reporter_get_from_attrs(devlink, info->attrs); -} - -static struct devlink_health_reporter * -devlink_health_reporter_get_from_cb(struct netlink_callback *cb) -{ - const struct genl_dumpit_info *info = genl_dumpit_info(cb); - struct devlink_health_reporter *reporter; - struct nlattr **attrs = info->attrs; - struct devlink *devlink; - - devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); - if (IS_ERR(devlink)) - return NULL; - - reporter = devlink_health_reporter_get_from_attrs(devlink, attrs); - devlink_put(devlink); - return reporter; -} - -void -devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, - enum devlink_health_reporter_state state) -{ - if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY && - state != DEVLINK_HEALTH_REPORTER_STATE_ERROR)) - return; - - if (reporter->health_state == state) - return; - - reporter->health_state = state; - trace_devlink_health_reporter_state_update(reporter->devlink, - reporter->ops->name, state); - devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update); - -static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - struct sk_buff *msg; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) { - err = -ENOMEM; - goto out; - } - - err = devlink_nl_health_reporter_fill(msg, reporter, - DEVLINK_CMD_HEALTH_REPORTER_GET, - info->snd_portid, info->snd_seq, - 0); - if (err) { - nlmsg_free(msg); - goto out; - } - - err = genlmsg_reply(msg, info); -out: - devlink_health_reporter_put(reporter); - return err; -} - -static int -devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink_health_reporter *reporter; - unsigned long index, port_index; - struct devlink_port *port; - struct devlink *devlink; - int start = cb->args[0]; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - mutex_lock(&devlink->reporters_lock); - list_for_each_entry(reporter, &devlink->reporter_list, - list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_health_reporter_fill( - msg, reporter, DEVLINK_CMD_HEALTH_REPORTER_GET, - NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - mutex_unlock(&devlink->reporters_lock); - devlink_put(devlink); - goto out; - } - idx++; - } - mutex_unlock(&devlink->reporters_lock); - devlink_put(devlink); - } - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - xa_for_each(&devlink->ports, port_index, port) { - mutex_lock(&port->reporters_lock); - list_for_each_entry(reporter, &port->reporter_list, list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_health_reporter_fill( - msg, reporter, - DEVLINK_CMD_HEALTH_REPORTER_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI); - if (err) { - mutex_unlock(&port->reporters_lock); - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - mutex_unlock(&port->reporters_lock); - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int -devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->recover && - (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) { - err = -EOPNOTSUPP; - goto out; - } - if (!reporter->ops->dump && - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) { - err = -EOPNOTSUPP; - goto out; - } - - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) - reporter->graceful_period = - nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]); - - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]) - reporter->auto_recover = - nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]); - - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) - reporter->auto_dump = - nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]); - - devlink_health_reporter_put(reporter); - return 0; -out: - devlink_health_reporter_put(reporter); - return err; -} - -static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - err = devlink_health_reporter_recover(reporter, NULL, info->extack); - - devlink_health_reporter_put(reporter); - return err; -} - -static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - struct devlink_fmsg *fmsg; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->diagnose) { - devlink_health_reporter_put(reporter); - return -EOPNOTSUPP; - } - - fmsg = devlink_fmsg_alloc(); - if (!fmsg) { - devlink_health_reporter_put(reporter); - return -ENOMEM; - } - - err = devlink_fmsg_obj_nest_start(fmsg); - if (err) - goto out; - - err = reporter->ops->diagnose(reporter, fmsg, info->extack); - if (err) - goto out; - - err = devlink_fmsg_obj_nest_end(fmsg); - if (err) - goto out; - - err = devlink_fmsg_snd(fmsg, info, - DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); - -out: - devlink_fmsg_free(fmsg); - devlink_health_reporter_put(reporter); - return err; -} - -static int -devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, - struct netlink_callback *cb) -{ - struct devlink_health_reporter *reporter; - u64 start = cb->args[0]; - int err; - - reporter = devlink_health_reporter_get_from_cb(cb); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->dump) { - err = -EOPNOTSUPP; - goto out; - } - mutex_lock(&reporter->dump_lock); - if (!start) { - err = devlink_health_do_dump(reporter, NULL, cb->extack); - if (err) - goto unlock; - cb->args[1] = reporter->dump_ts; - } - if (!reporter->dump_fmsg || cb->args[1] != reporter->dump_ts) { - NL_SET_ERR_MSG_MOD(cb->extack, "Dump trampled, please retry"); - err = -EAGAIN; - goto unlock; - } - - err = devlink_fmsg_dumpit(reporter->dump_fmsg, skb, cb, - DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET); -unlock: - mutex_unlock(&reporter->dump_lock); -out: - devlink_health_reporter_put(reporter); - return err; -} - -static int -devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->dump) { - devlink_health_reporter_put(reporter); - return -EOPNOTSUPP; - } - - mutex_lock(&reporter->dump_lock); - devlink_health_dump_clear(reporter); - mutex_unlock(&reporter->dump_lock); - devlink_health_reporter_put(reporter); - return 0; -} - -static int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->test) { - devlink_health_reporter_put(reporter); - return -EOPNOTSUPP; - } - - err = reporter->ops->test(reporter, info->extack); - - devlink_health_reporter_put(reporter); - return err; -} - -struct devlink_stats { - u64_stats_t rx_bytes; - u64_stats_t rx_packets; - struct u64_stats_sync syncp; -}; - -/** - * struct devlink_trap_policer_item - Packet trap policer attributes. - * @policer: Immutable packet trap policer attributes. - * @rate: Rate in packets / sec. - * @burst: Burst size in packets. - * @list: trap_policer_list member. - * - * Describes packet trap policer attributes. Created by devlink during trap - * policer registration. - */ -struct devlink_trap_policer_item { - const struct devlink_trap_policer *policer; - u64 rate; - u64 burst; - struct list_head list; -}; - -/** - * struct devlink_trap_group_item - Packet trap group attributes. - * @group: Immutable packet trap group attributes. - * @policer_item: Associated policer item. Can be NULL. - * @list: trap_group_list member. - * @stats: Trap group statistics. - * - * Describes packet trap group attributes. Created by devlink during trap - * group registration. - */ -struct devlink_trap_group_item { - const struct devlink_trap_group *group; - struct devlink_trap_policer_item *policer_item; - struct list_head list; - struct devlink_stats __percpu *stats; -}; - -/** - * struct devlink_trap_item - Packet trap attributes. - * @trap: Immutable packet trap attributes. - * @group_item: Associated group item. - * @list: trap_list member. - * @action: Trap action. - * @stats: Trap statistics. - * @priv: Driver private information. - * - * Describes both mutable and immutable packet trap attributes. Created by - * devlink during trap registration and used for all trap related operations. - */ -struct devlink_trap_item { - const struct devlink_trap *trap; - struct devlink_trap_group_item *group_item; - struct list_head list; - enum devlink_trap_action action; - struct devlink_stats __percpu *stats; - void *priv; -}; - -static struct devlink_trap_policer_item * -devlink_trap_policer_item_lookup(struct devlink *devlink, u32 id) -{ - struct devlink_trap_policer_item *policer_item; - - list_for_each_entry(policer_item, &devlink->trap_policer_list, list) { - if (policer_item->policer->id == id) - return policer_item; - } - - return NULL; -} - -static struct devlink_trap_item * -devlink_trap_item_lookup(struct devlink *devlink, const char *name) -{ - struct devlink_trap_item *trap_item; - - list_for_each_entry(trap_item, &devlink->trap_list, list) { - if (!strcmp(trap_item->trap->name, name)) - return trap_item; - } - - return NULL; -} - -static struct devlink_trap_item * -devlink_trap_item_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - struct nlattr *attr; - - if (!info->attrs[DEVLINK_ATTR_TRAP_NAME]) - return NULL; - attr = info->attrs[DEVLINK_ATTR_TRAP_NAME]; - - return devlink_trap_item_lookup(devlink, nla_data(attr)); -} - -static int -devlink_trap_action_get_from_info(struct genl_info *info, - enum devlink_trap_action *p_trap_action) -{ - u8 val; - - val = nla_get_u8(info->attrs[DEVLINK_ATTR_TRAP_ACTION]); - switch (val) { - case DEVLINK_TRAP_ACTION_DROP: - case DEVLINK_TRAP_ACTION_TRAP: - case DEVLINK_TRAP_ACTION_MIRROR: - *p_trap_action = val; - break; - default: - return -EINVAL; - } - - return 0; -} - -static int devlink_trap_metadata_put(struct sk_buff *msg, - const struct devlink_trap *trap) -{ - struct nlattr *attr; - - attr = nla_nest_start(msg, DEVLINK_ATTR_TRAP_METADATA); - if (!attr) - return -EMSGSIZE; - - if ((trap->metadata_cap & DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT) && - nla_put_flag(msg, DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT)) - goto nla_put_failure; - if ((trap->metadata_cap & DEVLINK_TRAP_METADATA_TYPE_F_FA_COOKIE) && - nla_put_flag(msg, DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE)) - goto nla_put_failure; - - nla_nest_end(msg, attr); - - return 0; - -nla_put_failure: - nla_nest_cancel(msg, attr); - return -EMSGSIZE; -} - -static void devlink_trap_stats_read(struct devlink_stats __percpu *trap_stats, - struct devlink_stats *stats) -{ - int i; - - memset(stats, 0, sizeof(*stats)); - for_each_possible_cpu(i) { - struct devlink_stats *cpu_stats; - u64 rx_packets, rx_bytes; - unsigned int start; - - cpu_stats = per_cpu_ptr(trap_stats, i); - do { - start = u64_stats_fetch_begin(&cpu_stats->syncp); - rx_packets = u64_stats_read(&cpu_stats->rx_packets); - rx_bytes = u64_stats_read(&cpu_stats->rx_bytes); - } while (u64_stats_fetch_retry(&cpu_stats->syncp, start)); - - u64_stats_add(&stats->rx_packets, rx_packets); - u64_stats_add(&stats->rx_bytes, rx_bytes); - } -} - -static int -devlink_trap_group_stats_put(struct sk_buff *msg, - struct devlink_stats __percpu *trap_stats) -{ - struct devlink_stats stats; - struct nlattr *attr; - - devlink_trap_stats_read(trap_stats, &stats); - - attr = nla_nest_start(msg, DEVLINK_ATTR_STATS); - if (!attr) - return -EMSGSIZE; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_PACKETS, - u64_stats_read(&stats.rx_packets), - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_BYTES, - u64_stats_read(&stats.rx_bytes), - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - nla_nest_end(msg, attr); - - return 0; - -nla_put_failure: - nla_nest_cancel(msg, attr); - return -EMSGSIZE; -} - -static int devlink_trap_stats_put(struct sk_buff *msg, struct devlink *devlink, - const struct devlink_trap_item *trap_item) -{ - struct devlink_stats stats; - struct nlattr *attr; - u64 drops = 0; - int err; - - if (devlink->ops->trap_drop_counter_get) { - err = devlink->ops->trap_drop_counter_get(devlink, - trap_item->trap, - &drops); - if (err) - return err; - } - - devlink_trap_stats_read(trap_item->stats, &stats); - - attr = nla_nest_start(msg, DEVLINK_ATTR_STATS); - if (!attr) - return -EMSGSIZE; - - if (devlink->ops->trap_drop_counter_get && - nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops, - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_PACKETS, - u64_stats_read(&stats.rx_packets), - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_BYTES, - u64_stats_read(&stats.rx_bytes), - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - nla_nest_end(msg, attr); - - return 0; - -nla_put_failure: - nla_nest_cancel(msg, attr); - return -EMSGSIZE; -} - -static int devlink_nl_trap_fill(struct sk_buff *msg, struct devlink *devlink, - const struct devlink_trap_item *trap_item, - enum devlink_command cmd, u32 portid, u32 seq, - int flags) -{ - struct devlink_trap_group_item *group_item = trap_item->group_item; - void *hdr; - int err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME, - group_item->group->name)) - goto nla_put_failure; - - if (nla_put_string(msg, DEVLINK_ATTR_TRAP_NAME, trap_item->trap->name)) - goto nla_put_failure; - - if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_TYPE, trap_item->trap->type)) - goto nla_put_failure; - - if (trap_item->trap->generic && - nla_put_flag(msg, DEVLINK_ATTR_TRAP_GENERIC)) - goto nla_put_failure; - - if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_ACTION, trap_item->action)) - goto nla_put_failure; - - err = devlink_trap_metadata_put(msg, trap_item->trap); - if (err) - goto nla_put_failure; - - err = devlink_trap_stats_put(msg, devlink, trap_item); - if (err) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_trap_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - struct devlink_trap_item *trap_item; - struct sk_buff *msg; - int err; - - if (list_empty(&devlink->trap_list)) - return -EOPNOTSUPP; - - trap_item = devlink_trap_item_get_from_info(devlink, info); - if (!trap_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap"); - return -ENOENT; - } - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_trap_fill(msg, devlink, trap_item, - DEVLINK_CMD_TRAP_NEW, info->snd_portid, - info->snd_seq, 0); - if (err) - goto err_trap_fill; - - return genlmsg_reply(msg, info); - -err_trap_fill: - nlmsg_free(msg); - return err; -} - -static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink_trap_item *trap_item; - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - list_for_each_entry(trap_item, &devlink->trap_list, list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_trap_fill(msg, devlink, trap_item, - DEVLINK_CMD_TRAP_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int __devlink_trap_action_set(struct devlink *devlink, - struct devlink_trap_item *trap_item, - enum devlink_trap_action trap_action, - struct netlink_ext_ack *extack) -{ - int err; - - if (trap_item->action != trap_action && - trap_item->trap->type != DEVLINK_TRAP_TYPE_DROP) { - NL_SET_ERR_MSG_MOD(extack, "Cannot change action of non-drop traps. Skipping"); - return 0; - } - - err = devlink->ops->trap_action_set(devlink, trap_item->trap, - trap_action, extack); - if (err) - return err; - - trap_item->action = trap_action; - - return 0; -} - -static int devlink_trap_action_set(struct devlink *devlink, - struct devlink_trap_item *trap_item, - struct genl_info *info) -{ - enum devlink_trap_action trap_action; - int err; - - if (!info->attrs[DEVLINK_ATTR_TRAP_ACTION]) - return 0; - - err = devlink_trap_action_get_from_info(info, &trap_action); - if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action"); - return -EINVAL; - } - - return __devlink_trap_action_set(devlink, trap_item, trap_action, - info->extack); -} - -static int devlink_nl_cmd_trap_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - struct devlink_trap_item *trap_item; - - if (list_empty(&devlink->trap_list)) - return -EOPNOTSUPP; - - trap_item = devlink_trap_item_get_from_info(devlink, info); - if (!trap_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap"); - return -ENOENT; - } - - return devlink_trap_action_set(devlink, trap_item, info); -} - -static struct devlink_trap_group_item * -devlink_trap_group_item_lookup(struct devlink *devlink, const char *name) -{ - struct devlink_trap_group_item *group_item; - - list_for_each_entry(group_item, &devlink->trap_group_list, list) { - if (!strcmp(group_item->group->name, name)) - return group_item; - } - - return NULL; -} - -static struct devlink_trap_group_item * -devlink_trap_group_item_lookup_by_id(struct devlink *devlink, u16 id) -{ - struct devlink_trap_group_item *group_item; - - list_for_each_entry(group_item, &devlink->trap_group_list, list) { - if (group_item->group->id == id) - return group_item; - } - - return NULL; -} - -static struct devlink_trap_group_item * -devlink_trap_group_item_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - char *name; - - if (!info->attrs[DEVLINK_ATTR_TRAP_GROUP_NAME]) - return NULL; - name = nla_data(info->attrs[DEVLINK_ATTR_TRAP_GROUP_NAME]); - - return devlink_trap_group_item_lookup(devlink, name); -} - -static int -devlink_nl_trap_group_fill(struct sk_buff *msg, struct devlink *devlink, - const struct devlink_trap_group_item *group_item, - enum devlink_command cmd, u32 portid, u32 seq, - int flags) -{ - void *hdr; - int err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME, - group_item->group->name)) - goto nla_put_failure; - - if (group_item->group->generic && - nla_put_flag(msg, DEVLINK_ATTR_TRAP_GENERIC)) - goto nla_put_failure; - - if (group_item->policer_item && - nla_put_u32(msg, DEVLINK_ATTR_TRAP_POLICER_ID, - group_item->policer_item->policer->id)) - goto nla_put_failure; - - err = devlink_trap_group_stats_put(msg, group_item->stats); - if (err) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_trap_group_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - struct devlink_trap_group_item *group_item; - struct sk_buff *msg; - int err; - - if (list_empty(&devlink->trap_group_list)) - return -EOPNOTSUPP; - - group_item = devlink_trap_group_item_get_from_info(devlink, info); - if (!group_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group"); - return -ENOENT; - } - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_trap_group_fill(msg, devlink, group_item, - DEVLINK_CMD_TRAP_GROUP_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) - goto err_trap_group_fill; - - return genlmsg_reply(msg, info); - -err_trap_group_fill: - nlmsg_free(msg); - return err; -} - -static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - enum devlink_command cmd = DEVLINK_CMD_TRAP_GROUP_NEW; - struct devlink_trap_group_item *group_item; - u32 portid = NETLINK_CB(cb->skb).portid; - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - list_for_each_entry(group_item, &devlink->trap_group_list, - list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_trap_group_fill(msg, devlink, - group_item, cmd, - portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int -__devlink_trap_group_action_set(struct devlink *devlink, - struct devlink_trap_group_item *group_item, - enum devlink_trap_action trap_action, - struct netlink_ext_ack *extack) -{ - const char *group_name = group_item->group->name; - struct devlink_trap_item *trap_item; - int err; - - if (devlink->ops->trap_group_action_set) { - err = devlink->ops->trap_group_action_set(devlink, group_item->group, - trap_action, extack); - if (err) - return err; - - list_for_each_entry(trap_item, &devlink->trap_list, list) { - if (strcmp(trap_item->group_item->group->name, group_name)) - continue; - if (trap_item->action != trap_action && - trap_item->trap->type != DEVLINK_TRAP_TYPE_DROP) - continue; - trap_item->action = trap_action; - } - - return 0; - } - - list_for_each_entry(trap_item, &devlink->trap_list, list) { - if (strcmp(trap_item->group_item->group->name, group_name)) - continue; - err = __devlink_trap_action_set(devlink, trap_item, - trap_action, extack); - if (err) - return err; - } - - return 0; -} - -static int -devlink_trap_group_action_set(struct devlink *devlink, - struct devlink_trap_group_item *group_item, - struct genl_info *info, bool *p_modified) -{ - enum devlink_trap_action trap_action; - int err; - - if (!info->attrs[DEVLINK_ATTR_TRAP_ACTION]) - return 0; - - err = devlink_trap_action_get_from_info(info, &trap_action); - if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action"); - return -EINVAL; - } - - err = __devlink_trap_group_action_set(devlink, group_item, trap_action, - info->extack); - if (err) - return err; - - *p_modified = true; - - return 0; -} - -static int devlink_trap_group_set(struct devlink *devlink, - struct devlink_trap_group_item *group_item, - struct genl_info *info) -{ - struct devlink_trap_policer_item *policer_item; - struct netlink_ext_ack *extack = info->extack; - const struct devlink_trap_policer *policer; - struct nlattr **attrs = info->attrs; - int err; - - if (!attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) - return 0; - - if (!devlink->ops->trap_group_set) - return -EOPNOTSUPP; - - policer_item = group_item->policer_item; - if (attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) { - u32 policer_id; - - policer_id = nla_get_u32(attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); - policer_item = devlink_trap_policer_item_lookup(devlink, - policer_id); - if (policer_id && !policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); - return -ENOENT; - } - } - policer = policer_item ? policer_item->policer : NULL; - - err = devlink->ops->trap_group_set(devlink, group_item->group, policer, - extack); - if (err) - return err; - - group_item->policer_item = policer_item; - - return 0; -} - -static int devlink_nl_cmd_trap_group_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - struct devlink_trap_group_item *group_item; - bool modified = false; - int err; - - if (list_empty(&devlink->trap_group_list)) - return -EOPNOTSUPP; - - group_item = devlink_trap_group_item_get_from_info(devlink, info); - if (!group_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group"); - return -ENOENT; - } - - err = devlink_trap_group_action_set(devlink, group_item, info, - &modified); - if (err) - return err; - - err = devlink_trap_group_set(devlink, group_item, info); - if (err) - goto err_trap_group_set; - - return 0; - -err_trap_group_set: - if (modified) - NL_SET_ERR_MSG_MOD(extack, "Trap group set failed, but some changes were committed already"); - return err; -} - -static struct devlink_trap_policer_item * -devlink_trap_policer_item_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - u32 id; - - if (!info->attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) - return NULL; - id = nla_get_u32(info->attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); - - return devlink_trap_policer_item_lookup(devlink, id); -} - -static int -devlink_trap_policer_stats_put(struct sk_buff *msg, struct devlink *devlink, - const struct devlink_trap_policer *policer) -{ - struct nlattr *attr; - u64 drops; - int err; - - if (!devlink->ops->trap_policer_counter_get) - return 0; - - err = devlink->ops->trap_policer_counter_get(devlink, policer, &drops); - if (err) - return err; - - attr = nla_nest_start(msg, DEVLINK_ATTR_STATS); - if (!attr) - return -EMSGSIZE; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops, - DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - nla_nest_end(msg, attr); - - return 0; - -nla_put_failure: - nla_nest_cancel(msg, attr); - return -EMSGSIZE; -} - -static int -devlink_nl_trap_policer_fill(struct sk_buff *msg, struct devlink *devlink, - const struct devlink_trap_policer_item *policer_item, - enum devlink_command cmd, u32 portid, u32 seq, - int flags) -{ - void *hdr; - int err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (nla_put_u32(msg, DEVLINK_ATTR_TRAP_POLICER_ID, - policer_item->policer->id)) - goto nla_put_failure; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_TRAP_POLICER_RATE, - policer_item->rate, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_TRAP_POLICER_BURST, - policer_item->burst, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - - err = devlink_trap_policer_stats_put(msg, devlink, - policer_item->policer); - if (err) - goto nla_put_failure; - - genlmsg_end(msg, hdr); - - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_trap_policer_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_trap_policer_item *policer_item; - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - if (list_empty(&devlink->trap_policer_list)) - return -EOPNOTSUPP; - - policer_item = devlink_trap_policer_item_get_from_info(devlink, info); - if (!policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); - return -ENOENT; - } - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_trap_policer_fill(msg, devlink, policer_item, - DEVLINK_CMD_TRAP_POLICER_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) - goto err_trap_policer_fill; - - return genlmsg_reply(msg, info); - -err_trap_policer_fill: - nlmsg_free(msg); - return err; -} - -static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - enum devlink_command cmd = DEVLINK_CMD_TRAP_POLICER_NEW; - struct devlink_trap_policer_item *policer_item; - u32 portid = NETLINK_CB(cb->skb).portid; - struct devlink *devlink; - int start = cb->args[0]; - unsigned long index; - int idx = 0; - int err; - - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - devl_lock(devlink); - list_for_each_entry(policer_item, &devlink->trap_policer_list, - list) { - if (idx < start) { - idx++; - continue; - } - err = devlink_nl_trap_policer_fill(msg, devlink, - policer_item, cmd, - portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - goto out; - } - idx++; - } - devl_unlock(devlink); - devlink_put(devlink); - } -out: - cb->args[0] = idx; - return msg->len; -} - -static int -devlink_trap_policer_set(struct devlink *devlink, - struct devlink_trap_policer_item *policer_item, - struct genl_info *info) -{ - struct netlink_ext_ack *extack = info->extack; - struct nlattr **attrs = info->attrs; - u64 rate, burst; - int err; - - rate = policer_item->rate; - burst = policer_item->burst; - - if (attrs[DEVLINK_ATTR_TRAP_POLICER_RATE]) - rate = nla_get_u64(attrs[DEVLINK_ATTR_TRAP_POLICER_RATE]); - - if (attrs[DEVLINK_ATTR_TRAP_POLICER_BURST]) - burst = nla_get_u64(attrs[DEVLINK_ATTR_TRAP_POLICER_BURST]); - - if (rate < policer_item->policer->min_rate) { - NL_SET_ERR_MSG_MOD(extack, "Policer rate lower than limit"); - return -EINVAL; - } - - if (rate > policer_item->policer->max_rate) { - NL_SET_ERR_MSG_MOD(extack, "Policer rate higher than limit"); - return -EINVAL; - } - - if (burst < policer_item->policer->min_burst) { - NL_SET_ERR_MSG_MOD(extack, "Policer burst size lower than limit"); - return -EINVAL; - } - - if (burst > policer_item->policer->max_burst) { - NL_SET_ERR_MSG_MOD(extack, "Policer burst size higher than limit"); - return -EINVAL; - } - - err = devlink->ops->trap_policer_set(devlink, policer_item->policer, - rate, burst, info->extack); - if (err) - return err; - - policer_item->rate = rate; - policer_item->burst = burst; - - return 0; -} - -static int devlink_nl_cmd_trap_policer_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink_trap_policer_item *policer_item; - struct netlink_ext_ack *extack = info->extack; - struct devlink *devlink = info->user_ptr[0]; - - if (list_empty(&devlink->trap_policer_list)) - return -EOPNOTSUPP; - - if (!devlink->ops->trap_policer_set) - return -EOPNOTSUPP; - - policer_item = devlink_trap_policer_item_get_from_info(devlink, info); - if (!policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); - return -ENOENT; - } - - return devlink_trap_policer_set(devlink, policer_item, info); -} - -static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { - [DEVLINK_ATTR_UNSPEC] = { .strict_start_type = - DEVLINK_ATTR_TRAP_POLICER_ID }, - [DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32 }, - [DEVLINK_ATTR_PORT_TYPE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_PORT_TYPE_AUTO, - DEVLINK_PORT_TYPE_IB), - [DEVLINK_ATTR_PORT_SPLIT_COUNT] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_INDEX] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_POOL_INDEX] = { .type = NLA_U16 }, - [DEVLINK_ATTR_SB_POOL_TYPE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_SB_POOL_SIZE] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 }, - [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_ESWITCH_MODE_LEGACY, - DEVLINK_ESWITCH_MODE_SWITCHDEV), - [DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 }, - [DEVLINK_ATTR_RESOURCE_ID] = { .type = NLA_U64}, - [DEVLINK_ATTR_RESOURCE_SIZE] = { .type = NLA_U64}, - [DEVLINK_ATTR_PARAM_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_REGION_CHUNK_ADDR] = { .type = NLA_U64 }, - [DEVLINK_ATTR_REGION_CHUNK_LEN] = { .type = NLA_U64 }, - [DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 }, - [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 }, - [DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK] = - NLA_POLICY_BITFIELD32(DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS), - [DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 }, - [DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 }, - [DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 }, - [DEVLINK_ATTR_TRAP_POLICER_ID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 }, - [DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 }, - [DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED }, - [DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - DEVLINK_RELOAD_ACTION_MAX), - [DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK), - [DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 }, - [DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 }, - [DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 }, - [DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 }, - [DEVLINK_ATTR_RATE_TYPE] = { .type = NLA_U16 }, - [DEVLINK_ATTR_RATE_TX_SHARE] = { .type = NLA_U64 }, - [DEVLINK_ATTR_RATE_TX_MAX] = { .type = NLA_U64 }, - [DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 }, - [DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED }, - [DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 }, - [DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 }, - [DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG }, -}; - -static const struct genl_small_ops devlink_nl_ops[] = { - { - .cmd = DEVLINK_CMD_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_get_doit, - .dumpit = devlink_nl_cmd_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_PORT_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_port_get_doit, - .dumpit = devlink_nl_cmd_port_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_PORT_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_port_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - }, - { - .cmd = DEVLINK_CMD_RATE_GET, - .doit = devlink_nl_cmd_rate_get_doit, - .dumpit = devlink_nl_cmd_rate_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_RATE, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_RATE_SET, - .doit = devlink_nl_cmd_rate_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_RATE, - }, - { - .cmd = DEVLINK_CMD_RATE_NEW, - .doit = devlink_nl_cmd_rate_new_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_RATE_DEL, - .doit = devlink_nl_cmd_rate_del_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_RATE_NODE, - }, - { - .cmd = DEVLINK_CMD_PORT_SPLIT, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_port_split_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - }, - { - .cmd = DEVLINK_CMD_PORT_UNSPLIT, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_port_unsplit_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - }, - { - .cmd = DEVLINK_CMD_PORT_NEW, - .doit = devlink_nl_cmd_port_new_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_PORT_DEL, - .doit = devlink_nl_cmd_port_del_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_LINECARD_GET, - .doit = devlink_nl_cmd_linecard_get_doit, - .dumpit = devlink_nl_cmd_linecard_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_LINECARD_SET, - .doit = devlink_nl_cmd_linecard_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD, - }, - { - .cmd = DEVLINK_CMD_SB_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_get_doit, - .dumpit = devlink_nl_cmd_sb_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_SB_POOL_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_pool_get_doit, - .dumpit = devlink_nl_cmd_sb_pool_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_SB_POOL_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_pool_set_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_SB_PORT_POOL_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_port_pool_get_doit, - .dumpit = devlink_nl_cmd_sb_port_pool_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_SB_PORT_POOL_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_port_pool_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - }, - { - .cmd = DEVLINK_CMD_SB_TC_POOL_BIND_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_tc_pool_bind_get_doit, - .dumpit = devlink_nl_cmd_sb_tc_pool_bind_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_SB_TC_POOL_BIND_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_tc_pool_bind_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - }, - { - .cmd = DEVLINK_CMD_SB_OCC_SNAPSHOT, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_occ_snapshot_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_SB_OCC_MAX_CLEAR, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_sb_occ_max_clear_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_ESWITCH_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_eswitch_get_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_ESWITCH_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_eswitch_set_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_DPIPE_TABLE_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_dpipe_table_get, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_DPIPE_ENTRIES_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_dpipe_entries_get, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_DPIPE_HEADERS_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_dpipe_headers_get, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_dpipe_table_counters_set, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_RESOURCE_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_resource_set, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_RESOURCE_DUMP, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_resource_dump, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_RELOAD, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_reload, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_PARAM_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_param_get_doit, - .dumpit = devlink_nl_cmd_param_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_PARAM_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_param_set_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_PORT_PARAM_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_port_param_get_doit, - .dumpit = devlink_nl_cmd_port_param_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_PORT_PARAM_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_port_param_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, - }, - { - .cmd = DEVLINK_CMD_REGION_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_region_get_doit, - .dumpit = devlink_nl_cmd_region_get_dumpit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_REGION_NEW, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_region_new, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_REGION_DEL, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_region_del, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_REGION_READ, - .validate = GENL_DONT_VALIDATE_STRICT | - GENL_DONT_VALIDATE_DUMP_STRICT, - .dumpit = devlink_nl_cmd_region_read_dumpit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_INFO_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_info_get_doit, - .dumpit = devlink_nl_cmd_info_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_health_reporter_get_doit, - .dumpit = devlink_nl_cmd_health_reporter_get_dumpit, - .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_SET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_health_reporter_set_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_health_reporter_recover_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_health_reporter_diagnose_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET, - .validate = GENL_DONT_VALIDATE_STRICT | - GENL_DONT_VALIDATE_DUMP_STRICT, - .dumpit = devlink_nl_cmd_health_reporter_dump_get_dumpit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_health_reporter_dump_clear_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, - }, - { - .cmd = DEVLINK_CMD_HEALTH_REPORTER_TEST, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_health_reporter_test_doit, - .flags = GENL_ADMIN_PERM, - .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, - }, - { - .cmd = DEVLINK_CMD_FLASH_UPDATE, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = devlink_nl_cmd_flash_update, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_TRAP_GET, - .doit = devlink_nl_cmd_trap_get_doit, - .dumpit = devlink_nl_cmd_trap_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_TRAP_SET, - .doit = devlink_nl_cmd_trap_set_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_TRAP_GROUP_GET, - .doit = devlink_nl_cmd_trap_group_get_doit, - .dumpit = devlink_nl_cmd_trap_group_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_TRAP_GROUP_SET, - .doit = devlink_nl_cmd_trap_group_set_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_TRAP_POLICER_GET, - .doit = devlink_nl_cmd_trap_policer_get_doit, - .dumpit = devlink_nl_cmd_trap_policer_get_dumpit, - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_TRAP_POLICER_SET, - .doit = devlink_nl_cmd_trap_policer_set_doit, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = DEVLINK_CMD_SELFTESTS_GET, - .doit = devlink_nl_cmd_selftests_get_doit, - .dumpit = devlink_nl_cmd_selftests_get_dumpit - /* can be retrieved by unprivileged users */ - }, - { - .cmd = DEVLINK_CMD_SELFTESTS_RUN, - .doit = devlink_nl_cmd_selftests_run, - .flags = GENL_ADMIN_PERM, - }, -}; - -static struct genl_family devlink_nl_family __ro_after_init = { - .name = DEVLINK_GENL_NAME, - .version = DEVLINK_GENL_VERSION, - .maxattr = DEVLINK_ATTR_MAX, - .policy = devlink_nl_policy, - .netnsok = true, - .parallel_ops = true, - .pre_doit = devlink_nl_pre_doit, - .post_doit = devlink_nl_post_doit, - .module = THIS_MODULE, - .small_ops = devlink_nl_ops, - .n_small_ops = ARRAY_SIZE(devlink_nl_ops), - .resv_start_op = DEVLINK_CMD_SELFTESTS_RUN + 1, - .mcgrps = devlink_nl_mcgrps, - .n_mcgrps = ARRAY_SIZE(devlink_nl_mcgrps), -}; - -static bool devlink_reload_actions_valid(const struct devlink_ops *ops) -{ - const struct devlink_reload_combination *comb; - int i; - - if (!devlink_reload_supported(ops)) { - if (WARN_ON(ops->reload_actions)) - return false; - return true; - } - - if (WARN_ON(!ops->reload_actions || - ops->reload_actions & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || - ops->reload_actions >= BIT(__DEVLINK_RELOAD_ACTION_MAX))) - return false; - - if (WARN_ON(ops->reload_limits & BIT(DEVLINK_RELOAD_LIMIT_UNSPEC) || - ops->reload_limits >= BIT(__DEVLINK_RELOAD_LIMIT_MAX))) - return false; - - for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) { - comb = &devlink_reload_invalid_combinations[i]; - if (ops->reload_actions == BIT(comb->action) && - ops->reload_limits == BIT(comb->limit)) - return false; - } - return true; -} - -/** - * devlink_set_features - Set devlink supported features - * - * @devlink: devlink - * @features: devlink support features - * - * This interface allows us to set reload ops separatelly from - * the devlink_alloc. - */ -void devlink_set_features(struct devlink *devlink, u64 features) -{ - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - WARN_ON(features & DEVLINK_F_RELOAD && - !devlink_reload_supported(devlink->ops)); - devlink->features = features; -} -EXPORT_SYMBOL_GPL(devlink_set_features); - -static int devlink_netdevice_event(struct notifier_block *nb, - unsigned long event, void *ptr); - -/** - * devlink_alloc_ns - Allocate new devlink instance resources - * in specific namespace - * - * @ops: ops - * @priv_size: size of user private data - * @net: net namespace - * @dev: parent device - * - * Allocate new devlink instance resources, including devlink index - * and name. - */ -struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, - size_t priv_size, struct net *net, - struct device *dev) -{ - struct devlink *devlink; - static u32 last_id; - int ret; - - WARN_ON(!ops || !dev); - if (!devlink_reload_actions_valid(ops)) - return NULL; - - devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL); - if (!devlink) - return NULL; - - ret = xa_alloc_cyclic(&devlinks, &devlink->index, devlink, xa_limit_31b, - &last_id, GFP_KERNEL); - if (ret < 0) - goto err_xa_alloc; - - devlink->netdevice_nb.notifier_call = devlink_netdevice_event; - ret = register_netdevice_notifier_net(net, &devlink->netdevice_nb); - if (ret) - goto err_register_netdevice_notifier; - - devlink->dev = dev; - devlink->ops = ops; - xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC); - xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC); - write_pnet(&devlink->_net, net); - INIT_LIST_HEAD(&devlink->rate_list); - INIT_LIST_HEAD(&devlink->linecard_list); - INIT_LIST_HEAD(&devlink->sb_list); - INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list); - INIT_LIST_HEAD(&devlink->resource_list); - INIT_LIST_HEAD(&devlink->param_list); - INIT_LIST_HEAD(&devlink->region_list); - INIT_LIST_HEAD(&devlink->reporter_list); - INIT_LIST_HEAD(&devlink->trap_list); - INIT_LIST_HEAD(&devlink->trap_group_list); - INIT_LIST_HEAD(&devlink->trap_policer_list); - lockdep_register_key(&devlink->lock_key); - mutex_init(&devlink->lock); - lockdep_set_class(&devlink->lock, &devlink->lock_key); - mutex_init(&devlink->reporters_lock); - mutex_init(&devlink->linecards_lock); - refcount_set(&devlink->refcount, 1); - init_completion(&devlink->comp); - - return devlink; - -err_register_netdevice_notifier: - xa_erase(&devlinks, devlink->index); -err_xa_alloc: - kfree(devlink); - return NULL; -} -EXPORT_SYMBOL_GPL(devlink_alloc_ns); - -static void -devlink_trap_policer_notify(struct devlink *devlink, - const struct devlink_trap_policer_item *policer_item, - enum devlink_command cmd); -static void -devlink_trap_group_notify(struct devlink *devlink, - const struct devlink_trap_group_item *group_item, - enum devlink_command cmd); -static void devlink_trap_notify(struct devlink *devlink, - const struct devlink_trap_item *trap_item, - enum devlink_command cmd); - -static void devlink_notify_register(struct devlink *devlink) -{ - struct devlink_trap_policer_item *policer_item; - struct devlink_trap_group_item *group_item; - struct devlink_param_item *param_item; - struct devlink_trap_item *trap_item; - struct devlink_port *devlink_port; - struct devlink_linecard *linecard; - struct devlink_rate *rate_node; - struct devlink_region *region; - unsigned long port_index; - - devlink_notify(devlink, DEVLINK_CMD_NEW); - list_for_each_entry(linecard, &devlink->linecard_list, list) - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - - xa_for_each(&devlink->ports, port_index, devlink_port) - devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); - - list_for_each_entry(policer_item, &devlink->trap_policer_list, list) - devlink_trap_policer_notify(devlink, policer_item, - DEVLINK_CMD_TRAP_POLICER_NEW); - - list_for_each_entry(group_item, &devlink->trap_group_list, list) - devlink_trap_group_notify(devlink, group_item, - DEVLINK_CMD_TRAP_GROUP_NEW); - - list_for_each_entry(trap_item, &devlink->trap_list, list) - devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_NEW); - - list_for_each_entry(rate_node, &devlink->rate_list, list) - devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); - - list_for_each_entry(region, &devlink->region_list, list) - devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); - - list_for_each_entry(param_item, &devlink->param_list, list) - devlink_param_notify(devlink, 0, param_item, - DEVLINK_CMD_PARAM_NEW); -} - -static void devlink_notify_unregister(struct devlink *devlink) -{ - struct devlink_trap_policer_item *policer_item; - struct devlink_trap_group_item *group_item; - struct devlink_param_item *param_item; - struct devlink_trap_item *trap_item; - struct devlink_port *devlink_port; - struct devlink_rate *rate_node; - struct devlink_region *region; - unsigned long port_index; - - list_for_each_entry_reverse(param_item, &devlink->param_list, list) - devlink_param_notify(devlink, 0, param_item, - DEVLINK_CMD_PARAM_DEL); - - list_for_each_entry_reverse(region, &devlink->region_list, list) - devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_DEL); - - list_for_each_entry_reverse(rate_node, &devlink->rate_list, list) - devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL); - - list_for_each_entry_reverse(trap_item, &devlink->trap_list, list) - devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_DEL); - - list_for_each_entry_reverse(group_item, &devlink->trap_group_list, list) - devlink_trap_group_notify(devlink, group_item, - DEVLINK_CMD_TRAP_GROUP_DEL); - list_for_each_entry_reverse(policer_item, &devlink->trap_policer_list, - list) - devlink_trap_policer_notify(devlink, policer_item, - DEVLINK_CMD_TRAP_POLICER_DEL); - - xa_for_each(&devlink->ports, port_index, devlink_port) - devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); - devlink_notify(devlink, DEVLINK_CMD_DEL); -} - -/** - * devlink_register - Register devlink instance - * - * @devlink: devlink - */ -void devlink_register(struct devlink *devlink) -{ - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - /* Make sure that we are in .probe() routine */ - - xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); - devlink_notify_register(devlink); -} -EXPORT_SYMBOL_GPL(devlink_register); - -/** - * devlink_unregister - Unregister devlink instance - * - * @devlink: devlink - */ -void devlink_unregister(struct devlink *devlink) -{ - ASSERT_DEVLINK_REGISTERED(devlink); - /* Make sure that we are in .remove() routine */ - - xa_set_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); - devlink_put(devlink); - wait_for_completion(&devlink->comp); - - devlink_notify_unregister(devlink); - xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); - xa_clear_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); -} -EXPORT_SYMBOL_GPL(devlink_unregister); - -/** - * devlink_free - Free devlink instance resources - * - * @devlink: devlink - */ -void devlink_free(struct devlink *devlink) -{ - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - mutex_destroy(&devlink->linecards_lock); - mutex_destroy(&devlink->reporters_lock); - mutex_destroy(&devlink->lock); - lockdep_unregister_key(&devlink->lock_key); - WARN_ON(!list_empty(&devlink->trap_policer_list)); - WARN_ON(!list_empty(&devlink->trap_group_list)); - WARN_ON(!list_empty(&devlink->trap_list)); - WARN_ON(!list_empty(&devlink->reporter_list)); - WARN_ON(!list_empty(&devlink->region_list)); - WARN_ON(!list_empty(&devlink->param_list)); - WARN_ON(!list_empty(&devlink->resource_list)); - WARN_ON(!list_empty(&devlink->dpipe_table_list)); - WARN_ON(!list_empty(&devlink->sb_list)); - WARN_ON(!list_empty(&devlink->rate_list)); - WARN_ON(!list_empty(&devlink->linecard_list)); - WARN_ON(!xa_empty(&devlink->ports)); - - xa_destroy(&devlink->snapshot_ids); - xa_destroy(&devlink->ports); - - WARN_ON_ONCE(unregister_netdevice_notifier_net(devlink_net(devlink), - &devlink->netdevice_nb)); - - xa_erase(&devlinks, devlink->index); - - kfree(devlink); -} -EXPORT_SYMBOL_GPL(devlink_free); - -static void devlink_port_type_warn(struct work_struct *work) -{ - WARN(true, "Type was not set for devlink port."); -} - -static bool devlink_port_type_should_warn(struct devlink_port *devlink_port) -{ - /* Ignore CPU and DSA flavours. */ - return devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_CPU && - devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_DSA && - devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_UNUSED; -} - -#define DEVLINK_PORT_TYPE_WARN_TIMEOUT (HZ * 3600) - -static void devlink_port_type_warn_schedule(struct devlink_port *devlink_port) -{ - if (!devlink_port_type_should_warn(devlink_port)) - return; - /* Schedule a work to WARN in case driver does not set port - * type within timeout. - */ - schedule_delayed_work(&devlink_port->type_warn_dw, - DEVLINK_PORT_TYPE_WARN_TIMEOUT); -} - -static void devlink_port_type_warn_cancel(struct devlink_port *devlink_port) -{ - if (!devlink_port_type_should_warn(devlink_port)) - return; - cancel_delayed_work_sync(&devlink_port->type_warn_dw); -} - -/** - * devlink_port_init() - Init devlink port - * - * @devlink: devlink - * @devlink_port: devlink port - * - * Initialize essencial stuff that is needed for functions - * that may be called before devlink port registration. - * Call to this function is optional and not needed - * in case the driver does not use such functions. - */ -void devlink_port_init(struct devlink *devlink, - struct devlink_port *devlink_port) -{ - if (devlink_port->initialized) - return; - devlink_port->devlink = devlink; - INIT_LIST_HEAD(&devlink_port->region_list); - devlink_port->initialized = true; -} -EXPORT_SYMBOL_GPL(devlink_port_init); - -/** - * devlink_port_fini() - Deinitialize devlink port - * - * @devlink_port: devlink port - * - * Deinitialize essencial stuff that is in use for functions - * that may be called after devlink port unregistration. - * Call to this function is optional and not needed - * in case the driver does not use such functions. - */ -void devlink_port_fini(struct devlink_port *devlink_port) -{ - WARN_ON(!list_empty(&devlink_port->region_list)); -} -EXPORT_SYMBOL_GPL(devlink_port_fini); - -/** - * devl_port_register() - Register devlink port - * - * @devlink: devlink - * @devlink_port: devlink port - * @port_index: driver-specific numerical identifier of the port - * - * Register devlink port with provided port index. User can use - * any indexing, even hw-related one. devlink_port structure - * is convenient to be embedded inside user driver private structure. - * Note that the caller should take care of zeroing the devlink_port - * structure. - */ -int devl_port_register(struct devlink *devlink, - struct devlink_port *devlink_port, - unsigned int port_index) -{ - int err; - - devl_assert_locked(devlink); - - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - - devlink_port_init(devlink, devlink_port); - devlink_port->registered = true; - devlink_port->index = port_index; - spin_lock_init(&devlink_port->type_lock); - INIT_LIST_HEAD(&devlink_port->reporter_list); - mutex_init(&devlink_port->reporters_lock); - err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL); - if (err) { - mutex_destroy(&devlink_port->reporters_lock); - return err; - } - - INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn); - devlink_port_type_warn_schedule(devlink_port); - devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); - return 0; -} -EXPORT_SYMBOL_GPL(devl_port_register); - -/** - * devlink_port_register - Register devlink port - * - * @devlink: devlink - * @devlink_port: devlink port - * @port_index: driver-specific numerical identifier of the port - * - * Register devlink port with provided port index. User can use - * any indexing, even hw-related one. devlink_port structure - * is convenient to be embedded inside user driver private structure. - * Note that the caller should take care of zeroing the devlink_port - * structure. - * - * Context: Takes and release devlink->lock . - */ -int devlink_port_register(struct devlink *devlink, - struct devlink_port *devlink_port, - unsigned int port_index) -{ - int err; - - devl_lock(devlink); - err = devl_port_register(devlink, devlink_port, port_index); - devl_unlock(devlink); - return err; -} -EXPORT_SYMBOL_GPL(devlink_port_register); - -/** - * devl_port_unregister() - Unregister devlink port - * - * @devlink_port: devlink port - */ -void devl_port_unregister(struct devlink_port *devlink_port) -{ - lockdep_assert_held(&devlink_port->devlink->lock); - WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET); - - devlink_port_type_warn_cancel(devlink_port); - devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); - xa_erase(&devlink_port->devlink->ports, devlink_port->index); - WARN_ON(!list_empty(&devlink_port->reporter_list)); - mutex_destroy(&devlink_port->reporters_lock); - devlink_port->registered = false; -} -EXPORT_SYMBOL_GPL(devl_port_unregister); - -/** - * devlink_port_unregister - Unregister devlink port - * - * @devlink_port: devlink port - * - * Context: Takes and release devlink->lock . - */ -void devlink_port_unregister(struct devlink_port *devlink_port) -{ - struct devlink *devlink = devlink_port->devlink; - - devl_lock(devlink); - devl_port_unregister(devlink_port); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_port_unregister); - -static void devlink_port_type_netdev_checks(struct devlink_port *devlink_port, - struct net_device *netdev) -{ - const struct net_device_ops *ops = netdev->netdev_ops; - - /* If driver registers devlink port, it should set devlink port - * attributes accordingly so the compat functions are called - * and the original ops are not used. - */ - if (ops->ndo_get_phys_port_name) { - /* Some drivers use the same set of ndos for netdevs - * that have devlink_port registered and also for - * those who don't. Make sure that ndo_get_phys_port_name - * returns -EOPNOTSUPP here in case it is defined. - * Warn if not. - */ - char name[IFNAMSIZ]; - int err; - - err = ops->ndo_get_phys_port_name(netdev, name, sizeof(name)); - WARN_ON(err != -EOPNOTSUPP); - } - if (ops->ndo_get_port_parent_id) { - /* Some drivers use the same set of ndos for netdevs - * that have devlink_port registered and also for - * those who don't. Make sure that ndo_get_port_parent_id - * returns -EOPNOTSUPP here in case it is defined. - * Warn if not. - */ - struct netdev_phys_item_id ppid; - int err; - - err = ops->ndo_get_port_parent_id(netdev, &ppid); - WARN_ON(err != -EOPNOTSUPP); - } -} - -static void __devlink_port_type_set(struct devlink_port *devlink_port, - enum devlink_port_type type, - void *type_dev) -{ - struct net_device *netdev = type_dev; - - ASSERT_DEVLINK_PORT_REGISTERED(devlink_port); - - if (type == DEVLINK_PORT_TYPE_NOTSET) { - devlink_port_type_warn_schedule(devlink_port); - } else { - devlink_port_type_warn_cancel(devlink_port); - if (type == DEVLINK_PORT_TYPE_ETH && netdev) - devlink_port_type_netdev_checks(devlink_port, netdev); - } - - spin_lock_bh(&devlink_port->type_lock); - devlink_port->type = type; - switch (type) { - case DEVLINK_PORT_TYPE_ETH: - devlink_port->type_eth.netdev = netdev; - if (netdev) { - ASSERT_RTNL(); - devlink_port->type_eth.ifindex = netdev->ifindex; - BUILD_BUG_ON(sizeof(devlink_port->type_eth.ifname) != - sizeof(netdev->name)); - strcpy(devlink_port->type_eth.ifname, netdev->name); - } - break; - case DEVLINK_PORT_TYPE_IB: - devlink_port->type_ib.ibdev = type_dev; - break; - default: - break; - } - spin_unlock_bh(&devlink_port->type_lock); - devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); -} - -/** - * devlink_port_type_eth_set - Set port type to Ethernet - * - * @devlink_port: devlink port - * - * If driver is calling this, most likely it is doing something wrong. - */ -void devlink_port_type_eth_set(struct devlink_port *devlink_port) -{ - dev_warn(devlink_port->devlink->dev, - "devlink port type for port %d set to Ethernet without a software interface reference, device type not supported by the kernel?\n", - devlink_port->index); - __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, NULL); -} -EXPORT_SYMBOL_GPL(devlink_port_type_eth_set); - -/** - * devlink_port_type_ib_set - Set port type to InfiniBand - * - * @devlink_port: devlink port - * @ibdev: related IB device - */ -void devlink_port_type_ib_set(struct devlink_port *devlink_port, - struct ib_device *ibdev) -{ - __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_IB, ibdev); -} -EXPORT_SYMBOL_GPL(devlink_port_type_ib_set); - -/** - * devlink_port_type_clear - Clear port type - * - * @devlink_port: devlink port - * - * If driver is calling this for clearing Ethernet type, most likely - * it is doing something wrong. - */ -void devlink_port_type_clear(struct devlink_port *devlink_port) -{ - if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) - dev_warn(devlink_port->devlink->dev, - "devlink port type for port %d cleared without a software interface reference, device type not supported by the kernel?\n", - devlink_port->index); - __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, NULL); -} -EXPORT_SYMBOL_GPL(devlink_port_type_clear); - -static int devlink_netdevice_event(struct notifier_block *nb, - unsigned long event, void *ptr) -{ - struct net_device *netdev = netdev_notifier_info_to_dev(ptr); - struct devlink_port *devlink_port = netdev->devlink_port; - struct devlink *devlink; - - devlink = container_of(nb, struct devlink, netdevice_nb); - - if (!devlink_port || devlink_port->devlink != devlink) - return NOTIFY_OK; - - switch (event) { - case NETDEV_POST_INIT: - /* Set the type but not netdev pointer. It is going to be set - * later on by NETDEV_REGISTER event. Happens once during - * netdevice register - */ - __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, - NULL); - break; - case NETDEV_REGISTER: - case NETDEV_CHANGENAME: - /* Set the netdev on top of previously set type. Note this - * event happens also during net namespace change so here - * we take into account netdev pointer appearing in this - * namespace. - */ - __devlink_port_type_set(devlink_port, devlink_port->type, - netdev); - break; - case NETDEV_UNREGISTER: - /* Clear netdev pointer, but not the type. This event happens - * also during net namespace change so we need to clear - * pointer to netdev that is going to another net namespace. - */ - __devlink_port_type_set(devlink_port, devlink_port->type, - NULL); - break; - case NETDEV_PRE_UNINIT: - /* Clear the type and the netdev pointer. Happens one during - * netdevice unregister. - */ - __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, - NULL); - break; - } - - return NOTIFY_OK; -} - -static int __devlink_port_attrs_set(struct devlink_port *devlink_port, - enum devlink_port_flavour flavour) -{ - struct devlink_port_attrs *attrs = &devlink_port->attrs; - - devlink_port->attrs_set = true; - attrs->flavour = flavour; - if (attrs->switch_id.id_len) { - devlink_port->switch_port = true; - if (WARN_ON(attrs->switch_id.id_len > MAX_PHYS_ITEM_ID_LEN)) - attrs->switch_id.id_len = MAX_PHYS_ITEM_ID_LEN; - } else { - devlink_port->switch_port = false; - } - return 0; -} - -/** - * devlink_port_attrs_set - Set port attributes - * - * @devlink_port: devlink port - * @attrs: devlink port attrs - */ -void devlink_port_attrs_set(struct devlink_port *devlink_port, - struct devlink_port_attrs *attrs) -{ - int ret; - - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - - devlink_port->attrs = *attrs; - ret = __devlink_port_attrs_set(devlink_port, attrs->flavour); - if (ret) - return; - WARN_ON(attrs->splittable && attrs->split); -} -EXPORT_SYMBOL_GPL(devlink_port_attrs_set); - -/** - * devlink_port_attrs_pci_pf_set - Set PCI PF port attributes - * - * @devlink_port: devlink port - * @controller: associated controller number for the devlink port instance - * @pf: associated PF for the devlink port instance - * @external: indicates if the port is for an external controller - */ -void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller, - u16 pf, bool external) -{ - struct devlink_port_attrs *attrs = &devlink_port->attrs; - int ret; - - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - - ret = __devlink_port_attrs_set(devlink_port, - DEVLINK_PORT_FLAVOUR_PCI_PF); - if (ret) - return; - attrs->pci_pf.controller = controller; - attrs->pci_pf.pf = pf; - attrs->pci_pf.external = external; -} -EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set); - -/** - * devlink_port_attrs_pci_vf_set - Set PCI VF port attributes - * - * @devlink_port: devlink port - * @controller: associated controller number for the devlink port instance - * @pf: associated PF for the devlink port instance - * @vf: associated VF of a PF for the devlink port instance - * @external: indicates if the port is for an external controller - */ -void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller, - u16 pf, u16 vf, bool external) -{ - struct devlink_port_attrs *attrs = &devlink_port->attrs; - int ret; - - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - - ret = __devlink_port_attrs_set(devlink_port, - DEVLINK_PORT_FLAVOUR_PCI_VF); - if (ret) - return; - attrs->pci_vf.controller = controller; - attrs->pci_vf.pf = pf; - attrs->pci_vf.vf = vf; - attrs->pci_vf.external = external; -} -EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set); - -/** - * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes - * - * @devlink_port: devlink port - * @controller: associated controller number for the devlink port instance - * @pf: associated PF for the devlink port instance - * @sf: associated SF of a PF for the devlink port instance - * @external: indicates if the port is for an external controller - */ -void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, - u16 pf, u32 sf, bool external) -{ - struct devlink_port_attrs *attrs = &devlink_port->attrs; - int ret; - - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - - ret = __devlink_port_attrs_set(devlink_port, - DEVLINK_PORT_FLAVOUR_PCI_SF); - if (ret) - return; - attrs->pci_sf.controller = controller; - attrs->pci_sf.pf = pf; - attrs->pci_sf.sf = sf; - attrs->pci_sf.external = external; -} -EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); - -/** - * devl_rate_node_create - create devlink rate node - * @devlink: devlink instance - * @priv: driver private data - * @node_name: name of the resulting node - * @parent: parent devlink_rate struct - * - * Create devlink rate object of type node - */ -struct devlink_rate * -devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, - struct devlink_rate *parent) -{ - struct devlink_rate *rate_node; - - rate_node = devlink_rate_node_get_by_name(devlink, node_name); - if (!IS_ERR(rate_node)) - return ERR_PTR(-EEXIST); - - rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL); - if (!rate_node) - return ERR_PTR(-ENOMEM); - - if (parent) { - rate_node->parent = parent; - refcount_inc(&rate_node->parent->refcnt); - } - - rate_node->type = DEVLINK_RATE_TYPE_NODE; - rate_node->devlink = devlink; - rate_node->priv = priv; - - rate_node->name = kstrdup(node_name, GFP_KERNEL); - if (!rate_node->name) { - kfree(rate_node); - return ERR_PTR(-ENOMEM); - } - - refcount_set(&rate_node->refcnt, 1); - list_add(&rate_node->list, &devlink->rate_list); - devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); - return rate_node; -} -EXPORT_SYMBOL_GPL(devl_rate_node_create); - -/** - * devl_rate_leaf_create - create devlink rate leaf - * @devlink_port: devlink port object to create rate object on - * @priv: driver private data - * @parent: parent devlink_rate struct - * - * Create devlink rate object of type leaf on provided @devlink_port. - */ -int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv, - struct devlink_rate *parent) -{ - struct devlink *devlink = devlink_port->devlink; - struct devlink_rate *devlink_rate; - - devl_assert_locked(devlink_port->devlink); - - if (WARN_ON(devlink_port->devlink_rate)) - return -EBUSY; - - devlink_rate = kzalloc(sizeof(*devlink_rate), GFP_KERNEL); - if (!devlink_rate) - return -ENOMEM; - - if (parent) { - devlink_rate->parent = parent; - refcount_inc(&devlink_rate->parent->refcnt); - } - - devlink_rate->type = DEVLINK_RATE_TYPE_LEAF; - devlink_rate->devlink = devlink; - devlink_rate->devlink_port = devlink_port; - devlink_rate->priv = priv; - list_add_tail(&devlink_rate->list, &devlink->rate_list); - devlink_port->devlink_rate = devlink_rate; - devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW); - - return 0; -} -EXPORT_SYMBOL_GPL(devl_rate_leaf_create); - -/** - * devl_rate_leaf_destroy - destroy devlink rate leaf - * - * @devlink_port: devlink port linked to the rate object - * - * Destroy the devlink rate object of type leaf on provided @devlink_port. - */ -void devl_rate_leaf_destroy(struct devlink_port *devlink_port) -{ - struct devlink_rate *devlink_rate = devlink_port->devlink_rate; - - devl_assert_locked(devlink_port->devlink); - if (!devlink_rate) - return; - - devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_DEL); - if (devlink_rate->parent) - refcount_dec(&devlink_rate->parent->refcnt); - list_del(&devlink_rate->list); - devlink_port->devlink_rate = NULL; - kfree(devlink_rate); -} -EXPORT_SYMBOL_GPL(devl_rate_leaf_destroy); - -/** - * devl_rate_nodes_destroy - destroy all devlink rate nodes on device - * @devlink: devlink instance - * - * Unset parent for all rate objects and destroy all rate nodes - * on specified device. - */ -void devl_rate_nodes_destroy(struct devlink *devlink) -{ - static struct devlink_rate *devlink_rate, *tmp; - const struct devlink_ops *ops = devlink->ops; - - devl_assert_locked(devlink); - - list_for_each_entry(devlink_rate, &devlink->rate_list, list) { - if (!devlink_rate->parent) - continue; - - refcount_dec(&devlink_rate->parent->refcnt); - if (devlink_rate_is_leaf(devlink_rate)) - ops->rate_leaf_parent_set(devlink_rate, NULL, devlink_rate->priv, - NULL, NULL); - else if (devlink_rate_is_node(devlink_rate)) - ops->rate_node_parent_set(devlink_rate, NULL, devlink_rate->priv, - NULL, NULL); - } - list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_list, list) { - if (devlink_rate_is_node(devlink_rate)) { - ops->rate_node_del(devlink_rate, devlink_rate->priv, NULL); - list_del(&devlink_rate->list); - kfree(devlink_rate->name); - kfree(devlink_rate); - } - } -} -EXPORT_SYMBOL_GPL(devl_rate_nodes_destroy); - -/** - * devlink_port_linecard_set - Link port with a linecard - * - * @devlink_port: devlink port - * @linecard: devlink linecard - */ -void devlink_port_linecard_set(struct devlink_port *devlink_port, - struct devlink_linecard *linecard) -{ - ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); - - devlink_port->linecard = linecard; -} -EXPORT_SYMBOL_GPL(devlink_port_linecard_set); - -static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, - char *name, size_t len) -{ - struct devlink_port_attrs *attrs = &devlink_port->attrs; - int n = 0; - - if (!devlink_port->attrs_set) - return -EOPNOTSUPP; - - switch (attrs->flavour) { - case DEVLINK_PORT_FLAVOUR_PHYSICAL: - if (devlink_port->linecard) - n = snprintf(name, len, "l%u", - devlink_port->linecard->index); - if (n < len) - n += snprintf(name + n, len - n, "p%u", - attrs->phys.port_number); - if (n < len && attrs->split) - n += snprintf(name + n, len - n, "s%u", - attrs->phys.split_subport_number); - break; - case DEVLINK_PORT_FLAVOUR_CPU: - case DEVLINK_PORT_FLAVOUR_DSA: - case DEVLINK_PORT_FLAVOUR_UNUSED: - /* As CPU and DSA ports do not have a netdevice associated - * case should not ever happen. - */ - WARN_ON(1); - return -EINVAL; - case DEVLINK_PORT_FLAVOUR_PCI_PF: - if (attrs->pci_pf.external) { - n = snprintf(name, len, "c%u", attrs->pci_pf.controller); - if (n >= len) - return -EINVAL; - len -= n; - name += n; - } - n = snprintf(name, len, "pf%u", attrs->pci_pf.pf); - break; - case DEVLINK_PORT_FLAVOUR_PCI_VF: - if (attrs->pci_vf.external) { - n = snprintf(name, len, "c%u", attrs->pci_vf.controller); - if (n >= len) - return -EINVAL; - len -= n; - name += n; - } - n = snprintf(name, len, "pf%uvf%u", - attrs->pci_vf.pf, attrs->pci_vf.vf); - break; - case DEVLINK_PORT_FLAVOUR_PCI_SF: - if (attrs->pci_sf.external) { - n = snprintf(name, len, "c%u", attrs->pci_sf.controller); - if (n >= len) - return -EINVAL; - len -= n; - name += n; - } - n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, - attrs->pci_sf.sf); - break; - case DEVLINK_PORT_FLAVOUR_VIRTUAL: - return -EOPNOTSUPP; - } - - if (n >= len) - return -EINVAL; - - return 0; -} - -static int devlink_linecard_types_init(struct devlink_linecard *linecard) -{ - struct devlink_linecard_type *linecard_type; - unsigned int count; - int i; - - count = linecard->ops->types_count(linecard, linecard->priv); - linecard->types = kmalloc_array(count, sizeof(*linecard_type), - GFP_KERNEL); - if (!linecard->types) - return -ENOMEM; - linecard->types_count = count; - - for (i = 0; i < count; i++) { - linecard_type = &linecard->types[i]; - linecard->ops->types_get(linecard, linecard->priv, i, - &linecard_type->type, - &linecard_type->priv); - } - return 0; -} - -static void devlink_linecard_types_fini(struct devlink_linecard *linecard) -{ - kfree(linecard->types); -} - -/** - * devlink_linecard_create - Create devlink linecard - * - * @devlink: devlink - * @linecard_index: driver-specific numerical identifier of the linecard - * @ops: linecards ops - * @priv: user priv pointer - * - * Create devlink linecard instance with provided linecard index. - * Caller can use any indexing, even hw-related one. - * - * Return: Line card structure or an ERR_PTR() encoded error code. - */ -struct devlink_linecard * -devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index, - const struct devlink_linecard_ops *ops, void *priv) -{ - struct devlink_linecard *linecard; - int err; - - if (WARN_ON(!ops || !ops->provision || !ops->unprovision || - !ops->types_count || !ops->types_get)) - return ERR_PTR(-EINVAL); - - mutex_lock(&devlink->linecards_lock); - if (devlink_linecard_index_exists(devlink, linecard_index)) { - mutex_unlock(&devlink->linecards_lock); - return ERR_PTR(-EEXIST); - } - - linecard = kzalloc(sizeof(*linecard), GFP_KERNEL); - if (!linecard) { - mutex_unlock(&devlink->linecards_lock); - return ERR_PTR(-ENOMEM); - } - - linecard->devlink = devlink; - linecard->index = linecard_index; - linecard->ops = ops; - linecard->priv = priv; - linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; - mutex_init(&linecard->state_lock); - - err = devlink_linecard_types_init(linecard); - if (err) { - mutex_destroy(&linecard->state_lock); - kfree(linecard); - mutex_unlock(&devlink->linecards_lock); - return ERR_PTR(err); - } - - list_add_tail(&linecard->list, &devlink->linecard_list); - refcount_set(&linecard->refcount, 1); - mutex_unlock(&devlink->linecards_lock); - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - return linecard; -} -EXPORT_SYMBOL_GPL(devlink_linecard_create); - -/** - * devlink_linecard_destroy - Destroy devlink linecard - * - * @linecard: devlink linecard - */ -void devlink_linecard_destroy(struct devlink_linecard *linecard) -{ - struct devlink *devlink = linecard->devlink; - - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); - mutex_lock(&devlink->linecards_lock); - list_del(&linecard->list); - devlink_linecard_types_fini(linecard); - mutex_unlock(&devlink->linecards_lock); - devlink_linecard_put(linecard); -} -EXPORT_SYMBOL_GPL(devlink_linecard_destroy); - -/** - * devlink_linecard_provision_set - Set provisioning on linecard - * - * @linecard: devlink linecard - * @type: linecard type - * - * This is either called directly from the provision() op call or - * as a result of the provision() op call asynchronously. - */ -void devlink_linecard_provision_set(struct devlink_linecard *linecard, - const char *type) -{ - mutex_lock(&linecard->state_lock); - WARN_ON(linecard->type && strcmp(linecard->type, type)); - linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED; - linecard->type = type; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); -} -EXPORT_SYMBOL_GPL(devlink_linecard_provision_set); - -/** - * devlink_linecard_provision_clear - Clear provisioning on linecard - * - * @linecard: devlink linecard - * - * This is either called directly from the unprovision() op call or - * as a result of the unprovision() op call asynchronously. - */ -void devlink_linecard_provision_clear(struct devlink_linecard *linecard) -{ - mutex_lock(&linecard->state_lock); - WARN_ON(linecard->nested_devlink); - linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; - linecard->type = NULL; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); -} -EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear); - -/** - * devlink_linecard_provision_fail - Fail provisioning on linecard - * - * @linecard: devlink linecard - * - * This is either called directly from the provision() op call or - * as a result of the provision() op call asynchronously. - */ -void devlink_linecard_provision_fail(struct devlink_linecard *linecard) -{ - mutex_lock(&linecard->state_lock); - WARN_ON(linecard->nested_devlink); - linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING_FAILED; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); -} -EXPORT_SYMBOL_GPL(devlink_linecard_provision_fail); - -/** - * devlink_linecard_activate - Set linecard active - * - * @linecard: devlink linecard - */ -void devlink_linecard_activate(struct devlink_linecard *linecard) -{ - mutex_lock(&linecard->state_lock); - WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_PROVISIONED); - linecard->state = DEVLINK_LINECARD_STATE_ACTIVE; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); -} -EXPORT_SYMBOL_GPL(devlink_linecard_activate); - -/** - * devlink_linecard_deactivate - Set linecard inactive - * - * @linecard: devlink linecard - */ -void devlink_linecard_deactivate(struct devlink_linecard *linecard) -{ - mutex_lock(&linecard->state_lock); - switch (linecard->state) { - case DEVLINK_LINECARD_STATE_ACTIVE: - linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - break; - case DEVLINK_LINECARD_STATE_UNPROVISIONING: - /* Line card is being deactivated as part - * of unprovisioning flow. - */ - break; - default: - WARN_ON(1); - break; - } - mutex_unlock(&linecard->state_lock); -} -EXPORT_SYMBOL_GPL(devlink_linecard_deactivate); - -/** - * devlink_linecard_nested_dl_set - Attach/detach nested devlink - * instance to linecard. - * - * @linecard: devlink linecard - * @nested_devlink: devlink instance to attach or NULL to detach - */ -void devlink_linecard_nested_dl_set(struct devlink_linecard *linecard, - struct devlink *nested_devlink) -{ - mutex_lock(&linecard->state_lock); - linecard->nested_devlink = nested_devlink; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); - mutex_unlock(&linecard->state_lock); -} -EXPORT_SYMBOL_GPL(devlink_linecard_nested_dl_set); - -int devl_sb_register(struct devlink *devlink, unsigned int sb_index, - u32 size, u16 ingress_pools_count, - u16 egress_pools_count, u16 ingress_tc_count, - u16 egress_tc_count) -{ - struct devlink_sb *devlink_sb; - - lockdep_assert_held(&devlink->lock); - - if (devlink_sb_index_exists(devlink, sb_index)) - return -EEXIST; - - devlink_sb = kzalloc(sizeof(*devlink_sb), GFP_KERNEL); - if (!devlink_sb) - return -ENOMEM; - devlink_sb->index = sb_index; - devlink_sb->size = size; - devlink_sb->ingress_pools_count = ingress_pools_count; - devlink_sb->egress_pools_count = egress_pools_count; - devlink_sb->ingress_tc_count = ingress_tc_count; - devlink_sb->egress_tc_count = egress_tc_count; - list_add_tail(&devlink_sb->list, &devlink->sb_list); - return 0; -} -EXPORT_SYMBOL_GPL(devl_sb_register); - -int devlink_sb_register(struct devlink *devlink, unsigned int sb_index, - u32 size, u16 ingress_pools_count, - u16 egress_pools_count, u16 ingress_tc_count, - u16 egress_tc_count) -{ - int err; - - devl_lock(devlink); - err = devl_sb_register(devlink, sb_index, size, ingress_pools_count, - egress_pools_count, ingress_tc_count, - egress_tc_count); - devl_unlock(devlink); - return err; -} -EXPORT_SYMBOL_GPL(devlink_sb_register); - -void devl_sb_unregister(struct devlink *devlink, unsigned int sb_index) -{ - struct devlink_sb *devlink_sb; - - lockdep_assert_held(&devlink->lock); - - devlink_sb = devlink_sb_get_by_index(devlink, sb_index); - WARN_ON(!devlink_sb); - list_del(&devlink_sb->list); - kfree(devlink_sb); -} -EXPORT_SYMBOL_GPL(devl_sb_unregister); - -void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index) -{ - devl_lock(devlink); - devl_sb_unregister(devlink, sb_index); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_sb_unregister); - -/** - * devl_dpipe_headers_register - register dpipe headers - * - * @devlink: devlink - * @dpipe_headers: dpipe header array - * - * Register the headers supported by hardware. - */ -void devl_dpipe_headers_register(struct devlink *devlink, - struct devlink_dpipe_headers *dpipe_headers) -{ - lockdep_assert_held(&devlink->lock); - - devlink->dpipe_headers = dpipe_headers; -} -EXPORT_SYMBOL_GPL(devl_dpipe_headers_register); - -/** - * devl_dpipe_headers_unregister - unregister dpipe headers - * - * @devlink: devlink - * - * Unregister the headers supported by hardware. - */ -void devl_dpipe_headers_unregister(struct devlink *devlink) -{ - lockdep_assert_held(&devlink->lock); - - devlink->dpipe_headers = NULL; -} -EXPORT_SYMBOL_GPL(devl_dpipe_headers_unregister); - -/** - * devlink_dpipe_table_counter_enabled - check if counter allocation - * required - * @devlink: devlink - * @table_name: tables name - * - * Used by driver to check if counter allocation is required. - * After counter allocation is turned on the table entries - * are updated to include counter statistics. - * - * After that point on the driver must respect the counter - * state so that each entry added to the table is added - * with a counter. - */ -bool devlink_dpipe_table_counter_enabled(struct devlink *devlink, - const char *table_name) -{ - struct devlink_dpipe_table *table; - bool enabled; - - rcu_read_lock(); - table = devlink_dpipe_table_find(&devlink->dpipe_table_list, - table_name, devlink); - enabled = false; - if (table) - enabled = table->counters_enabled; - rcu_read_unlock(); - return enabled; -} -EXPORT_SYMBOL_GPL(devlink_dpipe_table_counter_enabled); - -/** - * devl_dpipe_table_register - register dpipe table - * - * @devlink: devlink - * @table_name: table name - * @table_ops: table ops - * @priv: priv - * @counter_control_extern: external control for counters - */ -int devl_dpipe_table_register(struct devlink *devlink, - const char *table_name, - struct devlink_dpipe_table_ops *table_ops, - void *priv, bool counter_control_extern) -{ - struct devlink_dpipe_table *table; - - lockdep_assert_held(&devlink->lock); - - if (WARN_ON(!table_ops->size_get)) - return -EINVAL; - - if (devlink_dpipe_table_find(&devlink->dpipe_table_list, table_name, - devlink)) - return -EEXIST; - - table = kzalloc(sizeof(*table), GFP_KERNEL); - if (!table) - return -ENOMEM; - - table->name = table_name; - table->table_ops = table_ops; - table->priv = priv; - table->counter_control_extern = counter_control_extern; - - list_add_tail_rcu(&table->list, &devlink->dpipe_table_list); - - return 0; -} -EXPORT_SYMBOL_GPL(devl_dpipe_table_register); - -/** - * devl_dpipe_table_unregister - unregister dpipe table - * - * @devlink: devlink - * @table_name: table name - */ -void devl_dpipe_table_unregister(struct devlink *devlink, - const char *table_name) -{ - struct devlink_dpipe_table *table; - - lockdep_assert_held(&devlink->lock); - - table = devlink_dpipe_table_find(&devlink->dpipe_table_list, - table_name, devlink); - if (!table) - return; - list_del_rcu(&table->list); - kfree_rcu(table, rcu); -} -EXPORT_SYMBOL_GPL(devl_dpipe_table_unregister); - -/** - * devl_resource_register - devlink resource register - * - * @devlink: devlink - * @resource_name: resource's name - * @resource_size: resource's size - * @resource_id: resource's id - * @parent_resource_id: resource's parent id - * @size_params: size parameters - * - * Generic resources should reuse the same names across drivers. - * Please see the generic resources list at: - * Documentation/networking/devlink/devlink-resource.rst - */ -int devl_resource_register(struct devlink *devlink, - const char *resource_name, - u64 resource_size, - u64 resource_id, - u64 parent_resource_id, - const struct devlink_resource_size_params *size_params) -{ - struct devlink_resource *resource; - struct list_head *resource_list; - bool top_hierarchy; - - lockdep_assert_held(&devlink->lock); - - top_hierarchy = parent_resource_id == DEVLINK_RESOURCE_ID_PARENT_TOP; - - resource = devlink_resource_find(devlink, NULL, resource_id); - if (resource) - return -EINVAL; - - resource = kzalloc(sizeof(*resource), GFP_KERNEL); - if (!resource) - return -ENOMEM; - - if (top_hierarchy) { - resource_list = &devlink->resource_list; - } else { - struct devlink_resource *parent_resource; - - parent_resource = devlink_resource_find(devlink, NULL, - parent_resource_id); - if (parent_resource) { - resource_list = &parent_resource->resource_list; - resource->parent = parent_resource; - } else { - kfree(resource); - return -EINVAL; - } - } - - resource->name = resource_name; - resource->size = resource_size; - resource->size_new = resource_size; - resource->id = resource_id; - resource->size_valid = true; - memcpy(&resource->size_params, size_params, - sizeof(resource->size_params)); - INIT_LIST_HEAD(&resource->resource_list); - list_add_tail(&resource->list, resource_list); - - return 0; -} -EXPORT_SYMBOL_GPL(devl_resource_register); - -/** - * devlink_resource_register - devlink resource register - * - * @devlink: devlink - * @resource_name: resource's name - * @resource_size: resource's size - * @resource_id: resource's id - * @parent_resource_id: resource's parent id - * @size_params: size parameters - * - * Generic resources should reuse the same names across drivers. - * Please see the generic resources list at: - * Documentation/networking/devlink/devlink-resource.rst - * - * Context: Takes and release devlink->lock . - */ -int devlink_resource_register(struct devlink *devlink, - const char *resource_name, - u64 resource_size, - u64 resource_id, - u64 parent_resource_id, - const struct devlink_resource_size_params *size_params) -{ - int err; - - devl_lock(devlink); - err = devl_resource_register(devlink, resource_name, resource_size, - resource_id, parent_resource_id, size_params); - devl_unlock(devlink); - return err; -} -EXPORT_SYMBOL_GPL(devlink_resource_register); - -static void devlink_resource_unregister(struct devlink *devlink, - struct devlink_resource *resource) -{ - struct devlink_resource *tmp, *child_resource; - - list_for_each_entry_safe(child_resource, tmp, &resource->resource_list, - list) { - devlink_resource_unregister(devlink, child_resource); - list_del(&child_resource->list); - kfree(child_resource); - } -} - -/** - * devl_resources_unregister - free all resources - * - * @devlink: devlink - */ -void devl_resources_unregister(struct devlink *devlink) -{ - struct devlink_resource *tmp, *child_resource; - - lockdep_assert_held(&devlink->lock); - - list_for_each_entry_safe(child_resource, tmp, &devlink->resource_list, - list) { - devlink_resource_unregister(devlink, child_resource); - list_del(&child_resource->list); - kfree(child_resource); - } -} -EXPORT_SYMBOL_GPL(devl_resources_unregister); - -/** - * devlink_resources_unregister - free all resources - * - * @devlink: devlink - * - * Context: Takes and release devlink->lock . - */ -void devlink_resources_unregister(struct devlink *devlink) -{ - devl_lock(devlink); - devl_resources_unregister(devlink); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_resources_unregister); - -/** - * devl_resource_size_get - get and update size - * - * @devlink: devlink - * @resource_id: the requested resource id - * @p_resource_size: ptr to update - */ -int devl_resource_size_get(struct devlink *devlink, - u64 resource_id, - u64 *p_resource_size) -{ - struct devlink_resource *resource; - - lockdep_assert_held(&devlink->lock); - - resource = devlink_resource_find(devlink, NULL, resource_id); - if (!resource) - return -EINVAL; - *p_resource_size = resource->size_new; - resource->size = resource->size_new; - return 0; -} -EXPORT_SYMBOL_GPL(devl_resource_size_get); - -/** - * devl_dpipe_table_resource_set - set the resource id - * - * @devlink: devlink - * @table_name: table name - * @resource_id: resource id - * @resource_units: number of resource's units consumed per table's entry - */ -int devl_dpipe_table_resource_set(struct devlink *devlink, - const char *table_name, u64 resource_id, - u64 resource_units) -{ - struct devlink_dpipe_table *table; - - table = devlink_dpipe_table_find(&devlink->dpipe_table_list, - table_name, devlink); - if (!table) - return -EINVAL; - - table->resource_id = resource_id; - table->resource_units = resource_units; - table->resource_valid = true; - return 0; -} -EXPORT_SYMBOL_GPL(devl_dpipe_table_resource_set); - -/** - * devl_resource_occ_get_register - register occupancy getter - * - * @devlink: devlink - * @resource_id: resource id - * @occ_get: occupancy getter callback - * @occ_get_priv: occupancy getter callback priv - */ -void devl_resource_occ_get_register(struct devlink *devlink, - u64 resource_id, - devlink_resource_occ_get_t *occ_get, - void *occ_get_priv) -{ - struct devlink_resource *resource; - - lockdep_assert_held(&devlink->lock); - - resource = devlink_resource_find(devlink, NULL, resource_id); - if (WARN_ON(!resource)) - return; - WARN_ON(resource->occ_get); - - resource->occ_get = occ_get; - resource->occ_get_priv = occ_get_priv; -} -EXPORT_SYMBOL_GPL(devl_resource_occ_get_register); - -/** - * devlink_resource_occ_get_register - register occupancy getter - * - * @devlink: devlink - * @resource_id: resource id - * @occ_get: occupancy getter callback - * @occ_get_priv: occupancy getter callback priv - * - * Context: Takes and release devlink->lock . - */ -void devlink_resource_occ_get_register(struct devlink *devlink, - u64 resource_id, - devlink_resource_occ_get_t *occ_get, - void *occ_get_priv) -{ - devl_lock(devlink); - devl_resource_occ_get_register(devlink, resource_id, - occ_get, occ_get_priv); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_resource_occ_get_register); - -/** - * devl_resource_occ_get_unregister - unregister occupancy getter - * - * @devlink: devlink - * @resource_id: resource id - */ -void devl_resource_occ_get_unregister(struct devlink *devlink, - u64 resource_id) -{ - struct devlink_resource *resource; - - lockdep_assert_held(&devlink->lock); - - resource = devlink_resource_find(devlink, NULL, resource_id); - if (WARN_ON(!resource)) - return; - WARN_ON(!resource->occ_get); - - resource->occ_get = NULL; - resource->occ_get_priv = NULL; -} -EXPORT_SYMBOL_GPL(devl_resource_occ_get_unregister); - -/** - * devlink_resource_occ_get_unregister - unregister occupancy getter - * - * @devlink: devlink - * @resource_id: resource id - * - * Context: Takes and release devlink->lock . - */ -void devlink_resource_occ_get_unregister(struct devlink *devlink, - u64 resource_id) -{ - devl_lock(devlink); - devl_resource_occ_get_unregister(devlink, resource_id); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_resource_occ_get_unregister); - -static int devlink_param_verify(const struct devlink_param *param) -{ - if (!param || !param->name || !param->supported_cmodes) - return -EINVAL; - if (param->generic) - return devlink_param_generic_verify(param); - else - return devlink_param_driver_verify(param); -} - -/** - * devlink_params_register - register configuration parameters - * - * @devlink: devlink - * @params: configuration parameters array - * @params_count: number of parameters provided - * - * Register the configuration parameters supported by the driver. - */ -int devlink_params_register(struct devlink *devlink, - const struct devlink_param *params, - size_t params_count) -{ - const struct devlink_param *param = params; - int i, err; - - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - for (i = 0; i < params_count; i++, param++) { - err = devlink_param_register(devlink, param); - if (err) - goto rollback; - } - return 0; - -rollback: - if (!i) - return err; - - for (param--; i > 0; i--, param--) - devlink_param_unregister(devlink, param); - return err; -} -EXPORT_SYMBOL_GPL(devlink_params_register); - -/** - * devlink_params_unregister - unregister configuration parameters - * @devlink: devlink - * @params: configuration parameters to unregister - * @params_count: number of parameters provided - */ -void devlink_params_unregister(struct devlink *devlink, - const struct devlink_param *params, - size_t params_count) -{ - const struct devlink_param *param = params; - int i; - - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - for (i = 0; i < params_count; i++, param++) - devlink_param_unregister(devlink, param); -} -EXPORT_SYMBOL_GPL(devlink_params_unregister); - -/** - * devlink_param_register - register one configuration parameter - * - * @devlink: devlink - * @param: one configuration parameter - * - * Register the configuration parameter supported by the driver. - * Return: returns 0 on successful registration or error code otherwise. - */ -int devlink_param_register(struct devlink *devlink, - const struct devlink_param *param) -{ - struct devlink_param_item *param_item; - - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - WARN_ON(devlink_param_verify(param)); - WARN_ON(devlink_param_find_by_name(&devlink->param_list, param->name)); - - if (param->supported_cmodes == BIT(DEVLINK_PARAM_CMODE_DRIVERINIT)) - WARN_ON(param->get || param->set); - else - WARN_ON(!param->get || !param->set); - - param_item = kzalloc(sizeof(*param_item), GFP_KERNEL); - if (!param_item) - return -ENOMEM; - - param_item->param = param; - - list_add_tail(¶m_item->list, &devlink->param_list); - return 0; -} -EXPORT_SYMBOL_GPL(devlink_param_register); - -/** - * devlink_param_unregister - unregister one configuration parameter - * @devlink: devlink - * @param: configuration parameter to unregister - */ -void devlink_param_unregister(struct devlink *devlink, - const struct devlink_param *param) -{ - struct devlink_param_item *param_item; - - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - param_item = - devlink_param_find_by_name(&devlink->param_list, param->name); - WARN_ON(!param_item); - list_del(¶m_item->list); - kfree(param_item); -} -EXPORT_SYMBOL_GPL(devlink_param_unregister); - -/** - * devlink_param_driverinit_value_get - get configuration parameter - * value for driver initializing - * - * @devlink: devlink - * @param_id: parameter ID - * @init_val: value of parameter in driverinit configuration mode - * - * This function should be used by the driver to get driverinit - * configuration for initialization after reload command. - */ -int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, - union devlink_param_value *init_val) -{ - struct devlink_param_item *param_item; - - if (!devlink_reload_supported(devlink->ops)) - return -EOPNOTSUPP; - - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); - if (!param_item) - return -EINVAL; - - if (!param_item->driverinit_value_valid || - !devlink_param_cmode_is_supported(param_item->param, - DEVLINK_PARAM_CMODE_DRIVERINIT)) - return -EOPNOTSUPP; - - if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) - strcpy(init_val->vstr, param_item->driverinit_value.vstr); - else - *init_val = param_item->driverinit_value; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_get); - -/** - * devlink_param_driverinit_value_set - set value of configuration - * parameter for driverinit - * configuration mode - * - * @devlink: devlink - * @param_id: parameter ID - * @init_val: value of parameter to set for driverinit configuration mode - * - * This function should be used by the driver to set driverinit - * configuration mode default value. - */ -int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, - union devlink_param_value init_val) -{ - struct devlink_param_item *param_item; - - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); - if (!param_item) - return -EINVAL; - - if (!devlink_param_cmode_is_supported(param_item->param, - DEVLINK_PARAM_CMODE_DRIVERINIT)) - return -EOPNOTSUPP; - - if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) - strcpy(param_item->driverinit_value.vstr, init_val.vstr); - else - param_item->driverinit_value = init_val; - param_item->driverinit_value_valid = true; - return 0; -} -EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_set); - -/** - * devlink_param_value_changed - notify devlink on a parameter's value - * change. Should be called by the driver - * right after the change. - * - * @devlink: devlink - * @param_id: parameter ID - * - * This function should be used by the driver to notify devlink on value - * change, excluding driverinit configuration mode. - * For driverinit configuration mode driver should use the function - */ -void devlink_param_value_changed(struct devlink *devlink, u32 param_id) -{ - struct devlink_param_item *param_item; - - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); - WARN_ON(!param_item); - - devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); -} -EXPORT_SYMBOL_GPL(devlink_param_value_changed); - -/** - * devl_region_create - create a new address region - * - * @devlink: devlink - * @ops: region operations and name - * @region_max_snapshots: Maximum supported number of snapshots for region - * @region_size: size of region - */ -struct devlink_region *devl_region_create(struct devlink *devlink, - const struct devlink_region_ops *ops, - u32 region_max_snapshots, - u64 region_size) -{ - struct devlink_region *region; - - devl_assert_locked(devlink); - - if (WARN_ON(!ops) || WARN_ON(!ops->destructor)) - return ERR_PTR(-EINVAL); - - if (devlink_region_get_by_name(devlink, ops->name)) - return ERR_PTR(-EEXIST); - - region = kzalloc(sizeof(*region), GFP_KERNEL); - if (!region) - return ERR_PTR(-ENOMEM); - - region->devlink = devlink; - region->max_snapshots = region_max_snapshots; - region->ops = ops; - region->size = region_size; - INIT_LIST_HEAD(®ion->snapshot_list); - mutex_init(®ion->snapshot_lock); - list_add_tail(®ion->list, &devlink->region_list); - devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); - - return region; -} -EXPORT_SYMBOL_GPL(devl_region_create); - -/** - * devlink_region_create - create a new address region - * - * @devlink: devlink - * @ops: region operations and name - * @region_max_snapshots: Maximum supported number of snapshots for region - * @region_size: size of region - * - * Context: Takes and release devlink->lock . - */ -struct devlink_region * -devlink_region_create(struct devlink *devlink, - const struct devlink_region_ops *ops, - u32 region_max_snapshots, u64 region_size) -{ - struct devlink_region *region; - - devl_lock(devlink); - region = devl_region_create(devlink, ops, region_max_snapshots, - region_size); - devl_unlock(devlink); - return region; -} -EXPORT_SYMBOL_GPL(devlink_region_create); - -/** - * devlink_port_region_create - create a new address region for a port - * - * @port: devlink port - * @ops: region operations and name - * @region_max_snapshots: Maximum supported number of snapshots for region - * @region_size: size of region - * - * Context: Takes and release devlink->lock . - */ -struct devlink_region * -devlink_port_region_create(struct devlink_port *port, - const struct devlink_port_region_ops *ops, - u32 region_max_snapshots, u64 region_size) -{ - struct devlink *devlink = port->devlink; - struct devlink_region *region; - int err = 0; - - ASSERT_DEVLINK_PORT_INITIALIZED(port); - - if (WARN_ON(!ops) || WARN_ON(!ops->destructor)) - return ERR_PTR(-EINVAL); - - devl_lock(devlink); - - if (devlink_port_region_get_by_name(port, ops->name)) { - err = -EEXIST; - goto unlock; - } - - region = kzalloc(sizeof(*region), GFP_KERNEL); - if (!region) { - err = -ENOMEM; - goto unlock; - } - - region->devlink = devlink; - region->port = port; - region->max_snapshots = region_max_snapshots; - region->port_ops = ops; - region->size = region_size; - INIT_LIST_HEAD(®ion->snapshot_list); - mutex_init(®ion->snapshot_lock); - list_add_tail(®ion->list, &port->region_list); - devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); - - devl_unlock(devlink); - return region; - -unlock: - devl_unlock(devlink); - return ERR_PTR(err); -} -EXPORT_SYMBOL_GPL(devlink_port_region_create); - -/** - * devl_region_destroy - destroy address region - * - * @region: devlink region to destroy - */ -void devl_region_destroy(struct devlink_region *region) -{ - struct devlink *devlink = region->devlink; - struct devlink_snapshot *snapshot, *ts; - - devl_assert_locked(devlink); - - /* Free all snapshots of region */ - mutex_lock(®ion->snapshot_lock); - list_for_each_entry_safe(snapshot, ts, ®ion->snapshot_list, list) - devlink_region_snapshot_del(region, snapshot); - mutex_unlock(®ion->snapshot_lock); - - list_del(®ion->list); - mutex_destroy(®ion->snapshot_lock); - - devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_DEL); - kfree(region); -} -EXPORT_SYMBOL_GPL(devl_region_destroy); - -/** - * devlink_region_destroy - destroy address region - * - * @region: devlink region to destroy - * - * Context: Takes and release devlink->lock . - */ -void devlink_region_destroy(struct devlink_region *region) -{ - struct devlink *devlink = region->devlink; - - devl_lock(devlink); - devl_region_destroy(region); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_region_destroy); - -/** - * devlink_region_snapshot_id_get - get snapshot ID - * - * This callback should be called when adding a new snapshot, - * Driver should use the same id for multiple snapshots taken - * on multiple regions at the same time/by the same trigger. - * - * The caller of this function must use devlink_region_snapshot_id_put - * when finished creating regions using this id. - * - * Returns zero on success, or a negative error code on failure. - * - * @devlink: devlink - * @id: storage to return id - */ -int devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id) -{ - return __devlink_region_snapshot_id_get(devlink, id); -} -EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get); - -/** - * devlink_region_snapshot_id_put - put snapshot ID reference - * - * This should be called by a driver after finishing creating snapshots - * with an id. Doing so ensures that the ID can later be released in the - * event that all snapshots using it have been destroyed. - * - * @devlink: devlink - * @id: id to release reference on - */ -void devlink_region_snapshot_id_put(struct devlink *devlink, u32 id) -{ - __devlink_snapshot_id_decrement(devlink, id); -} -EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_put); - -/** - * devlink_region_snapshot_create - create a new snapshot - * This will add a new snapshot of a region. The snapshot - * will be stored on the region struct and can be accessed - * from devlink. This is useful for future analyses of snapshots. - * Multiple snapshots can be created on a region. - * The @snapshot_id should be obtained using the getter function. - * - * @region: devlink region of the snapshot - * @data: snapshot data - * @snapshot_id: snapshot id to be created - */ -int devlink_region_snapshot_create(struct devlink_region *region, - u8 *data, u32 snapshot_id) -{ - int err; - - mutex_lock(®ion->snapshot_lock); - err = __devlink_region_snapshot_create(region, data, snapshot_id); - mutex_unlock(®ion->snapshot_lock); - return err; -} -EXPORT_SYMBOL_GPL(devlink_region_snapshot_create); - -#define DEVLINK_TRAP(_id, _type) \ - { \ - .type = DEVLINK_TRAP_TYPE_##_type, \ - .id = DEVLINK_TRAP_GENERIC_ID_##_id, \ - .name = DEVLINK_TRAP_GENERIC_NAME_##_id, \ - } - -static const struct devlink_trap devlink_trap_generic[] = { - DEVLINK_TRAP(SMAC_MC, DROP), - DEVLINK_TRAP(VLAN_TAG_MISMATCH, DROP), - DEVLINK_TRAP(INGRESS_VLAN_FILTER, DROP), - DEVLINK_TRAP(INGRESS_STP_FILTER, DROP), - DEVLINK_TRAP(EMPTY_TX_LIST, DROP), - DEVLINK_TRAP(PORT_LOOPBACK_FILTER, DROP), - DEVLINK_TRAP(BLACKHOLE_ROUTE, DROP), - DEVLINK_TRAP(TTL_ERROR, EXCEPTION), - DEVLINK_TRAP(TAIL_DROP, DROP), - DEVLINK_TRAP(NON_IP_PACKET, DROP), - DEVLINK_TRAP(UC_DIP_MC_DMAC, DROP), - DEVLINK_TRAP(DIP_LB, DROP), - DEVLINK_TRAP(SIP_MC, DROP), - DEVLINK_TRAP(SIP_LB, DROP), - DEVLINK_TRAP(CORRUPTED_IP_HDR, DROP), - DEVLINK_TRAP(IPV4_SIP_BC, DROP), - DEVLINK_TRAP(IPV6_MC_DIP_RESERVED_SCOPE, DROP), - DEVLINK_TRAP(IPV6_MC_DIP_INTERFACE_LOCAL_SCOPE, DROP), - DEVLINK_TRAP(MTU_ERROR, EXCEPTION), - DEVLINK_TRAP(UNRESOLVED_NEIGH, EXCEPTION), - DEVLINK_TRAP(RPF, EXCEPTION), - DEVLINK_TRAP(REJECT_ROUTE, EXCEPTION), - DEVLINK_TRAP(IPV4_LPM_UNICAST_MISS, EXCEPTION), - DEVLINK_TRAP(IPV6_LPM_UNICAST_MISS, EXCEPTION), - DEVLINK_TRAP(NON_ROUTABLE, DROP), - DEVLINK_TRAP(DECAP_ERROR, EXCEPTION), - DEVLINK_TRAP(OVERLAY_SMAC_MC, DROP), - DEVLINK_TRAP(INGRESS_FLOW_ACTION_DROP, DROP), - DEVLINK_TRAP(EGRESS_FLOW_ACTION_DROP, DROP), - DEVLINK_TRAP(STP, CONTROL), - DEVLINK_TRAP(LACP, CONTROL), - DEVLINK_TRAP(LLDP, CONTROL), - DEVLINK_TRAP(IGMP_QUERY, CONTROL), - DEVLINK_TRAP(IGMP_V1_REPORT, CONTROL), - DEVLINK_TRAP(IGMP_V2_REPORT, CONTROL), - DEVLINK_TRAP(IGMP_V3_REPORT, CONTROL), - DEVLINK_TRAP(IGMP_V2_LEAVE, CONTROL), - DEVLINK_TRAP(MLD_QUERY, CONTROL), - DEVLINK_TRAP(MLD_V1_REPORT, CONTROL), - DEVLINK_TRAP(MLD_V2_REPORT, CONTROL), - DEVLINK_TRAP(MLD_V1_DONE, CONTROL), - DEVLINK_TRAP(IPV4_DHCP, CONTROL), - DEVLINK_TRAP(IPV6_DHCP, CONTROL), - DEVLINK_TRAP(ARP_REQUEST, CONTROL), - DEVLINK_TRAP(ARP_RESPONSE, CONTROL), - DEVLINK_TRAP(ARP_OVERLAY, CONTROL), - DEVLINK_TRAP(IPV6_NEIGH_SOLICIT, CONTROL), - DEVLINK_TRAP(IPV6_NEIGH_ADVERT, CONTROL), - DEVLINK_TRAP(IPV4_BFD, CONTROL), - DEVLINK_TRAP(IPV6_BFD, CONTROL), - DEVLINK_TRAP(IPV4_OSPF, CONTROL), - DEVLINK_TRAP(IPV6_OSPF, CONTROL), - DEVLINK_TRAP(IPV4_BGP, CONTROL), - DEVLINK_TRAP(IPV6_BGP, CONTROL), - DEVLINK_TRAP(IPV4_VRRP, CONTROL), - DEVLINK_TRAP(IPV6_VRRP, CONTROL), - DEVLINK_TRAP(IPV4_PIM, CONTROL), - DEVLINK_TRAP(IPV6_PIM, CONTROL), - DEVLINK_TRAP(UC_LB, CONTROL), - DEVLINK_TRAP(LOCAL_ROUTE, CONTROL), - DEVLINK_TRAP(EXTERNAL_ROUTE, CONTROL), - DEVLINK_TRAP(IPV6_UC_DIP_LINK_LOCAL_SCOPE, CONTROL), - DEVLINK_TRAP(IPV6_DIP_ALL_NODES, CONTROL), - DEVLINK_TRAP(IPV6_DIP_ALL_ROUTERS, CONTROL), - DEVLINK_TRAP(IPV6_ROUTER_SOLICIT, CONTROL), - DEVLINK_TRAP(IPV6_ROUTER_ADVERT, CONTROL), - DEVLINK_TRAP(IPV6_REDIRECT, CONTROL), - DEVLINK_TRAP(IPV4_ROUTER_ALERT, CONTROL), - DEVLINK_TRAP(IPV6_ROUTER_ALERT, CONTROL), - DEVLINK_TRAP(PTP_EVENT, CONTROL), - DEVLINK_TRAP(PTP_GENERAL, CONTROL), - DEVLINK_TRAP(FLOW_ACTION_SAMPLE, CONTROL), - DEVLINK_TRAP(FLOW_ACTION_TRAP, CONTROL), - DEVLINK_TRAP(EARLY_DROP, DROP), - DEVLINK_TRAP(VXLAN_PARSING, DROP), - DEVLINK_TRAP(LLC_SNAP_PARSING, DROP), - DEVLINK_TRAP(VLAN_PARSING, DROP), - DEVLINK_TRAP(PPPOE_PPP_PARSING, DROP), - DEVLINK_TRAP(MPLS_PARSING, DROP), - DEVLINK_TRAP(ARP_PARSING, DROP), - DEVLINK_TRAP(IP_1_PARSING, DROP), - DEVLINK_TRAP(IP_N_PARSING, DROP), - DEVLINK_TRAP(GRE_PARSING, DROP), - DEVLINK_TRAP(UDP_PARSING, DROP), - DEVLINK_TRAP(TCP_PARSING, DROP), - DEVLINK_TRAP(IPSEC_PARSING, DROP), - DEVLINK_TRAP(SCTP_PARSING, DROP), - DEVLINK_TRAP(DCCP_PARSING, DROP), - DEVLINK_TRAP(GTP_PARSING, DROP), - DEVLINK_TRAP(ESP_PARSING, DROP), - DEVLINK_TRAP(BLACKHOLE_NEXTHOP, DROP), - DEVLINK_TRAP(DMAC_FILTER, DROP), - DEVLINK_TRAP(EAPOL, CONTROL), - DEVLINK_TRAP(LOCKED_PORT, DROP), -}; - -#define DEVLINK_TRAP_GROUP(_id) \ - { \ - .id = DEVLINK_TRAP_GROUP_GENERIC_ID_##_id, \ - .name = DEVLINK_TRAP_GROUP_GENERIC_NAME_##_id, \ - } - -static const struct devlink_trap_group devlink_trap_group_generic[] = { - DEVLINK_TRAP_GROUP(L2_DROPS), - DEVLINK_TRAP_GROUP(L3_DROPS), - DEVLINK_TRAP_GROUP(L3_EXCEPTIONS), - DEVLINK_TRAP_GROUP(BUFFER_DROPS), - DEVLINK_TRAP_GROUP(TUNNEL_DROPS), - DEVLINK_TRAP_GROUP(ACL_DROPS), - DEVLINK_TRAP_GROUP(STP), - DEVLINK_TRAP_GROUP(LACP), - DEVLINK_TRAP_GROUP(LLDP), - DEVLINK_TRAP_GROUP(MC_SNOOPING), - DEVLINK_TRAP_GROUP(DHCP), - DEVLINK_TRAP_GROUP(NEIGH_DISCOVERY), - DEVLINK_TRAP_GROUP(BFD), - DEVLINK_TRAP_GROUP(OSPF), - DEVLINK_TRAP_GROUP(BGP), - DEVLINK_TRAP_GROUP(VRRP), - DEVLINK_TRAP_GROUP(PIM), - DEVLINK_TRAP_GROUP(UC_LB), - DEVLINK_TRAP_GROUP(LOCAL_DELIVERY), - DEVLINK_TRAP_GROUP(EXTERNAL_DELIVERY), - DEVLINK_TRAP_GROUP(IPV6), - DEVLINK_TRAP_GROUP(PTP_EVENT), - DEVLINK_TRAP_GROUP(PTP_GENERAL), - DEVLINK_TRAP_GROUP(ACL_SAMPLE), - DEVLINK_TRAP_GROUP(ACL_TRAP), - DEVLINK_TRAP_GROUP(PARSER_ERROR_DROPS), - DEVLINK_TRAP_GROUP(EAPOL), -}; - -static int devlink_trap_generic_verify(const struct devlink_trap *trap) -{ - if (trap->id > DEVLINK_TRAP_GENERIC_ID_MAX) - return -EINVAL; - - if (strcmp(trap->name, devlink_trap_generic[trap->id].name)) - return -EINVAL; - - if (trap->type != devlink_trap_generic[trap->id].type) - return -EINVAL; - - return 0; -} - -static int devlink_trap_driver_verify(const struct devlink_trap *trap) -{ - int i; - - if (trap->id <= DEVLINK_TRAP_GENERIC_ID_MAX) - return -EINVAL; - - for (i = 0; i < ARRAY_SIZE(devlink_trap_generic); i++) { - if (!strcmp(trap->name, devlink_trap_generic[i].name)) - return -EEXIST; - } - - return 0; -} - -static int devlink_trap_verify(const struct devlink_trap *trap) -{ - if (!trap || !trap->name) - return -EINVAL; - - if (trap->generic) - return devlink_trap_generic_verify(trap); - else - return devlink_trap_driver_verify(trap); -} - -static int -devlink_trap_group_generic_verify(const struct devlink_trap_group *group) -{ - if (group->id > DEVLINK_TRAP_GROUP_GENERIC_ID_MAX) - return -EINVAL; - - if (strcmp(group->name, devlink_trap_group_generic[group->id].name)) - return -EINVAL; - - return 0; -} - -static int -devlink_trap_group_driver_verify(const struct devlink_trap_group *group) -{ - int i; - - if (group->id <= DEVLINK_TRAP_GROUP_GENERIC_ID_MAX) - return -EINVAL; - - for (i = 0; i < ARRAY_SIZE(devlink_trap_group_generic); i++) { - if (!strcmp(group->name, devlink_trap_group_generic[i].name)) - return -EEXIST; - } - - return 0; -} - -static int devlink_trap_group_verify(const struct devlink_trap_group *group) -{ - if (group->generic) - return devlink_trap_group_generic_verify(group); - else - return devlink_trap_group_driver_verify(group); -} - -static void -devlink_trap_group_notify(struct devlink *devlink, - const struct devlink_trap_group_item *group_item, - enum devlink_command cmd) -{ - struct sk_buff *msg; - int err; - - WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_GROUP_NEW && - cmd != DEVLINK_CMD_TRAP_GROUP_DEL); - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_trap_group_fill(msg, devlink, group_item, cmd, 0, 0, - 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int -devlink_trap_item_group_link(struct devlink *devlink, - struct devlink_trap_item *trap_item) -{ - u16 group_id = trap_item->trap->init_group_id; - struct devlink_trap_group_item *group_item; - - group_item = devlink_trap_group_item_lookup_by_id(devlink, group_id); - if (WARN_ON_ONCE(!group_item)) - return -EINVAL; - - trap_item->group_item = group_item; - - return 0; -} - -static void devlink_trap_notify(struct devlink *devlink, - const struct devlink_trap_item *trap_item, - enum devlink_command cmd) -{ - struct sk_buff *msg; - int err; - - WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_NEW && - cmd != DEVLINK_CMD_TRAP_DEL); - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_trap_fill(msg, devlink, trap_item, cmd, 0, 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int -devlink_trap_register(struct devlink *devlink, - const struct devlink_trap *trap, void *priv) -{ - struct devlink_trap_item *trap_item; - int err; - - if (devlink_trap_item_lookup(devlink, trap->name)) - return -EEXIST; - - trap_item = kzalloc(sizeof(*trap_item), GFP_KERNEL); - if (!trap_item) - return -ENOMEM; - - trap_item->stats = netdev_alloc_pcpu_stats(struct devlink_stats); - if (!trap_item->stats) { - err = -ENOMEM; - goto err_stats_alloc; - } - - trap_item->trap = trap; - trap_item->action = trap->init_action; - trap_item->priv = priv; - - err = devlink_trap_item_group_link(devlink, trap_item); - if (err) - goto err_group_link; - - err = devlink->ops->trap_init(devlink, trap, trap_item); - if (err) - goto err_trap_init; - - list_add_tail(&trap_item->list, &devlink->trap_list); - devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_NEW); - - return 0; - -err_trap_init: -err_group_link: - free_percpu(trap_item->stats); -err_stats_alloc: - kfree(trap_item); - return err; -} - -static void devlink_trap_unregister(struct devlink *devlink, - const struct devlink_trap *trap) -{ - struct devlink_trap_item *trap_item; - - trap_item = devlink_trap_item_lookup(devlink, trap->name); - if (WARN_ON_ONCE(!trap_item)) - return; - - devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_DEL); - list_del(&trap_item->list); - if (devlink->ops->trap_fini) - devlink->ops->trap_fini(devlink, trap, trap_item); - free_percpu(trap_item->stats); - kfree(trap_item); -} - -static void devlink_trap_disable(struct devlink *devlink, - const struct devlink_trap *trap) -{ - struct devlink_trap_item *trap_item; - - trap_item = devlink_trap_item_lookup(devlink, trap->name); - if (WARN_ON_ONCE(!trap_item)) - return; - - devlink->ops->trap_action_set(devlink, trap, DEVLINK_TRAP_ACTION_DROP, - NULL); - trap_item->action = DEVLINK_TRAP_ACTION_DROP; -} - -/** - * devl_traps_register - Register packet traps with devlink. - * @devlink: devlink. - * @traps: Packet traps. - * @traps_count: Count of provided packet traps. - * @priv: Driver private information. - * - * Return: Non-zero value on failure. - */ -int devl_traps_register(struct devlink *devlink, - const struct devlink_trap *traps, - size_t traps_count, void *priv) -{ - int i, err; - - if (!devlink->ops->trap_init || !devlink->ops->trap_action_set) - return -EINVAL; - - devl_assert_locked(devlink); - for (i = 0; i < traps_count; i++) { - const struct devlink_trap *trap = &traps[i]; - - err = devlink_trap_verify(trap); - if (err) - goto err_trap_verify; - - err = devlink_trap_register(devlink, trap, priv); - if (err) - goto err_trap_register; - } - - return 0; - -err_trap_register: -err_trap_verify: - for (i--; i >= 0; i--) - devlink_trap_unregister(devlink, &traps[i]); - return err; -} -EXPORT_SYMBOL_GPL(devl_traps_register); - -/** - * devlink_traps_register - Register packet traps with devlink. - * @devlink: devlink. - * @traps: Packet traps. - * @traps_count: Count of provided packet traps. - * @priv: Driver private information. - * - * Context: Takes and release devlink->lock . - * - * Return: Non-zero value on failure. - */ -int devlink_traps_register(struct devlink *devlink, - const struct devlink_trap *traps, - size_t traps_count, void *priv) -{ - int err; - - devl_lock(devlink); - err = devl_traps_register(devlink, traps, traps_count, priv); - devl_unlock(devlink); - return err; -} -EXPORT_SYMBOL_GPL(devlink_traps_register); - -/** - * devl_traps_unregister - Unregister packet traps from devlink. - * @devlink: devlink. - * @traps: Packet traps. - * @traps_count: Count of provided packet traps. - */ -void devl_traps_unregister(struct devlink *devlink, - const struct devlink_trap *traps, - size_t traps_count) -{ - int i; - - devl_assert_locked(devlink); - /* Make sure we do not have any packets in-flight while unregistering - * traps by disabling all of them and waiting for a grace period. - */ - for (i = traps_count - 1; i >= 0; i--) - devlink_trap_disable(devlink, &traps[i]); - synchronize_rcu(); - for (i = traps_count - 1; i >= 0; i--) - devlink_trap_unregister(devlink, &traps[i]); -} -EXPORT_SYMBOL_GPL(devl_traps_unregister); - -/** - * devlink_traps_unregister - Unregister packet traps from devlink. - * @devlink: devlink. - * @traps: Packet traps. - * @traps_count: Count of provided packet traps. - * - * Context: Takes and release devlink->lock . - */ -void devlink_traps_unregister(struct devlink *devlink, - const struct devlink_trap *traps, - size_t traps_count) -{ - devl_lock(devlink); - devl_traps_unregister(devlink, traps, traps_count); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_traps_unregister); - -static void -devlink_trap_stats_update(struct devlink_stats __percpu *trap_stats, - size_t skb_len) -{ - struct devlink_stats *stats; - - stats = this_cpu_ptr(trap_stats); - u64_stats_update_begin(&stats->syncp); - u64_stats_add(&stats->rx_bytes, skb_len); - u64_stats_inc(&stats->rx_packets); - u64_stats_update_end(&stats->syncp); -} - -static void -devlink_trap_report_metadata_set(struct devlink_trap_metadata *metadata, - const struct devlink_trap_item *trap_item, - struct devlink_port *in_devlink_port, - const struct flow_action_cookie *fa_cookie) -{ - metadata->trap_name = trap_item->trap->name; - metadata->trap_group_name = trap_item->group_item->group->name; - metadata->fa_cookie = fa_cookie; - metadata->trap_type = trap_item->trap->type; - - spin_lock(&in_devlink_port->type_lock); - if (in_devlink_port->type == DEVLINK_PORT_TYPE_ETH) - metadata->input_dev = in_devlink_port->type_eth.netdev; - spin_unlock(&in_devlink_port->type_lock); -} - -/** - * devlink_trap_report - Report trapped packet to drop monitor. - * @devlink: devlink. - * @skb: Trapped packet. - * @trap_ctx: Trap context. - * @in_devlink_port: Input devlink port. - * @fa_cookie: Flow action cookie. Could be NULL. - */ -void devlink_trap_report(struct devlink *devlink, struct sk_buff *skb, - void *trap_ctx, struct devlink_port *in_devlink_port, - const struct flow_action_cookie *fa_cookie) - -{ - struct devlink_trap_item *trap_item = trap_ctx; - - devlink_trap_stats_update(trap_item->stats, skb->len); - devlink_trap_stats_update(trap_item->group_item->stats, skb->len); - - if (trace_devlink_trap_report_enabled()) { - struct devlink_trap_metadata metadata = {}; - - devlink_trap_report_metadata_set(&metadata, trap_item, - in_devlink_port, fa_cookie); - trace_devlink_trap_report(devlink, skb, &metadata); - } -} -EXPORT_SYMBOL_GPL(devlink_trap_report); - -/** - * devlink_trap_ctx_priv - Trap context to driver private information. - * @trap_ctx: Trap context. - * - * Return: Driver private information passed during registration. - */ -void *devlink_trap_ctx_priv(void *trap_ctx) -{ - struct devlink_trap_item *trap_item = trap_ctx; - - return trap_item->priv; -} -EXPORT_SYMBOL_GPL(devlink_trap_ctx_priv); - -static int -devlink_trap_group_item_policer_link(struct devlink *devlink, - struct devlink_trap_group_item *group_item) -{ - u32 policer_id = group_item->group->init_policer_id; - struct devlink_trap_policer_item *policer_item; - - if (policer_id == 0) - return 0; - - policer_item = devlink_trap_policer_item_lookup(devlink, policer_id); - if (WARN_ON_ONCE(!policer_item)) - return -EINVAL; - - group_item->policer_item = policer_item; - - return 0; -} - -static int -devlink_trap_group_register(struct devlink *devlink, - const struct devlink_trap_group *group) -{ - struct devlink_trap_group_item *group_item; - int err; - - if (devlink_trap_group_item_lookup(devlink, group->name)) - return -EEXIST; - - group_item = kzalloc(sizeof(*group_item), GFP_KERNEL); - if (!group_item) - return -ENOMEM; - - group_item->stats = netdev_alloc_pcpu_stats(struct devlink_stats); - if (!group_item->stats) { - err = -ENOMEM; - goto err_stats_alloc; - } - - group_item->group = group; - - err = devlink_trap_group_item_policer_link(devlink, group_item); - if (err) - goto err_policer_link; - - if (devlink->ops->trap_group_init) { - err = devlink->ops->trap_group_init(devlink, group); - if (err) - goto err_group_init; - } - - list_add_tail(&group_item->list, &devlink->trap_group_list); - devlink_trap_group_notify(devlink, group_item, - DEVLINK_CMD_TRAP_GROUP_NEW); - - return 0; - -err_group_init: -err_policer_link: - free_percpu(group_item->stats); -err_stats_alloc: - kfree(group_item); - return err; -} - -static void -devlink_trap_group_unregister(struct devlink *devlink, - const struct devlink_trap_group *group) -{ - struct devlink_trap_group_item *group_item; - - group_item = devlink_trap_group_item_lookup(devlink, group->name); - if (WARN_ON_ONCE(!group_item)) - return; - - devlink_trap_group_notify(devlink, group_item, - DEVLINK_CMD_TRAP_GROUP_DEL); - list_del(&group_item->list); - free_percpu(group_item->stats); - kfree(group_item); -} - -/** - * devl_trap_groups_register - Register packet trap groups with devlink. - * @devlink: devlink. - * @groups: Packet trap groups. - * @groups_count: Count of provided packet trap groups. - * - * Return: Non-zero value on failure. - */ -int devl_trap_groups_register(struct devlink *devlink, - const struct devlink_trap_group *groups, - size_t groups_count) -{ - int i, err; - - devl_assert_locked(devlink); - for (i = 0; i < groups_count; i++) { - const struct devlink_trap_group *group = &groups[i]; - - err = devlink_trap_group_verify(group); - if (err) - goto err_trap_group_verify; - - err = devlink_trap_group_register(devlink, group); - if (err) - goto err_trap_group_register; - } - - return 0; - -err_trap_group_register: -err_trap_group_verify: - for (i--; i >= 0; i--) - devlink_trap_group_unregister(devlink, &groups[i]); - return err; -} -EXPORT_SYMBOL_GPL(devl_trap_groups_register); - -/** - * devlink_trap_groups_register - Register packet trap groups with devlink. - * @devlink: devlink. - * @groups: Packet trap groups. - * @groups_count: Count of provided packet trap groups. - * - * Context: Takes and release devlink->lock . - * - * Return: Non-zero value on failure. - */ -int devlink_trap_groups_register(struct devlink *devlink, - const struct devlink_trap_group *groups, - size_t groups_count) -{ - int err; - - devl_lock(devlink); - err = devl_trap_groups_register(devlink, groups, groups_count); - devl_unlock(devlink); - return err; -} -EXPORT_SYMBOL_GPL(devlink_trap_groups_register); - -/** - * devl_trap_groups_unregister - Unregister packet trap groups from devlink. - * @devlink: devlink. - * @groups: Packet trap groups. - * @groups_count: Count of provided packet trap groups. - */ -void devl_trap_groups_unregister(struct devlink *devlink, - const struct devlink_trap_group *groups, - size_t groups_count) -{ - int i; - - devl_assert_locked(devlink); - for (i = groups_count - 1; i >= 0; i--) - devlink_trap_group_unregister(devlink, &groups[i]); -} -EXPORT_SYMBOL_GPL(devl_trap_groups_unregister); - -/** - * devlink_trap_groups_unregister - Unregister packet trap groups from devlink. - * @devlink: devlink. - * @groups: Packet trap groups. - * @groups_count: Count of provided packet trap groups. - * - * Context: Takes and release devlink->lock . - */ -void devlink_trap_groups_unregister(struct devlink *devlink, - const struct devlink_trap_group *groups, - size_t groups_count) -{ - devl_lock(devlink); - devl_trap_groups_unregister(devlink, groups, groups_count); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_trap_groups_unregister); - -static void -devlink_trap_policer_notify(struct devlink *devlink, - const struct devlink_trap_policer_item *policer_item, - enum devlink_command cmd) -{ - struct sk_buff *msg; - int err; - - WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_POLICER_NEW && - cmd != DEVLINK_CMD_TRAP_POLICER_DEL); - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_trap_policer_fill(msg, devlink, policer_item, cmd, 0, - 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -static int -devlink_trap_policer_register(struct devlink *devlink, - const struct devlink_trap_policer *policer) -{ - struct devlink_trap_policer_item *policer_item; - int err; - - if (devlink_trap_policer_item_lookup(devlink, policer->id)) - return -EEXIST; - - policer_item = kzalloc(sizeof(*policer_item), GFP_KERNEL); - if (!policer_item) - return -ENOMEM; - - policer_item->policer = policer; - policer_item->rate = policer->init_rate; - policer_item->burst = policer->init_burst; - - if (devlink->ops->trap_policer_init) { - err = devlink->ops->trap_policer_init(devlink, policer); - if (err) - goto err_policer_init; - } - - list_add_tail(&policer_item->list, &devlink->trap_policer_list); - devlink_trap_policer_notify(devlink, policer_item, - DEVLINK_CMD_TRAP_POLICER_NEW); - - return 0; - -err_policer_init: - kfree(policer_item); - return err; -} - -static void -devlink_trap_policer_unregister(struct devlink *devlink, - const struct devlink_trap_policer *policer) -{ - struct devlink_trap_policer_item *policer_item; - - policer_item = devlink_trap_policer_item_lookup(devlink, policer->id); - if (WARN_ON_ONCE(!policer_item)) - return; - - devlink_trap_policer_notify(devlink, policer_item, - DEVLINK_CMD_TRAP_POLICER_DEL); - list_del(&policer_item->list); - if (devlink->ops->trap_policer_fini) - devlink->ops->trap_policer_fini(devlink, policer); - kfree(policer_item); -} - -/** - * devl_trap_policers_register - Register packet trap policers with devlink. - * @devlink: devlink. - * @policers: Packet trap policers. - * @policers_count: Count of provided packet trap policers. - * - * Return: Non-zero value on failure. - */ -int -devl_trap_policers_register(struct devlink *devlink, - const struct devlink_trap_policer *policers, - size_t policers_count) -{ - int i, err; - - devl_assert_locked(devlink); - for (i = 0; i < policers_count; i++) { - const struct devlink_trap_policer *policer = &policers[i]; - - if (WARN_ON(policer->id == 0 || - policer->max_rate < policer->min_rate || - policer->max_burst < policer->min_burst)) { - err = -EINVAL; - goto err_trap_policer_verify; - } - - err = devlink_trap_policer_register(devlink, policer); - if (err) - goto err_trap_policer_register; - } - return 0; - -err_trap_policer_register: -err_trap_policer_verify: - for (i--; i >= 0; i--) - devlink_trap_policer_unregister(devlink, &policers[i]); - return err; -} -EXPORT_SYMBOL_GPL(devl_trap_policers_register); - -/** - * devl_trap_policers_unregister - Unregister packet trap policers from devlink. - * @devlink: devlink. - * @policers: Packet trap policers. - * @policers_count: Count of provided packet trap policers. - */ -void -devl_trap_policers_unregister(struct devlink *devlink, - const struct devlink_trap_policer *policers, - size_t policers_count) -{ - int i; - - devl_assert_locked(devlink); - for (i = policers_count - 1; i >= 0; i--) - devlink_trap_policer_unregister(devlink, &policers[i]); -} -EXPORT_SYMBOL_GPL(devl_trap_policers_unregister); - -static void __devlink_compat_running_version(struct devlink *devlink, - char *buf, size_t len) -{ - struct devlink_info_req req = {}; - const struct nlattr *nlattr; - struct sk_buff *msg; - int rem, err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - req.msg = msg; - err = devlink->ops->info_get(devlink, &req, NULL); - if (err) - goto free_msg; - - nla_for_each_attr(nlattr, (void *)msg->data, msg->len, rem) { - const struct nlattr *kv; - int rem_kv; - - if (nla_type(nlattr) != DEVLINK_ATTR_INFO_VERSION_RUNNING) - continue; - - nla_for_each_nested(kv, nlattr, rem_kv) { - if (nla_type(kv) != DEVLINK_ATTR_INFO_VERSION_VALUE) - continue; - - strlcat(buf, nla_data(kv), len); - strlcat(buf, " ", len); - } - } -free_msg: - nlmsg_free(msg); -} - -void devlink_compat_running_version(struct devlink *devlink, - char *buf, size_t len) -{ - if (!devlink->ops->info_get) - return; - - devl_lock(devlink); - __devlink_compat_running_version(devlink, buf, len); - devl_unlock(devlink); -} - -int devlink_compat_flash_update(struct devlink *devlink, const char *file_name) -{ - struct devlink_flash_update_params params = {}; - int ret; - - if (!devlink->ops->flash_update) - return -EOPNOTSUPP; - - ret = request_firmware(¶ms.fw, file_name, devlink->dev); - if (ret) - return ret; - - devl_lock(devlink); - devlink_flash_update_begin_notify(devlink); - ret = devlink->ops->flash_update(devlink, ¶ms, NULL); - devlink_flash_update_end_notify(devlink); - devl_unlock(devlink); - - release_firmware(params.fw); - - return ret; -} - -int devlink_compat_phys_port_name_get(struct net_device *dev, - char *name, size_t len) -{ - struct devlink_port *devlink_port; - - /* RTNL mutex is held here which ensures that devlink_port - * instance cannot disappear in the middle. No need to take - * any devlink lock as only permanent values are accessed. - */ - ASSERT_RTNL(); - - devlink_port = dev->devlink_port; - if (!devlink_port) - return -EOPNOTSUPP; - - return __devlink_port_phys_port_name_get(devlink_port, name, len); -} - -int devlink_compat_switch_id_get(struct net_device *dev, - struct netdev_phys_item_id *ppid) -{ - struct devlink_port *devlink_port; - - /* Caller must hold RTNL mutex or reference to dev, which ensures that - * devlink_port instance cannot disappear in the middle. No need to take - * any devlink lock as only permanent values are accessed. - */ - devlink_port = dev->devlink_port; - if (!devlink_port || !devlink_port->switch_port) - return -EOPNOTSUPP; - - memcpy(ppid, &devlink_port->attrs.switch_id, sizeof(*ppid)); - - return 0; -} - -static void __net_exit devlink_pernet_pre_exit(struct net *net) -{ - struct devlink *devlink; - u32 actions_performed; - unsigned long index; - int err; - - /* In case network namespace is getting destroyed, reload - * all devlink instances from this namespace into init_net. - */ - devlinks_xa_for_each_registered_get(net, index, devlink) { - WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); - mutex_lock(&devlink->lock); - err = devlink_reload(devlink, &init_net, - DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - DEVLINK_RELOAD_LIMIT_UNSPEC, - &actions_performed, NULL); - mutex_unlock(&devlink->lock); - if (err && err != -EOPNOTSUPP) - pr_warn("Failed to reload devlink instance into init_net\n"); - devlink_put(devlink); - } -} - -static struct pernet_operations devlink_pernet_ops __net_initdata = { - .pre_exit = devlink_pernet_pre_exit, -}; - -static int __init devlink_init(void) -{ - int err; - - err = genl_register_family(&devlink_nl_family); - if (err) - goto out; - err = register_pernet_subsys(&devlink_pernet_ops); - -out: - WARN_ON(err); - return err; -} - -subsys_initcall(devlink_init); diff --git a/net/devlink/Makefile b/net/devlink/Makefile new file mode 100644 index 000000000000..3a60959f71ee --- /dev/null +++ b/net/devlink/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-y := leftover.o diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c new file mode 100644 index 000000000000..032d6d0a5ce6 --- /dev/null +++ b/net/devlink/leftover.c @@ -0,0 +1,13029 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * net/core/devlink.c - Network physical/parent device Netlink interface + * + * Heavily inspired by net/wireless/ + * Copyright (c) 2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2016 Jiri Pirko + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#define CREATE_TRACE_POINTS +#include + +#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \ + (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX) + +struct devlink_dev_stats { + u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; + u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; +}; + +struct devlink { + u32 index; + struct xarray ports; + struct list_head rate_list; + struct list_head sb_list; + struct list_head dpipe_table_list; + struct list_head resource_list; + struct list_head param_list; + struct list_head region_list; + struct list_head reporter_list; + struct mutex reporters_lock; /* protects reporter_list */ + struct devlink_dpipe_headers *dpipe_headers; + struct list_head trap_list; + struct list_head trap_group_list; + struct list_head trap_policer_list; + struct list_head linecard_list; + struct mutex linecards_lock; /* protects linecard_list */ + const struct devlink_ops *ops; + u64 features; + struct xarray snapshot_ids; + struct devlink_dev_stats stats; + struct device *dev; + possible_net_t _net; + /* Serializes access to devlink instance specific objects such as + * port, sb, dpipe, resource, params, region, traps and more. + */ + struct mutex lock; + struct lock_class_key lock_key; + u8 reload_failed:1; + refcount_t refcount; + struct completion comp; + struct rcu_head rcu; + struct notifier_block netdevice_nb; + char priv[] __aligned(NETDEV_ALIGN); +}; + +struct devlink_linecard_ops; +struct devlink_linecard_type; + +struct devlink_linecard { + struct list_head list; + struct devlink *devlink; + unsigned int index; + refcount_t refcount; + const struct devlink_linecard_ops *ops; + void *priv; + enum devlink_linecard_state state; + struct mutex state_lock; /* Protects state */ + const char *type; + struct devlink_linecard_type *types; + unsigned int types_count; + struct devlink *nested_devlink; +}; + +/** + * struct devlink_resource - devlink resource + * @name: name of the resource + * @id: id, per devlink instance + * @size: size of the resource + * @size_new: updated size of the resource, reload is needed + * @size_valid: valid in case the total size of the resource is valid + * including its children + * @parent: parent resource + * @size_params: size parameters + * @list: parent list + * @resource_list: list of child resources + * @occ_get: occupancy getter callback + * @occ_get_priv: occupancy getter callback priv + */ +struct devlink_resource { + const char *name; + u64 id; + u64 size; + u64 size_new; + bool size_valid; + struct devlink_resource *parent; + struct devlink_resource_size_params size_params; + struct list_head list; + struct list_head resource_list; + devlink_resource_occ_get_t *occ_get; + void *occ_get_priv; +}; + +void *devlink_priv(struct devlink *devlink) +{ + return &devlink->priv; +} +EXPORT_SYMBOL_GPL(devlink_priv); + +struct devlink *priv_to_devlink(void *priv) +{ + return container_of(priv, struct devlink, priv); +} +EXPORT_SYMBOL_GPL(priv_to_devlink); + +struct device *devlink_to_dev(const struct devlink *devlink) +{ + return devlink->dev; +} +EXPORT_SYMBOL_GPL(devlink_to_dev); + +static struct devlink_dpipe_field devlink_dpipe_fields_ethernet[] = { + { + .name = "destination mac", + .id = DEVLINK_DPIPE_FIELD_ETHERNET_DST_MAC, + .bitwidth = 48, + }, +}; + +struct devlink_dpipe_header devlink_dpipe_header_ethernet = { + .name = "ethernet", + .id = DEVLINK_DPIPE_HEADER_ETHERNET, + .fields = devlink_dpipe_fields_ethernet, + .fields_count = ARRAY_SIZE(devlink_dpipe_fields_ethernet), + .global = true, +}; +EXPORT_SYMBOL_GPL(devlink_dpipe_header_ethernet); + +static struct devlink_dpipe_field devlink_dpipe_fields_ipv4[] = { + { + .name = "destination ip", + .id = DEVLINK_DPIPE_FIELD_IPV4_DST_IP, + .bitwidth = 32, + }, +}; + +struct devlink_dpipe_header devlink_dpipe_header_ipv4 = { + .name = "ipv4", + .id = DEVLINK_DPIPE_HEADER_IPV4, + .fields = devlink_dpipe_fields_ipv4, + .fields_count = ARRAY_SIZE(devlink_dpipe_fields_ipv4), + .global = true, +}; +EXPORT_SYMBOL_GPL(devlink_dpipe_header_ipv4); + +static struct devlink_dpipe_field devlink_dpipe_fields_ipv6[] = { + { + .name = "destination ip", + .id = DEVLINK_DPIPE_FIELD_IPV6_DST_IP, + .bitwidth = 128, + }, +}; + +struct devlink_dpipe_header devlink_dpipe_header_ipv6 = { + .name = "ipv6", + .id = DEVLINK_DPIPE_HEADER_IPV6, + .fields = devlink_dpipe_fields_ipv6, + .fields_count = ARRAY_SIZE(devlink_dpipe_fields_ipv6), + .global = true, +}; +EXPORT_SYMBOL_GPL(devlink_dpipe_header_ipv6); + +EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwmsg); +EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwerr); +EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report); + +#define DEVLINK_PORT_FN_CAPS_VALID_MASK \ + (_BITUL(__DEVLINK_PORT_FN_ATTR_CAPS_MAX) - 1) + +static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = { + [DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY }, + [DEVLINK_PORT_FN_ATTR_STATE] = + NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FN_STATE_INACTIVE, + DEVLINK_PORT_FN_STATE_ACTIVE), + [DEVLINK_PORT_FN_ATTR_CAPS] = + NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), +}; + +static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ID_MAX + 1] = { + [DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG }, +}; + +static DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC); +#define DEVLINK_REGISTERED XA_MARK_1 +#define DEVLINK_UNREGISTERING XA_MARK_2 + +/* devlink instances are open to the access from the user space after + * devlink_register() call. Such logical barrier allows us to have certain + * expectations related to locking. + * + * Before *_register() - we are in initialization stage and no parallel + * access possible to the devlink instance. All drivers perform that phase + * by implicitly holding device_lock. + * + * After *_register() - users and driver can access devlink instance at + * the same time. + */ +#define ASSERT_DEVLINK_REGISTERED(d) \ + WARN_ON_ONCE(!xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) +#define ASSERT_DEVLINK_NOT_REGISTERED(d) \ + WARN_ON_ONCE(xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) + +struct net *devlink_net(const struct devlink *devlink) +{ + return read_pnet(&devlink->_net); +} +EXPORT_SYMBOL_GPL(devlink_net); + +static void __devlink_put_rcu(struct rcu_head *head) +{ + struct devlink *devlink = container_of(head, struct devlink, rcu); + + complete(&devlink->comp); +} + +void devlink_put(struct devlink *devlink) +{ + if (refcount_dec_and_test(&devlink->refcount)) + /* Make sure unregister operation that may await the completion + * is unblocked only after all users are after the end of + * RCU grace period. + */ + call_rcu(&devlink->rcu, __devlink_put_rcu); +} + +struct devlink *__must_check devlink_try_get(struct devlink *devlink) +{ + if (refcount_inc_not_zero(&devlink->refcount)) + return devlink; + return NULL; +} + +void devl_assert_locked(struct devlink *devlink) +{ + lockdep_assert_held(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_assert_locked); + +#ifdef CONFIG_LOCKDEP +/* For use in conjunction with LOCKDEP only e.g. rcu_dereference_protected() */ +bool devl_lock_is_held(struct devlink *devlink) +{ + return lockdep_is_held(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_lock_is_held); +#endif + +void devl_lock(struct devlink *devlink) +{ + mutex_lock(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_lock); + +int devl_trylock(struct devlink *devlink) +{ + return mutex_trylock(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_trylock); + +void devl_unlock(struct devlink *devlink) +{ + mutex_unlock(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_unlock); + +static struct devlink * +devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter, + void * (*xa_find_fn)(struct xarray *, unsigned long *, + unsigned long, xa_mark_t)) +{ + struct devlink *devlink; + + rcu_read_lock(); +retry: + devlink = xa_find_fn(&devlinks, indexp, ULONG_MAX, DEVLINK_REGISTERED); + if (!devlink) + goto unlock; + + /* In case devlink_unregister() was already called and "unregistering" + * mark was set, do not allow to get a devlink reference here. + * This prevents live-lock of devlink_unregister() wait for completion. + */ + if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING)) + goto retry; + + /* For a possible retry, the xa_find_after() should be always used */ + xa_find_fn = xa_find_after; + if (!devlink_try_get(devlink)) + goto retry; + if (!net_eq(devlink_net(devlink), net)) { + devlink_put(devlink); + goto retry; + } +unlock: + rcu_read_unlock(); + return devlink; +} + +static struct devlink *devlinks_xa_find_get_first(struct net *net, + unsigned long *indexp, + xa_mark_t filter) +{ + return devlinks_xa_find_get(net, indexp, filter, xa_find); +} + +static struct devlink *devlinks_xa_find_get_next(struct net *net, + unsigned long *indexp, + xa_mark_t filter) +{ + return devlinks_xa_find_get(net, indexp, filter, xa_find_after); +} + +/* Iterate over devlink pointers which were possible to get reference to. + * devlink_put() needs to be called for each iterated devlink pointer + * in loop body in order to release the reference. + */ +#define devlinks_xa_for_each_get(net, index, devlink, filter) \ + for (index = 0, \ + devlink = devlinks_xa_find_get_first(net, &index, filter); \ + devlink; devlink = devlinks_xa_find_get_next(net, &index, filter)) + +#define devlinks_xa_for_each_registered_get(net, index, devlink) \ + devlinks_xa_for_each_get(net, index, devlink, DEVLINK_REGISTERED) + +static struct devlink *devlink_get_from_attrs(struct net *net, + struct nlattr **attrs) +{ + struct devlink *devlink; + unsigned long index; + char *busname; + char *devname; + + if (!attrs[DEVLINK_ATTR_BUS_NAME] || !attrs[DEVLINK_ATTR_DEV_NAME]) + return ERR_PTR(-EINVAL); + + busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]); + devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); + + devlinks_xa_for_each_registered_get(net, index, devlink) { + if (strcmp(devlink->dev->bus->name, busname) == 0 && + strcmp(dev_name(devlink->dev), devname) == 0) + return devlink; + devlink_put(devlink); + } + + return ERR_PTR(-ENODEV); +} + +#define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ + WARN_ON_ONCE(!(devlink_port)->registered) +#define ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port) \ + WARN_ON_ONCE((devlink_port)->registered) +#define ASSERT_DEVLINK_PORT_INITIALIZED(devlink_port) \ + WARN_ON_ONCE(!(devlink_port)->initialized) + +static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink, + unsigned int port_index) +{ + return xa_load(&devlink->ports, port_index); +} + +static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs) +{ + if (attrs[DEVLINK_ATTR_PORT_INDEX]) { + u32 port_index = nla_get_u32(attrs[DEVLINK_ATTR_PORT_INDEX]); + struct devlink_port *devlink_port; + + devlink_port = devlink_port_get_by_index(devlink, port_index); + if (!devlink_port) + return ERR_PTR(-ENODEV); + return devlink_port; + } + return ERR_PTR(-EINVAL); +} + +static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + return devlink_port_get_from_attrs(devlink, info->attrs); +} + +static inline bool +devlink_rate_is_leaf(struct devlink_rate *devlink_rate) +{ + return devlink_rate->type == DEVLINK_RATE_TYPE_LEAF; +} + +static inline bool +devlink_rate_is_node(struct devlink_rate *devlink_rate) +{ + return devlink_rate->type == DEVLINK_RATE_TYPE_NODE; +} + +static struct devlink_rate * +devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info) +{ + struct devlink_rate *devlink_rate; + struct devlink_port *devlink_port; + + devlink_port = devlink_port_get_from_attrs(devlink, info->attrs); + if (IS_ERR(devlink_port)) + return ERR_CAST(devlink_port); + devlink_rate = devlink_port->devlink_rate; + return devlink_rate ?: ERR_PTR(-ENODEV); +} + +static struct devlink_rate * +devlink_rate_node_get_by_name(struct devlink *devlink, const char *node_name) +{ + static struct devlink_rate *devlink_rate; + + list_for_each_entry(devlink_rate, &devlink->rate_list, list) { + if (devlink_rate_is_node(devlink_rate) && + !strcmp(node_name, devlink_rate->name)) + return devlink_rate; + } + return ERR_PTR(-ENODEV); +} + +static struct devlink_rate * +devlink_rate_node_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) +{ + const char *rate_node_name; + size_t len; + + if (!attrs[DEVLINK_ATTR_RATE_NODE_NAME]) + return ERR_PTR(-EINVAL); + rate_node_name = nla_data(attrs[DEVLINK_ATTR_RATE_NODE_NAME]); + len = strlen(rate_node_name); + /* Name cannot be empty or decimal number */ + if (!len || strspn(rate_node_name, "0123456789") == len) + return ERR_PTR(-EINVAL); + + return devlink_rate_node_get_by_name(devlink, rate_node_name); +} + +static struct devlink_rate * +devlink_rate_node_get_from_info(struct devlink *devlink, struct genl_info *info) +{ + return devlink_rate_node_get_from_attrs(devlink, info->attrs); +} + +static struct devlink_rate * +devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info) +{ + struct nlattr **attrs = info->attrs; + + if (attrs[DEVLINK_ATTR_PORT_INDEX]) + return devlink_rate_leaf_get_from_info(devlink, info); + else if (attrs[DEVLINK_ATTR_RATE_NODE_NAME]) + return devlink_rate_node_get_from_info(devlink, info); + else + return ERR_PTR(-EINVAL); +} + +static struct devlink_linecard * +devlink_linecard_get_by_index(struct devlink *devlink, + unsigned int linecard_index) +{ + struct devlink_linecard *devlink_linecard; + + list_for_each_entry(devlink_linecard, &devlink->linecard_list, list) { + if (devlink_linecard->index == linecard_index) + return devlink_linecard; + } + return NULL; +} + +static bool devlink_linecard_index_exists(struct devlink *devlink, + unsigned int linecard_index) +{ + return devlink_linecard_get_by_index(devlink, linecard_index); +} + +static struct devlink_linecard * +devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) +{ + if (attrs[DEVLINK_ATTR_LINECARD_INDEX]) { + u32 linecard_index = nla_get_u32(attrs[DEVLINK_ATTR_LINECARD_INDEX]); + struct devlink_linecard *linecard; + + mutex_lock(&devlink->linecards_lock); + linecard = devlink_linecard_get_by_index(devlink, linecard_index); + if (linecard) + refcount_inc(&linecard->refcount); + mutex_unlock(&devlink->linecards_lock); + if (!linecard) + return ERR_PTR(-ENODEV); + return linecard; + } + return ERR_PTR(-EINVAL); +} + +static struct devlink_linecard * +devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info) +{ + return devlink_linecard_get_from_attrs(devlink, info->attrs); +} + +static void devlink_linecard_put(struct devlink_linecard *linecard) +{ + if (refcount_dec_and_test(&linecard->refcount)) { + mutex_destroy(&linecard->state_lock); + kfree(linecard); + } +} + +struct devlink_sb { + struct list_head list; + unsigned int index; + u32 size; + u16 ingress_pools_count; + u16 egress_pools_count; + u16 ingress_tc_count; + u16 egress_tc_count; +}; + +static u16 devlink_sb_pool_count(struct devlink_sb *devlink_sb) +{ + return devlink_sb->ingress_pools_count + devlink_sb->egress_pools_count; +} + +static struct devlink_sb *devlink_sb_get_by_index(struct devlink *devlink, + unsigned int sb_index) +{ + struct devlink_sb *devlink_sb; + + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + if (devlink_sb->index == sb_index) + return devlink_sb; + } + return NULL; +} + +static bool devlink_sb_index_exists(struct devlink *devlink, + unsigned int sb_index) +{ + return devlink_sb_get_by_index(devlink, sb_index); +} + +static struct devlink_sb *devlink_sb_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs) +{ + if (attrs[DEVLINK_ATTR_SB_INDEX]) { + u32 sb_index = nla_get_u32(attrs[DEVLINK_ATTR_SB_INDEX]); + struct devlink_sb *devlink_sb; + + devlink_sb = devlink_sb_get_by_index(devlink, sb_index); + if (!devlink_sb) + return ERR_PTR(-ENODEV); + return devlink_sb; + } + return ERR_PTR(-EINVAL); +} + +static struct devlink_sb *devlink_sb_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + return devlink_sb_get_from_attrs(devlink, info->attrs); +} + +static int devlink_sb_pool_index_get_from_attrs(struct devlink_sb *devlink_sb, + struct nlattr **attrs, + u16 *p_pool_index) +{ + u16 val; + + if (!attrs[DEVLINK_ATTR_SB_POOL_INDEX]) + return -EINVAL; + + val = nla_get_u16(attrs[DEVLINK_ATTR_SB_POOL_INDEX]); + if (val >= devlink_sb_pool_count(devlink_sb)) + return -EINVAL; + *p_pool_index = val; + return 0; +} + +static int devlink_sb_pool_index_get_from_info(struct devlink_sb *devlink_sb, + struct genl_info *info, + u16 *p_pool_index) +{ + return devlink_sb_pool_index_get_from_attrs(devlink_sb, info->attrs, + p_pool_index); +} + +static int +devlink_sb_pool_type_get_from_attrs(struct nlattr **attrs, + enum devlink_sb_pool_type *p_pool_type) +{ + u8 val; + + if (!attrs[DEVLINK_ATTR_SB_POOL_TYPE]) + return -EINVAL; + + val = nla_get_u8(attrs[DEVLINK_ATTR_SB_POOL_TYPE]); + if (val != DEVLINK_SB_POOL_TYPE_INGRESS && + val != DEVLINK_SB_POOL_TYPE_EGRESS) + return -EINVAL; + *p_pool_type = val; + return 0; +} + +static int +devlink_sb_pool_type_get_from_info(struct genl_info *info, + enum devlink_sb_pool_type *p_pool_type) +{ + return devlink_sb_pool_type_get_from_attrs(info->attrs, p_pool_type); +} + +static int +devlink_sb_th_type_get_from_attrs(struct nlattr **attrs, + enum devlink_sb_threshold_type *p_th_type) +{ + u8 val; + + if (!attrs[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE]) + return -EINVAL; + + val = nla_get_u8(attrs[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE]); + if (val != DEVLINK_SB_THRESHOLD_TYPE_STATIC && + val != DEVLINK_SB_THRESHOLD_TYPE_DYNAMIC) + return -EINVAL; + *p_th_type = val; + return 0; +} + +static int +devlink_sb_th_type_get_from_info(struct genl_info *info, + enum devlink_sb_threshold_type *p_th_type) +{ + return devlink_sb_th_type_get_from_attrs(info->attrs, p_th_type); +} + +static int +devlink_sb_tc_index_get_from_attrs(struct devlink_sb *devlink_sb, + struct nlattr **attrs, + enum devlink_sb_pool_type pool_type, + u16 *p_tc_index) +{ + u16 val; + + if (!attrs[DEVLINK_ATTR_SB_TC_INDEX]) + return -EINVAL; + + val = nla_get_u16(attrs[DEVLINK_ATTR_SB_TC_INDEX]); + if (pool_type == DEVLINK_SB_POOL_TYPE_INGRESS && + val >= devlink_sb->ingress_tc_count) + return -EINVAL; + if (pool_type == DEVLINK_SB_POOL_TYPE_EGRESS && + val >= devlink_sb->egress_tc_count) + return -EINVAL; + *p_tc_index = val; + return 0; +} + +static void devlink_port_fn_cap_fill(struct nla_bitfield32 *caps, + u32 cap, bool is_enable) +{ + caps->selector |= cap; + if (is_enable) + caps->value |= cap; +} + +static int devlink_port_fn_roce_fill(const struct devlink_ops *ops, + struct devlink_port *devlink_port, + struct nla_bitfield32 *caps, + struct netlink_ext_ack *extack) +{ + bool is_enable; + int err; + + if (!ops->port_fn_roce_get) + return 0; + + err = ops->port_fn_roce_get(devlink_port, &is_enable, extack); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + + devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_ROCE, is_enable); + return 0; +} + +static int devlink_port_fn_migratable_fill(const struct devlink_ops *ops, + struct devlink_port *devlink_port, + struct nla_bitfield32 *caps, + struct netlink_ext_ack *extack) +{ + bool is_enable; + int err; + + if (!ops->port_fn_migratable_get || + devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) + return 0; + + err = ops->port_fn_migratable_get(devlink_port, &is_enable, extack); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + + devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_MIGRATABLE, is_enable); + return 0; +} + +static int devlink_port_fn_caps_fill(const struct devlink_ops *ops, + struct devlink_port *devlink_port, + struct sk_buff *msg, + struct netlink_ext_ack *extack, + bool *msg_updated) +{ + struct nla_bitfield32 caps = {}; + int err; + + err = devlink_port_fn_roce_fill(ops, devlink_port, &caps, extack); + if (err) + return err; + + err = devlink_port_fn_migratable_fill(ops, devlink_port, &caps, extack); + if (err) + return err; + + if (!caps.selector) + return 0; + err = nla_put_bitfield32(msg, DEVLINK_PORT_FN_ATTR_CAPS, caps.value, + caps.selector); + if (err) + return err; + + *msg_updated = true; + return 0; +} + +static int +devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb, + struct genl_info *info, + enum devlink_sb_pool_type pool_type, + u16 *p_tc_index) +{ + return devlink_sb_tc_index_get_from_attrs(devlink_sb, info->attrs, + pool_type, p_tc_index); +} + +struct devlink_region { + struct devlink *devlink; + struct devlink_port *port; + struct list_head list; + union { + const struct devlink_region_ops *ops; + const struct devlink_port_region_ops *port_ops; + }; + struct mutex snapshot_lock; /* protects snapshot_list, + * max_snapshots and cur_snapshots + * consistency. + */ + struct list_head snapshot_list; + u32 max_snapshots; + u32 cur_snapshots; + u64 size; +}; + +struct devlink_snapshot { + struct list_head list; + struct devlink_region *region; + u8 *data; + u32 id; +}; + +static struct devlink_region * +devlink_region_get_by_name(struct devlink *devlink, const char *region_name) +{ + struct devlink_region *region; + + list_for_each_entry(region, &devlink->region_list, list) + if (!strcmp(region->ops->name, region_name)) + return region; + + return NULL; +} + +static struct devlink_region * +devlink_port_region_get_by_name(struct devlink_port *port, + const char *region_name) +{ + struct devlink_region *region; + + list_for_each_entry(region, &port->region_list, list) + if (!strcmp(region->ops->name, region_name)) + return region; + + return NULL; +} + +static struct devlink_snapshot * +devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id) +{ + struct devlink_snapshot *snapshot; + + list_for_each_entry(snapshot, ®ion->snapshot_list, list) + if (snapshot->id == id) + return snapshot; + + return NULL; +} + +#define DEVLINK_NL_FLAG_NEED_PORT BIT(0) +#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) +#define DEVLINK_NL_FLAG_NEED_RATE BIT(2) +#define DEVLINK_NL_FLAG_NEED_RATE_NODE BIT(3) +#define DEVLINK_NL_FLAG_NEED_LINECARD BIT(4) + +static int devlink_nl_pre_doit(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + struct devlink_linecard *linecard; + struct devlink_port *devlink_port; + struct devlink *devlink; + int err; + + devlink = devlink_get_from_attrs(genl_info_net(info), info->attrs); + if (IS_ERR(devlink)) + return PTR_ERR(devlink); + devl_lock(devlink); + info->user_ptr[0] = devlink; + if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) { + devlink_port = devlink_port_get_from_info(devlink, info); + if (IS_ERR(devlink_port)) { + err = PTR_ERR(devlink_port); + goto unlock; + } + info->user_ptr[1] = devlink_port; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT) { + devlink_port = devlink_port_get_from_info(devlink, info); + if (!IS_ERR(devlink_port)) + info->user_ptr[1] = devlink_port; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE) { + struct devlink_rate *devlink_rate; + + devlink_rate = devlink_rate_get_from_info(devlink, info); + if (IS_ERR(devlink_rate)) { + err = PTR_ERR(devlink_rate); + goto unlock; + } + info->user_ptr[1] = devlink_rate; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE_NODE) { + struct devlink_rate *rate_node; + + rate_node = devlink_rate_node_get_from_info(devlink, info); + if (IS_ERR(rate_node)) { + err = PTR_ERR(rate_node); + goto unlock; + } + info->user_ptr[1] = rate_node; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { + linecard = devlink_linecard_get_from_info(devlink, info); + if (IS_ERR(linecard)) { + err = PTR_ERR(linecard); + goto unlock; + } + info->user_ptr[1] = linecard; + } + return 0; + +unlock: + devl_unlock(devlink); + devlink_put(devlink); + return err; +} + +static void devlink_nl_post_doit(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + struct devlink_linecard *linecard; + struct devlink *devlink; + + devlink = info->user_ptr[0]; + if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { + linecard = info->user_ptr[1]; + devlink_linecard_put(linecard); + } + devl_unlock(devlink); + devlink_put(devlink); +} + +static struct genl_family devlink_nl_family; + +enum devlink_multicast_groups { + DEVLINK_MCGRP_CONFIG, +}; + +static const struct genl_multicast_group devlink_nl_mcgrps[] = { + [DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME }, +}; + +static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink) +{ + if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name)) + return -EMSGSIZE; + if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(devlink->dev))) + return -EMSGSIZE; + return 0; +} + +static int devlink_nl_put_nested_handle(struct sk_buff *msg, struct devlink *devlink) +{ + struct nlattr *nested_attr; + + nested_attr = nla_nest_start(msg, DEVLINK_ATTR_NESTED_DEVLINK); + if (!nested_attr) + return -EMSGSIZE; + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + nla_nest_end(msg, nested_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(msg, nested_attr); + return -EMSGSIZE; +} + +int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port) +{ + if (devlink_nl_put_handle(msg, devlink_port->devlink)) + return -EMSGSIZE; + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) + return -EMSGSIZE; + return 0; +} + +size_t devlink_nl_port_handle_size(struct devlink_port *devlink_port) +{ + struct devlink *devlink = devlink_port->devlink; + + return nla_total_size(strlen(devlink->dev->bus->name) + 1) /* DEVLINK_ATTR_BUS_NAME */ + + nla_total_size(strlen(dev_name(devlink->dev)) + 1) /* DEVLINK_ATTR_DEV_NAME */ + + nla_total_size(4); /* DEVLINK_ATTR_PORT_INDEX */ +} + +struct devlink_reload_combination { + enum devlink_reload_action action; + enum devlink_reload_limit limit; +}; + +static const struct devlink_reload_combination devlink_reload_invalid_combinations[] = { + { + /* can't reinitialize driver with no down time */ + .action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + .limit = DEVLINK_RELOAD_LIMIT_NO_RESET, + }, +}; + +static bool +devlink_reload_combination_is_invalid(enum devlink_reload_action action, + enum devlink_reload_limit limit) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) + if (devlink_reload_invalid_combinations[i].action == action && + devlink_reload_invalid_combinations[i].limit == limit) + return true; + return false; +} + +static bool +devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action) +{ + return test_bit(action, &devlink->ops->reload_actions); +} + +static bool +devlink_reload_limit_is_supported(struct devlink *devlink, enum devlink_reload_limit limit) +{ + return test_bit(limit, &devlink->ops->reload_limits); +} + +static int devlink_reload_stat_put(struct sk_buff *msg, + enum devlink_reload_limit limit, u32 value) +{ + struct nlattr *reload_stats_entry; + + reload_stats_entry = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS_ENTRY); + if (!reload_stats_entry) + return -EMSGSIZE; + + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_STATS_LIMIT, limit) || + nla_put_u32(msg, DEVLINK_ATTR_RELOAD_STATS_VALUE, value)) + goto nla_put_failure; + nla_nest_end(msg, reload_stats_entry); + return 0; + +nla_put_failure: + nla_nest_cancel(msg, reload_stats_entry); + return -EMSGSIZE; +} + +static int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote) +{ + struct nlattr *reload_stats_attr, *act_info, *act_stats; + int i, j, stat_idx; + u32 value; + + if (!is_remote) + reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS); + else + reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_REMOTE_RELOAD_STATS); + + if (!reload_stats_attr) + return -EMSGSIZE; + + for (i = 0; i <= DEVLINK_RELOAD_ACTION_MAX; i++) { + if ((!is_remote && + !devlink_reload_action_is_supported(devlink, i)) || + i == DEVLINK_RELOAD_ACTION_UNSPEC) + continue; + act_info = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_INFO); + if (!act_info) + goto nla_put_failure; + + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_ACTION, i)) + goto action_info_nest_cancel; + act_stats = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_STATS); + if (!act_stats) + goto action_info_nest_cancel; + + for (j = 0; j <= DEVLINK_RELOAD_LIMIT_MAX; j++) { + /* Remote stats are shown even if not locally supported. + * Stats of actions with unspecified limit are shown + * though drivers don't need to register unspecified + * limit. + */ + if ((!is_remote && j != DEVLINK_RELOAD_LIMIT_UNSPEC && + !devlink_reload_limit_is_supported(devlink, j)) || + devlink_reload_combination_is_invalid(i, j)) + continue; + + stat_idx = j * __DEVLINK_RELOAD_ACTION_MAX + i; + if (!is_remote) + value = devlink->stats.reload_stats[stat_idx]; + else + value = devlink->stats.remote_reload_stats[stat_idx]; + if (devlink_reload_stat_put(msg, j, value)) + goto action_stats_nest_cancel; + } + nla_nest_end(msg, act_stats); + nla_nest_end(msg, act_info); + } + nla_nest_end(msg, reload_stats_attr); + return 0; + +action_stats_nest_cancel: + nla_nest_cancel(msg, act_stats); +action_info_nest_cancel: + nla_nest_cancel(msg, act_info); +nla_put_failure: + nla_nest_cancel(msg, reload_stats_attr); + return -EMSGSIZE; +} + +static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + struct nlattr *dev_stats; + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed)) + goto nla_put_failure; + + dev_stats = nla_nest_start(msg, DEVLINK_ATTR_DEV_STATS); + if (!dev_stats) + goto nla_put_failure; + + if (devlink_reload_stats_put(msg, devlink, false)) + goto dev_stats_nest_cancel; + if (devlink_reload_stats_put(msg, devlink, true)) + goto dev_stats_nest_cancel; + + nla_nest_end(msg, dev_stats); + genlmsg_end(msg, hdr); + return 0; + +dev_stats_nest_cancel: + nla_nest_cancel(msg, dev_stats); +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void devlink_notify(struct devlink *devlink, enum devlink_command cmd) +{ + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_NEW && cmd != DEVLINK_CMD_DEL); + WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_fill(msg, devlink, cmd, 0, 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int devlink_nl_port_attrs_put(struct sk_buff *msg, + struct devlink_port *devlink_port) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + + if (!devlink_port->attrs_set) + return 0; + if (attrs->lanes) { + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_LANES, attrs->lanes)) + return -EMSGSIZE; + } + if (nla_put_u8(msg, DEVLINK_ATTR_PORT_SPLITTABLE, attrs->splittable)) + return -EMSGSIZE; + if (nla_put_u16(msg, DEVLINK_ATTR_PORT_FLAVOUR, attrs->flavour)) + return -EMSGSIZE; + switch (devlink_port->attrs.flavour) { + case DEVLINK_PORT_FLAVOUR_PCI_PF: + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, + attrs->pci_pf.controller) || + nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_pf.pf)) + return -EMSGSIZE; + if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external)) + return -EMSGSIZE; + break; + case DEVLINK_PORT_FLAVOUR_PCI_VF: + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, + attrs->pci_vf.controller) || + nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_vf.pf) || + nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER, attrs->pci_vf.vf)) + return -EMSGSIZE; + if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external)) + return -EMSGSIZE; + break; + case DEVLINK_PORT_FLAVOUR_PCI_SF: + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, + attrs->pci_sf.controller) || + nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, + attrs->pci_sf.pf) || + nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, + attrs->pci_sf.sf)) + return -EMSGSIZE; + break; + case DEVLINK_PORT_FLAVOUR_PHYSICAL: + case DEVLINK_PORT_FLAVOUR_CPU: + case DEVLINK_PORT_FLAVOUR_DSA: + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_NUMBER, + attrs->phys.port_number)) + return -EMSGSIZE; + if (!attrs->split) + return 0; + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP, + attrs->phys.port_number)) + return -EMSGSIZE; + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_SUBPORT_NUMBER, + attrs->phys.split_subport_number)) + return -EMSGSIZE; + break; + default: + break; + } + return 0; +} + +static int devlink_port_fn_hw_addr_fill(const struct devlink_ops *ops, + struct devlink_port *port, + struct sk_buff *msg, + struct netlink_ext_ack *extack, + bool *msg_updated) +{ + u8 hw_addr[MAX_ADDR_LEN]; + int hw_addr_len; + int err; + + if (!ops->port_function_hw_addr_get) + return 0; + + err = ops->port_function_hw_addr_get(port, hw_addr, &hw_addr_len, + extack); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr); + if (err) + return err; + *msg_updated = true; + return 0; +} + +static int devlink_nl_rate_fill(struct sk_buff *msg, + struct devlink_rate *devlink_rate, + enum devlink_command cmd, u32 portid, u32 seq, + int flags, struct netlink_ext_ack *extack) +{ + struct devlink *devlink = devlink_rate->devlink; + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TYPE, devlink_rate->type)) + goto nla_put_failure; + + if (devlink_rate_is_leaf(devlink_rate)) { + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, + devlink_rate->devlink_port->index)) + goto nla_put_failure; + } else if (devlink_rate_is_node(devlink_rate)) { + if (nla_put_string(msg, DEVLINK_ATTR_RATE_NODE_NAME, + devlink_rate->name)) + goto nla_put_failure; + } + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE, + devlink_rate->tx_share, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX, + devlink_rate->tx_max, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + if (nla_put_u32(msg, DEVLINK_ATTR_RATE_TX_PRIORITY, + devlink_rate->tx_priority)) + goto nla_put_failure; + + if (nla_put_u32(msg, DEVLINK_ATTR_RATE_TX_WEIGHT, + devlink_rate->tx_weight)) + goto nla_put_failure; + + if (devlink_rate->parent) + if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME, + devlink_rate->parent->name)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static bool +devlink_port_fn_state_valid(enum devlink_port_fn_state state) +{ + return state == DEVLINK_PORT_FN_STATE_INACTIVE || + state == DEVLINK_PORT_FN_STATE_ACTIVE; +} + +static bool +devlink_port_fn_opstate_valid(enum devlink_port_fn_opstate opstate) +{ + return opstate == DEVLINK_PORT_FN_OPSTATE_DETACHED || + opstate == DEVLINK_PORT_FN_OPSTATE_ATTACHED; +} + +static int devlink_port_fn_state_fill(const struct devlink_ops *ops, + struct devlink_port *port, + struct sk_buff *msg, + struct netlink_ext_ack *extack, + bool *msg_updated) +{ + enum devlink_port_fn_opstate opstate; + enum devlink_port_fn_state state; + int err; + + if (!ops->port_fn_state_get) + return 0; + + err = ops->port_fn_state_get(port, &state, &opstate, extack); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + if (!devlink_port_fn_state_valid(state)) { + WARN_ON_ONCE(1); + NL_SET_ERR_MSG_MOD(extack, "Invalid state read from driver"); + return -EINVAL; + } + if (!devlink_port_fn_opstate_valid(opstate)) { + WARN_ON_ONCE(1); + NL_SET_ERR_MSG_MOD(extack, + "Invalid operational state read from driver"); + return -EINVAL; + } + if (nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_STATE, state) || + nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_OPSTATE, opstate)) + return -EMSGSIZE; + *msg_updated = true; + return 0; +} + +static int +devlink_port_fn_mig_set(struct devlink_port *devlink_port, bool enable, + struct netlink_ext_ack *extack) +{ + const struct devlink_ops *ops = devlink_port->devlink->ops; + + return ops->port_fn_migratable_set(devlink_port, enable, extack); +} + +static int +devlink_port_fn_roce_set(struct devlink_port *devlink_port, bool enable, + struct netlink_ext_ack *extack) +{ + const struct devlink_ops *ops = devlink_port->devlink->ops; + + return ops->port_fn_roce_set(devlink_port, enable, extack); +} + +static int devlink_port_fn_caps_set(struct devlink_port *devlink_port, + const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + struct nla_bitfield32 caps; + u32 caps_value; + int err; + + caps = nla_get_bitfield32(attr); + caps_value = caps.value & caps.selector; + if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE) { + err = devlink_port_fn_roce_set(devlink_port, + caps_value & DEVLINK_PORT_FN_CAP_ROCE, + extack); + if (err) + return err; + } + if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) { + err = devlink_port_fn_mig_set(devlink_port, caps_value & + DEVLINK_PORT_FN_CAP_MIGRATABLE, + extack); + if (err) + return err; + } + return 0; +} + +static int +devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port, + struct netlink_ext_ack *extack) +{ + const struct devlink_ops *ops; + struct nlattr *function_attr; + bool msg_updated = false; + int err; + + function_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PORT_FUNCTION); + if (!function_attr) + return -EMSGSIZE; + + ops = port->devlink->ops; + err = devlink_port_fn_hw_addr_fill(ops, port, msg, extack, + &msg_updated); + if (err) + goto out; + err = devlink_port_fn_caps_fill(ops, port, msg, extack, + &msg_updated); + if (err) + goto out; + err = devlink_port_fn_state_fill(ops, port, msg, extack, &msg_updated); +out: + if (err || !msg_updated) + nla_nest_cancel(msg, function_attr); + else + nla_nest_end(msg, function_attr); + return err; +} + +static int devlink_nl_port_fill(struct sk_buff *msg, + struct devlink_port *devlink_port, + enum devlink_command cmd, u32 portid, u32 seq, + int flags, struct netlink_ext_ack *extack) +{ + struct devlink *devlink = devlink_port->devlink; + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) + goto nla_put_failure; + + spin_lock_bh(&devlink_port->type_lock); + if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type)) + goto nla_put_failure_type_locked; + if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET && + nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE, + devlink_port->desired_type)) + goto nla_put_failure_type_locked; + if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) { + if (devlink_port->type_eth.netdev && + (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX, + devlink_port->type_eth.ifindex) || + nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME, + devlink_port->type_eth.ifname))) + goto nla_put_failure_type_locked; + } + if (devlink_port->type == DEVLINK_PORT_TYPE_IB) { + struct ib_device *ibdev = devlink_port->type_ib.ibdev; + + if (ibdev && + nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME, + ibdev->name)) + goto nla_put_failure_type_locked; + } + spin_unlock_bh(&devlink_port->type_lock); + if (devlink_nl_port_attrs_put(msg, devlink_port)) + goto nla_put_failure; + if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack)) + goto nla_put_failure; + if (devlink_port->linecard && + nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, + devlink_port->linecard->index)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure_type_locked: + spin_unlock_bh(&devlink_port->type_lock); +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void devlink_port_notify(struct devlink_port *devlink_port, + enum devlink_command cmd) +{ + struct devlink *devlink = devlink_port->devlink; + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd != DEVLINK_CMD_PORT_DEL); + + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_port_fill(msg, devlink_port, cmd, 0, 0, 0, NULL); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, + 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static void devlink_rate_notify(struct devlink_rate *devlink_rate, + enum devlink_command cmd) +{ + struct devlink *devlink = devlink_rate->devlink; + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_RATE_NEW && cmd != DEVLINK_CMD_RATE_DEL); + + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_rate_fill(msg, devlink_rate, cmd, 0, 0, 0, NULL); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, + 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink_rate *devlink_rate; + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + list_for_each_entry(devlink_rate, &devlink->rate_list, list) { + enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; + u32 id = NETLINK_CB(cb->skb).portid; + + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_rate_fill(msg, devlink_rate, cmd, id, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, NULL); + if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +static int devlink_nl_cmd_rate_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_rate *devlink_rate = info->user_ptr[1]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_rate_fill(msg, devlink_rate, DEVLINK_CMD_RATE_NEW, + info->snd_portid, info->snd_seq, 0, + info->extack); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static bool +devlink_rate_is_parent_node(struct devlink_rate *devlink_rate, + struct devlink_rate *parent) +{ + while (parent) { + if (parent == devlink_rate) + return true; + parent = parent->parent; + } + return false; +} + +static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + if (idx < start) { + idx++; + devlink_put(devlink); + continue; + } + + devl_lock(devlink); + err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI); + devl_unlock(devlink); + devlink_put(devlink); + + if (err) + goto out; + idx++; + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_PORT_NEW, + info->snd_portid, info->snd_seq, 0, + info->extack); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + struct devlink_port *devlink_port; + unsigned long index, port_index; + int start = cb->args[0]; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + xa_for_each(&devlink->ports, port_index, devlink_port) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_port_fill(msg, devlink_port, + DEVLINK_CMD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, cb->extack); + if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int devlink_port_type_set(struct devlink_port *devlink_port, + enum devlink_port_type port_type) + +{ + int err; + + if (!devlink_port->devlink->ops->port_type_set) + return -EOPNOTSUPP; + + if (port_type == devlink_port->type) + return 0; + + err = devlink_port->devlink->ops->port_type_set(devlink_port, + port_type); + if (err) + return err; + + devlink_port->desired_type = port_type; + devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); + return 0; +} + +static int devlink_port_function_hw_addr_set(struct devlink_port *port, + const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + const struct devlink_ops *ops = port->devlink->ops; + const u8 *hw_addr; + int hw_addr_len; + + hw_addr = nla_data(attr); + hw_addr_len = nla_len(attr); + if (hw_addr_len > MAX_ADDR_LEN) { + NL_SET_ERR_MSG_MOD(extack, "Port function hardware address too long"); + return -EINVAL; + } + if (port->type == DEVLINK_PORT_TYPE_ETH) { + if (hw_addr_len != ETH_ALEN) { + NL_SET_ERR_MSG_MOD(extack, "Address must be 6 bytes for Ethernet device"); + return -EINVAL; + } + if (!is_unicast_ether_addr(hw_addr)) { + NL_SET_ERR_MSG_MOD(extack, "Non-unicast hardware address unsupported"); + return -EINVAL; + } + } + + return ops->port_function_hw_addr_set(port, hw_addr, hw_addr_len, + extack); +} + +static int devlink_port_fn_state_set(struct devlink_port *port, + const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + enum devlink_port_fn_state state; + const struct devlink_ops *ops; + + state = nla_get_u8(attr); + ops = port->devlink->ops; + return ops->port_fn_state_set(port, state, extack); +} + +static int devlink_port_function_validate(struct devlink_port *devlink_port, + struct nlattr **tb, + struct netlink_ext_ack *extack) +{ + const struct devlink_ops *ops = devlink_port->devlink->ops; + struct nlattr *attr; + + if (tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] && + !ops->port_function_hw_addr_set) { + NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR], + "Port doesn't support function attributes"); + return -EOPNOTSUPP; + } + if (tb[DEVLINK_PORT_FN_ATTR_STATE] && !ops->port_fn_state_set) { + NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR], + "Function does not support state setting"); + return -EOPNOTSUPP; + } + attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; + if (attr) { + struct nla_bitfield32 caps; + + caps = nla_get_bitfield32(attr); + if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE && + !ops->port_fn_roce_set) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "Port doesn't support RoCE function attribute"); + return -EOPNOTSUPP; + } + if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) { + if (!ops->port_fn_migratable_set) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "Port doesn't support migratable function attribute"); + return -EOPNOTSUPP; + } + if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "migratable function attribute supported for VFs only"); + return -EOPNOTSUPP; + } + } + } + return 0; +} + +static int devlink_port_function_set(struct devlink_port *port, + const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1]; + int err; + + err = nla_parse_nested(tb, DEVLINK_PORT_FUNCTION_ATTR_MAX, attr, + devlink_function_nl_policy, extack); + if (err < 0) { + NL_SET_ERR_MSG_MOD(extack, "Fail to parse port function attributes"); + return err; + } + + err = devlink_port_function_validate(port, tb, extack); + if (err) + return err; + + attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR]; + if (attr) { + err = devlink_port_function_hw_addr_set(port, attr, extack); + if (err) + return err; + } + + attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; + if (attr) { + err = devlink_port_fn_caps_set(port, attr, extack); + if (err) + return err; + } + + /* Keep this as the last function attribute set, so that when + * multiple port function attributes are set along with state, + * Those can be applied first before activating the state. + */ + attr = tb[DEVLINK_PORT_FN_ATTR_STATE]; + if (attr) + err = devlink_port_fn_state_set(port, attr, extack); + + if (!err) + devlink_port_notify(port, DEVLINK_CMD_PORT_NEW); + return err; +} + +static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + int err; + + if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) { + enum devlink_port_type port_type; + + port_type = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_TYPE]); + err = devlink_port_type_set(devlink_port, port_type); + if (err) + return err; + } + + if (info->attrs[DEVLINK_ATTR_PORT_FUNCTION]) { + struct nlattr *attr = info->attrs[DEVLINK_ATTR_PORT_FUNCTION]; + struct netlink_ext_ack *extack = info->extack; + + err = devlink_port_function_set(devlink_port, attr, extack); + if (err) + return err; + } + + return 0; +} + +static int devlink_nl_cmd_port_split_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct devlink *devlink = info->user_ptr[0]; + u32 count; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PORT_SPLIT_COUNT)) + return -EINVAL; + if (!devlink->ops->port_split) + return -EOPNOTSUPP; + + count = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_SPLIT_COUNT]); + + if (!devlink_port->attrs.splittable) { + /* Split ports cannot be split. */ + if (devlink_port->attrs.split) + NL_SET_ERR_MSG_MOD(info->extack, "Port cannot be split further"); + else + NL_SET_ERR_MSG_MOD(info->extack, "Port cannot be split"); + return -EINVAL; + } + + if (count < 2 || !is_power_of_2(count) || count > devlink_port->attrs.lanes) { + NL_SET_ERR_MSG_MOD(info->extack, "Invalid split count"); + return -EINVAL; + } + + return devlink->ops->port_split(devlink, devlink_port, count, + info->extack); +} + +static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct devlink *devlink = info->user_ptr[0]; + + if (!devlink->ops->port_unsplit) + return -EOPNOTSUPP; + return devlink->ops->port_unsplit(devlink, devlink_port, info->extack); +} + +static int devlink_port_new_notify(struct devlink *devlink, + unsigned int port_index, + struct genl_info *info) +{ + struct devlink_port *devlink_port; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + lockdep_assert_held(&devlink->lock); + devlink_port = devlink_port_get_by_index(devlink, port_index); + if (!devlink_port) { + err = -ENODEV; + goto out; + } + + err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_NEW, + info->snd_portid, info->snd_seq, 0, NULL); + if (err) + goto out; + + return genlmsg_reply(msg, info); + +out: + nlmsg_free(msg); + return err; +} + +static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink_port_new_attrs new_attrs = {}; + struct devlink *devlink = info->user_ptr[0]; + unsigned int new_port_index; + int err; + + if (!devlink->ops->port_new || !devlink->ops->port_del) + return -EOPNOTSUPP; + + if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] || + !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) { + NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified"); + return -EINVAL; + } + new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]); + new_attrs.pfnum = + nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]); + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + /* Port index of the new port being created by driver. */ + new_attrs.port_index = + nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + new_attrs.port_index_valid = true; + } + if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) { + new_attrs.controller = + nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]); + new_attrs.controller_valid = true; + } + if (new_attrs.flavour == DEVLINK_PORT_FLAVOUR_PCI_SF && + info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) { + new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]); + new_attrs.sfnum_valid = true; + } + + err = devlink->ops->port_new(devlink, &new_attrs, extack, + &new_port_index); + if (err) + return err; + + err = devlink_port_new_notify(devlink, new_port_index, info); + if (err && err != -ENODEV) { + /* Fail to send the response; destroy newly created port. */ + devlink->ops->port_del(devlink, new_port_index, extack); + } + return err; +} + +static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + unsigned int port_index; + + if (!devlink->ops->port_del) + return -EOPNOTSUPP; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PORT_INDEX)) { + NL_SET_ERR_MSG_MOD(extack, "Port index is not specified"); + return -EINVAL; + } + port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + return devlink->ops->port_del(devlink, port_index, extack); +} + +static int +devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate, + struct genl_info *info, + struct nlattr *nla_parent) +{ + struct devlink *devlink = devlink_rate->devlink; + const char *parent_name = nla_data(nla_parent); + const struct devlink_ops *ops = devlink->ops; + size_t len = strlen(parent_name); + struct devlink_rate *parent; + int err = -EOPNOTSUPP; + + parent = devlink_rate->parent; + + if (parent && !len) { + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_parent_set(devlink_rate, NULL, + devlink_rate->priv, NULL, + info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_parent_set(devlink_rate, NULL, + devlink_rate->priv, NULL, + info->extack); + if (err) + return err; + + refcount_dec(&parent->refcnt); + devlink_rate->parent = NULL; + } else if (len) { + parent = devlink_rate_node_get_by_name(devlink, parent_name); + if (IS_ERR(parent)) + return -ENODEV; + + if (parent == devlink_rate) { + NL_SET_ERR_MSG_MOD(info->extack, "Parent to self is not allowed"); + return -EINVAL; + } + + if (devlink_rate_is_node(devlink_rate) && + devlink_rate_is_parent_node(devlink_rate, parent->parent)) { + NL_SET_ERR_MSG_MOD(info->extack, "Node is already a parent of parent node."); + return -EEXIST; + } + + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_parent_set(devlink_rate, parent, + devlink_rate->priv, parent->priv, + info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_parent_set(devlink_rate, parent, + devlink_rate->priv, parent->priv, + info->extack); + if (err) + return err; + + if (devlink_rate->parent) + /* we're reassigning to other parent in this case */ + refcount_dec(&devlink_rate->parent->refcnt); + + refcount_inc(&parent->refcnt); + devlink_rate->parent = parent; + } + + return 0; +} + +static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, + const struct devlink_ops *ops, + struct genl_info *info) +{ + struct nlattr *nla_parent, **attrs = info->attrs; + int err = -EOPNOTSUPP; + u32 priority; + u32 weight; + u64 rate; + + if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) { + rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_SHARE]); + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_tx_share_set(devlink_rate, devlink_rate->priv, + rate, info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_tx_share_set(devlink_rate, devlink_rate->priv, + rate, info->extack); + if (err) + return err; + devlink_rate->tx_share = rate; + } + + if (attrs[DEVLINK_ATTR_RATE_TX_MAX]) { + rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_MAX]); + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_tx_max_set(devlink_rate, devlink_rate->priv, + rate, info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_tx_max_set(devlink_rate, devlink_rate->priv, + rate, info->extack); + if (err) + return err; + devlink_rate->tx_max = rate; + } + + if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]) { + priority = nla_get_u32(attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]); + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_tx_priority_set(devlink_rate, devlink_rate->priv, + priority, info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_tx_priority_set(devlink_rate, devlink_rate->priv, + priority, info->extack); + + if (err) + return err; + devlink_rate->tx_priority = priority; + } + + if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]) { + weight = nla_get_u32(attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]); + if (devlink_rate_is_leaf(devlink_rate)) + err = ops->rate_leaf_tx_weight_set(devlink_rate, devlink_rate->priv, + weight, info->extack); + else if (devlink_rate_is_node(devlink_rate)) + err = ops->rate_node_tx_weight_set(devlink_rate, devlink_rate->priv, + weight, info->extack); + + if (err) + return err; + devlink_rate->tx_weight = weight; + } + + nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]; + if (nla_parent) { + err = devlink_nl_rate_parent_node_set(devlink_rate, info, + nla_parent); + if (err) + return err; + } + + return 0; +} + +static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops, + struct genl_info *info, + enum devlink_rate_type type) +{ + struct nlattr **attrs = info->attrs; + + if (type == DEVLINK_RATE_TYPE_LEAF) { + if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_leaf_tx_share_set) { + NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the leafs"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_leaf_tx_max_set) { + NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the leafs"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] && + !ops->rate_leaf_parent_set) { + NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_PRIORITY], + "TX priority set isn't supported for the leafs"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_leaf_tx_weight_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], + "TX weight set isn't supported for the leafs"); + return false; + } + } else if (type == DEVLINK_RATE_TYPE_NODE) { + if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) { + NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_node_tx_max_set) { + NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the nodes"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] && + !ops->rate_node_parent_set) { + NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_PRIORITY], + "TX priority set isn't supported for the nodes"); + return false; + } + if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_node_tx_weight_set) { + NL_SET_ERR_MSG_ATTR(info->extack, + attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], + "TX weight set isn't supported for the nodes"); + return false; + } + } else { + WARN(1, "Unknown type of rate object"); + return false; + } + + return true; +} + +static int devlink_nl_cmd_rate_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_rate *devlink_rate = info->user_ptr[1]; + struct devlink *devlink = devlink_rate->devlink; + const struct devlink_ops *ops = devlink->ops; + int err; + + if (!ops || !devlink_rate_set_ops_supported(ops, info, devlink_rate->type)) + return -EOPNOTSUPP; + + err = devlink_nl_rate_set(devlink_rate, ops, info); + + if (!err) + devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW); + return err; +} + +static int devlink_nl_cmd_rate_new_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_rate *rate_node; + const struct devlink_ops *ops; + int err; + + ops = devlink->ops; + if (!ops || !ops->rate_node_new || !ops->rate_node_del) { + NL_SET_ERR_MSG_MOD(info->extack, "Rate nodes aren't supported"); + return -EOPNOTSUPP; + } + + if (!devlink_rate_set_ops_supported(ops, info, DEVLINK_RATE_TYPE_NODE)) + return -EOPNOTSUPP; + + rate_node = devlink_rate_node_get_from_attrs(devlink, info->attrs); + if (!IS_ERR(rate_node)) + return -EEXIST; + else if (rate_node == ERR_PTR(-EINVAL)) + return -EINVAL; + + rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL); + if (!rate_node) + return -ENOMEM; + + rate_node->devlink = devlink; + rate_node->type = DEVLINK_RATE_TYPE_NODE; + rate_node->name = nla_strdup(info->attrs[DEVLINK_ATTR_RATE_NODE_NAME], GFP_KERNEL); + if (!rate_node->name) { + err = -ENOMEM; + goto err_strdup; + } + + err = ops->rate_node_new(rate_node, &rate_node->priv, info->extack); + if (err) + goto err_node_new; + + err = devlink_nl_rate_set(rate_node, ops, info); + if (err) + goto err_rate_set; + + refcount_set(&rate_node->refcnt, 1); + list_add(&rate_node->list, &devlink->rate_list); + devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); + return 0; + +err_rate_set: + ops->rate_node_del(rate_node, rate_node->priv, info->extack); +err_node_new: + kfree(rate_node->name); +err_strdup: + kfree(rate_node); + return err; +} + +static int devlink_nl_cmd_rate_del_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_rate *rate_node = info->user_ptr[1]; + struct devlink *devlink = rate_node->devlink; + const struct devlink_ops *ops = devlink->ops; + int err; + + if (refcount_read(&rate_node->refcnt) > 1) { + NL_SET_ERR_MSG_MOD(info->extack, "Node has children. Cannot delete node."); + return -EBUSY; + } + + devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL); + err = ops->rate_node_del(rate_node, rate_node->priv, info->extack); + if (rate_node->parent) + refcount_dec(&rate_node->parent->refcnt); + list_del(&rate_node->list); + kfree(rate_node->name); + kfree(rate_node); + return err; +} + +struct devlink_linecard_type { + const char *type; + const void *priv; +}; + +static int devlink_nl_linecard_fill(struct sk_buff *msg, + struct devlink *devlink, + struct devlink_linecard *linecard, + enum devlink_command cmd, u32 portid, + u32 seq, int flags, + struct netlink_ext_ack *extack) +{ + struct devlink_linecard_type *linecard_type; + struct nlattr *attr; + void *hdr; + int i; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index)) + goto nla_put_failure; + if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state)) + goto nla_put_failure; + if (linecard->type && + nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE, linecard->type)) + goto nla_put_failure; + + if (linecard->types_count) { + attr = nla_nest_start(msg, + DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES); + if (!attr) + goto nla_put_failure; + for (i = 0; i < linecard->types_count; i++) { + linecard_type = &linecard->types[i]; + if (nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE, + linecard_type->type)) { + nla_nest_cancel(msg, attr); + goto nla_put_failure; + } + } + nla_nest_end(msg, attr); + } + + if (linecard->nested_devlink && + devlink_nl_put_nested_handle(msg, linecard->nested_devlink)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void devlink_linecard_notify(struct devlink_linecard *linecard, + enum devlink_command cmd) +{ + struct devlink *devlink = linecard->devlink; + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_LINECARD_NEW && + cmd != DEVLINK_CMD_LINECARD_DEL); + + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_linecard_fill(msg, devlink, linecard, cmd, 0, 0, 0, + NULL); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int devlink_nl_cmd_linecard_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_linecard *linecard = info->user_ptr[1]; + struct devlink *devlink = linecard->devlink; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + mutex_lock(&linecard->state_lock); + err = devlink_nl_linecard_fill(msg, devlink, linecard, + DEVLINK_CMD_LINECARD_NEW, + info->snd_portid, info->snd_seq, 0, + info->extack); + mutex_unlock(&linecard->state_lock); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink_linecard *linecard; + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + mutex_lock(&devlink->linecards_lock); + list_for_each_entry(linecard, &devlink->linecard_list, list) { + if (idx < start) { + idx++; + continue; + } + mutex_lock(&linecard->state_lock); + err = devlink_nl_linecard_fill(msg, devlink, linecard, + DEVLINK_CMD_LINECARD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, + cb->extack); + mutex_unlock(&linecard->state_lock); + if (err) { + mutex_unlock(&devlink->linecards_lock); + devlink_put(devlink); + goto out; + } + idx++; + } + mutex_unlock(&devlink->linecards_lock); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static struct devlink_linecard_type * +devlink_linecard_type_lookup(struct devlink_linecard *linecard, + const char *type) +{ + struct devlink_linecard_type *linecard_type; + int i; + + for (i = 0; i < linecard->types_count; i++) { + linecard_type = &linecard->types[i]; + if (!strcmp(type, linecard_type->type)) + return linecard_type; + } + return NULL; +} + +static int devlink_linecard_type_set(struct devlink_linecard *linecard, + const char *type, + struct netlink_ext_ack *extack) +{ + const struct devlink_linecard_ops *ops = linecard->ops; + struct devlink_linecard_type *linecard_type; + int err; + + mutex_lock(&linecard->state_lock); + if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { + NL_SET_ERR_MSG_MOD(extack, "Line card is currently being provisioned"); + err = -EBUSY; + goto out; + } + if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { + NL_SET_ERR_MSG_MOD(extack, "Line card is currently being unprovisioned"); + err = -EBUSY; + goto out; + } + + linecard_type = devlink_linecard_type_lookup(linecard, type); + if (!linecard_type) { + NL_SET_ERR_MSG_MOD(extack, "Unsupported line card type provided"); + err = -EINVAL; + goto out; + } + + if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED && + linecard->state != DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { + NL_SET_ERR_MSG_MOD(extack, "Line card already provisioned"); + err = -EBUSY; + /* Check if the line card is provisioned in the same + * way the user asks. In case it is, make the operation + * to return success. + */ + if (ops->same_provision && + ops->same_provision(linecard, linecard->priv, + linecard_type->type, + linecard_type->priv)) + err = 0; + goto out; + } + + linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING; + linecard->type = linecard_type->type; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); + err = ops->provision(linecard, linecard->priv, linecard_type->type, + linecard_type->priv, extack); + if (err) { + /* Provisioning failed. Assume the linecard is unprovisioned + * for future operations. + */ + mutex_lock(&linecard->state_lock); + linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; + linecard->type = NULL; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); + } + return err; + +out: + mutex_unlock(&linecard->state_lock); + return err; +} + +static int devlink_linecard_type_unset(struct devlink_linecard *linecard, + struct netlink_ext_ack *extack) +{ + int err; + + mutex_lock(&linecard->state_lock); + if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { + NL_SET_ERR_MSG_MOD(extack, "Line card is currently being provisioned"); + err = -EBUSY; + goto out; + } + if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { + NL_SET_ERR_MSG_MOD(extack, "Line card is currently being unprovisioned"); + err = -EBUSY; + goto out; + } + if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { + linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; + linecard->type = NULL; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + err = 0; + goto out; + } + + if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) { + NL_SET_ERR_MSG_MOD(extack, "Line card is not provisioned"); + err = 0; + goto out; + } + linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONING; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); + err = linecard->ops->unprovision(linecard, linecard->priv, + extack); + if (err) { + /* Unprovisioning failed. Assume the linecard is unprovisioned + * for future operations. + */ + mutex_lock(&linecard->state_lock); + linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; + linecard->type = NULL; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); + } + return err; + +out: + mutex_unlock(&linecard->state_lock); + return err; +} + +static int devlink_nl_cmd_linecard_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_linecard *linecard = info->user_ptr[1]; + struct netlink_ext_ack *extack = info->extack; + int err; + + if (info->attrs[DEVLINK_ATTR_LINECARD_TYPE]) { + const char *type; + + type = nla_data(info->attrs[DEVLINK_ATTR_LINECARD_TYPE]); + if (strcmp(type, "")) { + err = devlink_linecard_type_set(linecard, type, extack); + if (err) + return err; + } else { + err = devlink_linecard_type_unset(linecard, extack); + if (err) + return err; + } + } + + return 0; +} + +static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink, + struct devlink_sb *devlink_sb, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_SIZE, devlink_sb->size)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_INGRESS_POOL_COUNT, + devlink_sb->ingress_pools_count)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_EGRESS_POOL_COUNT, + devlink_sb->egress_pools_count)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_INGRESS_TC_COUNT, + devlink_sb->ingress_tc_count)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_EGRESS_TC_COUNT, + devlink_sb->egress_tc_count)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_sb *devlink_sb; + struct sk_buff *msg; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_sb_fill(msg, devlink, devlink_sb, + DEVLINK_CMD_SB_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + struct devlink_sb *devlink_sb; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_sb_fill(msg, devlink, devlink_sb, + DEVLINK_CMD_SB_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int devlink_nl_sb_pool_fill(struct sk_buff *msg, struct devlink *devlink, + struct devlink_sb *devlink_sb, + u16 pool_index, enum devlink_command cmd, + u32 portid, u32 seq, int flags) +{ + struct devlink_sb_pool_info pool_info; + void *hdr; + int err; + + err = devlink->ops->sb_pool_get(devlink, devlink_sb->index, + pool_index, &pool_info); + if (err) + return err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_POOL_INDEX, pool_index)) + goto nla_put_failure; + if (nla_put_u8(msg, DEVLINK_ATTR_SB_POOL_TYPE, pool_info.pool_type)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_POOL_SIZE, pool_info.size)) + goto nla_put_failure; + if (nla_put_u8(msg, DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE, + pool_info.threshold_type)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_POOL_CELL_SIZE, + pool_info.cell_size)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_sb_pool_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_sb *devlink_sb; + struct sk_buff *msg; + u16 pool_index; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + err = devlink_sb_pool_index_get_from_info(devlink_sb, info, + &pool_index); + if (err) + return err; + + if (!devlink->ops->sb_pool_get) + return -EOPNOTSUPP; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_sb_pool_fill(msg, devlink, devlink_sb, pool_index, + DEVLINK_CMD_SB_POOL_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int __sb_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, + struct devlink *devlink, + struct devlink_sb *devlink_sb, + u32 portid, u32 seq) +{ + u16 pool_count = devlink_sb_pool_count(devlink_sb); + u16 pool_index; + int err; + + for (pool_index = 0; pool_index < pool_count; pool_index++) { + if (*p_idx < start) { + (*p_idx)++; + continue; + } + err = devlink_nl_sb_pool_fill(msg, devlink, + devlink_sb, + pool_index, + DEVLINK_CMD_SB_POOL_NEW, + portid, seq, NLM_F_MULTI); + if (err) + return err; + (*p_idx)++; + } + return 0; +} + +static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + struct devlink_sb *devlink_sb; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + if (!devlink->ops->sb_pool_get) + goto retry; + + devl_lock(devlink); + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + err = __sb_pool_get_dumpit(msg, start, &idx, devlink, + devlink_sb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + } + devl_unlock(devlink); +retry: + devlink_put(devlink); + } +out: + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +static int devlink_sb_pool_set(struct devlink *devlink, unsigned int sb_index, + u16 pool_index, u32 size, + enum devlink_sb_threshold_type threshold_type, + struct netlink_ext_ack *extack) + +{ + const struct devlink_ops *ops = devlink->ops; + + if (ops->sb_pool_set) + return ops->sb_pool_set(devlink, sb_index, pool_index, + size, threshold_type, extack); + return -EOPNOTSUPP; +} + +static int devlink_nl_cmd_sb_pool_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + enum devlink_sb_threshold_type threshold_type; + struct devlink_sb *devlink_sb; + u16 pool_index; + u32 size; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + err = devlink_sb_pool_index_get_from_info(devlink_sb, info, + &pool_index); + if (err) + return err; + + err = devlink_sb_th_type_get_from_info(info, &threshold_type); + if (err) + return err; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SB_POOL_SIZE)) + return -EINVAL; + + size = nla_get_u32(info->attrs[DEVLINK_ATTR_SB_POOL_SIZE]); + return devlink_sb_pool_set(devlink, devlink_sb->index, + pool_index, size, threshold_type, + info->extack); +} + +static int devlink_nl_sb_port_pool_fill(struct sk_buff *msg, + struct devlink *devlink, + struct devlink_port *devlink_port, + struct devlink_sb *devlink_sb, + u16 pool_index, + enum devlink_command cmd, + u32 portid, u32 seq, int flags) +{ + const struct devlink_ops *ops = devlink->ops; + u32 threshold; + void *hdr; + int err; + + err = ops->sb_port_pool_get(devlink_port, devlink_sb->index, + pool_index, &threshold); + if (err) + return err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_POOL_INDEX, pool_index)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_THRESHOLD, threshold)) + goto nla_put_failure; + + if (ops->sb_occ_port_pool_get) { + u32 cur; + u32 max; + + err = ops->sb_occ_port_pool_get(devlink_port, devlink_sb->index, + pool_index, &cur, &max); + if (err && err != -EOPNOTSUPP) + goto sb_occ_get_failure; + if (!err) { + if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_CUR, cur)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_MAX, max)) + goto nla_put_failure; + } + } + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + err = -EMSGSIZE; +sb_occ_get_failure: + genlmsg_cancel(msg, hdr); + return err; +} + +static int devlink_nl_cmd_sb_port_pool_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct devlink *devlink = devlink_port->devlink; + struct devlink_sb *devlink_sb; + struct sk_buff *msg; + u16 pool_index; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + err = devlink_sb_pool_index_get_from_info(devlink_sb, info, + &pool_index); + if (err) + return err; + + if (!devlink->ops->sb_port_pool_get) + return -EOPNOTSUPP; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_sb_port_pool_fill(msg, devlink, devlink_port, + devlink_sb, pool_index, + DEVLINK_CMD_SB_PORT_POOL_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int __sb_port_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, + struct devlink *devlink, + struct devlink_sb *devlink_sb, + u32 portid, u32 seq) +{ + struct devlink_port *devlink_port; + u16 pool_count = devlink_sb_pool_count(devlink_sb); + unsigned long port_index; + u16 pool_index; + int err; + + xa_for_each(&devlink->ports, port_index, devlink_port) { + for (pool_index = 0; pool_index < pool_count; pool_index++) { + if (*p_idx < start) { + (*p_idx)++; + continue; + } + err = devlink_nl_sb_port_pool_fill(msg, devlink, + devlink_port, + devlink_sb, + pool_index, + DEVLINK_CMD_SB_PORT_POOL_NEW, + portid, seq, + NLM_F_MULTI); + if (err) + return err; + (*p_idx)++; + } + } + return 0; +} + +static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + struct devlink_sb *devlink_sb; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + if (!devlink->ops->sb_port_pool_get) + goto retry; + + devl_lock(devlink); + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + err = __sb_port_pool_get_dumpit(msg, start, &idx, + devlink, devlink_sb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + } + devl_unlock(devlink); +retry: + devlink_put(devlink); + } +out: + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +static int devlink_sb_port_pool_set(struct devlink_port *devlink_port, + unsigned int sb_index, u16 pool_index, + u32 threshold, + struct netlink_ext_ack *extack) + +{ + const struct devlink_ops *ops = devlink_port->devlink->ops; + + if (ops->sb_port_pool_set) + return ops->sb_port_pool_set(devlink_port, sb_index, + pool_index, threshold, extack); + return -EOPNOTSUPP; +} + +static int devlink_nl_cmd_sb_port_pool_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct devlink *devlink = info->user_ptr[0]; + struct devlink_sb *devlink_sb; + u16 pool_index; + u32 threshold; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + err = devlink_sb_pool_index_get_from_info(devlink_sb, info, + &pool_index); + if (err) + return err; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SB_THRESHOLD)) + return -EINVAL; + + threshold = nla_get_u32(info->attrs[DEVLINK_ATTR_SB_THRESHOLD]); + return devlink_sb_port_pool_set(devlink_port, devlink_sb->index, + pool_index, threshold, info->extack); +} + +static int +devlink_nl_sb_tc_pool_bind_fill(struct sk_buff *msg, struct devlink *devlink, + struct devlink_port *devlink_port, + struct devlink_sb *devlink_sb, u16 tc_index, + enum devlink_sb_pool_type pool_type, + enum devlink_command cmd, + u32 portid, u32 seq, int flags) +{ + const struct devlink_ops *ops = devlink->ops; + u16 pool_index; + u32 threshold; + void *hdr; + int err; + + err = ops->sb_tc_pool_bind_get(devlink_port, devlink_sb->index, + tc_index, pool_type, + &pool_index, &threshold); + if (err) + return err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_INDEX, devlink_sb->index)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_TC_INDEX, tc_index)) + goto nla_put_failure; + if (nla_put_u8(msg, DEVLINK_ATTR_SB_POOL_TYPE, pool_type)) + goto nla_put_failure; + if (nla_put_u16(msg, DEVLINK_ATTR_SB_POOL_INDEX, pool_index)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_THRESHOLD, threshold)) + goto nla_put_failure; + + if (ops->sb_occ_tc_port_bind_get) { + u32 cur; + u32 max; + + err = ops->sb_occ_tc_port_bind_get(devlink_port, + devlink_sb->index, + tc_index, pool_type, + &cur, &max); + if (err && err != -EOPNOTSUPP) + return err; + if (!err) { + if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_CUR, cur)) + goto nla_put_failure; + if (nla_put_u32(msg, DEVLINK_ATTR_SB_OCC_MAX, max)) + goto nla_put_failure; + } + } + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_sb_tc_pool_bind_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct devlink *devlink = devlink_port->devlink; + struct devlink_sb *devlink_sb; + struct sk_buff *msg; + enum devlink_sb_pool_type pool_type; + u16 tc_index; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + err = devlink_sb_pool_type_get_from_info(info, &pool_type); + if (err) + return err; + + err = devlink_sb_tc_index_get_from_info(devlink_sb, info, + pool_type, &tc_index); + if (err) + return err; + + if (!devlink->ops->sb_tc_pool_bind_get) + return -EOPNOTSUPP; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_sb_tc_pool_bind_fill(msg, devlink, devlink_port, + devlink_sb, tc_index, pool_type, + DEVLINK_CMD_SB_TC_POOL_BIND_NEW, + info->snd_portid, + info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int __sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, + int start, int *p_idx, + struct devlink *devlink, + struct devlink_sb *devlink_sb, + u32 portid, u32 seq) +{ + struct devlink_port *devlink_port; + unsigned long port_index; + u16 tc_index; + int err; + + xa_for_each(&devlink->ports, port_index, devlink_port) { + for (tc_index = 0; + tc_index < devlink_sb->ingress_tc_count; tc_index++) { + if (*p_idx < start) { + (*p_idx)++; + continue; + } + err = devlink_nl_sb_tc_pool_bind_fill(msg, devlink, + devlink_port, + devlink_sb, + tc_index, + DEVLINK_SB_POOL_TYPE_INGRESS, + DEVLINK_CMD_SB_TC_POOL_BIND_NEW, + portid, seq, + NLM_F_MULTI); + if (err) + return err; + (*p_idx)++; + } + for (tc_index = 0; + tc_index < devlink_sb->egress_tc_count; tc_index++) { + if (*p_idx < start) { + (*p_idx)++; + continue; + } + err = devlink_nl_sb_tc_pool_bind_fill(msg, devlink, + devlink_port, + devlink_sb, + tc_index, + DEVLINK_SB_POOL_TYPE_EGRESS, + DEVLINK_CMD_SB_TC_POOL_BIND_NEW, + portid, seq, + NLM_F_MULTI); + if (err) + return err; + (*p_idx)++; + } + } + return 0; +} + +static int +devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + struct devlink_sb *devlink_sb; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + if (!devlink->ops->sb_tc_pool_bind_get) + goto retry; + + devl_lock(devlink); + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + err = __sb_tc_pool_bind_get_dumpit(msg, start, &idx, + devlink, + devlink_sb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + } + devl_unlock(devlink); +retry: + devlink_put(devlink); + } +out: + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +static int devlink_sb_tc_pool_bind_set(struct devlink_port *devlink_port, + unsigned int sb_index, u16 tc_index, + enum devlink_sb_pool_type pool_type, + u16 pool_index, u32 threshold, + struct netlink_ext_ack *extack) + +{ + const struct devlink_ops *ops = devlink_port->devlink->ops; + + if (ops->sb_tc_pool_bind_set) + return ops->sb_tc_pool_bind_set(devlink_port, sb_index, + tc_index, pool_type, + pool_index, threshold, extack); + return -EOPNOTSUPP; +} + +static int devlink_nl_cmd_sb_tc_pool_bind_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_port *devlink_port = info->user_ptr[1]; + struct devlink *devlink = info->user_ptr[0]; + enum devlink_sb_pool_type pool_type; + struct devlink_sb *devlink_sb; + u16 tc_index; + u16 pool_index; + u32 threshold; + int err; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + err = devlink_sb_pool_type_get_from_info(info, &pool_type); + if (err) + return err; + + err = devlink_sb_tc_index_get_from_info(devlink_sb, info, + pool_type, &tc_index); + if (err) + return err; + + err = devlink_sb_pool_index_get_from_info(devlink_sb, info, + &pool_index); + if (err) + return err; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SB_THRESHOLD)) + return -EINVAL; + + threshold = nla_get_u32(info->attrs[DEVLINK_ATTR_SB_THRESHOLD]); + return devlink_sb_tc_pool_bind_set(devlink_port, devlink_sb->index, + tc_index, pool_type, + pool_index, threshold, info->extack); +} + +static int devlink_nl_cmd_sb_occ_snapshot_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + const struct devlink_ops *ops = devlink->ops; + struct devlink_sb *devlink_sb; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + if (ops->sb_occ_snapshot) + return ops->sb_occ_snapshot(devlink, devlink_sb->index); + return -EOPNOTSUPP; +} + +static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + const struct devlink_ops *ops = devlink->ops; + struct devlink_sb *devlink_sb; + + devlink_sb = devlink_sb_get_from_info(devlink, info); + if (IS_ERR(devlink_sb)) + return PTR_ERR(devlink_sb); + + if (ops->sb_occ_max_clear) + return ops->sb_occ_max_clear(devlink, devlink_sb->index); + return -EOPNOTSUPP; +} + +static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + const struct devlink_ops *ops = devlink->ops; + enum devlink_eswitch_encap_mode encap_mode; + u8 inline_mode; + void *hdr; + int err = 0; + u16 mode; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + err = devlink_nl_put_handle(msg, devlink); + if (err) + goto nla_put_failure; + + if (ops->eswitch_mode_get) { + err = ops->eswitch_mode_get(devlink, &mode); + if (err) + goto nla_put_failure; + err = nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode); + if (err) + goto nla_put_failure; + } + + if (ops->eswitch_inline_mode_get) { + err = ops->eswitch_inline_mode_get(devlink, &inline_mode); + if (err) + goto nla_put_failure; + err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_INLINE_MODE, + inline_mode); + if (err) + goto nla_put_failure; + } + + if (ops->eswitch_encap_mode_get) { + err = ops->eswitch_encap_mode_get(devlink, &encap_mode); + if (err) + goto nla_put_failure; + err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, encap_mode); + if (err) + goto nla_put_failure; + } + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return err; +} + +static int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_GET, + info->snd_portid, info->snd_seq, 0); + + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, + struct netlink_ext_ack *extack) +{ + struct devlink_rate *devlink_rate; + + list_for_each_entry(devlink_rate, &devlink->rate_list, list) + if (devlink_rate_is_node(devlink_rate)) { + NL_SET_ERR_MSG_MOD(extack, "Rate node(s) exists."); + return -EBUSY; + } + return 0; +} + +static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + const struct devlink_ops *ops = devlink->ops; + enum devlink_eswitch_encap_mode encap_mode; + u8 inline_mode; + int err = 0; + u16 mode; + + if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) { + if (!ops->eswitch_mode_set) + return -EOPNOTSUPP; + mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]); + err = devlink_rate_nodes_check(devlink, mode, info->extack); + if (err) + return err; + err = ops->eswitch_mode_set(devlink, mode, info->extack); + if (err) + return err; + } + + if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) { + if (!ops->eswitch_inline_mode_set) + return -EOPNOTSUPP; + inline_mode = nla_get_u8( + info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]); + err = ops->eswitch_inline_mode_set(devlink, inline_mode, + info->extack); + if (err) + return err; + } + + if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) { + if (!ops->eswitch_encap_mode_set) + return -EOPNOTSUPP; + encap_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]); + err = ops->eswitch_encap_mode_set(devlink, encap_mode, + info->extack); + if (err) + return err; + } + + return 0; +} + +int devlink_dpipe_match_put(struct sk_buff *skb, + struct devlink_dpipe_match *match) +{ + struct devlink_dpipe_header *header = match->header; + struct devlink_dpipe_field *field = &header->fields[match->field_id]; + struct nlattr *match_attr; + + match_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_MATCH); + if (!match_attr) + return -EMSGSIZE; + + if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_MATCH_TYPE, match->type) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_INDEX, match->header_index) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) || + nla_put_u8(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL, header->global)) + goto nla_put_failure; + + nla_nest_end(skb, match_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, match_attr); + return -EMSGSIZE; +} +EXPORT_SYMBOL_GPL(devlink_dpipe_match_put); + +static int devlink_dpipe_matches_put(struct devlink_dpipe_table *table, + struct sk_buff *skb) +{ + struct nlattr *matches_attr; + + matches_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_TABLE_MATCHES); + if (!matches_attr) + return -EMSGSIZE; + + if (table->table_ops->matches_dump(table->priv, skb)) + goto nla_put_failure; + + nla_nest_end(skb, matches_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, matches_attr); + return -EMSGSIZE; +} + +int devlink_dpipe_action_put(struct sk_buff *skb, + struct devlink_dpipe_action *action) +{ + struct devlink_dpipe_header *header = action->header; + struct devlink_dpipe_field *field = &header->fields[action->field_id]; + struct nlattr *action_attr; + + action_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_ACTION); + if (!action_attr) + return -EMSGSIZE; + + if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_ACTION_TYPE, action->type) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_INDEX, action->header_index) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) || + nla_put_u8(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL, header->global)) + goto nla_put_failure; + + nla_nest_end(skb, action_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, action_attr); + return -EMSGSIZE; +} +EXPORT_SYMBOL_GPL(devlink_dpipe_action_put); + +static int devlink_dpipe_actions_put(struct devlink_dpipe_table *table, + struct sk_buff *skb) +{ + struct nlattr *actions_attr; + + actions_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_TABLE_ACTIONS); + if (!actions_attr) + return -EMSGSIZE; + + if (table->table_ops->actions_dump(table->priv, skb)) + goto nla_put_failure; + + nla_nest_end(skb, actions_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, actions_attr); + return -EMSGSIZE; +} + +static int devlink_dpipe_table_put(struct sk_buff *skb, + struct devlink_dpipe_table *table) +{ + struct nlattr *table_attr; + u64 table_size; + + table_size = table->table_ops->size_get(table->priv); + table_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_TABLE); + if (!table_attr) + return -EMSGSIZE; + + if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_TABLE_NAME, table->name) || + nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_SIZE, table_size, + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (nla_put_u8(skb, DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED, + table->counters_enabled)) + goto nla_put_failure; + + if (table->resource_valid) { + if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID, + table->resource_id, DEVLINK_ATTR_PAD) || + nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_UNITS, + table->resource_units, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + } + if (devlink_dpipe_matches_put(table, skb)) + goto nla_put_failure; + + if (devlink_dpipe_actions_put(table, skb)) + goto nla_put_failure; + + nla_nest_end(skb, table_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, table_attr); + return -EMSGSIZE; +} + +static int devlink_dpipe_send_and_alloc_skb(struct sk_buff **pskb, + struct genl_info *info) +{ + int err; + + if (*pskb) { + err = genlmsg_reply(*pskb, info); + if (err) + return err; + } + *pskb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!*pskb) + return -ENOMEM; + return 0; +} + +static int devlink_dpipe_tables_fill(struct genl_info *info, + enum devlink_command cmd, int flags, + struct list_head *dpipe_tables, + const char *table_name) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_dpipe_table *table; + struct nlattr *tables_attr; + struct sk_buff *skb = NULL; + struct nlmsghdr *nlh; + bool incomplete; + void *hdr; + int i; + int err; + + table = list_first_entry(dpipe_tables, + struct devlink_dpipe_table, list); +start_again: + err = devlink_dpipe_send_and_alloc_skb(&skb, info); + if (err) + return err; + + hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, + &devlink_nl_family, NLM_F_MULTI, cmd); + if (!hdr) { + nlmsg_free(skb); + return -EMSGSIZE; + } + + if (devlink_nl_put_handle(skb, devlink)) + goto nla_put_failure; + tables_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_TABLES); + if (!tables_attr) + goto nla_put_failure; + + i = 0; + incomplete = false; + list_for_each_entry_from(table, dpipe_tables, list) { + if (!table_name) { + err = devlink_dpipe_table_put(skb, table); + if (err) { + if (!i) + goto err_table_put; + incomplete = true; + break; + } + } else { + if (!strcmp(table->name, table_name)) { + err = devlink_dpipe_table_put(skb, table); + if (err) + break; + } + } + i++; + } + + nla_nest_end(skb, tables_attr); + genlmsg_end(skb, hdr); + if (incomplete) + goto start_again; + +send_done: + nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, flags | NLM_F_MULTI); + if (!nlh) { + err = devlink_dpipe_send_and_alloc_skb(&skb, info); + if (err) + return err; + goto send_done; + } + + return genlmsg_reply(skb, info); + +nla_put_failure: + err = -EMSGSIZE; +err_table_put: + nlmsg_free(skb); + return err; +} + +static int devlink_nl_cmd_dpipe_table_get(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + const char *table_name = NULL; + + if (info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]) + table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]); + + return devlink_dpipe_tables_fill(info, DEVLINK_CMD_DPIPE_TABLE_GET, 0, + &devlink->dpipe_table_list, + table_name); +} + +static int devlink_dpipe_value_put(struct sk_buff *skb, + struct devlink_dpipe_value *value) +{ + if (nla_put(skb, DEVLINK_ATTR_DPIPE_VALUE, + value->value_size, value->value)) + return -EMSGSIZE; + if (value->mask) + if (nla_put(skb, DEVLINK_ATTR_DPIPE_VALUE_MASK, + value->value_size, value->mask)) + return -EMSGSIZE; + if (value->mapping_valid) + if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_VALUE_MAPPING, + value->mapping_value)) + return -EMSGSIZE; + return 0; +} + +static int devlink_dpipe_action_value_put(struct sk_buff *skb, + struct devlink_dpipe_value *value) +{ + if (!value->action) + return -EINVAL; + if (devlink_dpipe_action_put(skb, value->action)) + return -EMSGSIZE; + if (devlink_dpipe_value_put(skb, value)) + return -EMSGSIZE; + return 0; +} + +static int devlink_dpipe_action_values_put(struct sk_buff *skb, + struct devlink_dpipe_value *values, + unsigned int values_count) +{ + struct nlattr *action_attr; + int i; + int err; + + for (i = 0; i < values_count; i++) { + action_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_ACTION_VALUE); + if (!action_attr) + return -EMSGSIZE; + err = devlink_dpipe_action_value_put(skb, &values[i]); + if (err) + goto err_action_value_put; + nla_nest_end(skb, action_attr); + } + return 0; + +err_action_value_put: + nla_nest_cancel(skb, action_attr); + return err; +} + +static int devlink_dpipe_match_value_put(struct sk_buff *skb, + struct devlink_dpipe_value *value) +{ + if (!value->match) + return -EINVAL; + if (devlink_dpipe_match_put(skb, value->match)) + return -EMSGSIZE; + if (devlink_dpipe_value_put(skb, value)) + return -EMSGSIZE; + return 0; +} + +static int devlink_dpipe_match_values_put(struct sk_buff *skb, + struct devlink_dpipe_value *values, + unsigned int values_count) +{ + struct nlattr *match_attr; + int i; + int err; + + for (i = 0; i < values_count; i++) { + match_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_MATCH_VALUE); + if (!match_attr) + return -EMSGSIZE; + err = devlink_dpipe_match_value_put(skb, &values[i]); + if (err) + goto err_match_value_put; + nla_nest_end(skb, match_attr); + } + return 0; + +err_match_value_put: + nla_nest_cancel(skb, match_attr); + return err; +} + +static int devlink_dpipe_entry_put(struct sk_buff *skb, + struct devlink_dpipe_entry *entry) +{ + struct nlattr *entry_attr, *matches_attr, *actions_attr; + int err; + + entry_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_ENTRY); + if (!entry_attr) + return -EMSGSIZE; + + if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_ENTRY_INDEX, entry->index, + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (entry->counter_valid) + if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_ENTRY_COUNTER, + entry->counter, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + matches_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_ENTRY_MATCH_VALUES); + if (!matches_attr) + goto nla_put_failure; + + err = devlink_dpipe_match_values_put(skb, entry->match_values, + entry->match_values_count); + if (err) { + nla_nest_cancel(skb, matches_attr); + goto err_match_values_put; + } + nla_nest_end(skb, matches_attr); + + actions_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_ENTRY_ACTION_VALUES); + if (!actions_attr) + goto nla_put_failure; + + err = devlink_dpipe_action_values_put(skb, entry->action_values, + entry->action_values_count); + if (err) { + nla_nest_cancel(skb, actions_attr); + goto err_action_values_put; + } + nla_nest_end(skb, actions_attr); + + nla_nest_end(skb, entry_attr); + return 0; + +nla_put_failure: + err = -EMSGSIZE; +err_match_values_put: +err_action_values_put: + nla_nest_cancel(skb, entry_attr); + return err; +} + +static struct devlink_dpipe_table * +devlink_dpipe_table_find(struct list_head *dpipe_tables, + const char *table_name, struct devlink *devlink) +{ + struct devlink_dpipe_table *table; + list_for_each_entry_rcu(table, dpipe_tables, list, + lockdep_is_held(&devlink->lock)) { + if (!strcmp(table->name, table_name)) + return table; + } + return NULL; +} + +int devlink_dpipe_entry_ctx_prepare(struct devlink_dpipe_dump_ctx *dump_ctx) +{ + struct devlink *devlink; + int err; + + err = devlink_dpipe_send_and_alloc_skb(&dump_ctx->skb, + dump_ctx->info); + if (err) + return err; + + dump_ctx->hdr = genlmsg_put(dump_ctx->skb, + dump_ctx->info->snd_portid, + dump_ctx->info->snd_seq, + &devlink_nl_family, NLM_F_MULTI, + dump_ctx->cmd); + if (!dump_ctx->hdr) + goto nla_put_failure; + + devlink = dump_ctx->info->user_ptr[0]; + if (devlink_nl_put_handle(dump_ctx->skb, devlink)) + goto nla_put_failure; + dump_ctx->nest = nla_nest_start_noflag(dump_ctx->skb, + DEVLINK_ATTR_DPIPE_ENTRIES); + if (!dump_ctx->nest) + goto nla_put_failure; + return 0; + +nla_put_failure: + nlmsg_free(dump_ctx->skb); + return -EMSGSIZE; +} +EXPORT_SYMBOL_GPL(devlink_dpipe_entry_ctx_prepare); + +int devlink_dpipe_entry_ctx_append(struct devlink_dpipe_dump_ctx *dump_ctx, + struct devlink_dpipe_entry *entry) +{ + return devlink_dpipe_entry_put(dump_ctx->skb, entry); +} +EXPORT_SYMBOL_GPL(devlink_dpipe_entry_ctx_append); + +int devlink_dpipe_entry_ctx_close(struct devlink_dpipe_dump_ctx *dump_ctx) +{ + nla_nest_end(dump_ctx->skb, dump_ctx->nest); + genlmsg_end(dump_ctx->skb, dump_ctx->hdr); + return 0; +} +EXPORT_SYMBOL_GPL(devlink_dpipe_entry_ctx_close); + +void devlink_dpipe_entry_clear(struct devlink_dpipe_entry *entry) + +{ + unsigned int value_count, value_index; + struct devlink_dpipe_value *value; + + value = entry->action_values; + value_count = entry->action_values_count; + for (value_index = 0; value_index < value_count; value_index++) { + kfree(value[value_index].value); + kfree(value[value_index].mask); + } + + value = entry->match_values; + value_count = entry->match_values_count; + for (value_index = 0; value_index < value_count; value_index++) { + kfree(value[value_index].value); + kfree(value[value_index].mask); + } +} +EXPORT_SYMBOL_GPL(devlink_dpipe_entry_clear); + +static int devlink_dpipe_entries_fill(struct genl_info *info, + enum devlink_command cmd, int flags, + struct devlink_dpipe_table *table) +{ + struct devlink_dpipe_dump_ctx dump_ctx; + struct nlmsghdr *nlh; + int err; + + dump_ctx.skb = NULL; + dump_ctx.cmd = cmd; + dump_ctx.info = info; + + err = table->table_ops->entries_dump(table->priv, + table->counters_enabled, + &dump_ctx); + if (err) + return err; + +send_done: + nlh = nlmsg_put(dump_ctx.skb, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, flags | NLM_F_MULTI); + if (!nlh) { + err = devlink_dpipe_send_and_alloc_skb(&dump_ctx.skb, info); + if (err) + return err; + goto send_done; + } + return genlmsg_reply(dump_ctx.skb, info); +} + +static int devlink_nl_cmd_dpipe_entries_get(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_dpipe_table *table; + const char *table_name; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_DPIPE_TABLE_NAME)) + return -EINVAL; + + table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]); + table = devlink_dpipe_table_find(&devlink->dpipe_table_list, + table_name, devlink); + if (!table) + return -EINVAL; + + if (!table->table_ops->entries_dump) + return -EINVAL; + + return devlink_dpipe_entries_fill(info, DEVLINK_CMD_DPIPE_ENTRIES_GET, + 0, table); +} + +static int devlink_dpipe_fields_put(struct sk_buff *skb, + const struct devlink_dpipe_header *header) +{ + struct devlink_dpipe_field *field; + struct nlattr *field_attr; + int i; + + for (i = 0; i < header->fields_count; i++) { + field = &header->fields[i]; + field_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_FIELD); + if (!field_attr) + return -EMSGSIZE; + if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_FIELD_NAME, field->name) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH, field->bitwidth) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE, field->mapping_type)) + goto nla_put_failure; + nla_nest_end(skb, field_attr); + } + return 0; + +nla_put_failure: + nla_nest_cancel(skb, field_attr); + return -EMSGSIZE; +} + +static int devlink_dpipe_header_put(struct sk_buff *skb, + struct devlink_dpipe_header *header) +{ + struct nlattr *fields_attr, *header_attr; + int err; + + header_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_HEADER); + if (!header_attr) + return -EMSGSIZE; + + if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_HEADER_NAME, header->name) || + nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) || + nla_put_u8(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL, header->global)) + goto nla_put_failure; + + fields_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_DPIPE_HEADER_FIELDS); + if (!fields_attr) + goto nla_put_failure; + + err = devlink_dpipe_fields_put(skb, header); + if (err) { + nla_nest_cancel(skb, fields_attr); + goto nla_put_failure; + } + nla_nest_end(skb, fields_attr); + nla_nest_end(skb, header_attr); + return 0; + +nla_put_failure: + err = -EMSGSIZE; + nla_nest_cancel(skb, header_attr); + return err; +} + +static int devlink_dpipe_headers_fill(struct genl_info *info, + enum devlink_command cmd, int flags, + struct devlink_dpipe_headers * + dpipe_headers) +{ + struct devlink *devlink = info->user_ptr[0]; + struct nlattr *headers_attr; + struct sk_buff *skb = NULL; + struct nlmsghdr *nlh; + void *hdr; + int i, j; + int err; + + i = 0; +start_again: + err = devlink_dpipe_send_and_alloc_skb(&skb, info); + if (err) + return err; + + hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, + &devlink_nl_family, NLM_F_MULTI, cmd); + if (!hdr) { + nlmsg_free(skb); + return -EMSGSIZE; + } + + if (devlink_nl_put_handle(skb, devlink)) + goto nla_put_failure; + headers_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_DPIPE_HEADERS); + if (!headers_attr) + goto nla_put_failure; + + j = 0; + for (; i < dpipe_headers->headers_count; i++) { + err = devlink_dpipe_header_put(skb, dpipe_headers->headers[i]); + if (err) { + if (!j) + goto err_table_put; + break; + } + j++; + } + nla_nest_end(skb, headers_attr); + genlmsg_end(skb, hdr); + if (i != dpipe_headers->headers_count) + goto start_again; + +send_done: + nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, flags | NLM_F_MULTI); + if (!nlh) { + err = devlink_dpipe_send_and_alloc_skb(&skb, info); + if (err) + return err; + goto send_done; + } + return genlmsg_reply(skb, info); + +nla_put_failure: + err = -EMSGSIZE; +err_table_put: + nlmsg_free(skb); + return err; +} + +static int devlink_nl_cmd_dpipe_headers_get(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + + if (!devlink->dpipe_headers) + return -EOPNOTSUPP; + return devlink_dpipe_headers_fill(info, DEVLINK_CMD_DPIPE_HEADERS_GET, + 0, devlink->dpipe_headers); +} + +static int devlink_dpipe_table_counters_set(struct devlink *devlink, + const char *table_name, + bool enable) +{ + struct devlink_dpipe_table *table; + + table = devlink_dpipe_table_find(&devlink->dpipe_table_list, + table_name, devlink); + if (!table) + return -EINVAL; + + if (table->counter_control_extern) + return -EOPNOTSUPP; + + if (!(table->counters_enabled ^ enable)) + return 0; + + table->counters_enabled = enable; + if (table->table_ops->counters_set_update) + table->table_ops->counters_set_update(table->priv, enable); + return 0; +} + +static int devlink_nl_cmd_dpipe_table_counters_set(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + const char *table_name; + bool counters_enable; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_DPIPE_TABLE_NAME) || + GENL_REQ_ATTR_CHECK(info, + DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED)) + return -EINVAL; + + table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]); + counters_enable = !!nla_get_u8(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED]); + + return devlink_dpipe_table_counters_set(devlink, table_name, + counters_enable); +} + +static struct devlink_resource * +devlink_resource_find(struct devlink *devlink, + struct devlink_resource *resource, u64 resource_id) +{ + struct list_head *resource_list; + + if (resource) + resource_list = &resource->resource_list; + else + resource_list = &devlink->resource_list; + + list_for_each_entry(resource, resource_list, list) { + struct devlink_resource *child_resource; + + if (resource->id == resource_id) + return resource; + + child_resource = devlink_resource_find(devlink, resource, + resource_id); + if (child_resource) + return child_resource; + } + return NULL; +} + +static void +devlink_resource_validate_children(struct devlink_resource *resource) +{ + struct devlink_resource *child_resource; + bool size_valid = true; + u64 parts_size = 0; + + if (list_empty(&resource->resource_list)) + goto out; + + list_for_each_entry(child_resource, &resource->resource_list, list) + parts_size += child_resource->size_new; + + if (parts_size > resource->size_new) + size_valid = false; +out: + resource->size_valid = size_valid; +} + +static int +devlink_resource_validate_size(struct devlink_resource *resource, u64 size, + struct netlink_ext_ack *extack) +{ + u64 reminder; + int err = 0; + + if (size > resource->size_params.size_max) { + NL_SET_ERR_MSG_MOD(extack, "Size larger than maximum"); + err = -EINVAL; + } + + if (size < resource->size_params.size_min) { + NL_SET_ERR_MSG_MOD(extack, "Size smaller than minimum"); + err = -EINVAL; + } + + div64_u64_rem(size, resource->size_params.size_granularity, &reminder); + if (reminder) { + NL_SET_ERR_MSG_MOD(extack, "Wrong granularity"); + err = -EINVAL; + } + + return err; +} + +static int devlink_nl_cmd_resource_set(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_resource *resource; + u64 resource_id; + u64 size; + int err; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_RESOURCE_ID) || + GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_RESOURCE_SIZE)) + return -EINVAL; + resource_id = nla_get_u64(info->attrs[DEVLINK_ATTR_RESOURCE_ID]); + + resource = devlink_resource_find(devlink, NULL, resource_id); + if (!resource) + return -EINVAL; + + size = nla_get_u64(info->attrs[DEVLINK_ATTR_RESOURCE_SIZE]); + err = devlink_resource_validate_size(resource, size, info->extack); + if (err) + return err; + + resource->size_new = size; + devlink_resource_validate_children(resource); + if (resource->parent) + devlink_resource_validate_children(resource->parent); + return 0; +} + +static int +devlink_resource_size_params_put(struct devlink_resource *resource, + struct sk_buff *skb) +{ + struct devlink_resource_size_params *size_params; + + size_params = &resource->size_params; + if (nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_GRAN, + size_params->size_granularity, DEVLINK_ATTR_PAD) || + nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_MAX, + size_params->size_max, DEVLINK_ATTR_PAD) || + nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_MIN, + size_params->size_min, DEVLINK_ATTR_PAD) || + nla_put_u8(skb, DEVLINK_ATTR_RESOURCE_UNIT, size_params->unit)) + return -EMSGSIZE; + return 0; +} + +static int devlink_resource_occ_put(struct devlink_resource *resource, + struct sk_buff *skb) +{ + if (!resource->occ_get) + return 0; + return nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_OCC, + resource->occ_get(resource->occ_get_priv), + DEVLINK_ATTR_PAD); +} + +static int devlink_resource_put(struct devlink *devlink, struct sk_buff *skb, + struct devlink_resource *resource) +{ + struct devlink_resource *child_resource; + struct nlattr *child_resource_attr; + struct nlattr *resource_attr; + + resource_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_RESOURCE); + if (!resource_attr) + return -EMSGSIZE; + + if (nla_put_string(skb, DEVLINK_ATTR_RESOURCE_NAME, resource->name) || + nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE, resource->size, + DEVLINK_ATTR_PAD) || + nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_ID, resource->id, + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (resource->size != resource->size_new && + nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_NEW, + resource->size_new, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (devlink_resource_occ_put(resource, skb)) + goto nla_put_failure; + if (devlink_resource_size_params_put(resource, skb)) + goto nla_put_failure; + if (list_empty(&resource->resource_list)) + goto out; + + if (nla_put_u8(skb, DEVLINK_ATTR_RESOURCE_SIZE_VALID, + resource->size_valid)) + goto nla_put_failure; + + child_resource_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_RESOURCE_LIST); + if (!child_resource_attr) + goto nla_put_failure; + + list_for_each_entry(child_resource, &resource->resource_list, list) { + if (devlink_resource_put(devlink, skb, child_resource)) + goto resource_put_failure; + } + + nla_nest_end(skb, child_resource_attr); +out: + nla_nest_end(skb, resource_attr); + return 0; + +resource_put_failure: + nla_nest_cancel(skb, child_resource_attr); +nla_put_failure: + nla_nest_cancel(skb, resource_attr); + return -EMSGSIZE; +} + +static int devlink_resource_fill(struct genl_info *info, + enum devlink_command cmd, int flags) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_resource *resource; + struct nlattr *resources_attr; + struct sk_buff *skb = NULL; + struct nlmsghdr *nlh; + bool incomplete; + void *hdr; + int i; + int err; + + resource = list_first_entry(&devlink->resource_list, + struct devlink_resource, list); +start_again: + err = devlink_dpipe_send_and_alloc_skb(&skb, info); + if (err) + return err; + + hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, + &devlink_nl_family, NLM_F_MULTI, cmd); + if (!hdr) { + nlmsg_free(skb); + return -EMSGSIZE; + } + + if (devlink_nl_put_handle(skb, devlink)) + goto nla_put_failure; + + resources_attr = nla_nest_start_noflag(skb, + DEVLINK_ATTR_RESOURCE_LIST); + if (!resources_attr) + goto nla_put_failure; + + incomplete = false; + i = 0; + list_for_each_entry_from(resource, &devlink->resource_list, list) { + err = devlink_resource_put(devlink, skb, resource); + if (err) { + if (!i) + goto err_resource_put; + incomplete = true; + break; + } + i++; + } + nla_nest_end(skb, resources_attr); + genlmsg_end(skb, hdr); + if (incomplete) + goto start_again; +send_done: + nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, flags | NLM_F_MULTI); + if (!nlh) { + err = devlink_dpipe_send_and_alloc_skb(&skb, info); + if (err) + return err; + goto send_done; + } + return genlmsg_reply(skb, info); + +nla_put_failure: + err = -EMSGSIZE; +err_resource_put: + nlmsg_free(skb); + return err; +} + +static int devlink_nl_cmd_resource_dump(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + + if (list_empty(&devlink->resource_list)) + return -EOPNOTSUPP; + + return devlink_resource_fill(info, DEVLINK_CMD_RESOURCE_DUMP, 0); +} + +static int +devlink_resources_validate(struct devlink *devlink, + struct devlink_resource *resource, + struct genl_info *info) +{ + struct list_head *resource_list; + int err = 0; + + if (resource) + resource_list = &resource->resource_list; + else + resource_list = &devlink->resource_list; + + list_for_each_entry(resource, resource_list, list) { + if (!resource->size_valid) + return -EINVAL; + err = devlink_resources_validate(devlink, resource, info); + if (err) + return err; + } + return err; +} + +static struct net *devlink_netns_get(struct sk_buff *skb, + struct genl_info *info) +{ + struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID]; + struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD]; + struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID]; + struct net *net; + + if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) { + NL_SET_ERR_MSG_MOD(info->extack, "multiple netns identifying attributes specified"); + return ERR_PTR(-EINVAL); + } + + if (netns_pid_attr) { + net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr)); + } else if (netns_fd_attr) { + net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr)); + } else if (netns_id_attr) { + net = get_net_ns_by_id(sock_net(skb->sk), + nla_get_u32(netns_id_attr)); + if (!net) + net = ERR_PTR(-EINVAL); + } else { + WARN_ON(1); + net = ERR_PTR(-EINVAL); + } + if (IS_ERR(net)) { + NL_SET_ERR_MSG_MOD(info->extack, "Unknown network namespace"); + return ERR_PTR(-EINVAL); + } + if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) { + put_net(net); + return ERR_PTR(-EPERM); + } + return net; +} + +static void devlink_param_notify(struct devlink *devlink, + unsigned int port_index, + struct devlink_param_item *param_item, + enum devlink_command cmd); + +static void devlink_ns_change_notify(struct devlink *devlink, + struct net *dest_net, struct net *curr_net, + bool new) +{ + struct devlink_param_item *param_item; + enum devlink_command cmd; + + /* Userspace needs to be notified about devlink objects + * removed from original and entering new network namespace. + * The rest of the devlink objects are re-created during + * reload process so the notifications are generated separatelly. + */ + + if (!dest_net || net_eq(dest_net, curr_net)) + return; + + if (new) + devlink_notify(devlink, DEVLINK_CMD_NEW); + + cmd = new ? DEVLINK_CMD_PARAM_NEW : DEVLINK_CMD_PARAM_DEL; + list_for_each_entry(param_item, &devlink->param_list, list) + devlink_param_notify(devlink, 0, param_item, cmd); + + if (!new) + devlink_notify(devlink, DEVLINK_CMD_DEL); +} + +static bool devlink_reload_supported(const struct devlink_ops *ops) +{ + return ops->reload_down && ops->reload_up; +} + +static void devlink_reload_failed_set(struct devlink *devlink, + bool reload_failed) +{ + if (devlink->reload_failed == reload_failed) + return; + devlink->reload_failed = reload_failed; + devlink_notify(devlink, DEVLINK_CMD_NEW); +} + +bool devlink_is_reload_failed(const struct devlink *devlink) +{ + return devlink->reload_failed; +} +EXPORT_SYMBOL_GPL(devlink_is_reload_failed); + +static void +__devlink_reload_stats_update(struct devlink *devlink, u32 *reload_stats, + enum devlink_reload_limit limit, u32 actions_performed) +{ + unsigned long actions = actions_performed; + int stat_idx; + int action; + + for_each_set_bit(action, &actions, __DEVLINK_RELOAD_ACTION_MAX) { + stat_idx = limit * __DEVLINK_RELOAD_ACTION_MAX + action; + reload_stats[stat_idx]++; + } + devlink_notify(devlink, DEVLINK_CMD_NEW); +} + +static void +devlink_reload_stats_update(struct devlink *devlink, enum devlink_reload_limit limit, + u32 actions_performed) +{ + __devlink_reload_stats_update(devlink, devlink->stats.reload_stats, limit, + actions_performed); +} + +/** + * devlink_remote_reload_actions_performed - Update devlink on reload actions + * performed which are not a direct result of devlink reload call. + * + * This should be called by a driver after performing reload actions in case it was not + * a result of devlink reload call. For example fw_activate was performed as a result + * of devlink reload triggered fw_activate on another host. + * The motivation for this function is to keep data on reload actions performed on this + * function whether it was done due to direct devlink reload call or not. + * + * @devlink: devlink + * @limit: reload limit + * @actions_performed: bitmask of actions performed + */ +void devlink_remote_reload_actions_performed(struct devlink *devlink, + enum devlink_reload_limit limit, + u32 actions_performed) +{ + if (WARN_ON(!actions_performed || + actions_performed & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || + actions_performed >= BIT(__DEVLINK_RELOAD_ACTION_MAX) || + limit > DEVLINK_RELOAD_LIMIT_MAX)) + return; + + __devlink_reload_stats_update(devlink, devlink->stats.remote_reload_stats, limit, + actions_performed); +} +EXPORT_SYMBOL_GPL(devlink_remote_reload_actions_performed); + +static int devlink_reload(struct devlink *devlink, struct net *dest_net, + enum devlink_reload_action action, enum devlink_reload_limit limit, + u32 *actions_performed, struct netlink_ext_ack *extack) +{ + u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; + struct net *curr_net; + int err; + + memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, + sizeof(remote_reload_stats)); + + curr_net = devlink_net(devlink); + devlink_ns_change_notify(devlink, dest_net, curr_net, false); + err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack); + if (err) + return err; + + if (dest_net && !net_eq(dest_net, curr_net)) { + move_netdevice_notifier_net(curr_net, dest_net, + &devlink->netdevice_nb); + write_pnet(&devlink->_net, dest_net); + } + + err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack); + devlink_reload_failed_set(devlink, !!err); + if (err) + return err; + + devlink_ns_change_notify(devlink, dest_net, curr_net, true); + WARN_ON(!(*actions_performed & BIT(action))); + /* Catch driver on updating the remote action within devlink reload */ + WARN_ON(memcmp(remote_reload_stats, devlink->stats.remote_reload_stats, + sizeof(remote_reload_stats))); + devlink_reload_stats_update(devlink, limit, *actions_performed); + return 0; +} + +static int +devlink_nl_reload_actions_performed_snd(struct devlink *devlink, u32 actions_performed, + enum devlink_command cmd, struct genl_info *info) +{ + struct sk_buff *msg; + void *hdr; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &devlink_nl_family, 0, cmd); + if (!hdr) + goto free_msg; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (nla_put_bitfield32(msg, DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, actions_performed, + actions_performed)) + goto nla_put_failure; + genlmsg_end(msg, hdr); + + return genlmsg_reply(msg, info); + +nla_put_failure: + genlmsg_cancel(msg, hdr); +free_msg: + nlmsg_free(msg); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + enum devlink_reload_action action; + enum devlink_reload_limit limit; + struct net *dest_net = NULL; + u32 actions_performed; + int err; + + if (!(devlink->features & DEVLINK_F_RELOAD)) + return -EOPNOTSUPP; + + err = devlink_resources_validate(devlink, NULL, info); + if (err) { + NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed"); + return err; + } + + if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION]) + action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]); + else + action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT; + + if (!devlink_reload_action_is_supported(devlink, action)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Requested reload action is not supported by the driver"); + return -EOPNOTSUPP; + } + + limit = DEVLINK_RELOAD_LIMIT_UNSPEC; + if (info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) { + struct nla_bitfield32 limits; + u32 limits_selected; + + limits = nla_get_bitfield32(info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]); + limits_selected = limits.value & limits.selector; + if (!limits_selected) { + NL_SET_ERR_MSG_MOD(info->extack, "Invalid limit selected"); + return -EINVAL; + } + for (limit = 0 ; limit <= DEVLINK_RELOAD_LIMIT_MAX ; limit++) + if (limits_selected & BIT(limit)) + break; + /* UAPI enables multiselection, but currently it is not used */ + if (limits_selected != BIT(limit)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Multiselection of limit is not supported"); + return -EOPNOTSUPP; + } + if (!devlink_reload_limit_is_supported(devlink, limit)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Requested limit is not supported by the driver"); + return -EOPNOTSUPP; + } + if (devlink_reload_combination_is_invalid(action, limit)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Requested limit is invalid for this action"); + return -EINVAL; + } + } + if (info->attrs[DEVLINK_ATTR_NETNS_PID] || + info->attrs[DEVLINK_ATTR_NETNS_FD] || + info->attrs[DEVLINK_ATTR_NETNS_ID]) { + dest_net = devlink_netns_get(skb, info); + if (IS_ERR(dest_net)) + return PTR_ERR(dest_net); + } + + err = devlink_reload(devlink, dest_net, action, limit, &actions_performed, info->extack); + + if (dest_net) + put_net(dest_net); + + if (err) + return err; + /* For backward compatibility generate reply only if attributes used by user */ + if (!info->attrs[DEVLINK_ATTR_RELOAD_ACTION] && !info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) + return 0; + + return devlink_nl_reload_actions_performed_snd(devlink, actions_performed, + DEVLINK_CMD_RELOAD, info); +} + +static int devlink_nl_flash_update_fill(struct sk_buff *msg, + struct devlink *devlink, + enum devlink_command cmd, + struct devlink_flash_notify *params) +{ + void *hdr; + + hdr = genlmsg_put(msg, 0, 0, &devlink_nl_family, 0, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS) + goto out; + + if (params->status_msg && + nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG, + params->status_msg)) + goto nla_put_failure; + if (params->component && + nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_COMPONENT, + params->component)) + goto nla_put_failure; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE, + params->done, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL, + params->total, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, + params->timeout, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + +out: + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void __devlink_flash_update_notify(struct devlink *devlink, + enum devlink_command cmd, + struct devlink_flash_notify *params) +{ + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_FLASH_UPDATE && + cmd != DEVLINK_CMD_FLASH_UPDATE_END && + cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS); + + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_flash_update_fill(msg, devlink, cmd, params); + if (err) + goto out_free_msg; + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); + return; + +out_free_msg: + nlmsg_free(msg); +} + +static void devlink_flash_update_begin_notify(struct devlink *devlink) +{ + struct devlink_flash_notify params = {}; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE, + ¶ms); +} + +static void devlink_flash_update_end_notify(struct devlink *devlink) +{ + struct devlink_flash_notify params = {}; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE_END, + ¶ms); +} + +void devlink_flash_update_status_notify(struct devlink *devlink, + const char *status_msg, + const char *component, + unsigned long done, + unsigned long total) +{ + struct devlink_flash_notify params = { + .status_msg = status_msg, + .component = component, + .done = done, + .total = total, + }; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE_STATUS, + ¶ms); +} +EXPORT_SYMBOL_GPL(devlink_flash_update_status_notify); + +void devlink_flash_update_timeout_notify(struct devlink *devlink, + const char *status_msg, + const char *component, + unsigned long timeout) +{ + struct devlink_flash_notify params = { + .status_msg = status_msg, + .component = component, + .timeout = timeout, + }; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE_STATUS, + ¶ms); +} +EXPORT_SYMBOL_GPL(devlink_flash_update_timeout_notify); + +struct devlink_info_req { + struct sk_buff *msg; + void (*version_cb)(const char *version_name, + enum devlink_info_version_type version_type, + void *version_cb_priv); + void *version_cb_priv; +}; + +struct devlink_flash_component_lookup_ctx { + const char *lookup_name; + bool lookup_name_found; +}; + +static void +devlink_flash_component_lookup_cb(const char *version_name, + enum devlink_info_version_type version_type, + void *version_cb_priv) +{ + struct devlink_flash_component_lookup_ctx *lookup_ctx = version_cb_priv; + + if (version_type != DEVLINK_INFO_VERSION_TYPE_COMPONENT || + lookup_ctx->lookup_name_found) + return; + + lookup_ctx->lookup_name_found = + !strcmp(lookup_ctx->lookup_name, version_name); +} + +static int devlink_flash_component_get(struct devlink *devlink, + struct nlattr *nla_component, + const char **p_component, + struct netlink_ext_ack *extack) +{ + struct devlink_flash_component_lookup_ctx lookup_ctx = {}; + struct devlink_info_req req = {}; + const char *component; + int ret; + + if (!nla_component) + return 0; + + component = nla_data(nla_component); + + if (!devlink->ops->info_get) { + NL_SET_ERR_MSG_ATTR(extack, nla_component, + "component update is not supported by this device"); + return -EOPNOTSUPP; + } + + lookup_ctx.lookup_name = component; + req.version_cb = devlink_flash_component_lookup_cb; + req.version_cb_priv = &lookup_ctx; + + ret = devlink->ops->info_get(devlink, &req, NULL); + if (ret) + return ret; + + if (!lookup_ctx.lookup_name_found) { + NL_SET_ERR_MSG_ATTR(extack, nla_component, + "selected component is not supported by this device"); + return -EINVAL; + } + *p_component = component; + return 0; +} + +static int devlink_nl_cmd_flash_update(struct sk_buff *skb, + struct genl_info *info) +{ + struct nlattr *nla_overwrite_mask, *nla_file_name; + struct devlink_flash_update_params params = {}; + struct devlink *devlink = info->user_ptr[0]; + const char *file_name; + u32 supported_params; + int ret; + + if (!devlink->ops->flash_update) + return -EOPNOTSUPP; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME)) + return -EINVAL; + + ret = devlink_flash_component_get(devlink, + info->attrs[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT], + ¶ms.component, info->extack); + if (ret) + return ret; + + supported_params = devlink->ops->supported_flash_update_params; + + nla_overwrite_mask = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK]; + if (nla_overwrite_mask) { + struct nla_bitfield32 sections; + + if (!(supported_params & DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK)) { + NL_SET_ERR_MSG_ATTR(info->extack, nla_overwrite_mask, + "overwrite settings are not supported by this device"); + return -EOPNOTSUPP; + } + sections = nla_get_bitfield32(nla_overwrite_mask); + params.overwrite_mask = sections.value & sections.selector; + } + + nla_file_name = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME]; + file_name = nla_data(nla_file_name); + ret = request_firmware(¶ms.fw, file_name, devlink->dev); + if (ret) { + NL_SET_ERR_MSG_ATTR(info->extack, nla_file_name, "failed to locate the requested firmware file"); + return ret; + } + + devlink_flash_update_begin_notify(devlink); + ret = devlink->ops->flash_update(devlink, ¶ms, info->extack); + devlink_flash_update_end_notify(devlink); + + release_firmware(params.fw); + + return ret; +} + +static int +devlink_nl_selftests_fill(struct sk_buff *msg, struct devlink *devlink, + u32 portid, u32 seq, int flags, + struct netlink_ext_ack *extack) +{ + struct nlattr *selftests; + void *hdr; + int err; + int i; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, + DEVLINK_CMD_SELFTESTS_GET); + if (!hdr) + return -EMSGSIZE; + + err = -EMSGSIZE; + if (devlink_nl_put_handle(msg, devlink)) + goto err_cancel_msg; + + selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); + if (!selftests) + goto err_cancel_msg; + + for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; + i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { + if (devlink->ops->selftest_check(devlink, i, extack)) { + err = nla_put_flag(msg, i); + if (err) + goto err_cancel_msg; + } + } + + nla_nest_end(msg, selftests); + genlmsg_end(msg, hdr); + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); + return err; +} + +static int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + if (!devlink->ops->selftest_check) + return -EOPNOTSUPP; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_selftests_fill(msg, devlink, info->snd_portid, + info->snd_seq, 0, info->extack); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + if (idx < start || !devlink->ops->selftest_check) + goto inc; + + devl_lock(devlink); + err = devlink_nl_selftests_fill(msg, devlink, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); + devl_unlock(devlink); + if (err) { + devlink_put(devlink); + break; + } +inc: + idx++; + devlink_put(devlink); + } + + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +static int devlink_selftest_result_put(struct sk_buff *skb, unsigned int id, + enum devlink_selftest_status test_status) +{ + struct nlattr *result_attr; + + result_attr = nla_nest_start(skb, DEVLINK_ATTR_SELFTEST_RESULT); + if (!result_attr) + return -EMSGSIZE; + + if (nla_put_u32(skb, DEVLINK_ATTR_SELFTEST_RESULT_ID, id) || + nla_put_u8(skb, DEVLINK_ATTR_SELFTEST_RESULT_STATUS, + test_status)) + goto nla_put_failure; + + nla_nest_end(skb, result_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, result_attr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_selftests_run(struct sk_buff *skb, + struct genl_info *info) +{ + struct nlattr *tb[DEVLINK_ATTR_SELFTEST_ID_MAX + 1]; + struct devlink *devlink = info->user_ptr[0]; + struct nlattr *attrs, *selftests; + struct sk_buff *msg; + void *hdr; + int err; + int i; + + if (!devlink->ops->selftest_run || !devlink->ops->selftest_check) + return -EOPNOTSUPP; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SELFTESTS)) + return -EINVAL; + + attrs = info->attrs[DEVLINK_ATTR_SELFTESTS]; + + err = nla_parse_nested(tb, DEVLINK_ATTR_SELFTEST_ID_MAX, attrs, + devlink_selftest_nl_policy, info->extack); + if (err < 0) + return err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = -EMSGSIZE; + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, + &devlink_nl_family, 0, DEVLINK_CMD_SELFTESTS_RUN); + if (!hdr) + goto free_msg; + + if (devlink_nl_put_handle(msg, devlink)) + goto genlmsg_cancel; + + selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); + if (!selftests) + goto genlmsg_cancel; + + for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; + i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { + enum devlink_selftest_status test_status; + + if (nla_get_flag(tb[i])) { + if (!devlink->ops->selftest_check(devlink, i, + info->extack)) { + if (devlink_selftest_result_put(msg, i, + DEVLINK_SELFTEST_STATUS_SKIP)) + goto selftests_nest_cancel; + continue; + } + + test_status = devlink->ops->selftest_run(devlink, i, + info->extack); + if (devlink_selftest_result_put(msg, i, test_status)) + goto selftests_nest_cancel; + } + } + + nla_nest_end(msg, selftests); + genlmsg_end(msg, hdr); + return genlmsg_reply(msg, info); + +selftests_nest_cancel: + nla_nest_cancel(msg, selftests); +genlmsg_cancel: + genlmsg_cancel(msg, hdr); +free_msg: + nlmsg_free(msg); + return err; +} + +static const struct devlink_param devlink_param_generic[] = { + { + .id = DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, + .name = DEVLINK_PARAM_GENERIC_INT_ERR_RESET_NAME, + .type = DEVLINK_PARAM_GENERIC_INT_ERR_RESET_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + .name = DEVLINK_PARAM_GENERIC_MAX_MACS_NAME, + .type = DEVLINK_PARAM_GENERIC_MAX_MACS_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV, + .name = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_SRIOV_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, + .name = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME, + .type = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI, + .name = DEVLINK_PARAM_GENERIC_IGNORE_ARI_NAME, + .type = DEVLINK_PARAM_GENERIC_IGNORE_ARI_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX, + .name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_NAME, + .type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN, + .name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_NAME, + .type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, + .name = DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_NAME, + .type = DEVLINK_PARAM_GENERIC_FW_LOAD_POLICY_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE, + .name = DEVLINK_PARAM_GENERIC_RESET_DEV_ON_DRV_PROBE_NAME, + .type = DEVLINK_PARAM_GENERIC_RESET_DEV_ON_DRV_PROBE_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, + .name = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET, + .name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, + .name = DEVLINK_PARAM_GENERIC_ENABLE_ETH_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_ETH_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA, + .name = DEVLINK_PARAM_GENERIC_ENABLE_RDMA_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_RDMA_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET, + .name = DEVLINK_PARAM_GENERIC_ENABLE_VNET_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_VNET_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_IWARP, + .name = DEVLINK_PARAM_GENERIC_ENABLE_IWARP_NAME, + .type = DEVLINK_PARAM_GENERIC_ENABLE_IWARP_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE, + .name = DEVLINK_PARAM_GENERIC_IO_EQ_SIZE_NAME, + .type = DEVLINK_PARAM_GENERIC_IO_EQ_SIZE_TYPE, + }, + { + .id = DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE, + .name = DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_NAME, + .type = DEVLINK_PARAM_GENERIC_EVENT_EQ_SIZE_TYPE, + }, +}; + +static int devlink_param_generic_verify(const struct devlink_param *param) +{ + /* verify it match generic parameter by id and name */ + if (param->id > DEVLINK_PARAM_GENERIC_ID_MAX) + return -EINVAL; + if (strcmp(param->name, devlink_param_generic[param->id].name)) + return -ENOENT; + + WARN_ON(param->type != devlink_param_generic[param->id].type); + + return 0; +} + +static int devlink_param_driver_verify(const struct devlink_param *param) +{ + int i; + + if (param->id <= DEVLINK_PARAM_GENERIC_ID_MAX) + return -EINVAL; + /* verify no such name in generic params */ + for (i = 0; i <= DEVLINK_PARAM_GENERIC_ID_MAX; i++) + if (!strcmp(param->name, devlink_param_generic[i].name)) + return -EEXIST; + + return 0; +} + +static struct devlink_param_item * +devlink_param_find_by_name(struct list_head *param_list, + const char *param_name) +{ + struct devlink_param_item *param_item; + + list_for_each_entry(param_item, param_list, list) + if (!strcmp(param_item->param->name, param_name)) + return param_item; + return NULL; +} + +static struct devlink_param_item * +devlink_param_find_by_id(struct list_head *param_list, u32 param_id) +{ + struct devlink_param_item *param_item; + + list_for_each_entry(param_item, param_list, list) + if (param_item->param->id == param_id) + return param_item; + return NULL; +} + +static bool +devlink_param_cmode_is_supported(const struct devlink_param *param, + enum devlink_param_cmode cmode) +{ + return test_bit(cmode, ¶m->supported_cmodes); +} + +static int devlink_param_get(struct devlink *devlink, + const struct devlink_param *param, + struct devlink_param_gset_ctx *ctx) +{ + if (!param->get || devlink->reload_failed) + return -EOPNOTSUPP; + return param->get(devlink, param->id, ctx); +} + +static int devlink_param_set(struct devlink *devlink, + const struct devlink_param *param, + struct devlink_param_gset_ctx *ctx) +{ + if (!param->set || devlink->reload_failed) + return -EOPNOTSUPP; + return param->set(devlink, param->id, ctx); +} + +static int +devlink_param_type_to_nla_type(enum devlink_param_type param_type) +{ + switch (param_type) { + case DEVLINK_PARAM_TYPE_U8: + return NLA_U8; + case DEVLINK_PARAM_TYPE_U16: + return NLA_U16; + case DEVLINK_PARAM_TYPE_U32: + return NLA_U32; + case DEVLINK_PARAM_TYPE_STRING: + return NLA_STRING; + case DEVLINK_PARAM_TYPE_BOOL: + return NLA_FLAG; + default: + return -EINVAL; + } +} + +static int +devlink_nl_param_value_fill_one(struct sk_buff *msg, + enum devlink_param_type type, + enum devlink_param_cmode cmode, + union devlink_param_value val) +{ + struct nlattr *param_value_attr; + + param_value_attr = nla_nest_start_noflag(msg, + DEVLINK_ATTR_PARAM_VALUE); + if (!param_value_attr) + goto nla_put_failure; + + if (nla_put_u8(msg, DEVLINK_ATTR_PARAM_VALUE_CMODE, cmode)) + goto value_nest_cancel; + + switch (type) { + case DEVLINK_PARAM_TYPE_U8: + if (nla_put_u8(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, val.vu8)) + goto value_nest_cancel; + break; + case DEVLINK_PARAM_TYPE_U16: + if (nla_put_u16(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, val.vu16)) + goto value_nest_cancel; + break; + case DEVLINK_PARAM_TYPE_U32: + if (nla_put_u32(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, val.vu32)) + goto value_nest_cancel; + break; + case DEVLINK_PARAM_TYPE_STRING: + if (nla_put_string(msg, DEVLINK_ATTR_PARAM_VALUE_DATA, + val.vstr)) + goto value_nest_cancel; + break; + case DEVLINK_PARAM_TYPE_BOOL: + if (val.vbool && + nla_put_flag(msg, DEVLINK_ATTR_PARAM_VALUE_DATA)) + goto value_nest_cancel; + break; + } + + nla_nest_end(msg, param_value_attr); + return 0; + +value_nest_cancel: + nla_nest_cancel(msg, param_value_attr); +nla_put_failure: + return -EMSGSIZE; +} + +static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink, + unsigned int port_index, + struct devlink_param_item *param_item, + enum devlink_command cmd, + u32 portid, u32 seq, int flags) +{ + union devlink_param_value param_value[DEVLINK_PARAM_CMODE_MAX + 1]; + bool param_value_set[DEVLINK_PARAM_CMODE_MAX + 1] = {}; + const struct devlink_param *param = param_item->param; + struct devlink_param_gset_ctx ctx; + struct nlattr *param_values_list; + struct nlattr *param_attr; + int nla_type; + void *hdr; + int err; + int i; + + /* Get value from driver part to driverinit configuration mode */ + for (i = 0; i <= DEVLINK_PARAM_CMODE_MAX; i++) { + if (!devlink_param_cmode_is_supported(param, i)) + continue; + if (i == DEVLINK_PARAM_CMODE_DRIVERINIT) { + if (!param_item->driverinit_value_valid) + return -EOPNOTSUPP; + param_value[i] = param_item->driverinit_value; + } else { + ctx.cmode = i; + err = devlink_param_get(devlink, param, &ctx); + if (err) + return err; + param_value[i] = ctx.val; + } + param_value_set[i] = true; + } + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto genlmsg_cancel; + + if (cmd == DEVLINK_CMD_PORT_PARAM_GET || + cmd == DEVLINK_CMD_PORT_PARAM_NEW || + cmd == DEVLINK_CMD_PORT_PARAM_DEL) + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, port_index)) + goto genlmsg_cancel; + + param_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PARAM); + if (!param_attr) + goto genlmsg_cancel; + if (nla_put_string(msg, DEVLINK_ATTR_PARAM_NAME, param->name)) + goto param_nest_cancel; + if (param->generic && nla_put_flag(msg, DEVLINK_ATTR_PARAM_GENERIC)) + goto param_nest_cancel; + + nla_type = devlink_param_type_to_nla_type(param->type); + if (nla_type < 0) + goto param_nest_cancel; + if (nla_put_u8(msg, DEVLINK_ATTR_PARAM_TYPE, nla_type)) + goto param_nest_cancel; + + param_values_list = nla_nest_start_noflag(msg, + DEVLINK_ATTR_PARAM_VALUES_LIST); + if (!param_values_list) + goto param_nest_cancel; + + for (i = 0; i <= DEVLINK_PARAM_CMODE_MAX; i++) { + if (!param_value_set[i]) + continue; + err = devlink_nl_param_value_fill_one(msg, param->type, + i, param_value[i]); + if (err) + goto values_list_nest_cancel; + } + + nla_nest_end(msg, param_values_list); + nla_nest_end(msg, param_attr); + genlmsg_end(msg, hdr); + return 0; + +values_list_nest_cancel: + nla_nest_end(msg, param_values_list); +param_nest_cancel: + nla_nest_cancel(msg, param_attr); +genlmsg_cancel: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void devlink_param_notify(struct devlink *devlink, + unsigned int port_index, + struct devlink_param_item *param_item, + enum devlink_command cmd) +{ + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_PARAM_NEW && cmd != DEVLINK_CMD_PARAM_DEL && + cmd != DEVLINK_CMD_PORT_PARAM_NEW && + cmd != DEVLINK_CMD_PORT_PARAM_DEL); + ASSERT_DEVLINK_REGISTERED(devlink); + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + err = devlink_nl_param_fill(msg, devlink, port_index, param_item, cmd, + 0, 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink_param_item *param_item; + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + list_for_each_entry(param_item, &devlink->param_list, list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_param_fill(msg, devlink, 0, param_item, + DEVLINK_CMD_PARAM_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +static int +devlink_param_type_get_from_info(struct genl_info *info, + enum devlink_param_type *param_type) +{ + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PARAM_TYPE)) + return -EINVAL; + + switch (nla_get_u8(info->attrs[DEVLINK_ATTR_PARAM_TYPE])) { + case NLA_U8: + *param_type = DEVLINK_PARAM_TYPE_U8; + break; + case NLA_U16: + *param_type = DEVLINK_PARAM_TYPE_U16; + break; + case NLA_U32: + *param_type = DEVLINK_PARAM_TYPE_U32; + break; + case NLA_STRING: + *param_type = DEVLINK_PARAM_TYPE_STRING; + break; + case NLA_FLAG: + *param_type = DEVLINK_PARAM_TYPE_BOOL; + break; + default: + return -EINVAL; + } + + return 0; +} + +static int +devlink_param_value_get_from_info(const struct devlink_param *param, + struct genl_info *info, + union devlink_param_value *value) +{ + struct nlattr *param_data; + int len; + + param_data = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA]; + + if (param->type != DEVLINK_PARAM_TYPE_BOOL && !param_data) + return -EINVAL; + + switch (param->type) { + case DEVLINK_PARAM_TYPE_U8: + if (nla_len(param_data) != sizeof(u8)) + return -EINVAL; + value->vu8 = nla_get_u8(param_data); + break; + case DEVLINK_PARAM_TYPE_U16: + if (nla_len(param_data) != sizeof(u16)) + return -EINVAL; + value->vu16 = nla_get_u16(param_data); + break; + case DEVLINK_PARAM_TYPE_U32: + if (nla_len(param_data) != sizeof(u32)) + return -EINVAL; + value->vu32 = nla_get_u32(param_data); + break; + case DEVLINK_PARAM_TYPE_STRING: + len = strnlen(nla_data(param_data), nla_len(param_data)); + if (len == nla_len(param_data) || + len >= __DEVLINK_PARAM_MAX_STRING_VALUE) + return -EINVAL; + strcpy(value->vstr, nla_data(param_data)); + break; + case DEVLINK_PARAM_TYPE_BOOL: + if (param_data && nla_len(param_data)) + return -EINVAL; + value->vbool = nla_get_flag(param_data); + break; + } + return 0; +} + +static struct devlink_param_item * +devlink_param_get_from_info(struct list_head *param_list, + struct genl_info *info) +{ + char *param_name; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PARAM_NAME)) + return NULL; + + param_name = nla_data(info->attrs[DEVLINK_ATTR_PARAM_NAME]); + return devlink_param_find_by_name(param_list, param_name); +} + +static int devlink_nl_cmd_param_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_param_item *param_item; + struct sk_buff *msg; + int err; + + param_item = devlink_param_get_from_info(&devlink->param_list, info); + if (!param_item) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_param_fill(msg, devlink, 0, param_item, + DEVLINK_CMD_PARAM_GET, + info->snd_portid, info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink, + unsigned int port_index, + struct list_head *param_list, + struct genl_info *info, + enum devlink_command cmd) +{ + enum devlink_param_type param_type; + struct devlink_param_gset_ctx ctx; + enum devlink_param_cmode cmode; + struct devlink_param_item *param_item; + const struct devlink_param *param; + union devlink_param_value value; + int err = 0; + + param_item = devlink_param_get_from_info(param_list, info); + if (!param_item) + return -EINVAL; + param = param_item->param; + err = devlink_param_type_get_from_info(info, ¶m_type); + if (err) + return err; + if (param_type != param->type) + return -EINVAL; + err = devlink_param_value_get_from_info(param, info, &value); + if (err) + return err; + if (param->validate) { + err = param->validate(devlink, param->id, value, info->extack); + if (err) + return err; + } + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PARAM_VALUE_CMODE)) + return -EINVAL; + cmode = nla_get_u8(info->attrs[DEVLINK_ATTR_PARAM_VALUE_CMODE]); + if (!devlink_param_cmode_is_supported(param, cmode)) + return -EOPNOTSUPP; + + if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) { + if (param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(param_item->driverinit_value.vstr, value.vstr); + else + param_item->driverinit_value = value; + param_item->driverinit_value_valid = true; + } else { + if (!param->set) + return -EOPNOTSUPP; + ctx.val = value; + ctx.cmode = cmode; + err = devlink_param_set(devlink, param, &ctx); + if (err) + return err; + } + + devlink_param_notify(devlink, port_index, param_item, cmd); + return 0; +} + +static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + + return __devlink_nl_cmd_param_set_doit(devlink, 0, &devlink->param_list, + info, DEVLINK_CMD_PARAM_NEW); +} + +static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + NL_SET_ERR_MSG_MOD(cb->extack, "Port params are not supported"); + return msg->len; +} + +static int devlink_nl_cmd_port_param_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + NL_SET_ERR_MSG_MOD(info->extack, "Port params are not supported"); + return -EINVAL; +} + +static int devlink_nl_cmd_port_param_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + NL_SET_ERR_MSG_MOD(info->extack, "Port params are not supported"); + return -EINVAL; +} + +static int devlink_nl_region_snapshot_id_put(struct sk_buff *msg, + struct devlink *devlink, + struct devlink_snapshot *snapshot) +{ + struct nlattr *snap_attr; + int err; + + snap_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_REGION_SNAPSHOT); + if (!snap_attr) + return -EINVAL; + + err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, snapshot->id); + if (err) + goto nla_put_failure; + + nla_nest_end(msg, snap_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(msg, snap_attr); + return err; +} + +static int devlink_nl_region_snapshots_id_put(struct sk_buff *msg, + struct devlink *devlink, + struct devlink_region *region) +{ + struct devlink_snapshot *snapshot; + struct nlattr *snapshots_attr; + int err; + + snapshots_attr = nla_nest_start_noflag(msg, + DEVLINK_ATTR_REGION_SNAPSHOTS); + if (!snapshots_attr) + return -EINVAL; + + list_for_each_entry(snapshot, ®ion->snapshot_list, list) { + err = devlink_nl_region_snapshot_id_put(msg, devlink, snapshot); + if (err) + goto nla_put_failure; + } + + nla_nest_end(msg, snapshots_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(msg, snapshots_attr); + return err; +} + +static int devlink_nl_region_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags, + struct devlink_region *region) +{ + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + err = devlink_nl_put_handle(msg, devlink); + if (err) + goto nla_put_failure; + + if (region->port) { + err = nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, + region->port->index); + if (err) + goto nla_put_failure; + } + + err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, region->ops->name); + if (err) + goto nla_put_failure; + + err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE, + region->size, + DEVLINK_ATTR_PAD); + if (err) + goto nla_put_failure; + + err = nla_put_u32(msg, DEVLINK_ATTR_REGION_MAX_SNAPSHOTS, + region->max_snapshots); + if (err) + goto nla_put_failure; + + err = devlink_nl_region_snapshots_id_put(msg, devlink, region); + if (err) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return err; +} + +static struct sk_buff * +devlink_nl_region_notify_build(struct devlink_region *region, + struct devlink_snapshot *snapshot, + enum devlink_command cmd, u32 portid, u32 seq) +{ + struct devlink *devlink = region->devlink; + struct sk_buff *msg; + void *hdr; + int err; + + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return ERR_PTR(-ENOMEM); + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, 0, cmd); + if (!hdr) { + err = -EMSGSIZE; + goto out_free_msg; + } + + err = devlink_nl_put_handle(msg, devlink); + if (err) + goto out_cancel_msg; + + if (region->port) { + err = nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, + region->port->index); + if (err) + goto out_cancel_msg; + } + + err = nla_put_string(msg, DEVLINK_ATTR_REGION_NAME, + region->ops->name); + if (err) + goto out_cancel_msg; + + if (snapshot) { + err = nla_put_u32(msg, DEVLINK_ATTR_REGION_SNAPSHOT_ID, + snapshot->id); + if (err) + goto out_cancel_msg; + } else { + err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_SIZE, + region->size, DEVLINK_ATTR_PAD); + if (err) + goto out_cancel_msg; + } + genlmsg_end(msg, hdr); + + return msg; + +out_cancel_msg: + genlmsg_cancel(msg, hdr); +out_free_msg: + nlmsg_free(msg); + return ERR_PTR(err); +} + +static void devlink_nl_region_notify(struct devlink_region *region, + struct devlink_snapshot *snapshot, + enum devlink_command cmd) +{ + struct devlink *devlink = region->devlink; + struct sk_buff *msg; + + WARN_ON(cmd != DEVLINK_CMD_REGION_NEW && cmd != DEVLINK_CMD_REGION_DEL); + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = devlink_nl_region_notify_build(region, snapshot, cmd, 0, 0); + if (IS_ERR(msg)) + return; + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, + 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +/** + * __devlink_snapshot_id_increment - Increment number of snapshots using an id + * @devlink: devlink instance + * @id: the snapshot id + * + * Track when a new snapshot begins using an id. Load the count for the + * given id from the snapshot xarray, increment it, and store it back. + * + * Called when a new snapshot is created with the given id. + * + * The id *must* have been previously allocated by + * devlink_region_snapshot_id_get(). + * + * Returns 0 on success, or an error on failure. + */ +static int __devlink_snapshot_id_increment(struct devlink *devlink, u32 id) +{ + unsigned long count; + void *p; + int err; + + xa_lock(&devlink->snapshot_ids); + p = xa_load(&devlink->snapshot_ids, id); + if (WARN_ON(!p)) { + err = -EINVAL; + goto unlock; + } + + if (WARN_ON(!xa_is_value(p))) { + err = -EINVAL; + goto unlock; + } + + count = xa_to_value(p); + count++; + + err = xa_err(__xa_store(&devlink->snapshot_ids, id, xa_mk_value(count), + GFP_ATOMIC)); +unlock: + xa_unlock(&devlink->snapshot_ids); + return err; +} + +/** + * __devlink_snapshot_id_decrement - Decrease number of snapshots using an id + * @devlink: devlink instance + * @id: the snapshot id + * + * Track when a snapshot is deleted and stops using an id. Load the count + * for the given id from the snapshot xarray, decrement it, and store it + * back. + * + * If the count reaches zero, erase this id from the xarray, freeing it + * up for future re-use by devlink_region_snapshot_id_get(). + * + * Called when a snapshot using the given id is deleted, and when the + * initial allocator of the id is finished using it. + */ +static void __devlink_snapshot_id_decrement(struct devlink *devlink, u32 id) +{ + unsigned long count; + void *p; + + xa_lock(&devlink->snapshot_ids); + p = xa_load(&devlink->snapshot_ids, id); + if (WARN_ON(!p)) + goto unlock; + + if (WARN_ON(!xa_is_value(p))) + goto unlock; + + count = xa_to_value(p); + + if (count > 1) { + count--; + __xa_store(&devlink->snapshot_ids, id, xa_mk_value(count), + GFP_ATOMIC); + } else { + /* If this was the last user, we can erase this id */ + __xa_erase(&devlink->snapshot_ids, id); + } +unlock: + xa_unlock(&devlink->snapshot_ids); +} + +/** + * __devlink_snapshot_id_insert - Insert a specific snapshot ID + * @devlink: devlink instance + * @id: the snapshot id + * + * Mark the given snapshot id as used by inserting a zero value into the + * snapshot xarray. + * + * This must be called while holding the devlink instance lock. Unlike + * devlink_snapshot_id_get, the initial reference count is zero, not one. + * It is expected that the id will immediately be used before + * releasing the devlink instance lock. + * + * Returns zero on success, or an error code if the snapshot id could not + * be inserted. + */ +static int __devlink_snapshot_id_insert(struct devlink *devlink, u32 id) +{ + int err; + + xa_lock(&devlink->snapshot_ids); + if (xa_load(&devlink->snapshot_ids, id)) { + xa_unlock(&devlink->snapshot_ids); + return -EEXIST; + } + err = xa_err(__xa_store(&devlink->snapshot_ids, id, xa_mk_value(0), + GFP_ATOMIC)); + xa_unlock(&devlink->snapshot_ids); + return err; +} + +/** + * __devlink_region_snapshot_id_get - get snapshot ID + * @devlink: devlink instance + * @id: storage to return snapshot id + * + * Allocates a new snapshot id. Returns zero on success, or a negative + * error on failure. Must be called while holding the devlink instance + * lock. + * + * Snapshot IDs are tracked using an xarray which stores the number of + * users of the snapshot id. + * + * Note that the caller of this function counts as a 'user', in order to + * avoid race conditions. The caller must release its hold on the + * snapshot by using devlink_region_snapshot_id_put. + */ +static int __devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id) +{ + return xa_alloc(&devlink->snapshot_ids, id, xa_mk_value(1), + xa_limit_32b, GFP_KERNEL); +} + +/** + * __devlink_region_snapshot_create - create a new snapshot + * This will add a new snapshot of a region. The snapshot + * will be stored on the region struct and can be accessed + * from devlink. This is useful for future analyses of snapshots. + * Multiple snapshots can be created on a region. + * The @snapshot_id should be obtained using the getter function. + * + * Must be called only while holding the region snapshot lock. + * + * @region: devlink region of the snapshot + * @data: snapshot data + * @snapshot_id: snapshot id to be created + */ +static int +__devlink_region_snapshot_create(struct devlink_region *region, + u8 *data, u32 snapshot_id) +{ + struct devlink *devlink = region->devlink; + struct devlink_snapshot *snapshot; + int err; + + lockdep_assert_held(®ion->snapshot_lock); + + /* check if region can hold one more snapshot */ + if (region->cur_snapshots == region->max_snapshots) + return -ENOSPC; + + if (devlink_region_snapshot_get_by_id(region, snapshot_id)) + return -EEXIST; + + snapshot = kzalloc(sizeof(*snapshot), GFP_KERNEL); + if (!snapshot) + return -ENOMEM; + + err = __devlink_snapshot_id_increment(devlink, snapshot_id); + if (err) + goto err_snapshot_id_increment; + + snapshot->id = snapshot_id; + snapshot->region = region; + snapshot->data = data; + + list_add_tail(&snapshot->list, ®ion->snapshot_list); + + region->cur_snapshots++; + + devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_NEW); + return 0; + +err_snapshot_id_increment: + kfree(snapshot); + return err; +} + +static void devlink_region_snapshot_del(struct devlink_region *region, + struct devlink_snapshot *snapshot) +{ + struct devlink *devlink = region->devlink; + + lockdep_assert_held(®ion->snapshot_lock); + + devlink_nl_region_notify(region, snapshot, DEVLINK_CMD_REGION_DEL); + region->cur_snapshots--; + list_del(&snapshot->list); + region->ops->destructor(snapshot->data); + __devlink_snapshot_id_decrement(devlink, snapshot->id); + kfree(snapshot); +} + +static int devlink_nl_cmd_region_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_port *port = NULL; + struct devlink_region *region; + const char *region_name; + struct sk_buff *msg; + unsigned int index; + int err; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME)) + return -EINVAL; + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + port = devlink_port_get_by_index(devlink, index); + if (!port) + return -ENODEV; + } + + region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]); + if (port) + region = devlink_port_region_get_by_name(port, region_name); + else + region = devlink_region_get_by_name(devlink, region_name); + + if (!region) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_region_fill(msg, devlink, DEVLINK_CMD_REGION_GET, + info->snd_portid, info->snd_seq, 0, + region); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_region_get_port_dumpit(struct sk_buff *msg, + struct netlink_callback *cb, + struct devlink_port *port, + int *idx, + int start) +{ + struct devlink_region *region; + int err = 0; + + list_for_each_entry(region, &port->region_list, list) { + if (*idx < start) { + (*idx)++; + continue; + } + err = devlink_nl_region_fill(msg, port->devlink, + DEVLINK_CMD_REGION_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, region); + if (err) + goto out; + (*idx)++; + } + +out: + return err; +} + +static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, + struct netlink_callback *cb, + struct devlink *devlink, + int *idx, + int start) +{ + struct devlink_region *region; + struct devlink_port *port; + unsigned long port_index; + int err = 0; + + devl_lock(devlink); + list_for_each_entry(region, &devlink->region_list, list) { + if (*idx < start) { + (*idx)++; + continue; + } + err = devlink_nl_region_fill(msg, devlink, + DEVLINK_CMD_REGION_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, region); + if (err) + goto out; + (*idx)++; + } + + xa_for_each(&devlink->ports, port_index, port) { + err = devlink_nl_cmd_region_get_port_dumpit(msg, cb, port, idx, + start); + if (err) + goto out; + } + +out: + devl_unlock(devlink); + return err; +} + +static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, + &idx, start); + devlink_put(devlink); + if (err) + goto out; + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int devlink_nl_cmd_region_del(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_snapshot *snapshot; + struct devlink_port *port = NULL; + struct devlink_region *region; + const char *region_name; + unsigned int index; + u32 snapshot_id; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME) || + GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_SNAPSHOT_ID)) + return -EINVAL; + + region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]); + snapshot_id = nla_get_u32(info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]); + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + port = devlink_port_get_by_index(devlink, index); + if (!port) + return -ENODEV; + } + + if (port) + region = devlink_port_region_get_by_name(port, region_name); + else + region = devlink_region_get_by_name(devlink, region_name); + + if (!region) + return -EINVAL; + + mutex_lock(®ion->snapshot_lock); + snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); + if (!snapshot) { + mutex_unlock(®ion->snapshot_lock); + return -EINVAL; + } + + devlink_region_snapshot_del(region, snapshot); + mutex_unlock(®ion->snapshot_lock); + return 0; +} + +static int +devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_snapshot *snapshot; + struct devlink_port *port = NULL; + struct nlattr *snapshot_id_attr; + struct devlink_region *region; + const char *region_name; + unsigned int index; + u32 snapshot_id; + u8 *data; + int err; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME)) { + NL_SET_ERR_MSG_MOD(info->extack, "No region name provided"); + return -EINVAL; + } + + region_name = nla_data(info->attrs[DEVLINK_ATTR_REGION_NAME]); + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + port = devlink_port_get_by_index(devlink, index); + if (!port) + return -ENODEV; + } + + if (port) + region = devlink_port_region_get_by_name(port, region_name); + else + region = devlink_region_get_by_name(devlink, region_name); + + if (!region) { + NL_SET_ERR_MSG_MOD(info->extack, "The requested region does not exist"); + return -EINVAL; + } + + if (!region->ops->snapshot) { + NL_SET_ERR_MSG_MOD(info->extack, "The requested region does not support taking an immediate snapshot"); + return -EOPNOTSUPP; + } + + mutex_lock(®ion->snapshot_lock); + + if (region->cur_snapshots == region->max_snapshots) { + NL_SET_ERR_MSG_MOD(info->extack, "The region has reached the maximum number of stored snapshots"); + err = -ENOSPC; + goto unlock; + } + + snapshot_id_attr = info->attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; + if (snapshot_id_attr) { + snapshot_id = nla_get_u32(snapshot_id_attr); + + if (devlink_region_snapshot_get_by_id(region, snapshot_id)) { + NL_SET_ERR_MSG_MOD(info->extack, "The requested snapshot id is already in use"); + err = -EEXIST; + goto unlock; + } + + err = __devlink_snapshot_id_insert(devlink, snapshot_id); + if (err) + goto unlock; + } else { + err = __devlink_region_snapshot_id_get(devlink, &snapshot_id); + if (err) { + NL_SET_ERR_MSG_MOD(info->extack, "Failed to allocate a new snapshot id"); + goto unlock; + } + } + + if (port) + err = region->port_ops->snapshot(port, region->port_ops, + info->extack, &data); + else + err = region->ops->snapshot(devlink, region->ops, + info->extack, &data); + if (err) + goto err_snapshot_capture; + + err = __devlink_region_snapshot_create(region, data, snapshot_id); + if (err) + goto err_snapshot_create; + + if (!snapshot_id_attr) { + struct sk_buff *msg; + + snapshot = devlink_region_snapshot_get_by_id(region, + snapshot_id); + if (WARN_ON(!snapshot)) { + err = -EINVAL; + goto unlock; + } + + msg = devlink_nl_region_notify_build(region, snapshot, + DEVLINK_CMD_REGION_NEW, + info->snd_portid, + info->snd_seq); + err = PTR_ERR_OR_ZERO(msg); + if (err) + goto err_notify; + + err = genlmsg_reply(msg, info); + if (err) + goto err_notify; + } + + mutex_unlock(®ion->snapshot_lock); + return 0; + +err_snapshot_create: + region->ops->destructor(data); +err_snapshot_capture: + __devlink_snapshot_id_decrement(devlink, snapshot_id); + mutex_unlock(®ion->snapshot_lock); + return err; + +err_notify: + devlink_region_snapshot_del(region, snapshot); +unlock: + mutex_unlock(®ion->snapshot_lock); + return err; +} + +static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg, + u8 *chunk, u32 chunk_size, + u64 addr) +{ + struct nlattr *chunk_attr; + int err; + + chunk_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_REGION_CHUNK); + if (!chunk_attr) + return -EINVAL; + + err = nla_put(msg, DEVLINK_ATTR_REGION_CHUNK_DATA, chunk_size, chunk); + if (err) + goto nla_put_failure; + + err = nla_put_u64_64bit(msg, DEVLINK_ATTR_REGION_CHUNK_ADDR, addr, + DEVLINK_ATTR_PAD); + if (err) + goto nla_put_failure; + + nla_nest_end(msg, chunk_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(msg, chunk_attr); + return err; +} + +#define DEVLINK_REGION_READ_CHUNK_SIZE 256 + +typedef int devlink_chunk_fill_t(void *cb_priv, u8 *chunk, u32 chunk_size, + u64 curr_offset, + struct netlink_ext_ack *extack); + +static int +devlink_nl_region_read_fill(struct sk_buff *skb, devlink_chunk_fill_t *cb, + void *cb_priv, u64 start_offset, u64 end_offset, + u64 *new_offset, struct netlink_ext_ack *extack) +{ + u64 curr_offset = start_offset; + int err = 0; + u8 *data; + + /* Allocate and re-use a single buffer */ + data = kmalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL); + if (!data) + return -ENOMEM; + + *new_offset = start_offset; + + while (curr_offset < end_offset) { + u32 data_size; + + data_size = min_t(u32, end_offset - curr_offset, + DEVLINK_REGION_READ_CHUNK_SIZE); + + err = cb(cb_priv, data, data_size, curr_offset, extack); + if (err) + break; + + err = devlink_nl_cmd_region_read_chunk_fill(skb, data, data_size, curr_offset); + if (err) + break; + + curr_offset += data_size; + } + *new_offset = curr_offset; + + kfree(data); + + return err; +} + +static int +devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size, + u64 curr_offset, + struct netlink_ext_ack __always_unused *extack) +{ + struct devlink_snapshot *snapshot = cb_priv; + + memcpy(chunk, &snapshot->data[curr_offset], chunk_size); + + return 0; +} + +static int +devlink_region_port_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, + u64 curr_offset, struct netlink_ext_ack *extack) +{ + struct devlink_region *region = cb_priv; + + return region->port_ops->read(region->port, region->port_ops, extack, + curr_offset, chunk_size, chunk); +} + +static int +devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, + u64 curr_offset, struct netlink_ext_ack *extack) +{ + struct devlink_region *region = cb_priv; + + return region->ops->read(region->devlink, region->ops, extack, + curr_offset, chunk_size, chunk); +} + +static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct genl_dumpit_info *info = genl_dumpit_info(cb); + struct nlattr *chunks_attr, *region_attr, *snapshot_attr; + u64 ret_offset, start_offset, end_offset = U64_MAX; + struct nlattr **attrs = info->attrs; + struct devlink_port *port = NULL; + devlink_chunk_fill_t *region_cb; + struct devlink_region *region; + const char *region_name; + struct devlink *devlink; + unsigned int index; + void *region_cb_priv; + void *hdr; + int err; + + start_offset = *((u64 *)&cb->args[0]); + + devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); + if (IS_ERR(devlink)) + return PTR_ERR(devlink); + + devl_lock(devlink); + + if (!attrs[DEVLINK_ATTR_REGION_NAME]) { + NL_SET_ERR_MSG(cb->extack, "No region name provided"); + err = -EINVAL; + goto out_unlock; + } + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + port = devlink_port_get_by_index(devlink, index); + if (!port) { + err = -ENODEV; + goto out_unlock; + } + } + + region_attr = attrs[DEVLINK_ATTR_REGION_NAME]; + region_name = nla_data(region_attr); + + if (port) + region = devlink_port_region_get_by_name(port, region_name); + else + region = devlink_region_get_by_name(devlink, region_name); + + if (!region) { + NL_SET_ERR_MSG_ATTR(cb->extack, region_attr, "Requested region does not exist"); + err = -EINVAL; + goto out_unlock; + } + + snapshot_attr = attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; + if (!snapshot_attr) { + if (!nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) { + NL_SET_ERR_MSG(cb->extack, "No snapshot id provided"); + err = -EINVAL; + goto out_unlock; + } + + if (!region->ops->read) { + NL_SET_ERR_MSG(cb->extack, "Requested region does not support direct read"); + err = -EOPNOTSUPP; + goto out_unlock; + } + + if (port) + region_cb = &devlink_region_port_direct_fill; + else + region_cb = &devlink_region_direct_fill; + region_cb_priv = region; + } else { + struct devlink_snapshot *snapshot; + u32 snapshot_id; + + if (nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) { + NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Direct region read does not use snapshot"); + err = -EINVAL; + goto out_unlock; + } + + snapshot_id = nla_get_u32(snapshot_attr); + snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); + if (!snapshot) { + NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Requested snapshot does not exist"); + err = -EINVAL; + goto out_unlock; + } + region_cb = &devlink_region_snapshot_fill; + region_cb_priv = snapshot; + } + + if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] && + attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]) { + if (!start_offset) + start_offset = + nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR]); + + end_offset = nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR]); + end_offset += nla_get_u64(attrs[DEVLINK_ATTR_REGION_CHUNK_LEN]); + } + + if (end_offset > region->size) + end_offset = region->size; + + /* return 0 if there is no further data to read */ + if (start_offset == end_offset) { + err = 0; + goto out_unlock; + } + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, + &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI, + DEVLINK_CMD_REGION_READ); + if (!hdr) { + err = -EMSGSIZE; + goto out_unlock; + } + + err = devlink_nl_put_handle(skb, devlink); + if (err) + goto nla_put_failure; + + if (region->port) { + err = nla_put_u32(skb, DEVLINK_ATTR_PORT_INDEX, + region->port->index); + if (err) + goto nla_put_failure; + } + + err = nla_put_string(skb, DEVLINK_ATTR_REGION_NAME, region_name); + if (err) + goto nla_put_failure; + + chunks_attr = nla_nest_start_noflag(skb, DEVLINK_ATTR_REGION_CHUNKS); + if (!chunks_attr) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv, + start_offset, end_offset, &ret_offset, + cb->extack); + + if (err && err != -EMSGSIZE) + goto nla_put_failure; + + /* Check if there was any progress done to prevent infinite loop */ + if (ret_offset == start_offset) { + err = -EINVAL; + goto nla_put_failure; + } + + *((u64 *)&cb->args[0]) = ret_offset; + + nla_nest_end(skb, chunks_attr); + genlmsg_end(skb, hdr); + devl_unlock(devlink); + devlink_put(devlink); + return skb->len; + +nla_put_failure: + genlmsg_cancel(skb, hdr); +out_unlock: + devl_unlock(devlink); + devlink_put(devlink); + return err; +} + +int devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn) +{ + if (!req->msg) + return 0; + return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn); +} +EXPORT_SYMBOL_GPL(devlink_info_serial_number_put); + +int devlink_info_board_serial_number_put(struct devlink_info_req *req, + const char *bsn) +{ + if (!req->msg) + return 0; + return nla_put_string(req->msg, DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, + bsn); +} +EXPORT_SYMBOL_GPL(devlink_info_board_serial_number_put); + +static int devlink_info_version_put(struct devlink_info_req *req, int attr, + const char *version_name, + const char *version_value, + enum devlink_info_version_type version_type) +{ + struct nlattr *nest; + int err; + + if (req->version_cb) + req->version_cb(version_name, version_type, + req->version_cb_priv); + + if (!req->msg) + return 0; + + nest = nla_nest_start_noflag(req->msg, attr); + if (!nest) + return -EMSGSIZE; + + err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_NAME, + version_name); + if (err) + goto nla_put_failure; + + err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_VALUE, + version_value); + if (err) + goto nla_put_failure; + + nla_nest_end(req->msg, nest); + + return 0; + +nla_put_failure: + nla_nest_cancel(req->msg, nest); + return err; +} + +int devlink_info_version_fixed_put(struct devlink_info_req *req, + const char *version_name, + const char *version_value) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_FIXED, + version_name, version_value, + DEVLINK_INFO_VERSION_TYPE_NONE); +} +EXPORT_SYMBOL_GPL(devlink_info_version_fixed_put); + +int devlink_info_version_stored_put(struct devlink_info_req *req, + const char *version_name, + const char *version_value) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, + version_name, version_value, + DEVLINK_INFO_VERSION_TYPE_NONE); +} +EXPORT_SYMBOL_GPL(devlink_info_version_stored_put); + +int devlink_info_version_stored_put_ext(struct devlink_info_req *req, + const char *version_name, + const char *version_value, + enum devlink_info_version_type version_type) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, + version_name, version_value, + version_type); +} +EXPORT_SYMBOL_GPL(devlink_info_version_stored_put_ext); + +int devlink_info_version_running_put(struct devlink_info_req *req, + const char *version_name, + const char *version_value) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, + version_name, version_value, + DEVLINK_INFO_VERSION_TYPE_NONE); +} +EXPORT_SYMBOL_GPL(devlink_info_version_running_put); + +int devlink_info_version_running_put_ext(struct devlink_info_req *req, + const char *version_name, + const char *version_value, + enum devlink_info_version_type version_type) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, + version_name, version_value, + version_type); +} +EXPORT_SYMBOL_GPL(devlink_info_version_running_put_ext); + +static int devlink_nl_driver_info_get(struct device_driver *drv, + struct devlink_info_req *req) +{ + if (!drv) + return 0; + + if (drv->name[0]) + return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRIVER_NAME, + drv->name); + + return 0; +} + +static int +devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags, struct netlink_ext_ack *extack) +{ + struct device *dev = devlink_to_dev(devlink); + struct devlink_info_req req = {}; + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + err = -EMSGSIZE; + if (devlink_nl_put_handle(msg, devlink)) + goto err_cancel_msg; + + req.msg = msg; + if (devlink->ops->info_get) { + err = devlink->ops->info_get(devlink, &req, extack); + if (err) + goto err_cancel_msg; + } + + err = devlink_nl_driver_info_get(dev->driver, &req); + if (err) + goto err_cancel_msg; + + genlmsg_end(msg, hdr); + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); + return err; +} + +static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, + info->snd_portid, info->snd_seq, 0, + info->extack); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err = 0; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + if (idx < start) + goto inc; + + devl_lock(devlink); + err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); + devl_unlock(devlink); + if (err == -EOPNOTSUPP) + err = 0; + else if (err) { + devlink_put(devlink); + break; + } +inc: + idx++; + devlink_put(devlink); + } + + if (err != -EMSGSIZE) + return err; + + cb->args[0] = idx; + return msg->len; +} + +struct devlink_fmsg_item { + struct list_head list; + int attrtype; + u8 nla_type; + u16 len; + int value[]; +}; + +struct devlink_fmsg { + struct list_head item_list; + bool putting_binary; /* This flag forces enclosing of binary data + * in an array brackets. It forces using + * of designated API: + * devlink_fmsg_binary_pair_nest_start() + * devlink_fmsg_binary_pair_nest_end() + */ +}; + +static struct devlink_fmsg *devlink_fmsg_alloc(void) +{ + struct devlink_fmsg *fmsg; + + fmsg = kzalloc(sizeof(*fmsg), GFP_KERNEL); + if (!fmsg) + return NULL; + + INIT_LIST_HEAD(&fmsg->item_list); + + return fmsg; +} + +static void devlink_fmsg_free(struct devlink_fmsg *fmsg) +{ + struct devlink_fmsg_item *item, *tmp; + + list_for_each_entry_safe(item, tmp, &fmsg->item_list, list) { + list_del(&item->list); + kfree(item); + } + kfree(fmsg); +} + +static int devlink_fmsg_nest_common(struct devlink_fmsg *fmsg, + int attrtype) +{ + struct devlink_fmsg_item *item; + + item = kzalloc(sizeof(*item), GFP_KERNEL); + if (!item) + return -ENOMEM; + + item->attrtype = attrtype; + list_add_tail(&item->list, &fmsg->item_list); + + return 0; +} + +int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_OBJ_NEST_START); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_start); + +static int devlink_fmsg_nest_end(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_NEST_END); +} + +int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_end(fmsg); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_end); + +#define DEVLINK_FMSG_MAX_SIZE (GENLMSG_DEFAULT_SIZE - GENL_HDRLEN - NLA_HDRLEN) + +static int devlink_fmsg_put_name(struct devlink_fmsg *fmsg, const char *name) +{ + struct devlink_fmsg_item *item; + + if (fmsg->putting_binary) + return -EINVAL; + + if (strlen(name) + 1 > DEVLINK_FMSG_MAX_SIZE) + return -EMSGSIZE; + + item = kzalloc(sizeof(*item) + strlen(name) + 1, GFP_KERNEL); + if (!item) + return -ENOMEM; + + item->nla_type = NLA_NUL_STRING; + item->len = strlen(name) + 1; + item->attrtype = DEVLINK_ATTR_FMSG_OBJ_NAME; + memcpy(&item->value, name, item->len); + list_add_tail(&item->list, &fmsg->item_list); + + return 0; +} + +int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + if (fmsg->putting_binary) + return -EINVAL; + + err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_PAIR_NEST_START); + if (err) + return err; + + err = devlink_fmsg_put_name(fmsg, name); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_start); + +int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_end(fmsg); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_end); + +int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg, + const char *name) +{ + int err; + + if (fmsg->putting_binary) + return -EINVAL; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_ARR_NEST_START); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_start); + +int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg) +{ + int err; + + if (fmsg->putting_binary) + return -EINVAL; + + err = devlink_fmsg_nest_end(fmsg); + if (err) + return err; + + err = devlink_fmsg_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_end); + +int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg *fmsg, + const char *name) +{ + int err; + + err = devlink_fmsg_arr_pair_nest_start(fmsg, name); + if (err) + return err; + + fmsg->putting_binary = true; + return err; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_start); + +int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg *fmsg) +{ + if (!fmsg->putting_binary) + return -EINVAL; + + fmsg->putting_binary = false; + return devlink_fmsg_arr_pair_nest_end(fmsg); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_end); + +static int devlink_fmsg_put_value(struct devlink_fmsg *fmsg, + const void *value, u16 value_len, + u8 value_nla_type) +{ + struct devlink_fmsg_item *item; + + if (value_len > DEVLINK_FMSG_MAX_SIZE) + return -EMSGSIZE; + + item = kzalloc(sizeof(*item) + value_len, GFP_KERNEL); + if (!item) + return -ENOMEM; + + item->nla_type = value_nla_type; + item->len = value_len; + item->attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; + memcpy(&item->value, value, item->len); + list_add_tail(&item->list, &fmsg->item_list); + + return 0; +} + +static int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_FLAG); +} + +static int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U8); +} + +int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U32); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u32_put); + +static int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U64); +} + +int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, value, strlen(value) + 1, + NLA_NUL_STRING); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_string_put); + +int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value, + u16 value_len) +{ + if (!fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, value, value_len, NLA_BINARY); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_put); + +int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name, + bool value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_bool_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_bool_pair_put); + +int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name, + u8 value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_u8_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u8_pair_put); + +int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name, + u32 value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_u32_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u32_pair_put); + +int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name, + u64 value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_u64_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u64_pair_put); + +int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name, + const char *value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_string_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_string_pair_put); + +int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name, + const void *value, u32 value_len) +{ + u32 data_size; + int end_err; + u32 offset; + int err; + + err = devlink_fmsg_binary_pair_nest_start(fmsg, name); + if (err) + return err; + + for (offset = 0; offset < value_len; offset += data_size) { + data_size = value_len - offset; + if (data_size > DEVLINK_FMSG_MAX_SIZE) + data_size = DEVLINK_FMSG_MAX_SIZE; + err = devlink_fmsg_binary_put(fmsg, value + offset, data_size); + if (err) + break; + /* Exit from loop with a break (instead of + * return) to make sure putting_binary is turned off in + * devlink_fmsg_binary_pair_nest_end + */ + } + + end_err = devlink_fmsg_binary_pair_nest_end(fmsg); + if (end_err) + err = end_err; + + return err; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_put); + +static int +devlink_fmsg_item_fill_type(struct devlink_fmsg_item *msg, struct sk_buff *skb) +{ + switch (msg->nla_type) { + case NLA_FLAG: + case NLA_U8: + case NLA_U32: + case NLA_U64: + case NLA_NUL_STRING: + case NLA_BINARY: + return nla_put_u8(skb, DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, + msg->nla_type); + default: + return -EINVAL; + } +} + +static int +devlink_fmsg_item_fill_data(struct devlink_fmsg_item *msg, struct sk_buff *skb) +{ + int attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; + u8 tmp; + + switch (msg->nla_type) { + case NLA_FLAG: + /* Always provide flag data, regardless of its value */ + tmp = *(bool *) msg->value; + + return nla_put_u8(skb, attrtype, tmp); + case NLA_U8: + return nla_put_u8(skb, attrtype, *(u8 *) msg->value); + case NLA_U32: + return nla_put_u32(skb, attrtype, *(u32 *) msg->value); + case NLA_U64: + return nla_put_u64_64bit(skb, attrtype, *(u64 *) msg->value, + DEVLINK_ATTR_PAD); + case NLA_NUL_STRING: + return nla_put_string(skb, attrtype, (char *) &msg->value); + case NLA_BINARY: + return nla_put(skb, attrtype, msg->len, (void *) &msg->value); + default: + return -EINVAL; + } +} + +static int +devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb, + int *start) +{ + struct devlink_fmsg_item *item; + struct nlattr *fmsg_nlattr; + int i = 0; + int err; + + fmsg_nlattr = nla_nest_start_noflag(skb, DEVLINK_ATTR_FMSG); + if (!fmsg_nlattr) + return -EMSGSIZE; + + list_for_each_entry(item, &fmsg->item_list, list) { + if (i < *start) { + i++; + continue; + } + + switch (item->attrtype) { + case DEVLINK_ATTR_FMSG_OBJ_NEST_START: + case DEVLINK_ATTR_FMSG_PAIR_NEST_START: + case DEVLINK_ATTR_FMSG_ARR_NEST_START: + case DEVLINK_ATTR_FMSG_NEST_END: + err = nla_put_flag(skb, item->attrtype); + break; + case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA: + err = devlink_fmsg_item_fill_type(item, skb); + if (err) + break; + err = devlink_fmsg_item_fill_data(item, skb); + break; + case DEVLINK_ATTR_FMSG_OBJ_NAME: + err = nla_put_string(skb, item->attrtype, + (char *) &item->value); + break; + default: + err = -EINVAL; + break; + } + if (!err) + *start = ++i; + else + break; + } + + nla_nest_end(skb, fmsg_nlattr); + return err; +} + +static int devlink_fmsg_snd(struct devlink_fmsg *fmsg, + struct genl_info *info, + enum devlink_command cmd, int flags) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + bool last = false; + int index = 0; + void *hdr; + int err; + + while (!last) { + int tmp_index = index; + + skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!skb) + return -ENOMEM; + + hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, + &devlink_nl_family, flags | NLM_F_MULTI, cmd); + if (!hdr) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + err = devlink_fmsg_prepare_skb(fmsg, skb, &index); + if (!err) + last = true; + else if (err != -EMSGSIZE || tmp_index == index) + goto nla_put_failure; + + genlmsg_end(skb, hdr); + err = genlmsg_reply(skb, info); + if (err) + return err; + } + + skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!skb) + return -ENOMEM; + nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, flags | NLM_F_MULTI); + if (!nlh) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + return genlmsg_reply(skb, info); + +nla_put_failure: + nlmsg_free(skb); + return err; +} + +static int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, + struct netlink_callback *cb, + enum devlink_command cmd) +{ + int index = cb->args[0]; + int tmp_index = index; + void *hdr; + int err; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, + &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI, cmd); + if (!hdr) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + err = devlink_fmsg_prepare_skb(fmsg, skb, &index); + if ((err && err != -EMSGSIZE) || tmp_index == index) + goto nla_put_failure; + + cb->args[0] = index; + genlmsg_end(skb, hdr); + return skb->len; + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return err; +} + +struct devlink_health_reporter { + struct list_head list; + void *priv; + const struct devlink_health_reporter_ops *ops; + struct devlink *devlink; + struct devlink_port *devlink_port; + struct devlink_fmsg *dump_fmsg; + struct mutex dump_lock; /* lock parallel read/write from dump buffers */ + u64 graceful_period; + bool auto_recover; + bool auto_dump; + u8 health_state; + u64 dump_ts; + u64 dump_real_ts; + u64 error_count; + u64 recovery_count; + u64 last_recovery_ts; + refcount_t refcount; +}; + +void * +devlink_health_reporter_priv(struct devlink_health_reporter *reporter) +{ + return reporter->priv; +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_priv); + +static struct devlink_health_reporter * +__devlink_health_reporter_find_by_name(struct list_head *reporter_list, + struct mutex *list_lock, + const char *reporter_name) +{ + struct devlink_health_reporter *reporter; + + lockdep_assert_held(list_lock); + list_for_each_entry(reporter, reporter_list, list) + if (!strcmp(reporter->ops->name, reporter_name)) + return reporter; + return NULL; +} + +static struct devlink_health_reporter * +devlink_health_reporter_find_by_name(struct devlink *devlink, + const char *reporter_name) +{ + return __devlink_health_reporter_find_by_name(&devlink->reporter_list, + &devlink->reporters_lock, + reporter_name); +} + +static struct devlink_health_reporter * +devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, + const char *reporter_name) +{ + return __devlink_health_reporter_find_by_name(&devlink_port->reporter_list, + &devlink_port->reporters_lock, + reporter_name); +} + +static struct devlink_health_reporter * +__devlink_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + if (WARN_ON(graceful_period && !ops->recover)) + return ERR_PTR(-EINVAL); + + reporter = kzalloc(sizeof(*reporter), GFP_KERNEL); + if (!reporter) + return ERR_PTR(-ENOMEM); + + reporter->priv = priv; + reporter->ops = ops; + reporter->devlink = devlink; + reporter->graceful_period = graceful_period; + reporter->auto_recover = !!ops->recover; + reporter->auto_dump = !!ops->dump; + mutex_init(&reporter->dump_lock); + refcount_set(&reporter->refcount, 1); + return reporter; +} + +/** + * devlink_port_health_reporter_create - create devlink health reporter for + * specified port instance + * + * @port: devlink_port which should contain the new reporter + * @ops: ops + * @graceful_period: to avoid recovery loops, in msecs + * @priv: priv + */ +struct devlink_health_reporter * +devlink_port_health_reporter_create(struct devlink_port *port, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + mutex_lock(&port->reporters_lock); + if (__devlink_health_reporter_find_by_name(&port->reporter_list, + &port->reporters_lock, ops->name)) { + reporter = ERR_PTR(-EEXIST); + goto unlock; + } + + reporter = __devlink_health_reporter_create(port->devlink, ops, + graceful_period, priv); + if (IS_ERR(reporter)) + goto unlock; + + reporter->devlink_port = port; + list_add_tail(&reporter->list, &port->reporter_list); +unlock: + mutex_unlock(&port->reporters_lock); + return reporter; +} +EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create); + +/** + * devlink_health_reporter_create - create devlink health reporter + * + * @devlink: devlink + * @ops: ops + * @graceful_period: to avoid recovery loops, in msecs + * @priv: priv + */ +struct devlink_health_reporter * +devlink_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + mutex_lock(&devlink->reporters_lock); + if (devlink_health_reporter_find_by_name(devlink, ops->name)) { + reporter = ERR_PTR(-EEXIST); + goto unlock; + } + + reporter = __devlink_health_reporter_create(devlink, ops, + graceful_period, priv); + if (IS_ERR(reporter)) + goto unlock; + + list_add_tail(&reporter->list, &devlink->reporter_list); +unlock: + mutex_unlock(&devlink->reporters_lock); + return reporter; +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_create); + +static void +devlink_health_reporter_free(struct devlink_health_reporter *reporter) +{ + mutex_destroy(&reporter->dump_lock); + if (reporter->dump_fmsg) + devlink_fmsg_free(reporter->dump_fmsg); + kfree(reporter); +} + +static void +devlink_health_reporter_put(struct devlink_health_reporter *reporter) +{ + if (refcount_dec_and_test(&reporter->refcount)) + devlink_health_reporter_free(reporter); +} + +static void +__devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + list_del(&reporter->list); + devlink_health_reporter_put(reporter); +} + +/** + * devlink_health_reporter_destroy - destroy devlink health reporter + * + * @reporter: devlink health reporter to destroy + */ +void +devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + struct mutex *lock = &reporter->devlink->reporters_lock; + + mutex_lock(lock); + __devlink_health_reporter_destroy(reporter); + mutex_unlock(lock); +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); + +/** + * devlink_port_health_reporter_destroy - destroy devlink port health reporter + * + * @reporter: devlink health reporter to destroy + */ +void +devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + struct mutex *lock = &reporter->devlink_port->reporters_lock; + + mutex_lock(lock); + __devlink_health_reporter_destroy(reporter); + mutex_unlock(lock); +} +EXPORT_SYMBOL_GPL(devlink_port_health_reporter_destroy); + +static int +devlink_nl_health_reporter_fill(struct sk_buff *msg, + struct devlink_health_reporter *reporter, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + struct devlink *devlink = reporter->devlink; + struct nlattr *reporter_attr; + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto genlmsg_cancel; + + if (reporter->devlink_port) { + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, reporter->devlink_port->index)) + goto genlmsg_cancel; + } + reporter_attr = nla_nest_start_noflag(msg, + DEVLINK_ATTR_HEALTH_REPORTER); + if (!reporter_attr) + goto genlmsg_cancel; + if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME, + reporter->ops->name)) + goto reporter_nest_cancel; + if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE, + reporter->health_state)) + goto reporter_nest_cancel; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR_COUNT, + reporter->error_count, DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT, + reporter->recovery_count, DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->ops->recover && + nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, + reporter->graceful_period, + DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->ops->recover && + nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, + reporter->auto_recover)) + goto reporter_nest_cancel; + if (reporter->dump_fmsg && + nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, + jiffies_to_msecs(reporter->dump_ts), + DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->dump_fmsg && + nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS, + reporter->dump_real_ts, DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->ops->dump && + nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, + reporter->auto_dump)) + goto reporter_nest_cancel; + + nla_nest_end(msg, reporter_attr); + genlmsg_end(msg, hdr); + return 0; + +reporter_nest_cancel: + nla_nest_end(msg, reporter_attr); +genlmsg_cancel: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void devlink_recover_notify(struct devlink_health_reporter *reporter, + enum devlink_command cmd) +{ + struct devlink *devlink = reporter->devlink; + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_HEALTH_REPORTER_RECOVER); + WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_health_reporter_fill(msg, reporter, cmd, 0, 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, + 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +void +devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter) +{ + reporter->recovery_count++; + reporter->last_recovery_ts = jiffies; +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done); + +static int +devlink_health_reporter_recover(struct devlink_health_reporter *reporter, + void *priv_ctx, struct netlink_ext_ack *extack) +{ + int err; + + if (reporter->health_state == DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) + return 0; + + if (!reporter->ops->recover) + return -EOPNOTSUPP; + + err = reporter->ops->recover(reporter, priv_ctx, extack); + if (err) + return err; + + devlink_health_reporter_recovery_done(reporter); + reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; + devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); + + return 0; +} + +static void +devlink_health_dump_clear(struct devlink_health_reporter *reporter) +{ + if (!reporter->dump_fmsg) + return; + devlink_fmsg_free(reporter->dump_fmsg); + reporter->dump_fmsg = NULL; +} + +static int devlink_health_do_dump(struct devlink_health_reporter *reporter, + void *priv_ctx, + struct netlink_ext_ack *extack) +{ + int err; + + if (!reporter->ops->dump) + return 0; + + if (reporter->dump_fmsg) + return 0; + + reporter->dump_fmsg = devlink_fmsg_alloc(); + if (!reporter->dump_fmsg) { + err = -ENOMEM; + return err; + } + + err = devlink_fmsg_obj_nest_start(reporter->dump_fmsg); + if (err) + goto dump_err; + + err = reporter->ops->dump(reporter, reporter->dump_fmsg, + priv_ctx, extack); + if (err) + goto dump_err; + + err = devlink_fmsg_obj_nest_end(reporter->dump_fmsg); + if (err) + goto dump_err; + + reporter->dump_ts = jiffies; + reporter->dump_real_ts = ktime_get_real_ns(); + + return 0; + +dump_err: + devlink_health_dump_clear(reporter); + return err; +} + +int devlink_health_report(struct devlink_health_reporter *reporter, + const char *msg, void *priv_ctx) +{ + enum devlink_health_reporter_state prev_health_state; + struct devlink *devlink = reporter->devlink; + unsigned long recover_ts_threshold; + int ret; + + /* write a log message of the current error */ + WARN_ON(!msg); + trace_devlink_health_report(devlink, reporter->ops->name, msg); + reporter->error_count++; + prev_health_state = reporter->health_state; + reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; + devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); + + /* abort if the previous error wasn't recovered */ + recover_ts_threshold = reporter->last_recovery_ts + + msecs_to_jiffies(reporter->graceful_period); + if (reporter->auto_recover && + (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || + (reporter->last_recovery_ts && reporter->recovery_count && + time_is_after_jiffies(recover_ts_threshold)))) { + trace_devlink_health_recover_aborted(devlink, + reporter->ops->name, + reporter->health_state, + jiffies - + reporter->last_recovery_ts); + return -ECANCELED; + } + + if (reporter->auto_dump) { + mutex_lock(&reporter->dump_lock); + /* store current dump of current error, for later analysis */ + devlink_health_do_dump(reporter, priv_ctx, NULL); + mutex_unlock(&reporter->dump_lock); + } + + if (!reporter->auto_recover) + return 0; + + devl_lock(devlink); + ret = devlink_health_reporter_recover(reporter, priv_ctx, NULL); + devl_unlock(devlink); + + return ret; +} +EXPORT_SYMBOL_GPL(devlink_health_report); + +static struct devlink_health_reporter * +devlink_health_reporter_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs) +{ + struct devlink_health_reporter *reporter; + struct devlink_port *devlink_port; + char *reporter_name; + + if (!attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]) + return NULL; + + reporter_name = nla_data(attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]); + devlink_port = devlink_port_get_from_attrs(devlink, attrs); + if (IS_ERR(devlink_port)) { + mutex_lock(&devlink->reporters_lock); + reporter = devlink_health_reporter_find_by_name(devlink, reporter_name); + if (reporter) + refcount_inc(&reporter->refcount); + mutex_unlock(&devlink->reporters_lock); + } else { + mutex_lock(&devlink_port->reporters_lock); + reporter = devlink_port_health_reporter_find_by_name(devlink_port, reporter_name); + if (reporter) + refcount_inc(&reporter->refcount); + mutex_unlock(&devlink_port->reporters_lock); + } + + return reporter; +} + +static struct devlink_health_reporter * +devlink_health_reporter_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + return devlink_health_reporter_get_from_attrs(devlink, info->attrs); +} + +static struct devlink_health_reporter * +devlink_health_reporter_get_from_cb(struct netlink_callback *cb) +{ + const struct genl_dumpit_info *info = genl_dumpit_info(cb); + struct devlink_health_reporter *reporter; + struct nlattr **attrs = info->attrs; + struct devlink *devlink; + + devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); + if (IS_ERR(devlink)) + return NULL; + + reporter = devlink_health_reporter_get_from_attrs(devlink, attrs); + devlink_put(devlink); + return reporter; +} + +void +devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, + enum devlink_health_reporter_state state) +{ + if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY && + state != DEVLINK_HEALTH_REPORTER_STATE_ERROR)) + return; + + if (reporter->health_state == state) + return; + + reporter->health_state = state; + trace_devlink_health_reporter_state_update(reporter->devlink, + reporter->ops->name, state); + devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update); + +static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + struct sk_buff *msg; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) { + err = -ENOMEM; + goto out; + } + + err = devlink_nl_health_reporter_fill(msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + info->snd_portid, info->snd_seq, + 0); + if (err) { + nlmsg_free(msg); + goto out; + } + + err = genlmsg_reply(msg, info); +out: + devlink_health_reporter_put(reporter); + return err; +} + +static int +devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink_health_reporter *reporter; + unsigned long index, port_index; + struct devlink_port *port; + struct devlink *devlink; + int start = cb->args[0]; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + mutex_lock(&devlink->reporters_lock); + list_for_each_entry(reporter, &devlink->reporter_list, + list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_health_reporter_fill( + msg, reporter, DEVLINK_CMD_HEALTH_REPORTER_GET, + NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + mutex_unlock(&devlink->reporters_lock); + devlink_put(devlink); + goto out; + } + idx++; + } + mutex_unlock(&devlink->reporters_lock); + devlink_put(devlink); + } + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + xa_for_each(&devlink->ports, port_index, port) { + mutex_lock(&port->reporters_lock); + list_for_each_entry(reporter, &port->reporter_list, list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_health_reporter_fill( + msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI); + if (err) { + mutex_unlock(&port->reporters_lock); + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + mutex_unlock(&port->reporters_lock); + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int +devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->recover && + (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) { + err = -EOPNOTSUPP; + goto out; + } + if (!reporter->ops->dump && + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) { + err = -EOPNOTSUPP; + goto out; + } + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) + reporter->graceful_period = + nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]); + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]) + reporter->auto_recover = + nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]); + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) + reporter->auto_dump = + nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]); + + devlink_health_reporter_put(reporter); + return 0; +out: + devlink_health_reporter_put(reporter); + return err; +} + +static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + err = devlink_health_reporter_recover(reporter, NULL, info->extack); + + devlink_health_reporter_put(reporter); + return err; +} + +static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + struct devlink_fmsg *fmsg; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->diagnose) { + devlink_health_reporter_put(reporter); + return -EOPNOTSUPP; + } + + fmsg = devlink_fmsg_alloc(); + if (!fmsg) { + devlink_health_reporter_put(reporter); + return -ENOMEM; + } + + err = devlink_fmsg_obj_nest_start(fmsg); + if (err) + goto out; + + err = reporter->ops->diagnose(reporter, fmsg, info->extack); + if (err) + goto out; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + goto out; + + err = devlink_fmsg_snd(fmsg, info, + DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); + +out: + devlink_fmsg_free(fmsg); + devlink_health_reporter_put(reporter); + return err; +} + +static int +devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct devlink_health_reporter *reporter; + u64 start = cb->args[0]; + int err; + + reporter = devlink_health_reporter_get_from_cb(cb); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->dump) { + err = -EOPNOTSUPP; + goto out; + } + mutex_lock(&reporter->dump_lock); + if (!start) { + err = devlink_health_do_dump(reporter, NULL, cb->extack); + if (err) + goto unlock; + cb->args[1] = reporter->dump_ts; + } + if (!reporter->dump_fmsg || cb->args[1] != reporter->dump_ts) { + NL_SET_ERR_MSG_MOD(cb->extack, "Dump trampled, please retry"); + err = -EAGAIN; + goto unlock; + } + + err = devlink_fmsg_dumpit(reporter->dump_fmsg, skb, cb, + DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET); +unlock: + mutex_unlock(&reporter->dump_lock); +out: + devlink_health_reporter_put(reporter); + return err; +} + +static int +devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->dump) { + devlink_health_reporter_put(reporter); + return -EOPNOTSUPP; + } + + mutex_lock(&reporter->dump_lock); + devlink_health_dump_clear(reporter); + mutex_unlock(&reporter->dump_lock); + devlink_health_reporter_put(reporter); + return 0; +} + +static int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->test) { + devlink_health_reporter_put(reporter); + return -EOPNOTSUPP; + } + + err = reporter->ops->test(reporter, info->extack); + + devlink_health_reporter_put(reporter); + return err; +} + +struct devlink_stats { + u64_stats_t rx_bytes; + u64_stats_t rx_packets; + struct u64_stats_sync syncp; +}; + +/** + * struct devlink_trap_policer_item - Packet trap policer attributes. + * @policer: Immutable packet trap policer attributes. + * @rate: Rate in packets / sec. + * @burst: Burst size in packets. + * @list: trap_policer_list member. + * + * Describes packet trap policer attributes. Created by devlink during trap + * policer registration. + */ +struct devlink_trap_policer_item { + const struct devlink_trap_policer *policer; + u64 rate; + u64 burst; + struct list_head list; +}; + +/** + * struct devlink_trap_group_item - Packet trap group attributes. + * @group: Immutable packet trap group attributes. + * @policer_item: Associated policer item. Can be NULL. + * @list: trap_group_list member. + * @stats: Trap group statistics. + * + * Describes packet trap group attributes. Created by devlink during trap + * group registration. + */ +struct devlink_trap_group_item { + const struct devlink_trap_group *group; + struct devlink_trap_policer_item *policer_item; + struct list_head list; + struct devlink_stats __percpu *stats; +}; + +/** + * struct devlink_trap_item - Packet trap attributes. + * @trap: Immutable packet trap attributes. + * @group_item: Associated group item. + * @list: trap_list member. + * @action: Trap action. + * @stats: Trap statistics. + * @priv: Driver private information. + * + * Describes both mutable and immutable packet trap attributes. Created by + * devlink during trap registration and used for all trap related operations. + */ +struct devlink_trap_item { + const struct devlink_trap *trap; + struct devlink_trap_group_item *group_item; + struct list_head list; + enum devlink_trap_action action; + struct devlink_stats __percpu *stats; + void *priv; +}; + +static struct devlink_trap_policer_item * +devlink_trap_policer_item_lookup(struct devlink *devlink, u32 id) +{ + struct devlink_trap_policer_item *policer_item; + + list_for_each_entry(policer_item, &devlink->trap_policer_list, list) { + if (policer_item->policer->id == id) + return policer_item; + } + + return NULL; +} + +static struct devlink_trap_item * +devlink_trap_item_lookup(struct devlink *devlink, const char *name) +{ + struct devlink_trap_item *trap_item; + + list_for_each_entry(trap_item, &devlink->trap_list, list) { + if (!strcmp(trap_item->trap->name, name)) + return trap_item; + } + + return NULL; +} + +static struct devlink_trap_item * +devlink_trap_item_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + struct nlattr *attr; + + if (!info->attrs[DEVLINK_ATTR_TRAP_NAME]) + return NULL; + attr = info->attrs[DEVLINK_ATTR_TRAP_NAME]; + + return devlink_trap_item_lookup(devlink, nla_data(attr)); +} + +static int +devlink_trap_action_get_from_info(struct genl_info *info, + enum devlink_trap_action *p_trap_action) +{ + u8 val; + + val = nla_get_u8(info->attrs[DEVLINK_ATTR_TRAP_ACTION]); + switch (val) { + case DEVLINK_TRAP_ACTION_DROP: + case DEVLINK_TRAP_ACTION_TRAP: + case DEVLINK_TRAP_ACTION_MIRROR: + *p_trap_action = val; + break; + default: + return -EINVAL; + } + + return 0; +} + +static int devlink_trap_metadata_put(struct sk_buff *msg, + const struct devlink_trap *trap) +{ + struct nlattr *attr; + + attr = nla_nest_start(msg, DEVLINK_ATTR_TRAP_METADATA); + if (!attr) + return -EMSGSIZE; + + if ((trap->metadata_cap & DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT) && + nla_put_flag(msg, DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT)) + goto nla_put_failure; + if ((trap->metadata_cap & DEVLINK_TRAP_METADATA_TYPE_F_FA_COOKIE) && + nla_put_flag(msg, DEVLINK_ATTR_TRAP_METADATA_TYPE_FA_COOKIE)) + goto nla_put_failure; + + nla_nest_end(msg, attr); + + return 0; + +nla_put_failure: + nla_nest_cancel(msg, attr); + return -EMSGSIZE; +} + +static void devlink_trap_stats_read(struct devlink_stats __percpu *trap_stats, + struct devlink_stats *stats) +{ + int i; + + memset(stats, 0, sizeof(*stats)); + for_each_possible_cpu(i) { + struct devlink_stats *cpu_stats; + u64 rx_packets, rx_bytes; + unsigned int start; + + cpu_stats = per_cpu_ptr(trap_stats, i); + do { + start = u64_stats_fetch_begin(&cpu_stats->syncp); + rx_packets = u64_stats_read(&cpu_stats->rx_packets); + rx_bytes = u64_stats_read(&cpu_stats->rx_bytes); + } while (u64_stats_fetch_retry(&cpu_stats->syncp, start)); + + u64_stats_add(&stats->rx_packets, rx_packets); + u64_stats_add(&stats->rx_bytes, rx_bytes); + } +} + +static int +devlink_trap_group_stats_put(struct sk_buff *msg, + struct devlink_stats __percpu *trap_stats) +{ + struct devlink_stats stats; + struct nlattr *attr; + + devlink_trap_stats_read(trap_stats, &stats); + + attr = nla_nest_start(msg, DEVLINK_ATTR_STATS); + if (!attr) + return -EMSGSIZE; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_PACKETS, + u64_stats_read(&stats.rx_packets), + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_BYTES, + u64_stats_read(&stats.rx_bytes), + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + nla_nest_end(msg, attr); + + return 0; + +nla_put_failure: + nla_nest_cancel(msg, attr); + return -EMSGSIZE; +} + +static int devlink_trap_stats_put(struct sk_buff *msg, struct devlink *devlink, + const struct devlink_trap_item *trap_item) +{ + struct devlink_stats stats; + struct nlattr *attr; + u64 drops = 0; + int err; + + if (devlink->ops->trap_drop_counter_get) { + err = devlink->ops->trap_drop_counter_get(devlink, + trap_item->trap, + &drops); + if (err) + return err; + } + + devlink_trap_stats_read(trap_item->stats, &stats); + + attr = nla_nest_start(msg, DEVLINK_ATTR_STATS); + if (!attr) + return -EMSGSIZE; + + if (devlink->ops->trap_drop_counter_get && + nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops, + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_PACKETS, + u64_stats_read(&stats.rx_packets), + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_BYTES, + u64_stats_read(&stats.rx_bytes), + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + nla_nest_end(msg, attr); + + return 0; + +nla_put_failure: + nla_nest_cancel(msg, attr); + return -EMSGSIZE; +} + +static int devlink_nl_trap_fill(struct sk_buff *msg, struct devlink *devlink, + const struct devlink_trap_item *trap_item, + enum devlink_command cmd, u32 portid, u32 seq, + int flags) +{ + struct devlink_trap_group_item *group_item = trap_item->group_item; + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME, + group_item->group->name)) + goto nla_put_failure; + + if (nla_put_string(msg, DEVLINK_ATTR_TRAP_NAME, trap_item->trap->name)) + goto nla_put_failure; + + if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_TYPE, trap_item->trap->type)) + goto nla_put_failure; + + if (trap_item->trap->generic && + nla_put_flag(msg, DEVLINK_ATTR_TRAP_GENERIC)) + goto nla_put_failure; + + if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_ACTION, trap_item->action)) + goto nla_put_failure; + + err = devlink_trap_metadata_put(msg, trap_item->trap); + if (err) + goto nla_put_failure; + + err = devlink_trap_stats_put(msg, devlink, trap_item); + if (err) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_trap_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + struct devlink_trap_item *trap_item; + struct sk_buff *msg; + int err; + + if (list_empty(&devlink->trap_list)) + return -EOPNOTSUPP; + + trap_item = devlink_trap_item_get_from_info(devlink, info); + if (!trap_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap"); + return -ENOENT; + } + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_trap_fill(msg, devlink, trap_item, + DEVLINK_CMD_TRAP_NEW, info->snd_portid, + info->snd_seq, 0); + if (err) + goto err_trap_fill; + + return genlmsg_reply(msg, info); + +err_trap_fill: + nlmsg_free(msg); + return err; +} + +static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + struct devlink_trap_item *trap_item; + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + list_for_each_entry(trap_item, &devlink->trap_list, list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_trap_fill(msg, devlink, trap_item, + DEVLINK_CMD_TRAP_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int __devlink_trap_action_set(struct devlink *devlink, + struct devlink_trap_item *trap_item, + enum devlink_trap_action trap_action, + struct netlink_ext_ack *extack) +{ + int err; + + if (trap_item->action != trap_action && + trap_item->trap->type != DEVLINK_TRAP_TYPE_DROP) { + NL_SET_ERR_MSG_MOD(extack, "Cannot change action of non-drop traps. Skipping"); + return 0; + } + + err = devlink->ops->trap_action_set(devlink, trap_item->trap, + trap_action, extack); + if (err) + return err; + + trap_item->action = trap_action; + + return 0; +} + +static int devlink_trap_action_set(struct devlink *devlink, + struct devlink_trap_item *trap_item, + struct genl_info *info) +{ + enum devlink_trap_action trap_action; + int err; + + if (!info->attrs[DEVLINK_ATTR_TRAP_ACTION]) + return 0; + + err = devlink_trap_action_get_from_info(info, &trap_action); + if (err) { + NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action"); + return -EINVAL; + } + + return __devlink_trap_action_set(devlink, trap_item, trap_action, + info->extack); +} + +static int devlink_nl_cmd_trap_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + struct devlink_trap_item *trap_item; + + if (list_empty(&devlink->trap_list)) + return -EOPNOTSUPP; + + trap_item = devlink_trap_item_get_from_info(devlink, info); + if (!trap_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap"); + return -ENOENT; + } + + return devlink_trap_action_set(devlink, trap_item, info); +} + +static struct devlink_trap_group_item * +devlink_trap_group_item_lookup(struct devlink *devlink, const char *name) +{ + struct devlink_trap_group_item *group_item; + + list_for_each_entry(group_item, &devlink->trap_group_list, list) { + if (!strcmp(group_item->group->name, name)) + return group_item; + } + + return NULL; +} + +static struct devlink_trap_group_item * +devlink_trap_group_item_lookup_by_id(struct devlink *devlink, u16 id) +{ + struct devlink_trap_group_item *group_item; + + list_for_each_entry(group_item, &devlink->trap_group_list, list) { + if (group_item->group->id == id) + return group_item; + } + + return NULL; +} + +static struct devlink_trap_group_item * +devlink_trap_group_item_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + char *name; + + if (!info->attrs[DEVLINK_ATTR_TRAP_GROUP_NAME]) + return NULL; + name = nla_data(info->attrs[DEVLINK_ATTR_TRAP_GROUP_NAME]); + + return devlink_trap_group_item_lookup(devlink, name); +} + +static int +devlink_nl_trap_group_fill(struct sk_buff *msg, struct devlink *devlink, + const struct devlink_trap_group_item *group_item, + enum devlink_command cmd, u32 portid, u32 seq, + int flags) +{ + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME, + group_item->group->name)) + goto nla_put_failure; + + if (group_item->group->generic && + nla_put_flag(msg, DEVLINK_ATTR_TRAP_GENERIC)) + goto nla_put_failure; + + if (group_item->policer_item && + nla_put_u32(msg, DEVLINK_ATTR_TRAP_POLICER_ID, + group_item->policer_item->policer->id)) + goto nla_put_failure; + + err = devlink_trap_group_stats_put(msg, group_item->stats); + if (err) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_trap_group_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + struct devlink_trap_group_item *group_item; + struct sk_buff *msg; + int err; + + if (list_empty(&devlink->trap_group_list)) + return -EOPNOTSUPP; + + group_item = devlink_trap_group_item_get_from_info(devlink, info); + if (!group_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group"); + return -ENOENT; + } + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_trap_group_fill(msg, devlink, group_item, + DEVLINK_CMD_TRAP_GROUP_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) + goto err_trap_group_fill; + + return genlmsg_reply(msg, info); + +err_trap_group_fill: + nlmsg_free(msg); + return err; +} + +static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + enum devlink_command cmd = DEVLINK_CMD_TRAP_GROUP_NEW; + struct devlink_trap_group_item *group_item; + u32 portid = NETLINK_CB(cb->skb).portid; + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + list_for_each_entry(group_item, &devlink->trap_group_list, + list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_trap_group_fill(msg, devlink, + group_item, cmd, + portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int +__devlink_trap_group_action_set(struct devlink *devlink, + struct devlink_trap_group_item *group_item, + enum devlink_trap_action trap_action, + struct netlink_ext_ack *extack) +{ + const char *group_name = group_item->group->name; + struct devlink_trap_item *trap_item; + int err; + + if (devlink->ops->trap_group_action_set) { + err = devlink->ops->trap_group_action_set(devlink, group_item->group, + trap_action, extack); + if (err) + return err; + + list_for_each_entry(trap_item, &devlink->trap_list, list) { + if (strcmp(trap_item->group_item->group->name, group_name)) + continue; + if (trap_item->action != trap_action && + trap_item->trap->type != DEVLINK_TRAP_TYPE_DROP) + continue; + trap_item->action = trap_action; + } + + return 0; + } + + list_for_each_entry(trap_item, &devlink->trap_list, list) { + if (strcmp(trap_item->group_item->group->name, group_name)) + continue; + err = __devlink_trap_action_set(devlink, trap_item, + trap_action, extack); + if (err) + return err; + } + + return 0; +} + +static int +devlink_trap_group_action_set(struct devlink *devlink, + struct devlink_trap_group_item *group_item, + struct genl_info *info, bool *p_modified) +{ + enum devlink_trap_action trap_action; + int err; + + if (!info->attrs[DEVLINK_ATTR_TRAP_ACTION]) + return 0; + + err = devlink_trap_action_get_from_info(info, &trap_action); + if (err) { + NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action"); + return -EINVAL; + } + + err = __devlink_trap_group_action_set(devlink, group_item, trap_action, + info->extack); + if (err) + return err; + + *p_modified = true; + + return 0; +} + +static int devlink_trap_group_set(struct devlink *devlink, + struct devlink_trap_group_item *group_item, + struct genl_info *info) +{ + struct devlink_trap_policer_item *policer_item; + struct netlink_ext_ack *extack = info->extack; + const struct devlink_trap_policer *policer; + struct nlattr **attrs = info->attrs; + int err; + + if (!attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) + return 0; + + if (!devlink->ops->trap_group_set) + return -EOPNOTSUPP; + + policer_item = group_item->policer_item; + if (attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) { + u32 policer_id; + + policer_id = nla_get_u32(attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); + policer_item = devlink_trap_policer_item_lookup(devlink, + policer_id); + if (policer_id && !policer_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + return -ENOENT; + } + } + policer = policer_item ? policer_item->policer : NULL; + + err = devlink->ops->trap_group_set(devlink, group_item->group, policer, + extack); + if (err) + return err; + + group_item->policer_item = policer_item; + + return 0; +} + +static int devlink_nl_cmd_trap_group_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + struct devlink_trap_group_item *group_item; + bool modified = false; + int err; + + if (list_empty(&devlink->trap_group_list)) + return -EOPNOTSUPP; + + group_item = devlink_trap_group_item_get_from_info(devlink, info); + if (!group_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group"); + return -ENOENT; + } + + err = devlink_trap_group_action_set(devlink, group_item, info, + &modified); + if (err) + return err; + + err = devlink_trap_group_set(devlink, group_item, info); + if (err) + goto err_trap_group_set; + + return 0; + +err_trap_group_set: + if (modified) + NL_SET_ERR_MSG_MOD(extack, "Trap group set failed, but some changes were committed already"); + return err; +} + +static struct devlink_trap_policer_item * +devlink_trap_policer_item_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + u32 id; + + if (!info->attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) + return NULL; + id = nla_get_u32(info->attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); + + return devlink_trap_policer_item_lookup(devlink, id); +} + +static int +devlink_trap_policer_stats_put(struct sk_buff *msg, struct devlink *devlink, + const struct devlink_trap_policer *policer) +{ + struct nlattr *attr; + u64 drops; + int err; + + if (!devlink->ops->trap_policer_counter_get) + return 0; + + err = devlink->ops->trap_policer_counter_get(devlink, policer, &drops); + if (err) + return err; + + attr = nla_nest_start(msg, DEVLINK_ATTR_STATS); + if (!attr) + return -EMSGSIZE; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops, + DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + nla_nest_end(msg, attr); + + return 0; + +nla_put_failure: + nla_nest_cancel(msg, attr); + return -EMSGSIZE; +} + +static int +devlink_nl_trap_policer_fill(struct sk_buff *msg, struct devlink *devlink, + const struct devlink_trap_policer_item *policer_item, + enum devlink_command cmd, u32 portid, u32 seq, + int flags) +{ + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (nla_put_u32(msg, DEVLINK_ATTR_TRAP_POLICER_ID, + policer_item->policer->id)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_TRAP_POLICER_RATE, + policer_item->rate, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_TRAP_POLICER_BURST, + policer_item->burst, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + + err = devlink_trap_policer_stats_put(msg, devlink, + policer_item->policer); + if (err) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int devlink_nl_cmd_trap_policer_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_trap_policer_item *policer_item; + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + if (list_empty(&devlink->trap_policer_list)) + return -EOPNOTSUPP; + + policer_item = devlink_trap_policer_item_get_from_info(devlink, info); + if (!policer_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + return -ENOENT; + } + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_trap_policer_fill(msg, devlink, policer_item, + DEVLINK_CMD_TRAP_POLICER_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) + goto err_trap_policer_fill; + + return genlmsg_reply(msg, info); + +err_trap_policer_fill: + nlmsg_free(msg); + return err; +} + +static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) +{ + enum devlink_command cmd = DEVLINK_CMD_TRAP_POLICER_NEW; + struct devlink_trap_policer_item *policer_item; + u32 portid = NETLINK_CB(cb->skb).portid; + struct devlink *devlink; + int start = cb->args[0]; + unsigned long index; + int idx = 0; + int err; + + devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devl_lock(devlink); + list_for_each_entry(policer_item, &devlink->trap_policer_list, + list) { + if (idx < start) { + idx++; + continue; + } + err = devlink_nl_trap_policer_fill(msg, devlink, + policer_item, cmd, + portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + devl_unlock(devlink); + devlink_put(devlink); + goto out; + } + idx++; + } + devl_unlock(devlink); + devlink_put(devlink); + } +out: + cb->args[0] = idx; + return msg->len; +} + +static int +devlink_trap_policer_set(struct devlink *devlink, + struct devlink_trap_policer_item *policer_item, + struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct nlattr **attrs = info->attrs; + u64 rate, burst; + int err; + + rate = policer_item->rate; + burst = policer_item->burst; + + if (attrs[DEVLINK_ATTR_TRAP_POLICER_RATE]) + rate = nla_get_u64(attrs[DEVLINK_ATTR_TRAP_POLICER_RATE]); + + if (attrs[DEVLINK_ATTR_TRAP_POLICER_BURST]) + burst = nla_get_u64(attrs[DEVLINK_ATTR_TRAP_POLICER_BURST]); + + if (rate < policer_item->policer->min_rate) { + NL_SET_ERR_MSG_MOD(extack, "Policer rate lower than limit"); + return -EINVAL; + } + + if (rate > policer_item->policer->max_rate) { + NL_SET_ERR_MSG_MOD(extack, "Policer rate higher than limit"); + return -EINVAL; + } + + if (burst < policer_item->policer->min_burst) { + NL_SET_ERR_MSG_MOD(extack, "Policer burst size lower than limit"); + return -EINVAL; + } + + if (burst > policer_item->policer->max_burst) { + NL_SET_ERR_MSG_MOD(extack, "Policer burst size higher than limit"); + return -EINVAL; + } + + err = devlink->ops->trap_policer_set(devlink, policer_item->policer, + rate, burst, info->extack); + if (err) + return err; + + policer_item->rate = rate; + policer_item->burst = burst; + + return 0; +} + +static int devlink_nl_cmd_trap_policer_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink_trap_policer_item *policer_item; + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + + if (list_empty(&devlink->trap_policer_list)) + return -EOPNOTSUPP; + + if (!devlink->ops->trap_policer_set) + return -EOPNOTSUPP; + + policer_item = devlink_trap_policer_item_get_from_info(devlink, info); + if (!policer_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + return -ENOENT; + } + + return devlink_trap_policer_set(devlink, policer_item, info); +} + +static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { + [DEVLINK_ATTR_UNSPEC] = { .strict_start_type = + DEVLINK_ATTR_TRAP_POLICER_ID }, + [DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32 }, + [DEVLINK_ATTR_PORT_TYPE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_PORT_TYPE_AUTO, + DEVLINK_PORT_TYPE_IB), + [DEVLINK_ATTR_PORT_SPLIT_COUNT] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_INDEX] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_POOL_INDEX] = { .type = NLA_U16 }, + [DEVLINK_ATTR_SB_POOL_TYPE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_SB_POOL_SIZE] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 }, + [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_ESWITCH_MODE_LEGACY, + DEVLINK_ESWITCH_MODE_SWITCHDEV), + [DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 }, + [DEVLINK_ATTR_RESOURCE_ID] = { .type = NLA_U64}, + [DEVLINK_ATTR_RESOURCE_SIZE] = { .type = NLA_U64}, + [DEVLINK_ATTR_PARAM_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_REGION_CHUNK_ADDR] = { .type = NLA_U64 }, + [DEVLINK_ATTR_REGION_CHUNK_LEN] = { .type = NLA_U64 }, + [DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 }, + [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 }, + [DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK] = + NLA_POLICY_BITFIELD32(DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS), + [DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 }, + [DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 }, + [DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 }, + [DEVLINK_ATTR_TRAP_POLICER_ID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 }, + [DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 }, + [DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED }, + [DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + DEVLINK_RELOAD_ACTION_MAX), + [DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK), + [DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 }, + [DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 }, + [DEVLINK_ATTR_RATE_TYPE] = { .type = NLA_U16 }, + [DEVLINK_ATTR_RATE_TX_SHARE] = { .type = NLA_U64 }, + [DEVLINK_ATTR_RATE_TX_MAX] = { .type = NLA_U64 }, + [DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 }, + [DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED }, + [DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 }, + [DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 }, + [DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG }, +}; + +static const struct genl_small_ops devlink_nl_ops[] = { + { + .cmd = DEVLINK_CMD_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_get_doit, + .dumpit = devlink_nl_cmd_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_PORT_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_port_get_doit, + .dumpit = devlink_nl_cmd_port_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_PORT_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_port_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + }, + { + .cmd = DEVLINK_CMD_RATE_GET, + .doit = devlink_nl_cmd_rate_get_doit, + .dumpit = devlink_nl_cmd_rate_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_RATE, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_RATE_SET, + .doit = devlink_nl_cmd_rate_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_RATE, + }, + { + .cmd = DEVLINK_CMD_RATE_NEW, + .doit = devlink_nl_cmd_rate_new_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_RATE_DEL, + .doit = devlink_nl_cmd_rate_del_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_RATE_NODE, + }, + { + .cmd = DEVLINK_CMD_PORT_SPLIT, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_port_split_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + }, + { + .cmd = DEVLINK_CMD_PORT_UNSPLIT, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_port_unsplit_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + }, + { + .cmd = DEVLINK_CMD_PORT_NEW, + .doit = devlink_nl_cmd_port_new_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_PORT_DEL, + .doit = devlink_nl_cmd_port_del_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_LINECARD_GET, + .doit = devlink_nl_cmd_linecard_get_doit, + .dumpit = devlink_nl_cmd_linecard_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_LINECARD_SET, + .doit = devlink_nl_cmd_linecard_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD, + }, + { + .cmd = DEVLINK_CMD_SB_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_get_doit, + .dumpit = devlink_nl_cmd_sb_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_SB_POOL_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_pool_get_doit, + .dumpit = devlink_nl_cmd_sb_pool_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_SB_POOL_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_pool_set_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_SB_PORT_POOL_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_port_pool_get_doit, + .dumpit = devlink_nl_cmd_sb_port_pool_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_SB_PORT_POOL_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_port_pool_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + }, + { + .cmd = DEVLINK_CMD_SB_TC_POOL_BIND_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_tc_pool_bind_get_doit, + .dumpit = devlink_nl_cmd_sb_tc_pool_bind_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_SB_TC_POOL_BIND_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_tc_pool_bind_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + }, + { + .cmd = DEVLINK_CMD_SB_OCC_SNAPSHOT, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_occ_snapshot_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_SB_OCC_MAX_CLEAR, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_sb_occ_max_clear_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_ESWITCH_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_eswitch_get_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_ESWITCH_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_eswitch_set_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_DPIPE_TABLE_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_dpipe_table_get, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_DPIPE_ENTRIES_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_dpipe_entries_get, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_DPIPE_HEADERS_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_dpipe_headers_get, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_dpipe_table_counters_set, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_RESOURCE_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_resource_set, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_RESOURCE_DUMP, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_resource_dump, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_RELOAD, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_reload, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_PARAM_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_param_get_doit, + .dumpit = devlink_nl_cmd_param_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_PARAM_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_param_set_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_PORT_PARAM_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_port_param_get_doit, + .dumpit = devlink_nl_cmd_port_param_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_PORT_PARAM_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_port_param_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, + }, + { + .cmd = DEVLINK_CMD_REGION_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_region_get_doit, + .dumpit = devlink_nl_cmd_region_get_dumpit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_REGION_NEW, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_region_new, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_REGION_DEL, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_region_del, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_REGION_READ, + .validate = GENL_DONT_VALIDATE_STRICT | + GENL_DONT_VALIDATE_DUMP_STRICT, + .dumpit = devlink_nl_cmd_region_read_dumpit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_INFO_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_info_get_doit, + .dumpit = devlink_nl_cmd_info_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_health_reporter_get_doit, + .dumpit = devlink_nl_cmd_health_reporter_get_dumpit, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_SET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_health_reporter_set_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_health_reporter_recover_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_health_reporter_diagnose_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET, + .validate = GENL_DONT_VALIDATE_STRICT | + GENL_DONT_VALIDATE_DUMP_STRICT, + .dumpit = devlink_nl_cmd_health_reporter_dump_get_dumpit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_health_reporter_dump_clear_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, + }, + { + .cmd = DEVLINK_CMD_HEALTH_REPORTER_TEST, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_health_reporter_test_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, + }, + { + .cmd = DEVLINK_CMD_FLASH_UPDATE, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = devlink_nl_cmd_flash_update, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_TRAP_GET, + .doit = devlink_nl_cmd_trap_get_doit, + .dumpit = devlink_nl_cmd_trap_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_TRAP_SET, + .doit = devlink_nl_cmd_trap_set_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_TRAP_GROUP_GET, + .doit = devlink_nl_cmd_trap_group_get_doit, + .dumpit = devlink_nl_cmd_trap_group_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_TRAP_GROUP_SET, + .doit = devlink_nl_cmd_trap_group_set_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_TRAP_POLICER_GET, + .doit = devlink_nl_cmd_trap_policer_get_doit, + .dumpit = devlink_nl_cmd_trap_policer_get_dumpit, + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_TRAP_POLICER_SET, + .doit = devlink_nl_cmd_trap_policer_set_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = DEVLINK_CMD_SELFTESTS_GET, + .doit = devlink_nl_cmd_selftests_get_doit, + .dumpit = devlink_nl_cmd_selftests_get_dumpit + /* can be retrieved by unprivileged users */ + }, + { + .cmd = DEVLINK_CMD_SELFTESTS_RUN, + .doit = devlink_nl_cmd_selftests_run, + .flags = GENL_ADMIN_PERM, + }, +}; + +static struct genl_family devlink_nl_family __ro_after_init = { + .name = DEVLINK_GENL_NAME, + .version = DEVLINK_GENL_VERSION, + .maxattr = DEVLINK_ATTR_MAX, + .policy = devlink_nl_policy, + .netnsok = true, + .parallel_ops = true, + .pre_doit = devlink_nl_pre_doit, + .post_doit = devlink_nl_post_doit, + .module = THIS_MODULE, + .small_ops = devlink_nl_ops, + .n_small_ops = ARRAY_SIZE(devlink_nl_ops), + .resv_start_op = DEVLINK_CMD_SELFTESTS_RUN + 1, + .mcgrps = devlink_nl_mcgrps, + .n_mcgrps = ARRAY_SIZE(devlink_nl_mcgrps), +}; + +static bool devlink_reload_actions_valid(const struct devlink_ops *ops) +{ + const struct devlink_reload_combination *comb; + int i; + + if (!devlink_reload_supported(ops)) { + if (WARN_ON(ops->reload_actions)) + return false; + return true; + } + + if (WARN_ON(!ops->reload_actions || + ops->reload_actions & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || + ops->reload_actions >= BIT(__DEVLINK_RELOAD_ACTION_MAX))) + return false; + + if (WARN_ON(ops->reload_limits & BIT(DEVLINK_RELOAD_LIMIT_UNSPEC) || + ops->reload_limits >= BIT(__DEVLINK_RELOAD_LIMIT_MAX))) + return false; + + for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) { + comb = &devlink_reload_invalid_combinations[i]; + if (ops->reload_actions == BIT(comb->action) && + ops->reload_limits == BIT(comb->limit)) + return false; + } + return true; +} + +/** + * devlink_set_features - Set devlink supported features + * + * @devlink: devlink + * @features: devlink support features + * + * This interface allows us to set reload ops separatelly from + * the devlink_alloc. + */ +void devlink_set_features(struct devlink *devlink, u64 features) +{ + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + WARN_ON(features & DEVLINK_F_RELOAD && + !devlink_reload_supported(devlink->ops)); + devlink->features = features; +} +EXPORT_SYMBOL_GPL(devlink_set_features); + +static int devlink_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr); + +/** + * devlink_alloc_ns - Allocate new devlink instance resources + * in specific namespace + * + * @ops: ops + * @priv_size: size of user private data + * @net: net namespace + * @dev: parent device + * + * Allocate new devlink instance resources, including devlink index + * and name. + */ +struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, + size_t priv_size, struct net *net, + struct device *dev) +{ + struct devlink *devlink; + static u32 last_id; + int ret; + + WARN_ON(!ops || !dev); + if (!devlink_reload_actions_valid(ops)) + return NULL; + + devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL); + if (!devlink) + return NULL; + + ret = xa_alloc_cyclic(&devlinks, &devlink->index, devlink, xa_limit_31b, + &last_id, GFP_KERNEL); + if (ret < 0) + goto err_xa_alloc; + + devlink->netdevice_nb.notifier_call = devlink_netdevice_event; + ret = register_netdevice_notifier_net(net, &devlink->netdevice_nb); + if (ret) + goto err_register_netdevice_notifier; + + devlink->dev = dev; + devlink->ops = ops; + xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC); + xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC); + write_pnet(&devlink->_net, net); + INIT_LIST_HEAD(&devlink->rate_list); + INIT_LIST_HEAD(&devlink->linecard_list); + INIT_LIST_HEAD(&devlink->sb_list); + INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list); + INIT_LIST_HEAD(&devlink->resource_list); + INIT_LIST_HEAD(&devlink->param_list); + INIT_LIST_HEAD(&devlink->region_list); + INIT_LIST_HEAD(&devlink->reporter_list); + INIT_LIST_HEAD(&devlink->trap_list); + INIT_LIST_HEAD(&devlink->trap_group_list); + INIT_LIST_HEAD(&devlink->trap_policer_list); + lockdep_register_key(&devlink->lock_key); + mutex_init(&devlink->lock); + lockdep_set_class(&devlink->lock, &devlink->lock_key); + mutex_init(&devlink->reporters_lock); + mutex_init(&devlink->linecards_lock); + refcount_set(&devlink->refcount, 1); + init_completion(&devlink->comp); + + return devlink; + +err_register_netdevice_notifier: + xa_erase(&devlinks, devlink->index); +err_xa_alloc: + kfree(devlink); + return NULL; +} +EXPORT_SYMBOL_GPL(devlink_alloc_ns); + +static void +devlink_trap_policer_notify(struct devlink *devlink, + const struct devlink_trap_policer_item *policer_item, + enum devlink_command cmd); +static void +devlink_trap_group_notify(struct devlink *devlink, + const struct devlink_trap_group_item *group_item, + enum devlink_command cmd); +static void devlink_trap_notify(struct devlink *devlink, + const struct devlink_trap_item *trap_item, + enum devlink_command cmd); + +static void devlink_notify_register(struct devlink *devlink) +{ + struct devlink_trap_policer_item *policer_item; + struct devlink_trap_group_item *group_item; + struct devlink_param_item *param_item; + struct devlink_trap_item *trap_item; + struct devlink_port *devlink_port; + struct devlink_linecard *linecard; + struct devlink_rate *rate_node; + struct devlink_region *region; + unsigned long port_index; + + devlink_notify(devlink, DEVLINK_CMD_NEW); + list_for_each_entry(linecard, &devlink->linecard_list, list) + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + + xa_for_each(&devlink->ports, port_index, devlink_port) + devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); + + list_for_each_entry(policer_item, &devlink->trap_policer_list, list) + devlink_trap_policer_notify(devlink, policer_item, + DEVLINK_CMD_TRAP_POLICER_NEW); + + list_for_each_entry(group_item, &devlink->trap_group_list, list) + devlink_trap_group_notify(devlink, group_item, + DEVLINK_CMD_TRAP_GROUP_NEW); + + list_for_each_entry(trap_item, &devlink->trap_list, list) + devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_NEW); + + list_for_each_entry(rate_node, &devlink->rate_list, list) + devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); + + list_for_each_entry(region, &devlink->region_list, list) + devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); + + list_for_each_entry(param_item, &devlink->param_list, list) + devlink_param_notify(devlink, 0, param_item, + DEVLINK_CMD_PARAM_NEW); +} + +static void devlink_notify_unregister(struct devlink *devlink) +{ + struct devlink_trap_policer_item *policer_item; + struct devlink_trap_group_item *group_item; + struct devlink_param_item *param_item; + struct devlink_trap_item *trap_item; + struct devlink_port *devlink_port; + struct devlink_rate *rate_node; + struct devlink_region *region; + unsigned long port_index; + + list_for_each_entry_reverse(param_item, &devlink->param_list, list) + devlink_param_notify(devlink, 0, param_item, + DEVLINK_CMD_PARAM_DEL); + + list_for_each_entry_reverse(region, &devlink->region_list, list) + devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_DEL); + + list_for_each_entry_reverse(rate_node, &devlink->rate_list, list) + devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL); + + list_for_each_entry_reverse(trap_item, &devlink->trap_list, list) + devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_DEL); + + list_for_each_entry_reverse(group_item, &devlink->trap_group_list, list) + devlink_trap_group_notify(devlink, group_item, + DEVLINK_CMD_TRAP_GROUP_DEL); + list_for_each_entry_reverse(policer_item, &devlink->trap_policer_list, + list) + devlink_trap_policer_notify(devlink, policer_item, + DEVLINK_CMD_TRAP_POLICER_DEL); + + xa_for_each(&devlink->ports, port_index, devlink_port) + devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); + devlink_notify(devlink, DEVLINK_CMD_DEL); +} + +/** + * devlink_register - Register devlink instance + * + * @devlink: devlink + */ +void devlink_register(struct devlink *devlink) +{ + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + /* Make sure that we are in .probe() routine */ + + xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); + devlink_notify_register(devlink); +} +EXPORT_SYMBOL_GPL(devlink_register); + +/** + * devlink_unregister - Unregister devlink instance + * + * @devlink: devlink + */ +void devlink_unregister(struct devlink *devlink) +{ + ASSERT_DEVLINK_REGISTERED(devlink); + /* Make sure that we are in .remove() routine */ + + xa_set_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); + devlink_put(devlink); + wait_for_completion(&devlink->comp); + + devlink_notify_unregister(devlink); + xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); + xa_clear_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); +} +EXPORT_SYMBOL_GPL(devlink_unregister); + +/** + * devlink_free - Free devlink instance resources + * + * @devlink: devlink + */ +void devlink_free(struct devlink *devlink) +{ + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + mutex_destroy(&devlink->linecards_lock); + mutex_destroy(&devlink->reporters_lock); + mutex_destroy(&devlink->lock); + lockdep_unregister_key(&devlink->lock_key); + WARN_ON(!list_empty(&devlink->trap_policer_list)); + WARN_ON(!list_empty(&devlink->trap_group_list)); + WARN_ON(!list_empty(&devlink->trap_list)); + WARN_ON(!list_empty(&devlink->reporter_list)); + WARN_ON(!list_empty(&devlink->region_list)); + WARN_ON(!list_empty(&devlink->param_list)); + WARN_ON(!list_empty(&devlink->resource_list)); + WARN_ON(!list_empty(&devlink->dpipe_table_list)); + WARN_ON(!list_empty(&devlink->sb_list)); + WARN_ON(!list_empty(&devlink->rate_list)); + WARN_ON(!list_empty(&devlink->linecard_list)); + WARN_ON(!xa_empty(&devlink->ports)); + + xa_destroy(&devlink->snapshot_ids); + xa_destroy(&devlink->ports); + + WARN_ON_ONCE(unregister_netdevice_notifier_net(devlink_net(devlink), + &devlink->netdevice_nb)); + + xa_erase(&devlinks, devlink->index); + + kfree(devlink); +} +EXPORT_SYMBOL_GPL(devlink_free); + +static void devlink_port_type_warn(struct work_struct *work) +{ + WARN(true, "Type was not set for devlink port."); +} + +static bool devlink_port_type_should_warn(struct devlink_port *devlink_port) +{ + /* Ignore CPU and DSA flavours. */ + return devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_CPU && + devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_DSA && + devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_UNUSED; +} + +#define DEVLINK_PORT_TYPE_WARN_TIMEOUT (HZ * 3600) + +static void devlink_port_type_warn_schedule(struct devlink_port *devlink_port) +{ + if (!devlink_port_type_should_warn(devlink_port)) + return; + /* Schedule a work to WARN in case driver does not set port + * type within timeout. + */ + schedule_delayed_work(&devlink_port->type_warn_dw, + DEVLINK_PORT_TYPE_WARN_TIMEOUT); +} + +static void devlink_port_type_warn_cancel(struct devlink_port *devlink_port) +{ + if (!devlink_port_type_should_warn(devlink_port)) + return; + cancel_delayed_work_sync(&devlink_port->type_warn_dw); +} + +/** + * devlink_port_init() - Init devlink port + * + * @devlink: devlink + * @devlink_port: devlink port + * + * Initialize essencial stuff that is needed for functions + * that may be called before devlink port registration. + * Call to this function is optional and not needed + * in case the driver does not use such functions. + */ +void devlink_port_init(struct devlink *devlink, + struct devlink_port *devlink_port) +{ + if (devlink_port->initialized) + return; + devlink_port->devlink = devlink; + INIT_LIST_HEAD(&devlink_port->region_list); + devlink_port->initialized = true; +} +EXPORT_SYMBOL_GPL(devlink_port_init); + +/** + * devlink_port_fini() - Deinitialize devlink port + * + * @devlink_port: devlink port + * + * Deinitialize essencial stuff that is in use for functions + * that may be called after devlink port unregistration. + * Call to this function is optional and not needed + * in case the driver does not use such functions. + */ +void devlink_port_fini(struct devlink_port *devlink_port) +{ + WARN_ON(!list_empty(&devlink_port->region_list)); +} +EXPORT_SYMBOL_GPL(devlink_port_fini); + +/** + * devl_port_register() - Register devlink port + * + * @devlink: devlink + * @devlink_port: devlink port + * @port_index: driver-specific numerical identifier of the port + * + * Register devlink port with provided port index. User can use + * any indexing, even hw-related one. devlink_port structure + * is convenient to be embedded inside user driver private structure. + * Note that the caller should take care of zeroing the devlink_port + * structure. + */ +int devl_port_register(struct devlink *devlink, + struct devlink_port *devlink_port, + unsigned int port_index) +{ + int err; + + devl_assert_locked(devlink); + + ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + + devlink_port_init(devlink, devlink_port); + devlink_port->registered = true; + devlink_port->index = port_index; + spin_lock_init(&devlink_port->type_lock); + INIT_LIST_HEAD(&devlink_port->reporter_list); + mutex_init(&devlink_port->reporters_lock); + err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL); + if (err) { + mutex_destroy(&devlink_port->reporters_lock); + return err; + } + + INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn); + devlink_port_type_warn_schedule(devlink_port); + devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); + return 0; +} +EXPORT_SYMBOL_GPL(devl_port_register); + +/** + * devlink_port_register - Register devlink port + * + * @devlink: devlink + * @devlink_port: devlink port + * @port_index: driver-specific numerical identifier of the port + * + * Register devlink port with provided port index. User can use + * any indexing, even hw-related one. devlink_port structure + * is convenient to be embedded inside user driver private structure. + * Note that the caller should take care of zeroing the devlink_port + * structure. + * + * Context: Takes and release devlink->lock . + */ +int devlink_port_register(struct devlink *devlink, + struct devlink_port *devlink_port, + unsigned int port_index) +{ + int err; + + devl_lock(devlink); + err = devl_port_register(devlink, devlink_port, port_index); + devl_unlock(devlink); + return err; +} +EXPORT_SYMBOL_GPL(devlink_port_register); + +/** + * devl_port_unregister() - Unregister devlink port + * + * @devlink_port: devlink port + */ +void devl_port_unregister(struct devlink_port *devlink_port) +{ + lockdep_assert_held(&devlink_port->devlink->lock); + WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET); + + devlink_port_type_warn_cancel(devlink_port); + devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); + xa_erase(&devlink_port->devlink->ports, devlink_port->index); + WARN_ON(!list_empty(&devlink_port->reporter_list)); + mutex_destroy(&devlink_port->reporters_lock); + devlink_port->registered = false; +} +EXPORT_SYMBOL_GPL(devl_port_unregister); + +/** + * devlink_port_unregister - Unregister devlink port + * + * @devlink_port: devlink port + * + * Context: Takes and release devlink->lock . + */ +void devlink_port_unregister(struct devlink_port *devlink_port) +{ + struct devlink *devlink = devlink_port->devlink; + + devl_lock(devlink); + devl_port_unregister(devlink_port); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_port_unregister); + +static void devlink_port_type_netdev_checks(struct devlink_port *devlink_port, + struct net_device *netdev) +{ + const struct net_device_ops *ops = netdev->netdev_ops; + + /* If driver registers devlink port, it should set devlink port + * attributes accordingly so the compat functions are called + * and the original ops are not used. + */ + if (ops->ndo_get_phys_port_name) { + /* Some drivers use the same set of ndos for netdevs + * that have devlink_port registered and also for + * those who don't. Make sure that ndo_get_phys_port_name + * returns -EOPNOTSUPP here in case it is defined. + * Warn if not. + */ + char name[IFNAMSIZ]; + int err; + + err = ops->ndo_get_phys_port_name(netdev, name, sizeof(name)); + WARN_ON(err != -EOPNOTSUPP); + } + if (ops->ndo_get_port_parent_id) { + /* Some drivers use the same set of ndos for netdevs + * that have devlink_port registered and also for + * those who don't. Make sure that ndo_get_port_parent_id + * returns -EOPNOTSUPP here in case it is defined. + * Warn if not. + */ + struct netdev_phys_item_id ppid; + int err; + + err = ops->ndo_get_port_parent_id(netdev, &ppid); + WARN_ON(err != -EOPNOTSUPP); + } +} + +static void __devlink_port_type_set(struct devlink_port *devlink_port, + enum devlink_port_type type, + void *type_dev) +{ + struct net_device *netdev = type_dev; + + ASSERT_DEVLINK_PORT_REGISTERED(devlink_port); + + if (type == DEVLINK_PORT_TYPE_NOTSET) { + devlink_port_type_warn_schedule(devlink_port); + } else { + devlink_port_type_warn_cancel(devlink_port); + if (type == DEVLINK_PORT_TYPE_ETH && netdev) + devlink_port_type_netdev_checks(devlink_port, netdev); + } + + spin_lock_bh(&devlink_port->type_lock); + devlink_port->type = type; + switch (type) { + case DEVLINK_PORT_TYPE_ETH: + devlink_port->type_eth.netdev = netdev; + if (netdev) { + ASSERT_RTNL(); + devlink_port->type_eth.ifindex = netdev->ifindex; + BUILD_BUG_ON(sizeof(devlink_port->type_eth.ifname) != + sizeof(netdev->name)); + strcpy(devlink_port->type_eth.ifname, netdev->name); + } + break; + case DEVLINK_PORT_TYPE_IB: + devlink_port->type_ib.ibdev = type_dev; + break; + default: + break; + } + spin_unlock_bh(&devlink_port->type_lock); + devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); +} + +/** + * devlink_port_type_eth_set - Set port type to Ethernet + * + * @devlink_port: devlink port + * + * If driver is calling this, most likely it is doing something wrong. + */ +void devlink_port_type_eth_set(struct devlink_port *devlink_port) +{ + dev_warn(devlink_port->devlink->dev, + "devlink port type for port %d set to Ethernet without a software interface reference, device type not supported by the kernel?\n", + devlink_port->index); + __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, NULL); +} +EXPORT_SYMBOL_GPL(devlink_port_type_eth_set); + +/** + * devlink_port_type_ib_set - Set port type to InfiniBand + * + * @devlink_port: devlink port + * @ibdev: related IB device + */ +void devlink_port_type_ib_set(struct devlink_port *devlink_port, + struct ib_device *ibdev) +{ + __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_IB, ibdev); +} +EXPORT_SYMBOL_GPL(devlink_port_type_ib_set); + +/** + * devlink_port_type_clear - Clear port type + * + * @devlink_port: devlink port + * + * If driver is calling this for clearing Ethernet type, most likely + * it is doing something wrong. + */ +void devlink_port_type_clear(struct devlink_port *devlink_port) +{ + if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) + dev_warn(devlink_port->devlink->dev, + "devlink port type for port %d cleared without a software interface reference, device type not supported by the kernel?\n", + devlink_port->index); + __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, NULL); +} +EXPORT_SYMBOL_GPL(devlink_port_type_clear); + +static int devlink_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr) +{ + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); + struct devlink_port *devlink_port = netdev->devlink_port; + struct devlink *devlink; + + devlink = container_of(nb, struct devlink, netdevice_nb); + + if (!devlink_port || devlink_port->devlink != devlink) + return NOTIFY_OK; + + switch (event) { + case NETDEV_POST_INIT: + /* Set the type but not netdev pointer. It is going to be set + * later on by NETDEV_REGISTER event. Happens once during + * netdevice register + */ + __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_ETH, + NULL); + break; + case NETDEV_REGISTER: + case NETDEV_CHANGENAME: + /* Set the netdev on top of previously set type. Note this + * event happens also during net namespace change so here + * we take into account netdev pointer appearing in this + * namespace. + */ + __devlink_port_type_set(devlink_port, devlink_port->type, + netdev); + break; + case NETDEV_UNREGISTER: + /* Clear netdev pointer, but not the type. This event happens + * also during net namespace change so we need to clear + * pointer to netdev that is going to another net namespace. + */ + __devlink_port_type_set(devlink_port, devlink_port->type, + NULL); + break; + case NETDEV_PRE_UNINIT: + /* Clear the type and the netdev pointer. Happens one during + * netdevice unregister. + */ + __devlink_port_type_set(devlink_port, DEVLINK_PORT_TYPE_NOTSET, + NULL); + break; + } + + return NOTIFY_OK; +} + +static int __devlink_port_attrs_set(struct devlink_port *devlink_port, + enum devlink_port_flavour flavour) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + + devlink_port->attrs_set = true; + attrs->flavour = flavour; + if (attrs->switch_id.id_len) { + devlink_port->switch_port = true; + if (WARN_ON(attrs->switch_id.id_len > MAX_PHYS_ITEM_ID_LEN)) + attrs->switch_id.id_len = MAX_PHYS_ITEM_ID_LEN; + } else { + devlink_port->switch_port = false; + } + return 0; +} + +/** + * devlink_port_attrs_set - Set port attributes + * + * @devlink_port: devlink port + * @attrs: devlink port attrs + */ +void devlink_port_attrs_set(struct devlink_port *devlink_port, + struct devlink_port_attrs *attrs) +{ + int ret; + + ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + + devlink_port->attrs = *attrs; + ret = __devlink_port_attrs_set(devlink_port, attrs->flavour); + if (ret) + return; + WARN_ON(attrs->splittable && attrs->split); +} +EXPORT_SYMBOL_GPL(devlink_port_attrs_set); + +/** + * devlink_port_attrs_pci_pf_set - Set PCI PF port attributes + * + * @devlink_port: devlink port + * @controller: associated controller number for the devlink port instance + * @pf: associated PF for the devlink port instance + * @external: indicates if the port is for an external controller + */ +void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller, + u16 pf, bool external) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + int ret; + + ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + + ret = __devlink_port_attrs_set(devlink_port, + DEVLINK_PORT_FLAVOUR_PCI_PF); + if (ret) + return; + attrs->pci_pf.controller = controller; + attrs->pci_pf.pf = pf; + attrs->pci_pf.external = external; +} +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set); + +/** + * devlink_port_attrs_pci_vf_set - Set PCI VF port attributes + * + * @devlink_port: devlink port + * @controller: associated controller number for the devlink port instance + * @pf: associated PF for the devlink port instance + * @vf: associated VF of a PF for the devlink port instance + * @external: indicates if the port is for an external controller + */ +void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller, + u16 pf, u16 vf, bool external) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + int ret; + + ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + + ret = __devlink_port_attrs_set(devlink_port, + DEVLINK_PORT_FLAVOUR_PCI_VF); + if (ret) + return; + attrs->pci_vf.controller = controller; + attrs->pci_vf.pf = pf; + attrs->pci_vf.vf = vf; + attrs->pci_vf.external = external; +} +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set); + +/** + * devlink_port_attrs_pci_sf_set - Set PCI SF port attributes + * + * @devlink_port: devlink port + * @controller: associated controller number for the devlink port instance + * @pf: associated PF for the devlink port instance + * @sf: associated SF of a PF for the devlink port instance + * @external: indicates if the port is for an external controller + */ +void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller, + u16 pf, u32 sf, bool external) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + int ret; + + ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + + ret = __devlink_port_attrs_set(devlink_port, + DEVLINK_PORT_FLAVOUR_PCI_SF); + if (ret) + return; + attrs->pci_sf.controller = controller; + attrs->pci_sf.pf = pf; + attrs->pci_sf.sf = sf; + attrs->pci_sf.external = external; +} +EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set); + +/** + * devl_rate_node_create - create devlink rate node + * @devlink: devlink instance + * @priv: driver private data + * @node_name: name of the resulting node + * @parent: parent devlink_rate struct + * + * Create devlink rate object of type node + */ +struct devlink_rate * +devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, + struct devlink_rate *parent) +{ + struct devlink_rate *rate_node; + + rate_node = devlink_rate_node_get_by_name(devlink, node_name); + if (!IS_ERR(rate_node)) + return ERR_PTR(-EEXIST); + + rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL); + if (!rate_node) + return ERR_PTR(-ENOMEM); + + if (parent) { + rate_node->parent = parent; + refcount_inc(&rate_node->parent->refcnt); + } + + rate_node->type = DEVLINK_RATE_TYPE_NODE; + rate_node->devlink = devlink; + rate_node->priv = priv; + + rate_node->name = kstrdup(node_name, GFP_KERNEL); + if (!rate_node->name) { + kfree(rate_node); + return ERR_PTR(-ENOMEM); + } + + refcount_set(&rate_node->refcnt, 1); + list_add(&rate_node->list, &devlink->rate_list); + devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW); + return rate_node; +} +EXPORT_SYMBOL_GPL(devl_rate_node_create); + +/** + * devl_rate_leaf_create - create devlink rate leaf + * @devlink_port: devlink port object to create rate object on + * @priv: driver private data + * @parent: parent devlink_rate struct + * + * Create devlink rate object of type leaf on provided @devlink_port. + */ +int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv, + struct devlink_rate *parent) +{ + struct devlink *devlink = devlink_port->devlink; + struct devlink_rate *devlink_rate; + + devl_assert_locked(devlink_port->devlink); + + if (WARN_ON(devlink_port->devlink_rate)) + return -EBUSY; + + devlink_rate = kzalloc(sizeof(*devlink_rate), GFP_KERNEL); + if (!devlink_rate) + return -ENOMEM; + + if (parent) { + devlink_rate->parent = parent; + refcount_inc(&devlink_rate->parent->refcnt); + } + + devlink_rate->type = DEVLINK_RATE_TYPE_LEAF; + devlink_rate->devlink = devlink; + devlink_rate->devlink_port = devlink_port; + devlink_rate->priv = priv; + list_add_tail(&devlink_rate->list, &devlink->rate_list); + devlink_port->devlink_rate = devlink_rate; + devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW); + + return 0; +} +EXPORT_SYMBOL_GPL(devl_rate_leaf_create); + +/** + * devl_rate_leaf_destroy - destroy devlink rate leaf + * + * @devlink_port: devlink port linked to the rate object + * + * Destroy the devlink rate object of type leaf on provided @devlink_port. + */ +void devl_rate_leaf_destroy(struct devlink_port *devlink_port) +{ + struct devlink_rate *devlink_rate = devlink_port->devlink_rate; + + devl_assert_locked(devlink_port->devlink); + if (!devlink_rate) + return; + + devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_DEL); + if (devlink_rate->parent) + refcount_dec(&devlink_rate->parent->refcnt); + list_del(&devlink_rate->list); + devlink_port->devlink_rate = NULL; + kfree(devlink_rate); +} +EXPORT_SYMBOL_GPL(devl_rate_leaf_destroy); + +/** + * devl_rate_nodes_destroy - destroy all devlink rate nodes on device + * @devlink: devlink instance + * + * Unset parent for all rate objects and destroy all rate nodes + * on specified device. + */ +void devl_rate_nodes_destroy(struct devlink *devlink) +{ + static struct devlink_rate *devlink_rate, *tmp; + const struct devlink_ops *ops = devlink->ops; + + devl_assert_locked(devlink); + + list_for_each_entry(devlink_rate, &devlink->rate_list, list) { + if (!devlink_rate->parent) + continue; + + refcount_dec(&devlink_rate->parent->refcnt); + if (devlink_rate_is_leaf(devlink_rate)) + ops->rate_leaf_parent_set(devlink_rate, NULL, devlink_rate->priv, + NULL, NULL); + else if (devlink_rate_is_node(devlink_rate)) + ops->rate_node_parent_set(devlink_rate, NULL, devlink_rate->priv, + NULL, NULL); + } + list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_list, list) { + if (devlink_rate_is_node(devlink_rate)) { + ops->rate_node_del(devlink_rate, devlink_rate->priv, NULL); + list_del(&devlink_rate->list); + kfree(devlink_rate->name); + kfree(devlink_rate); + } + } +} +EXPORT_SYMBOL_GPL(devl_rate_nodes_destroy); + +/** + * devlink_port_linecard_set - Link port with a linecard + * + * @devlink_port: devlink port + * @linecard: devlink linecard + */ +void devlink_port_linecard_set(struct devlink_port *devlink_port, + struct devlink_linecard *linecard) +{ + ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port); + + devlink_port->linecard = linecard; +} +EXPORT_SYMBOL_GPL(devlink_port_linecard_set); + +static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port, + char *name, size_t len) +{ + struct devlink_port_attrs *attrs = &devlink_port->attrs; + int n = 0; + + if (!devlink_port->attrs_set) + return -EOPNOTSUPP; + + switch (attrs->flavour) { + case DEVLINK_PORT_FLAVOUR_PHYSICAL: + if (devlink_port->linecard) + n = snprintf(name, len, "l%u", + devlink_port->linecard->index); + if (n < len) + n += snprintf(name + n, len - n, "p%u", + attrs->phys.port_number); + if (n < len && attrs->split) + n += snprintf(name + n, len - n, "s%u", + attrs->phys.split_subport_number); + break; + case DEVLINK_PORT_FLAVOUR_CPU: + case DEVLINK_PORT_FLAVOUR_DSA: + case DEVLINK_PORT_FLAVOUR_UNUSED: + /* As CPU and DSA ports do not have a netdevice associated + * case should not ever happen. + */ + WARN_ON(1); + return -EINVAL; + case DEVLINK_PORT_FLAVOUR_PCI_PF: + if (attrs->pci_pf.external) { + n = snprintf(name, len, "c%u", attrs->pci_pf.controller); + if (n >= len) + return -EINVAL; + len -= n; + name += n; + } + n = snprintf(name, len, "pf%u", attrs->pci_pf.pf); + break; + case DEVLINK_PORT_FLAVOUR_PCI_VF: + if (attrs->pci_vf.external) { + n = snprintf(name, len, "c%u", attrs->pci_vf.controller); + if (n >= len) + return -EINVAL; + len -= n; + name += n; + } + n = snprintf(name, len, "pf%uvf%u", + attrs->pci_vf.pf, attrs->pci_vf.vf); + break; + case DEVLINK_PORT_FLAVOUR_PCI_SF: + if (attrs->pci_sf.external) { + n = snprintf(name, len, "c%u", attrs->pci_sf.controller); + if (n >= len) + return -EINVAL; + len -= n; + name += n; + } + n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, + attrs->pci_sf.sf); + break; + case DEVLINK_PORT_FLAVOUR_VIRTUAL: + return -EOPNOTSUPP; + } + + if (n >= len) + return -EINVAL; + + return 0; +} + +static int devlink_linecard_types_init(struct devlink_linecard *linecard) +{ + struct devlink_linecard_type *linecard_type; + unsigned int count; + int i; + + count = linecard->ops->types_count(linecard, linecard->priv); + linecard->types = kmalloc_array(count, sizeof(*linecard_type), + GFP_KERNEL); + if (!linecard->types) + return -ENOMEM; + linecard->types_count = count; + + for (i = 0; i < count; i++) { + linecard_type = &linecard->types[i]; + linecard->ops->types_get(linecard, linecard->priv, i, + &linecard_type->type, + &linecard_type->priv); + } + return 0; +} + +static void devlink_linecard_types_fini(struct devlink_linecard *linecard) +{ + kfree(linecard->types); +} + +/** + * devlink_linecard_create - Create devlink linecard + * + * @devlink: devlink + * @linecard_index: driver-specific numerical identifier of the linecard + * @ops: linecards ops + * @priv: user priv pointer + * + * Create devlink linecard instance with provided linecard index. + * Caller can use any indexing, even hw-related one. + * + * Return: Line card structure or an ERR_PTR() encoded error code. + */ +struct devlink_linecard * +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index, + const struct devlink_linecard_ops *ops, void *priv) +{ + struct devlink_linecard *linecard; + int err; + + if (WARN_ON(!ops || !ops->provision || !ops->unprovision || + !ops->types_count || !ops->types_get)) + return ERR_PTR(-EINVAL); + + mutex_lock(&devlink->linecards_lock); + if (devlink_linecard_index_exists(devlink, linecard_index)) { + mutex_unlock(&devlink->linecards_lock); + return ERR_PTR(-EEXIST); + } + + linecard = kzalloc(sizeof(*linecard), GFP_KERNEL); + if (!linecard) { + mutex_unlock(&devlink->linecards_lock); + return ERR_PTR(-ENOMEM); + } + + linecard->devlink = devlink; + linecard->index = linecard_index; + linecard->ops = ops; + linecard->priv = priv; + linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; + mutex_init(&linecard->state_lock); + + err = devlink_linecard_types_init(linecard); + if (err) { + mutex_destroy(&linecard->state_lock); + kfree(linecard); + mutex_unlock(&devlink->linecards_lock); + return ERR_PTR(err); + } + + list_add_tail(&linecard->list, &devlink->linecard_list); + refcount_set(&linecard->refcount, 1); + mutex_unlock(&devlink->linecards_lock); + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + return linecard; +} +EXPORT_SYMBOL_GPL(devlink_linecard_create); + +/** + * devlink_linecard_destroy - Destroy devlink linecard + * + * @linecard: devlink linecard + */ +void devlink_linecard_destroy(struct devlink_linecard *linecard) +{ + struct devlink *devlink = linecard->devlink; + + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); + mutex_lock(&devlink->linecards_lock); + list_del(&linecard->list); + devlink_linecard_types_fini(linecard); + mutex_unlock(&devlink->linecards_lock); + devlink_linecard_put(linecard); +} +EXPORT_SYMBOL_GPL(devlink_linecard_destroy); + +/** + * devlink_linecard_provision_set - Set provisioning on linecard + * + * @linecard: devlink linecard + * @type: linecard type + * + * This is either called directly from the provision() op call or + * as a result of the provision() op call asynchronously. + */ +void devlink_linecard_provision_set(struct devlink_linecard *linecard, + const char *type) +{ + mutex_lock(&linecard->state_lock); + WARN_ON(linecard->type && strcmp(linecard->type, type)); + linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED; + linecard->type = type; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); +} +EXPORT_SYMBOL_GPL(devlink_linecard_provision_set); + +/** + * devlink_linecard_provision_clear - Clear provisioning on linecard + * + * @linecard: devlink linecard + * + * This is either called directly from the unprovision() op call or + * as a result of the unprovision() op call asynchronously. + */ +void devlink_linecard_provision_clear(struct devlink_linecard *linecard) +{ + mutex_lock(&linecard->state_lock); + WARN_ON(linecard->nested_devlink); + linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED; + linecard->type = NULL; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); +} +EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear); + +/** + * devlink_linecard_provision_fail - Fail provisioning on linecard + * + * @linecard: devlink linecard + * + * This is either called directly from the provision() op call or + * as a result of the provision() op call asynchronously. + */ +void devlink_linecard_provision_fail(struct devlink_linecard *linecard) +{ + mutex_lock(&linecard->state_lock); + WARN_ON(linecard->nested_devlink); + linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING_FAILED; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); +} +EXPORT_SYMBOL_GPL(devlink_linecard_provision_fail); + +/** + * devlink_linecard_activate - Set linecard active + * + * @linecard: devlink linecard + */ +void devlink_linecard_activate(struct devlink_linecard *linecard) +{ + mutex_lock(&linecard->state_lock); + WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_PROVISIONED); + linecard->state = DEVLINK_LINECARD_STATE_ACTIVE; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); +} +EXPORT_SYMBOL_GPL(devlink_linecard_activate); + +/** + * devlink_linecard_deactivate - Set linecard inactive + * + * @linecard: devlink linecard + */ +void devlink_linecard_deactivate(struct devlink_linecard *linecard) +{ + mutex_lock(&linecard->state_lock); + switch (linecard->state) { + case DEVLINK_LINECARD_STATE_ACTIVE: + linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + break; + case DEVLINK_LINECARD_STATE_UNPROVISIONING: + /* Line card is being deactivated as part + * of unprovisioning flow. + */ + break; + default: + WARN_ON(1); + break; + } + mutex_unlock(&linecard->state_lock); +} +EXPORT_SYMBOL_GPL(devlink_linecard_deactivate); + +/** + * devlink_linecard_nested_dl_set - Attach/detach nested devlink + * instance to linecard. + * + * @linecard: devlink linecard + * @nested_devlink: devlink instance to attach or NULL to detach + */ +void devlink_linecard_nested_dl_set(struct devlink_linecard *linecard, + struct devlink *nested_devlink) +{ + mutex_lock(&linecard->state_lock); + linecard->nested_devlink = nested_devlink; + devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); + mutex_unlock(&linecard->state_lock); +} +EXPORT_SYMBOL_GPL(devlink_linecard_nested_dl_set); + +int devl_sb_register(struct devlink *devlink, unsigned int sb_index, + u32 size, u16 ingress_pools_count, + u16 egress_pools_count, u16 ingress_tc_count, + u16 egress_tc_count) +{ + struct devlink_sb *devlink_sb; + + lockdep_assert_held(&devlink->lock); + + if (devlink_sb_index_exists(devlink, sb_index)) + return -EEXIST; + + devlink_sb = kzalloc(sizeof(*devlink_sb), GFP_KERNEL); + if (!devlink_sb) + return -ENOMEM; + devlink_sb->index = sb_index; + devlink_sb->size = size; + devlink_sb->ingress_pools_count = ingress_pools_count; + devlink_sb->egress_pools_count = egress_pools_count; + devlink_sb->ingress_tc_count = ingress_tc_count; + devlink_sb->egress_tc_count = egress_tc_count; + list_add_tail(&devlink_sb->list, &devlink->sb_list); + return 0; +} +EXPORT_SYMBOL_GPL(devl_sb_register); + +int devlink_sb_register(struct devlink *devlink, unsigned int sb_index, + u32 size, u16 ingress_pools_count, + u16 egress_pools_count, u16 ingress_tc_count, + u16 egress_tc_count) +{ + int err; + + devl_lock(devlink); + err = devl_sb_register(devlink, sb_index, size, ingress_pools_count, + egress_pools_count, ingress_tc_count, + egress_tc_count); + devl_unlock(devlink); + return err; +} +EXPORT_SYMBOL_GPL(devlink_sb_register); + +void devl_sb_unregister(struct devlink *devlink, unsigned int sb_index) +{ + struct devlink_sb *devlink_sb; + + lockdep_assert_held(&devlink->lock); + + devlink_sb = devlink_sb_get_by_index(devlink, sb_index); + WARN_ON(!devlink_sb); + list_del(&devlink_sb->list); + kfree(devlink_sb); +} +EXPORT_SYMBOL_GPL(devl_sb_unregister); + +void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index) +{ + devl_lock(devlink); + devl_sb_unregister(devlink, sb_index); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_sb_unregister); + +/** + * devl_dpipe_headers_register - register dpipe headers + * + * @devlink: devlink + * @dpipe_headers: dpipe header array + * + * Register the headers supported by hardware. + */ +void devl_dpipe_headers_register(struct devlink *devlink, + struct devlink_dpipe_headers *dpipe_headers) +{ + lockdep_assert_held(&devlink->lock); + + devlink->dpipe_headers = dpipe_headers; +} +EXPORT_SYMBOL_GPL(devl_dpipe_headers_register); + +/** + * devl_dpipe_headers_unregister - unregister dpipe headers + * + * @devlink: devlink + * + * Unregister the headers supported by hardware. + */ +void devl_dpipe_headers_unregister(struct devlink *devlink) +{ + lockdep_assert_held(&devlink->lock); + + devlink->dpipe_headers = NULL; +} +EXPORT_SYMBOL_GPL(devl_dpipe_headers_unregister); + +/** + * devlink_dpipe_table_counter_enabled - check if counter allocation + * required + * @devlink: devlink + * @table_name: tables name + * + * Used by driver to check if counter allocation is required. + * After counter allocation is turned on the table entries + * are updated to include counter statistics. + * + * After that point on the driver must respect the counter + * state so that each entry added to the table is added + * with a counter. + */ +bool devlink_dpipe_table_counter_enabled(struct devlink *devlink, + const char *table_name) +{ + struct devlink_dpipe_table *table; + bool enabled; + + rcu_read_lock(); + table = devlink_dpipe_table_find(&devlink->dpipe_table_list, + table_name, devlink); + enabled = false; + if (table) + enabled = table->counters_enabled; + rcu_read_unlock(); + return enabled; +} +EXPORT_SYMBOL_GPL(devlink_dpipe_table_counter_enabled); + +/** + * devl_dpipe_table_register - register dpipe table + * + * @devlink: devlink + * @table_name: table name + * @table_ops: table ops + * @priv: priv + * @counter_control_extern: external control for counters + */ +int devl_dpipe_table_register(struct devlink *devlink, + const char *table_name, + struct devlink_dpipe_table_ops *table_ops, + void *priv, bool counter_control_extern) +{ + struct devlink_dpipe_table *table; + + lockdep_assert_held(&devlink->lock); + + if (WARN_ON(!table_ops->size_get)) + return -EINVAL; + + if (devlink_dpipe_table_find(&devlink->dpipe_table_list, table_name, + devlink)) + return -EEXIST; + + table = kzalloc(sizeof(*table), GFP_KERNEL); + if (!table) + return -ENOMEM; + + table->name = table_name; + table->table_ops = table_ops; + table->priv = priv; + table->counter_control_extern = counter_control_extern; + + list_add_tail_rcu(&table->list, &devlink->dpipe_table_list); + + return 0; +} +EXPORT_SYMBOL_GPL(devl_dpipe_table_register); + +/** + * devl_dpipe_table_unregister - unregister dpipe table + * + * @devlink: devlink + * @table_name: table name + */ +void devl_dpipe_table_unregister(struct devlink *devlink, + const char *table_name) +{ + struct devlink_dpipe_table *table; + + lockdep_assert_held(&devlink->lock); + + table = devlink_dpipe_table_find(&devlink->dpipe_table_list, + table_name, devlink); + if (!table) + return; + list_del_rcu(&table->list); + kfree_rcu(table, rcu); +} +EXPORT_SYMBOL_GPL(devl_dpipe_table_unregister); + +/** + * devl_resource_register - devlink resource register + * + * @devlink: devlink + * @resource_name: resource's name + * @resource_size: resource's size + * @resource_id: resource's id + * @parent_resource_id: resource's parent id + * @size_params: size parameters + * + * Generic resources should reuse the same names across drivers. + * Please see the generic resources list at: + * Documentation/networking/devlink/devlink-resource.rst + */ +int devl_resource_register(struct devlink *devlink, + const char *resource_name, + u64 resource_size, + u64 resource_id, + u64 parent_resource_id, + const struct devlink_resource_size_params *size_params) +{ + struct devlink_resource *resource; + struct list_head *resource_list; + bool top_hierarchy; + + lockdep_assert_held(&devlink->lock); + + top_hierarchy = parent_resource_id == DEVLINK_RESOURCE_ID_PARENT_TOP; + + resource = devlink_resource_find(devlink, NULL, resource_id); + if (resource) + return -EINVAL; + + resource = kzalloc(sizeof(*resource), GFP_KERNEL); + if (!resource) + return -ENOMEM; + + if (top_hierarchy) { + resource_list = &devlink->resource_list; + } else { + struct devlink_resource *parent_resource; + + parent_resource = devlink_resource_find(devlink, NULL, + parent_resource_id); + if (parent_resource) { + resource_list = &parent_resource->resource_list; + resource->parent = parent_resource; + } else { + kfree(resource); + return -EINVAL; + } + } + + resource->name = resource_name; + resource->size = resource_size; + resource->size_new = resource_size; + resource->id = resource_id; + resource->size_valid = true; + memcpy(&resource->size_params, size_params, + sizeof(resource->size_params)); + INIT_LIST_HEAD(&resource->resource_list); + list_add_tail(&resource->list, resource_list); + + return 0; +} +EXPORT_SYMBOL_GPL(devl_resource_register); + +/** + * devlink_resource_register - devlink resource register + * + * @devlink: devlink + * @resource_name: resource's name + * @resource_size: resource's size + * @resource_id: resource's id + * @parent_resource_id: resource's parent id + * @size_params: size parameters + * + * Generic resources should reuse the same names across drivers. + * Please see the generic resources list at: + * Documentation/networking/devlink/devlink-resource.rst + * + * Context: Takes and release devlink->lock . + */ +int devlink_resource_register(struct devlink *devlink, + const char *resource_name, + u64 resource_size, + u64 resource_id, + u64 parent_resource_id, + const struct devlink_resource_size_params *size_params) +{ + int err; + + devl_lock(devlink); + err = devl_resource_register(devlink, resource_name, resource_size, + resource_id, parent_resource_id, size_params); + devl_unlock(devlink); + return err; +} +EXPORT_SYMBOL_GPL(devlink_resource_register); + +static void devlink_resource_unregister(struct devlink *devlink, + struct devlink_resource *resource) +{ + struct devlink_resource *tmp, *child_resource; + + list_for_each_entry_safe(child_resource, tmp, &resource->resource_list, + list) { + devlink_resource_unregister(devlink, child_resource); + list_del(&child_resource->list); + kfree(child_resource); + } +} + +/** + * devl_resources_unregister - free all resources + * + * @devlink: devlink + */ +void devl_resources_unregister(struct devlink *devlink) +{ + struct devlink_resource *tmp, *child_resource; + + lockdep_assert_held(&devlink->lock); + + list_for_each_entry_safe(child_resource, tmp, &devlink->resource_list, + list) { + devlink_resource_unregister(devlink, child_resource); + list_del(&child_resource->list); + kfree(child_resource); + } +} +EXPORT_SYMBOL_GPL(devl_resources_unregister); + +/** + * devlink_resources_unregister - free all resources + * + * @devlink: devlink + * + * Context: Takes and release devlink->lock . + */ +void devlink_resources_unregister(struct devlink *devlink) +{ + devl_lock(devlink); + devl_resources_unregister(devlink); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_resources_unregister); + +/** + * devl_resource_size_get - get and update size + * + * @devlink: devlink + * @resource_id: the requested resource id + * @p_resource_size: ptr to update + */ +int devl_resource_size_get(struct devlink *devlink, + u64 resource_id, + u64 *p_resource_size) +{ + struct devlink_resource *resource; + + lockdep_assert_held(&devlink->lock); + + resource = devlink_resource_find(devlink, NULL, resource_id); + if (!resource) + return -EINVAL; + *p_resource_size = resource->size_new; + resource->size = resource->size_new; + return 0; +} +EXPORT_SYMBOL_GPL(devl_resource_size_get); + +/** + * devl_dpipe_table_resource_set - set the resource id + * + * @devlink: devlink + * @table_name: table name + * @resource_id: resource id + * @resource_units: number of resource's units consumed per table's entry + */ +int devl_dpipe_table_resource_set(struct devlink *devlink, + const char *table_name, u64 resource_id, + u64 resource_units) +{ + struct devlink_dpipe_table *table; + + table = devlink_dpipe_table_find(&devlink->dpipe_table_list, + table_name, devlink); + if (!table) + return -EINVAL; + + table->resource_id = resource_id; + table->resource_units = resource_units; + table->resource_valid = true; + return 0; +} +EXPORT_SYMBOL_GPL(devl_dpipe_table_resource_set); + +/** + * devl_resource_occ_get_register - register occupancy getter + * + * @devlink: devlink + * @resource_id: resource id + * @occ_get: occupancy getter callback + * @occ_get_priv: occupancy getter callback priv + */ +void devl_resource_occ_get_register(struct devlink *devlink, + u64 resource_id, + devlink_resource_occ_get_t *occ_get, + void *occ_get_priv) +{ + struct devlink_resource *resource; + + lockdep_assert_held(&devlink->lock); + + resource = devlink_resource_find(devlink, NULL, resource_id); + if (WARN_ON(!resource)) + return; + WARN_ON(resource->occ_get); + + resource->occ_get = occ_get; + resource->occ_get_priv = occ_get_priv; +} +EXPORT_SYMBOL_GPL(devl_resource_occ_get_register); + +/** + * devlink_resource_occ_get_register - register occupancy getter + * + * @devlink: devlink + * @resource_id: resource id + * @occ_get: occupancy getter callback + * @occ_get_priv: occupancy getter callback priv + * + * Context: Takes and release devlink->lock . + */ +void devlink_resource_occ_get_register(struct devlink *devlink, + u64 resource_id, + devlink_resource_occ_get_t *occ_get, + void *occ_get_priv) +{ + devl_lock(devlink); + devl_resource_occ_get_register(devlink, resource_id, + occ_get, occ_get_priv); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_resource_occ_get_register); + +/** + * devl_resource_occ_get_unregister - unregister occupancy getter + * + * @devlink: devlink + * @resource_id: resource id + */ +void devl_resource_occ_get_unregister(struct devlink *devlink, + u64 resource_id) +{ + struct devlink_resource *resource; + + lockdep_assert_held(&devlink->lock); + + resource = devlink_resource_find(devlink, NULL, resource_id); + if (WARN_ON(!resource)) + return; + WARN_ON(!resource->occ_get); + + resource->occ_get = NULL; + resource->occ_get_priv = NULL; +} +EXPORT_SYMBOL_GPL(devl_resource_occ_get_unregister); + +/** + * devlink_resource_occ_get_unregister - unregister occupancy getter + * + * @devlink: devlink + * @resource_id: resource id + * + * Context: Takes and release devlink->lock . + */ +void devlink_resource_occ_get_unregister(struct devlink *devlink, + u64 resource_id) +{ + devl_lock(devlink); + devl_resource_occ_get_unregister(devlink, resource_id); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_resource_occ_get_unregister); + +static int devlink_param_verify(const struct devlink_param *param) +{ + if (!param || !param->name || !param->supported_cmodes) + return -EINVAL; + if (param->generic) + return devlink_param_generic_verify(param); + else + return devlink_param_driver_verify(param); +} + +/** + * devlink_params_register - register configuration parameters + * + * @devlink: devlink + * @params: configuration parameters array + * @params_count: number of parameters provided + * + * Register the configuration parameters supported by the driver. + */ +int devlink_params_register(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count) +{ + const struct devlink_param *param = params; + int i, err; + + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + for (i = 0; i < params_count; i++, param++) { + err = devlink_param_register(devlink, param); + if (err) + goto rollback; + } + return 0; + +rollback: + if (!i) + return err; + + for (param--; i > 0; i--, param--) + devlink_param_unregister(devlink, param); + return err; +} +EXPORT_SYMBOL_GPL(devlink_params_register); + +/** + * devlink_params_unregister - unregister configuration parameters + * @devlink: devlink + * @params: configuration parameters to unregister + * @params_count: number of parameters provided + */ +void devlink_params_unregister(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count) +{ + const struct devlink_param *param = params; + int i; + + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + for (i = 0; i < params_count; i++, param++) + devlink_param_unregister(devlink, param); +} +EXPORT_SYMBOL_GPL(devlink_params_unregister); + +/** + * devlink_param_register - register one configuration parameter + * + * @devlink: devlink + * @param: one configuration parameter + * + * Register the configuration parameter supported by the driver. + * Return: returns 0 on successful registration or error code otherwise. + */ +int devlink_param_register(struct devlink *devlink, + const struct devlink_param *param) +{ + struct devlink_param_item *param_item; + + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + WARN_ON(devlink_param_verify(param)); + WARN_ON(devlink_param_find_by_name(&devlink->param_list, param->name)); + + if (param->supported_cmodes == BIT(DEVLINK_PARAM_CMODE_DRIVERINIT)) + WARN_ON(param->get || param->set); + else + WARN_ON(!param->get || !param->set); + + param_item = kzalloc(sizeof(*param_item), GFP_KERNEL); + if (!param_item) + return -ENOMEM; + + param_item->param = param; + + list_add_tail(¶m_item->list, &devlink->param_list); + return 0; +} +EXPORT_SYMBOL_GPL(devlink_param_register); + +/** + * devlink_param_unregister - unregister one configuration parameter + * @devlink: devlink + * @param: configuration parameter to unregister + */ +void devlink_param_unregister(struct devlink *devlink, + const struct devlink_param *param) +{ + struct devlink_param_item *param_item; + + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + param_item = + devlink_param_find_by_name(&devlink->param_list, param->name); + WARN_ON(!param_item); + list_del(¶m_item->list); + kfree(param_item); +} +EXPORT_SYMBOL_GPL(devlink_param_unregister); + +/** + * devlink_param_driverinit_value_get - get configuration parameter + * value for driver initializing + * + * @devlink: devlink + * @param_id: parameter ID + * @init_val: value of parameter in driverinit configuration mode + * + * This function should be used by the driver to get driverinit + * configuration for initialization after reload command. + */ +int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, + union devlink_param_value *init_val) +{ + struct devlink_param_item *param_item; + + if (!devlink_reload_supported(devlink->ops)) + return -EOPNOTSUPP; + + param_item = devlink_param_find_by_id(&devlink->param_list, param_id); + if (!param_item) + return -EINVAL; + + if (!param_item->driverinit_value_valid || + !devlink_param_cmode_is_supported(param_item->param, + DEVLINK_PARAM_CMODE_DRIVERINIT)) + return -EOPNOTSUPP; + + if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(init_val->vstr, param_item->driverinit_value.vstr); + else + *init_val = param_item->driverinit_value; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_get); + +/** + * devlink_param_driverinit_value_set - set value of configuration + * parameter for driverinit + * configuration mode + * + * @devlink: devlink + * @param_id: parameter ID + * @init_val: value of parameter to set for driverinit configuration mode + * + * This function should be used by the driver to set driverinit + * configuration mode default value. + */ +int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, + union devlink_param_value init_val) +{ + struct devlink_param_item *param_item; + + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + param_item = devlink_param_find_by_id(&devlink->param_list, param_id); + if (!param_item) + return -EINVAL; + + if (!devlink_param_cmode_is_supported(param_item->param, + DEVLINK_PARAM_CMODE_DRIVERINIT)) + return -EOPNOTSUPP; + + if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) + strcpy(param_item->driverinit_value.vstr, init_val.vstr); + else + param_item->driverinit_value = init_val; + param_item->driverinit_value_valid = true; + return 0; +} +EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_set); + +/** + * devlink_param_value_changed - notify devlink on a parameter's value + * change. Should be called by the driver + * right after the change. + * + * @devlink: devlink + * @param_id: parameter ID + * + * This function should be used by the driver to notify devlink on value + * change, excluding driverinit configuration mode. + * For driverinit configuration mode driver should use the function + */ +void devlink_param_value_changed(struct devlink *devlink, u32 param_id) +{ + struct devlink_param_item *param_item; + + param_item = devlink_param_find_by_id(&devlink->param_list, param_id); + WARN_ON(!param_item); + + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); +} +EXPORT_SYMBOL_GPL(devlink_param_value_changed); + +/** + * devl_region_create - create a new address region + * + * @devlink: devlink + * @ops: region operations and name + * @region_max_snapshots: Maximum supported number of snapshots for region + * @region_size: size of region + */ +struct devlink_region *devl_region_create(struct devlink *devlink, + const struct devlink_region_ops *ops, + u32 region_max_snapshots, + u64 region_size) +{ + struct devlink_region *region; + + devl_assert_locked(devlink); + + if (WARN_ON(!ops) || WARN_ON(!ops->destructor)) + return ERR_PTR(-EINVAL); + + if (devlink_region_get_by_name(devlink, ops->name)) + return ERR_PTR(-EEXIST); + + region = kzalloc(sizeof(*region), GFP_KERNEL); + if (!region) + return ERR_PTR(-ENOMEM); + + region->devlink = devlink; + region->max_snapshots = region_max_snapshots; + region->ops = ops; + region->size = region_size; + INIT_LIST_HEAD(®ion->snapshot_list); + mutex_init(®ion->snapshot_lock); + list_add_tail(®ion->list, &devlink->region_list); + devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); + + return region; +} +EXPORT_SYMBOL_GPL(devl_region_create); + +/** + * devlink_region_create - create a new address region + * + * @devlink: devlink + * @ops: region operations and name + * @region_max_snapshots: Maximum supported number of snapshots for region + * @region_size: size of region + * + * Context: Takes and release devlink->lock . + */ +struct devlink_region * +devlink_region_create(struct devlink *devlink, + const struct devlink_region_ops *ops, + u32 region_max_snapshots, u64 region_size) +{ + struct devlink_region *region; + + devl_lock(devlink); + region = devl_region_create(devlink, ops, region_max_snapshots, + region_size); + devl_unlock(devlink); + return region; +} +EXPORT_SYMBOL_GPL(devlink_region_create); + +/** + * devlink_port_region_create - create a new address region for a port + * + * @port: devlink port + * @ops: region operations and name + * @region_max_snapshots: Maximum supported number of snapshots for region + * @region_size: size of region + * + * Context: Takes and release devlink->lock . + */ +struct devlink_region * +devlink_port_region_create(struct devlink_port *port, + const struct devlink_port_region_ops *ops, + u32 region_max_snapshots, u64 region_size) +{ + struct devlink *devlink = port->devlink; + struct devlink_region *region; + int err = 0; + + ASSERT_DEVLINK_PORT_INITIALIZED(port); + + if (WARN_ON(!ops) || WARN_ON(!ops->destructor)) + return ERR_PTR(-EINVAL); + + devl_lock(devlink); + + if (devlink_port_region_get_by_name(port, ops->name)) { + err = -EEXIST; + goto unlock; + } + + region = kzalloc(sizeof(*region), GFP_KERNEL); + if (!region) { + err = -ENOMEM; + goto unlock; + } + + region->devlink = devlink; + region->port = port; + region->max_snapshots = region_max_snapshots; + region->port_ops = ops; + region->size = region_size; + INIT_LIST_HEAD(®ion->snapshot_list); + mutex_init(®ion->snapshot_lock); + list_add_tail(®ion->list, &port->region_list); + devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); + + devl_unlock(devlink); + return region; + +unlock: + devl_unlock(devlink); + return ERR_PTR(err); +} +EXPORT_SYMBOL_GPL(devlink_port_region_create); + +/** + * devl_region_destroy - destroy address region + * + * @region: devlink region to destroy + */ +void devl_region_destroy(struct devlink_region *region) +{ + struct devlink *devlink = region->devlink; + struct devlink_snapshot *snapshot, *ts; + + devl_assert_locked(devlink); + + /* Free all snapshots of region */ + mutex_lock(®ion->snapshot_lock); + list_for_each_entry_safe(snapshot, ts, ®ion->snapshot_list, list) + devlink_region_snapshot_del(region, snapshot); + mutex_unlock(®ion->snapshot_lock); + + list_del(®ion->list); + mutex_destroy(®ion->snapshot_lock); + + devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_DEL); + kfree(region); +} +EXPORT_SYMBOL_GPL(devl_region_destroy); + +/** + * devlink_region_destroy - destroy address region + * + * @region: devlink region to destroy + * + * Context: Takes and release devlink->lock . + */ +void devlink_region_destroy(struct devlink_region *region) +{ + struct devlink *devlink = region->devlink; + + devl_lock(devlink); + devl_region_destroy(region); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_region_destroy); + +/** + * devlink_region_snapshot_id_get - get snapshot ID + * + * This callback should be called when adding a new snapshot, + * Driver should use the same id for multiple snapshots taken + * on multiple regions at the same time/by the same trigger. + * + * The caller of this function must use devlink_region_snapshot_id_put + * when finished creating regions using this id. + * + * Returns zero on success, or a negative error code on failure. + * + * @devlink: devlink + * @id: storage to return id + */ +int devlink_region_snapshot_id_get(struct devlink *devlink, u32 *id) +{ + return __devlink_region_snapshot_id_get(devlink, id); +} +EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_get); + +/** + * devlink_region_snapshot_id_put - put snapshot ID reference + * + * This should be called by a driver after finishing creating snapshots + * with an id. Doing so ensures that the ID can later be released in the + * event that all snapshots using it have been destroyed. + * + * @devlink: devlink + * @id: id to release reference on + */ +void devlink_region_snapshot_id_put(struct devlink *devlink, u32 id) +{ + __devlink_snapshot_id_decrement(devlink, id); +} +EXPORT_SYMBOL_GPL(devlink_region_snapshot_id_put); + +/** + * devlink_region_snapshot_create - create a new snapshot + * This will add a new snapshot of a region. The snapshot + * will be stored on the region struct and can be accessed + * from devlink. This is useful for future analyses of snapshots. + * Multiple snapshots can be created on a region. + * The @snapshot_id should be obtained using the getter function. + * + * @region: devlink region of the snapshot + * @data: snapshot data + * @snapshot_id: snapshot id to be created + */ +int devlink_region_snapshot_create(struct devlink_region *region, + u8 *data, u32 snapshot_id) +{ + int err; + + mutex_lock(®ion->snapshot_lock); + err = __devlink_region_snapshot_create(region, data, snapshot_id); + mutex_unlock(®ion->snapshot_lock); + return err; +} +EXPORT_SYMBOL_GPL(devlink_region_snapshot_create); + +#define DEVLINK_TRAP(_id, _type) \ + { \ + .type = DEVLINK_TRAP_TYPE_##_type, \ + .id = DEVLINK_TRAP_GENERIC_ID_##_id, \ + .name = DEVLINK_TRAP_GENERIC_NAME_##_id, \ + } + +static const struct devlink_trap devlink_trap_generic[] = { + DEVLINK_TRAP(SMAC_MC, DROP), + DEVLINK_TRAP(VLAN_TAG_MISMATCH, DROP), + DEVLINK_TRAP(INGRESS_VLAN_FILTER, DROP), + DEVLINK_TRAP(INGRESS_STP_FILTER, DROP), + DEVLINK_TRAP(EMPTY_TX_LIST, DROP), + DEVLINK_TRAP(PORT_LOOPBACK_FILTER, DROP), + DEVLINK_TRAP(BLACKHOLE_ROUTE, DROP), + DEVLINK_TRAP(TTL_ERROR, EXCEPTION), + DEVLINK_TRAP(TAIL_DROP, DROP), + DEVLINK_TRAP(NON_IP_PACKET, DROP), + DEVLINK_TRAP(UC_DIP_MC_DMAC, DROP), + DEVLINK_TRAP(DIP_LB, DROP), + DEVLINK_TRAP(SIP_MC, DROP), + DEVLINK_TRAP(SIP_LB, DROP), + DEVLINK_TRAP(CORRUPTED_IP_HDR, DROP), + DEVLINK_TRAP(IPV4_SIP_BC, DROP), + DEVLINK_TRAP(IPV6_MC_DIP_RESERVED_SCOPE, DROP), + DEVLINK_TRAP(IPV6_MC_DIP_INTERFACE_LOCAL_SCOPE, DROP), + DEVLINK_TRAP(MTU_ERROR, EXCEPTION), + DEVLINK_TRAP(UNRESOLVED_NEIGH, EXCEPTION), + DEVLINK_TRAP(RPF, EXCEPTION), + DEVLINK_TRAP(REJECT_ROUTE, EXCEPTION), + DEVLINK_TRAP(IPV4_LPM_UNICAST_MISS, EXCEPTION), + DEVLINK_TRAP(IPV6_LPM_UNICAST_MISS, EXCEPTION), + DEVLINK_TRAP(NON_ROUTABLE, DROP), + DEVLINK_TRAP(DECAP_ERROR, EXCEPTION), + DEVLINK_TRAP(OVERLAY_SMAC_MC, DROP), + DEVLINK_TRAP(INGRESS_FLOW_ACTION_DROP, DROP), + DEVLINK_TRAP(EGRESS_FLOW_ACTION_DROP, DROP), + DEVLINK_TRAP(STP, CONTROL), + DEVLINK_TRAP(LACP, CONTROL), + DEVLINK_TRAP(LLDP, CONTROL), + DEVLINK_TRAP(IGMP_QUERY, CONTROL), + DEVLINK_TRAP(IGMP_V1_REPORT, CONTROL), + DEVLINK_TRAP(IGMP_V2_REPORT, CONTROL), + DEVLINK_TRAP(IGMP_V3_REPORT, CONTROL), + DEVLINK_TRAP(IGMP_V2_LEAVE, CONTROL), + DEVLINK_TRAP(MLD_QUERY, CONTROL), + DEVLINK_TRAP(MLD_V1_REPORT, CONTROL), + DEVLINK_TRAP(MLD_V2_REPORT, CONTROL), + DEVLINK_TRAP(MLD_V1_DONE, CONTROL), + DEVLINK_TRAP(IPV4_DHCP, CONTROL), + DEVLINK_TRAP(IPV6_DHCP, CONTROL), + DEVLINK_TRAP(ARP_REQUEST, CONTROL), + DEVLINK_TRAP(ARP_RESPONSE, CONTROL), + DEVLINK_TRAP(ARP_OVERLAY, CONTROL), + DEVLINK_TRAP(IPV6_NEIGH_SOLICIT, CONTROL), + DEVLINK_TRAP(IPV6_NEIGH_ADVERT, CONTROL), + DEVLINK_TRAP(IPV4_BFD, CONTROL), + DEVLINK_TRAP(IPV6_BFD, CONTROL), + DEVLINK_TRAP(IPV4_OSPF, CONTROL), + DEVLINK_TRAP(IPV6_OSPF, CONTROL), + DEVLINK_TRAP(IPV4_BGP, CONTROL), + DEVLINK_TRAP(IPV6_BGP, CONTROL), + DEVLINK_TRAP(IPV4_VRRP, CONTROL), + DEVLINK_TRAP(IPV6_VRRP, CONTROL), + DEVLINK_TRAP(IPV4_PIM, CONTROL), + DEVLINK_TRAP(IPV6_PIM, CONTROL), + DEVLINK_TRAP(UC_LB, CONTROL), + DEVLINK_TRAP(LOCAL_ROUTE, CONTROL), + DEVLINK_TRAP(EXTERNAL_ROUTE, CONTROL), + DEVLINK_TRAP(IPV6_UC_DIP_LINK_LOCAL_SCOPE, CONTROL), + DEVLINK_TRAP(IPV6_DIP_ALL_NODES, CONTROL), + DEVLINK_TRAP(IPV6_DIP_ALL_ROUTERS, CONTROL), + DEVLINK_TRAP(IPV6_ROUTER_SOLICIT, CONTROL), + DEVLINK_TRAP(IPV6_ROUTER_ADVERT, CONTROL), + DEVLINK_TRAP(IPV6_REDIRECT, CONTROL), + DEVLINK_TRAP(IPV4_ROUTER_ALERT, CONTROL), + DEVLINK_TRAP(IPV6_ROUTER_ALERT, CONTROL), + DEVLINK_TRAP(PTP_EVENT, CONTROL), + DEVLINK_TRAP(PTP_GENERAL, CONTROL), + DEVLINK_TRAP(FLOW_ACTION_SAMPLE, CONTROL), + DEVLINK_TRAP(FLOW_ACTION_TRAP, CONTROL), + DEVLINK_TRAP(EARLY_DROP, DROP), + DEVLINK_TRAP(VXLAN_PARSING, DROP), + DEVLINK_TRAP(LLC_SNAP_PARSING, DROP), + DEVLINK_TRAP(VLAN_PARSING, DROP), + DEVLINK_TRAP(PPPOE_PPP_PARSING, DROP), + DEVLINK_TRAP(MPLS_PARSING, DROP), + DEVLINK_TRAP(ARP_PARSING, DROP), + DEVLINK_TRAP(IP_1_PARSING, DROP), + DEVLINK_TRAP(IP_N_PARSING, DROP), + DEVLINK_TRAP(GRE_PARSING, DROP), + DEVLINK_TRAP(UDP_PARSING, DROP), + DEVLINK_TRAP(TCP_PARSING, DROP), + DEVLINK_TRAP(IPSEC_PARSING, DROP), + DEVLINK_TRAP(SCTP_PARSING, DROP), + DEVLINK_TRAP(DCCP_PARSING, DROP), + DEVLINK_TRAP(GTP_PARSING, DROP), + DEVLINK_TRAP(ESP_PARSING, DROP), + DEVLINK_TRAP(BLACKHOLE_NEXTHOP, DROP), + DEVLINK_TRAP(DMAC_FILTER, DROP), + DEVLINK_TRAP(EAPOL, CONTROL), + DEVLINK_TRAP(LOCKED_PORT, DROP), +}; + +#define DEVLINK_TRAP_GROUP(_id) \ + { \ + .id = DEVLINK_TRAP_GROUP_GENERIC_ID_##_id, \ + .name = DEVLINK_TRAP_GROUP_GENERIC_NAME_##_id, \ + } + +static const struct devlink_trap_group devlink_trap_group_generic[] = { + DEVLINK_TRAP_GROUP(L2_DROPS), + DEVLINK_TRAP_GROUP(L3_DROPS), + DEVLINK_TRAP_GROUP(L3_EXCEPTIONS), + DEVLINK_TRAP_GROUP(BUFFER_DROPS), + DEVLINK_TRAP_GROUP(TUNNEL_DROPS), + DEVLINK_TRAP_GROUP(ACL_DROPS), + DEVLINK_TRAP_GROUP(STP), + DEVLINK_TRAP_GROUP(LACP), + DEVLINK_TRAP_GROUP(LLDP), + DEVLINK_TRAP_GROUP(MC_SNOOPING), + DEVLINK_TRAP_GROUP(DHCP), + DEVLINK_TRAP_GROUP(NEIGH_DISCOVERY), + DEVLINK_TRAP_GROUP(BFD), + DEVLINK_TRAP_GROUP(OSPF), + DEVLINK_TRAP_GROUP(BGP), + DEVLINK_TRAP_GROUP(VRRP), + DEVLINK_TRAP_GROUP(PIM), + DEVLINK_TRAP_GROUP(UC_LB), + DEVLINK_TRAP_GROUP(LOCAL_DELIVERY), + DEVLINK_TRAP_GROUP(EXTERNAL_DELIVERY), + DEVLINK_TRAP_GROUP(IPV6), + DEVLINK_TRAP_GROUP(PTP_EVENT), + DEVLINK_TRAP_GROUP(PTP_GENERAL), + DEVLINK_TRAP_GROUP(ACL_SAMPLE), + DEVLINK_TRAP_GROUP(ACL_TRAP), + DEVLINK_TRAP_GROUP(PARSER_ERROR_DROPS), + DEVLINK_TRAP_GROUP(EAPOL), +}; + +static int devlink_trap_generic_verify(const struct devlink_trap *trap) +{ + if (trap->id > DEVLINK_TRAP_GENERIC_ID_MAX) + return -EINVAL; + + if (strcmp(trap->name, devlink_trap_generic[trap->id].name)) + return -EINVAL; + + if (trap->type != devlink_trap_generic[trap->id].type) + return -EINVAL; + + return 0; +} + +static int devlink_trap_driver_verify(const struct devlink_trap *trap) +{ + int i; + + if (trap->id <= DEVLINK_TRAP_GENERIC_ID_MAX) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(devlink_trap_generic); i++) { + if (!strcmp(trap->name, devlink_trap_generic[i].name)) + return -EEXIST; + } + + return 0; +} + +static int devlink_trap_verify(const struct devlink_trap *trap) +{ + if (!trap || !trap->name) + return -EINVAL; + + if (trap->generic) + return devlink_trap_generic_verify(trap); + else + return devlink_trap_driver_verify(trap); +} + +static int +devlink_trap_group_generic_verify(const struct devlink_trap_group *group) +{ + if (group->id > DEVLINK_TRAP_GROUP_GENERIC_ID_MAX) + return -EINVAL; + + if (strcmp(group->name, devlink_trap_group_generic[group->id].name)) + return -EINVAL; + + return 0; +} + +static int +devlink_trap_group_driver_verify(const struct devlink_trap_group *group) +{ + int i; + + if (group->id <= DEVLINK_TRAP_GROUP_GENERIC_ID_MAX) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(devlink_trap_group_generic); i++) { + if (!strcmp(group->name, devlink_trap_group_generic[i].name)) + return -EEXIST; + } + + return 0; +} + +static int devlink_trap_group_verify(const struct devlink_trap_group *group) +{ + if (group->generic) + return devlink_trap_group_generic_verify(group); + else + return devlink_trap_group_driver_verify(group); +} + +static void +devlink_trap_group_notify(struct devlink *devlink, + const struct devlink_trap_group_item *group_item, + enum devlink_command cmd) +{ + struct sk_buff *msg; + int err; + + WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_GROUP_NEW && + cmd != DEVLINK_CMD_TRAP_GROUP_DEL); + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_trap_group_fill(msg, devlink, group_item, cmd, 0, 0, + 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int +devlink_trap_item_group_link(struct devlink *devlink, + struct devlink_trap_item *trap_item) +{ + u16 group_id = trap_item->trap->init_group_id; + struct devlink_trap_group_item *group_item; + + group_item = devlink_trap_group_item_lookup_by_id(devlink, group_id); + if (WARN_ON_ONCE(!group_item)) + return -EINVAL; + + trap_item->group_item = group_item; + + return 0; +} + +static void devlink_trap_notify(struct devlink *devlink, + const struct devlink_trap_item *trap_item, + enum devlink_command cmd) +{ + struct sk_buff *msg; + int err; + + WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_NEW && + cmd != DEVLINK_CMD_TRAP_DEL); + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_trap_fill(msg, devlink, trap_item, cmd, 0, 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int +devlink_trap_register(struct devlink *devlink, + const struct devlink_trap *trap, void *priv) +{ + struct devlink_trap_item *trap_item; + int err; + + if (devlink_trap_item_lookup(devlink, trap->name)) + return -EEXIST; + + trap_item = kzalloc(sizeof(*trap_item), GFP_KERNEL); + if (!trap_item) + return -ENOMEM; + + trap_item->stats = netdev_alloc_pcpu_stats(struct devlink_stats); + if (!trap_item->stats) { + err = -ENOMEM; + goto err_stats_alloc; + } + + trap_item->trap = trap; + trap_item->action = trap->init_action; + trap_item->priv = priv; + + err = devlink_trap_item_group_link(devlink, trap_item); + if (err) + goto err_group_link; + + err = devlink->ops->trap_init(devlink, trap, trap_item); + if (err) + goto err_trap_init; + + list_add_tail(&trap_item->list, &devlink->trap_list); + devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_NEW); + + return 0; + +err_trap_init: +err_group_link: + free_percpu(trap_item->stats); +err_stats_alloc: + kfree(trap_item); + return err; +} + +static void devlink_trap_unregister(struct devlink *devlink, + const struct devlink_trap *trap) +{ + struct devlink_trap_item *trap_item; + + trap_item = devlink_trap_item_lookup(devlink, trap->name); + if (WARN_ON_ONCE(!trap_item)) + return; + + devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_DEL); + list_del(&trap_item->list); + if (devlink->ops->trap_fini) + devlink->ops->trap_fini(devlink, trap, trap_item); + free_percpu(trap_item->stats); + kfree(trap_item); +} + +static void devlink_trap_disable(struct devlink *devlink, + const struct devlink_trap *trap) +{ + struct devlink_trap_item *trap_item; + + trap_item = devlink_trap_item_lookup(devlink, trap->name); + if (WARN_ON_ONCE(!trap_item)) + return; + + devlink->ops->trap_action_set(devlink, trap, DEVLINK_TRAP_ACTION_DROP, + NULL); + trap_item->action = DEVLINK_TRAP_ACTION_DROP; +} + +/** + * devl_traps_register - Register packet traps with devlink. + * @devlink: devlink. + * @traps: Packet traps. + * @traps_count: Count of provided packet traps. + * @priv: Driver private information. + * + * Return: Non-zero value on failure. + */ +int devl_traps_register(struct devlink *devlink, + const struct devlink_trap *traps, + size_t traps_count, void *priv) +{ + int i, err; + + if (!devlink->ops->trap_init || !devlink->ops->trap_action_set) + return -EINVAL; + + devl_assert_locked(devlink); + for (i = 0; i < traps_count; i++) { + const struct devlink_trap *trap = &traps[i]; + + err = devlink_trap_verify(trap); + if (err) + goto err_trap_verify; + + err = devlink_trap_register(devlink, trap, priv); + if (err) + goto err_trap_register; + } + + return 0; + +err_trap_register: +err_trap_verify: + for (i--; i >= 0; i--) + devlink_trap_unregister(devlink, &traps[i]); + return err; +} +EXPORT_SYMBOL_GPL(devl_traps_register); + +/** + * devlink_traps_register - Register packet traps with devlink. + * @devlink: devlink. + * @traps: Packet traps. + * @traps_count: Count of provided packet traps. + * @priv: Driver private information. + * + * Context: Takes and release devlink->lock . + * + * Return: Non-zero value on failure. + */ +int devlink_traps_register(struct devlink *devlink, + const struct devlink_trap *traps, + size_t traps_count, void *priv) +{ + int err; + + devl_lock(devlink); + err = devl_traps_register(devlink, traps, traps_count, priv); + devl_unlock(devlink); + return err; +} +EXPORT_SYMBOL_GPL(devlink_traps_register); + +/** + * devl_traps_unregister - Unregister packet traps from devlink. + * @devlink: devlink. + * @traps: Packet traps. + * @traps_count: Count of provided packet traps. + */ +void devl_traps_unregister(struct devlink *devlink, + const struct devlink_trap *traps, + size_t traps_count) +{ + int i; + + devl_assert_locked(devlink); + /* Make sure we do not have any packets in-flight while unregistering + * traps by disabling all of them and waiting for a grace period. + */ + for (i = traps_count - 1; i >= 0; i--) + devlink_trap_disable(devlink, &traps[i]); + synchronize_rcu(); + for (i = traps_count - 1; i >= 0; i--) + devlink_trap_unregister(devlink, &traps[i]); +} +EXPORT_SYMBOL_GPL(devl_traps_unregister); + +/** + * devlink_traps_unregister - Unregister packet traps from devlink. + * @devlink: devlink. + * @traps: Packet traps. + * @traps_count: Count of provided packet traps. + * + * Context: Takes and release devlink->lock . + */ +void devlink_traps_unregister(struct devlink *devlink, + const struct devlink_trap *traps, + size_t traps_count) +{ + devl_lock(devlink); + devl_traps_unregister(devlink, traps, traps_count); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_traps_unregister); + +static void +devlink_trap_stats_update(struct devlink_stats __percpu *trap_stats, + size_t skb_len) +{ + struct devlink_stats *stats; + + stats = this_cpu_ptr(trap_stats); + u64_stats_update_begin(&stats->syncp); + u64_stats_add(&stats->rx_bytes, skb_len); + u64_stats_inc(&stats->rx_packets); + u64_stats_update_end(&stats->syncp); +} + +static void +devlink_trap_report_metadata_set(struct devlink_trap_metadata *metadata, + const struct devlink_trap_item *trap_item, + struct devlink_port *in_devlink_port, + const struct flow_action_cookie *fa_cookie) +{ + metadata->trap_name = trap_item->trap->name; + metadata->trap_group_name = trap_item->group_item->group->name; + metadata->fa_cookie = fa_cookie; + metadata->trap_type = trap_item->trap->type; + + spin_lock(&in_devlink_port->type_lock); + if (in_devlink_port->type == DEVLINK_PORT_TYPE_ETH) + metadata->input_dev = in_devlink_port->type_eth.netdev; + spin_unlock(&in_devlink_port->type_lock); +} + +/** + * devlink_trap_report - Report trapped packet to drop monitor. + * @devlink: devlink. + * @skb: Trapped packet. + * @trap_ctx: Trap context. + * @in_devlink_port: Input devlink port. + * @fa_cookie: Flow action cookie. Could be NULL. + */ +void devlink_trap_report(struct devlink *devlink, struct sk_buff *skb, + void *trap_ctx, struct devlink_port *in_devlink_port, + const struct flow_action_cookie *fa_cookie) + +{ + struct devlink_trap_item *trap_item = trap_ctx; + + devlink_trap_stats_update(trap_item->stats, skb->len); + devlink_trap_stats_update(trap_item->group_item->stats, skb->len); + + if (trace_devlink_trap_report_enabled()) { + struct devlink_trap_metadata metadata = {}; + + devlink_trap_report_metadata_set(&metadata, trap_item, + in_devlink_port, fa_cookie); + trace_devlink_trap_report(devlink, skb, &metadata); + } +} +EXPORT_SYMBOL_GPL(devlink_trap_report); + +/** + * devlink_trap_ctx_priv - Trap context to driver private information. + * @trap_ctx: Trap context. + * + * Return: Driver private information passed during registration. + */ +void *devlink_trap_ctx_priv(void *trap_ctx) +{ + struct devlink_trap_item *trap_item = trap_ctx; + + return trap_item->priv; +} +EXPORT_SYMBOL_GPL(devlink_trap_ctx_priv); + +static int +devlink_trap_group_item_policer_link(struct devlink *devlink, + struct devlink_trap_group_item *group_item) +{ + u32 policer_id = group_item->group->init_policer_id; + struct devlink_trap_policer_item *policer_item; + + if (policer_id == 0) + return 0; + + policer_item = devlink_trap_policer_item_lookup(devlink, policer_id); + if (WARN_ON_ONCE(!policer_item)) + return -EINVAL; + + group_item->policer_item = policer_item; + + return 0; +} + +static int +devlink_trap_group_register(struct devlink *devlink, + const struct devlink_trap_group *group) +{ + struct devlink_trap_group_item *group_item; + int err; + + if (devlink_trap_group_item_lookup(devlink, group->name)) + return -EEXIST; + + group_item = kzalloc(sizeof(*group_item), GFP_KERNEL); + if (!group_item) + return -ENOMEM; + + group_item->stats = netdev_alloc_pcpu_stats(struct devlink_stats); + if (!group_item->stats) { + err = -ENOMEM; + goto err_stats_alloc; + } + + group_item->group = group; + + err = devlink_trap_group_item_policer_link(devlink, group_item); + if (err) + goto err_policer_link; + + if (devlink->ops->trap_group_init) { + err = devlink->ops->trap_group_init(devlink, group); + if (err) + goto err_group_init; + } + + list_add_tail(&group_item->list, &devlink->trap_group_list); + devlink_trap_group_notify(devlink, group_item, + DEVLINK_CMD_TRAP_GROUP_NEW); + + return 0; + +err_group_init: +err_policer_link: + free_percpu(group_item->stats); +err_stats_alloc: + kfree(group_item); + return err; +} + +static void +devlink_trap_group_unregister(struct devlink *devlink, + const struct devlink_trap_group *group) +{ + struct devlink_trap_group_item *group_item; + + group_item = devlink_trap_group_item_lookup(devlink, group->name); + if (WARN_ON_ONCE(!group_item)) + return; + + devlink_trap_group_notify(devlink, group_item, + DEVLINK_CMD_TRAP_GROUP_DEL); + list_del(&group_item->list); + free_percpu(group_item->stats); + kfree(group_item); +} + +/** + * devl_trap_groups_register - Register packet trap groups with devlink. + * @devlink: devlink. + * @groups: Packet trap groups. + * @groups_count: Count of provided packet trap groups. + * + * Return: Non-zero value on failure. + */ +int devl_trap_groups_register(struct devlink *devlink, + const struct devlink_trap_group *groups, + size_t groups_count) +{ + int i, err; + + devl_assert_locked(devlink); + for (i = 0; i < groups_count; i++) { + const struct devlink_trap_group *group = &groups[i]; + + err = devlink_trap_group_verify(group); + if (err) + goto err_trap_group_verify; + + err = devlink_trap_group_register(devlink, group); + if (err) + goto err_trap_group_register; + } + + return 0; + +err_trap_group_register: +err_trap_group_verify: + for (i--; i >= 0; i--) + devlink_trap_group_unregister(devlink, &groups[i]); + return err; +} +EXPORT_SYMBOL_GPL(devl_trap_groups_register); + +/** + * devlink_trap_groups_register - Register packet trap groups with devlink. + * @devlink: devlink. + * @groups: Packet trap groups. + * @groups_count: Count of provided packet trap groups. + * + * Context: Takes and release devlink->lock . + * + * Return: Non-zero value on failure. + */ +int devlink_trap_groups_register(struct devlink *devlink, + const struct devlink_trap_group *groups, + size_t groups_count) +{ + int err; + + devl_lock(devlink); + err = devl_trap_groups_register(devlink, groups, groups_count); + devl_unlock(devlink); + return err; +} +EXPORT_SYMBOL_GPL(devlink_trap_groups_register); + +/** + * devl_trap_groups_unregister - Unregister packet trap groups from devlink. + * @devlink: devlink. + * @groups: Packet trap groups. + * @groups_count: Count of provided packet trap groups. + */ +void devl_trap_groups_unregister(struct devlink *devlink, + const struct devlink_trap_group *groups, + size_t groups_count) +{ + int i; + + devl_assert_locked(devlink); + for (i = groups_count - 1; i >= 0; i--) + devlink_trap_group_unregister(devlink, &groups[i]); +} +EXPORT_SYMBOL_GPL(devl_trap_groups_unregister); + +/** + * devlink_trap_groups_unregister - Unregister packet trap groups from devlink. + * @devlink: devlink. + * @groups: Packet trap groups. + * @groups_count: Count of provided packet trap groups. + * + * Context: Takes and release devlink->lock . + */ +void devlink_trap_groups_unregister(struct devlink *devlink, + const struct devlink_trap_group *groups, + size_t groups_count) +{ + devl_lock(devlink); + devl_trap_groups_unregister(devlink, groups, groups_count); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_trap_groups_unregister); + +static void +devlink_trap_policer_notify(struct devlink *devlink, + const struct devlink_trap_policer_item *policer_item, + enum devlink_command cmd) +{ + struct sk_buff *msg; + int err; + + WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_POLICER_NEW && + cmd != DEVLINK_CMD_TRAP_POLICER_DEL); + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_trap_policer_fill(msg, devlink, policer_item, cmd, 0, + 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +static int +devlink_trap_policer_register(struct devlink *devlink, + const struct devlink_trap_policer *policer) +{ + struct devlink_trap_policer_item *policer_item; + int err; + + if (devlink_trap_policer_item_lookup(devlink, policer->id)) + return -EEXIST; + + policer_item = kzalloc(sizeof(*policer_item), GFP_KERNEL); + if (!policer_item) + return -ENOMEM; + + policer_item->policer = policer; + policer_item->rate = policer->init_rate; + policer_item->burst = policer->init_burst; + + if (devlink->ops->trap_policer_init) { + err = devlink->ops->trap_policer_init(devlink, policer); + if (err) + goto err_policer_init; + } + + list_add_tail(&policer_item->list, &devlink->trap_policer_list); + devlink_trap_policer_notify(devlink, policer_item, + DEVLINK_CMD_TRAP_POLICER_NEW); + + return 0; + +err_policer_init: + kfree(policer_item); + return err; +} + +static void +devlink_trap_policer_unregister(struct devlink *devlink, + const struct devlink_trap_policer *policer) +{ + struct devlink_trap_policer_item *policer_item; + + policer_item = devlink_trap_policer_item_lookup(devlink, policer->id); + if (WARN_ON_ONCE(!policer_item)) + return; + + devlink_trap_policer_notify(devlink, policer_item, + DEVLINK_CMD_TRAP_POLICER_DEL); + list_del(&policer_item->list); + if (devlink->ops->trap_policer_fini) + devlink->ops->trap_policer_fini(devlink, policer); + kfree(policer_item); +} + +/** + * devl_trap_policers_register - Register packet trap policers with devlink. + * @devlink: devlink. + * @policers: Packet trap policers. + * @policers_count: Count of provided packet trap policers. + * + * Return: Non-zero value on failure. + */ +int +devl_trap_policers_register(struct devlink *devlink, + const struct devlink_trap_policer *policers, + size_t policers_count) +{ + int i, err; + + devl_assert_locked(devlink); + for (i = 0; i < policers_count; i++) { + const struct devlink_trap_policer *policer = &policers[i]; + + if (WARN_ON(policer->id == 0 || + policer->max_rate < policer->min_rate || + policer->max_burst < policer->min_burst)) { + err = -EINVAL; + goto err_trap_policer_verify; + } + + err = devlink_trap_policer_register(devlink, policer); + if (err) + goto err_trap_policer_register; + } + return 0; + +err_trap_policer_register: +err_trap_policer_verify: + for (i--; i >= 0; i--) + devlink_trap_policer_unregister(devlink, &policers[i]); + return err; +} +EXPORT_SYMBOL_GPL(devl_trap_policers_register); + +/** + * devl_trap_policers_unregister - Unregister packet trap policers from devlink. + * @devlink: devlink. + * @policers: Packet trap policers. + * @policers_count: Count of provided packet trap policers. + */ +void +devl_trap_policers_unregister(struct devlink *devlink, + const struct devlink_trap_policer *policers, + size_t policers_count) +{ + int i; + + devl_assert_locked(devlink); + for (i = policers_count - 1; i >= 0; i--) + devlink_trap_policer_unregister(devlink, &policers[i]); +} +EXPORT_SYMBOL_GPL(devl_trap_policers_unregister); + +static void __devlink_compat_running_version(struct devlink *devlink, + char *buf, size_t len) +{ + struct devlink_info_req req = {}; + const struct nlattr *nlattr; + struct sk_buff *msg; + int rem, err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + req.msg = msg; + err = devlink->ops->info_get(devlink, &req, NULL); + if (err) + goto free_msg; + + nla_for_each_attr(nlattr, (void *)msg->data, msg->len, rem) { + const struct nlattr *kv; + int rem_kv; + + if (nla_type(nlattr) != DEVLINK_ATTR_INFO_VERSION_RUNNING) + continue; + + nla_for_each_nested(kv, nlattr, rem_kv) { + if (nla_type(kv) != DEVLINK_ATTR_INFO_VERSION_VALUE) + continue; + + strlcat(buf, nla_data(kv), len); + strlcat(buf, " ", len); + } + } +free_msg: + nlmsg_free(msg); +} + +void devlink_compat_running_version(struct devlink *devlink, + char *buf, size_t len) +{ + if (!devlink->ops->info_get) + return; + + devl_lock(devlink); + __devlink_compat_running_version(devlink, buf, len); + devl_unlock(devlink); +} + +int devlink_compat_flash_update(struct devlink *devlink, const char *file_name) +{ + struct devlink_flash_update_params params = {}; + int ret; + + if (!devlink->ops->flash_update) + return -EOPNOTSUPP; + + ret = request_firmware(¶ms.fw, file_name, devlink->dev); + if (ret) + return ret; + + devl_lock(devlink); + devlink_flash_update_begin_notify(devlink); + ret = devlink->ops->flash_update(devlink, ¶ms, NULL); + devlink_flash_update_end_notify(devlink); + devl_unlock(devlink); + + release_firmware(params.fw); + + return ret; +} + +int devlink_compat_phys_port_name_get(struct net_device *dev, + char *name, size_t len) +{ + struct devlink_port *devlink_port; + + /* RTNL mutex is held here which ensures that devlink_port + * instance cannot disappear in the middle. No need to take + * any devlink lock as only permanent values are accessed. + */ + ASSERT_RTNL(); + + devlink_port = dev->devlink_port; + if (!devlink_port) + return -EOPNOTSUPP; + + return __devlink_port_phys_port_name_get(devlink_port, name, len); +} + +int devlink_compat_switch_id_get(struct net_device *dev, + struct netdev_phys_item_id *ppid) +{ + struct devlink_port *devlink_port; + + /* Caller must hold RTNL mutex or reference to dev, which ensures that + * devlink_port instance cannot disappear in the middle. No need to take + * any devlink lock as only permanent values are accessed. + */ + devlink_port = dev->devlink_port; + if (!devlink_port || !devlink_port->switch_port) + return -EOPNOTSUPP; + + memcpy(ppid, &devlink_port->attrs.switch_id, sizeof(*ppid)); + + return 0; +} + +static void __net_exit devlink_pernet_pre_exit(struct net *net) +{ + struct devlink *devlink; + u32 actions_performed; + unsigned long index; + int err; + + /* In case network namespace is getting destroyed, reload + * all devlink instances from this namespace into init_net. + */ + devlinks_xa_for_each_registered_get(net, index, devlink) { + WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); + mutex_lock(&devlink->lock); + err = devlink_reload(devlink, &init_net, + DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + DEVLINK_RELOAD_LIMIT_UNSPEC, + &actions_performed, NULL); + mutex_unlock(&devlink->lock); + if (err && err != -EOPNOTSUPP) + pr_warn("Failed to reload devlink instance into init_net\n"); + devlink_put(devlink); + } +} + +static struct pernet_operations devlink_pernet_ops __net_initdata = { + .pre_exit = devlink_pernet_pre_exit, +}; + +static int __init devlink_init(void) +{ + int err; + + err = genl_register_family(&devlink_nl_family); + if (err) + goto out; + err = register_pernet_subsys(&devlink_pernet_ops); + +out: + WARN_ON(err); + return err; +} + +subsys_initcall(devlink_init); -- cgit v1.2.3 From e50ef40f9a9a35d8526d38e6b5a165178fa1857f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:18 -0800 Subject: devlink: rename devlink_netdevice_event -> devlink_port_netdevice_event To make the upcoming change a pure(er?) code move rename devlink_netdevice_event -> devlink_port_netdevice_event. This makes it clear that it only touches ports and doesn't belong cleanly in the core. Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 032d6d0a5ce6..1221c7277c0b 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -9942,8 +9942,8 @@ void devlink_set_features(struct devlink *devlink, u64 features) } EXPORT_SYMBOL_GPL(devlink_set_features); -static int devlink_netdevice_event(struct notifier_block *nb, - unsigned long event, void *ptr); +static int devlink_port_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr); /** * devlink_alloc_ns - Allocate new devlink instance resources @@ -9978,7 +9978,7 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, if (ret < 0) goto err_xa_alloc; - devlink->netdevice_nb.notifier_call = devlink_netdevice_event; + devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event; ret = register_netdevice_notifier_net(net, &devlink->netdevice_nb); if (ret) goto err_register_netdevice_notifier; @@ -10480,8 +10480,8 @@ void devlink_port_type_clear(struct devlink_port *devlink_port) } EXPORT_SYMBOL_GPL(devlink_port_type_clear); -static int devlink_netdevice_event(struct notifier_block *nb, - unsigned long event, void *ptr) +static int devlink_port_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr) { struct net_device *netdev = netdev_notifier_info_to_dev(ptr); struct devlink_port *devlink_port = netdev->devlink_port; -- cgit v1.2.3 From 687125b5799cd5120437fa455cfccbe8537916ff Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:19 -0800 Subject: devlink: split out core code Move core code into a separate file. It's spread around the main file which makes refactoring and figuring out how devlink works harder. Move the xarray, all the most core devlink instance code out like locking, ref counting, alloc, register, etc. Leave port stuff in leftover.c, if we want to move port code it'd probably be to its own file. Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/Makefile | 2 +- net/devlink/core.c | 347 ++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 117 ++++++++++++ net/devlink/leftover.c | 444 ++------------------------------------------ 4 files changed, 476 insertions(+), 434 deletions(-) create mode 100644 net/devlink/core.c create mode 100644 net/devlink/devl_internal.h (limited to 'net') diff --git a/net/devlink/Makefile b/net/devlink/Makefile index 3a60959f71ee..aff7da844e5d 100644 --- a/net/devlink/Makefile +++ b/net/devlink/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 -obj-y := leftover.o +obj-y := leftover.o core.o diff --git a/net/devlink/core.c b/net/devlink/core.c new file mode 100644 index 000000000000..c084eafa17fb --- /dev/null +++ b/net/devlink/core.c @@ -0,0 +1,347 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2016 Jiri Pirko + */ + +#include + +#include "devl_internal.h" + +DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC); + +void *devlink_priv(struct devlink *devlink) +{ + return &devlink->priv; +} +EXPORT_SYMBOL_GPL(devlink_priv); + +struct devlink *priv_to_devlink(void *priv) +{ + return container_of(priv, struct devlink, priv); +} +EXPORT_SYMBOL_GPL(priv_to_devlink); + +struct device *devlink_to_dev(const struct devlink *devlink) +{ + return devlink->dev; +} +EXPORT_SYMBOL_GPL(devlink_to_dev); + +struct net *devlink_net(const struct devlink *devlink) +{ + return read_pnet(&devlink->_net); +} +EXPORT_SYMBOL_GPL(devlink_net); + +void devl_assert_locked(struct devlink *devlink) +{ + lockdep_assert_held(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_assert_locked); + +#ifdef CONFIG_LOCKDEP +/* For use in conjunction with LOCKDEP only e.g. rcu_dereference_protected() */ +bool devl_lock_is_held(struct devlink *devlink) +{ + return lockdep_is_held(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_lock_is_held); +#endif + +void devl_lock(struct devlink *devlink) +{ + mutex_lock(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_lock); + +int devl_trylock(struct devlink *devlink) +{ + return mutex_trylock(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_trylock); + +void devl_unlock(struct devlink *devlink) +{ + mutex_unlock(&devlink->lock); +} +EXPORT_SYMBOL_GPL(devl_unlock); + +struct devlink *__must_check devlink_try_get(struct devlink *devlink) +{ + if (refcount_inc_not_zero(&devlink->refcount)) + return devlink; + return NULL; +} + +static void __devlink_put_rcu(struct rcu_head *head) +{ + struct devlink *devlink = container_of(head, struct devlink, rcu); + + complete(&devlink->comp); +} + +void devlink_put(struct devlink *devlink) +{ + if (refcount_dec_and_test(&devlink->refcount)) + /* Make sure unregister operation that may await the completion + * is unblocked only after all users are after the end of + * RCU grace period. + */ + call_rcu(&devlink->rcu, __devlink_put_rcu); +} + +static struct devlink * +devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter, + void * (*xa_find_fn)(struct xarray *, unsigned long *, + unsigned long, xa_mark_t)) +{ + struct devlink *devlink; + + rcu_read_lock(); +retry: + devlink = xa_find_fn(&devlinks, indexp, ULONG_MAX, DEVLINK_REGISTERED); + if (!devlink) + goto unlock; + + /* In case devlink_unregister() was already called and "unregistering" + * mark was set, do not allow to get a devlink reference here. + * This prevents live-lock of devlink_unregister() wait for completion. + */ + if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING)) + goto retry; + + /* For a possible retry, the xa_find_after() should be always used */ + xa_find_fn = xa_find_after; + if (!devlink_try_get(devlink)) + goto retry; + if (!net_eq(devlink_net(devlink), net)) { + devlink_put(devlink); + goto retry; + } +unlock: + rcu_read_unlock(); + return devlink; +} + +struct devlink * +devlinks_xa_find_get_first(struct net *net, unsigned long *indexp, + xa_mark_t filter) +{ + return devlinks_xa_find_get(net, indexp, filter, xa_find); +} + +struct devlink * +devlinks_xa_find_get_next(struct net *net, unsigned long *indexp, + xa_mark_t filter) +{ + return devlinks_xa_find_get(net, indexp, filter, xa_find_after); +} + +/** + * devlink_set_features - Set devlink supported features + * + * @devlink: devlink + * @features: devlink support features + * + * This interface allows us to set reload ops separatelly from + * the devlink_alloc. + */ +void devlink_set_features(struct devlink *devlink, u64 features) +{ + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + WARN_ON(features & DEVLINK_F_RELOAD && + !devlink_reload_supported(devlink->ops)); + devlink->features = features; +} +EXPORT_SYMBOL_GPL(devlink_set_features); + +/** + * devlink_register - Register devlink instance + * + * @devlink: devlink + */ +void devlink_register(struct devlink *devlink) +{ + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + /* Make sure that we are in .probe() routine */ + + xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); + devlink_notify_register(devlink); +} +EXPORT_SYMBOL_GPL(devlink_register); + +/** + * devlink_unregister - Unregister devlink instance + * + * @devlink: devlink + */ +void devlink_unregister(struct devlink *devlink) +{ + ASSERT_DEVLINK_REGISTERED(devlink); + /* Make sure that we are in .remove() routine */ + + xa_set_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); + devlink_put(devlink); + wait_for_completion(&devlink->comp); + + devlink_notify_unregister(devlink); + xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); + xa_clear_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); +} +EXPORT_SYMBOL_GPL(devlink_unregister); + +/** + * devlink_alloc_ns - Allocate new devlink instance resources + * in specific namespace + * + * @ops: ops + * @priv_size: size of user private data + * @net: net namespace + * @dev: parent device + * + * Allocate new devlink instance resources, including devlink index + * and name. + */ +struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, + size_t priv_size, struct net *net, + struct device *dev) +{ + struct devlink *devlink; + static u32 last_id; + int ret; + + WARN_ON(!ops || !dev); + if (!devlink_reload_actions_valid(ops)) + return NULL; + + devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL); + if (!devlink) + return NULL; + + ret = xa_alloc_cyclic(&devlinks, &devlink->index, devlink, xa_limit_31b, + &last_id, GFP_KERNEL); + if (ret < 0) + goto err_xa_alloc; + + devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event; + ret = register_netdevice_notifier_net(net, &devlink->netdevice_nb); + if (ret) + goto err_register_netdevice_notifier; + + devlink->dev = dev; + devlink->ops = ops; + xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC); + xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC); + write_pnet(&devlink->_net, net); + INIT_LIST_HEAD(&devlink->rate_list); + INIT_LIST_HEAD(&devlink->linecard_list); + INIT_LIST_HEAD(&devlink->sb_list); + INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list); + INIT_LIST_HEAD(&devlink->resource_list); + INIT_LIST_HEAD(&devlink->param_list); + INIT_LIST_HEAD(&devlink->region_list); + INIT_LIST_HEAD(&devlink->reporter_list); + INIT_LIST_HEAD(&devlink->trap_list); + INIT_LIST_HEAD(&devlink->trap_group_list); + INIT_LIST_HEAD(&devlink->trap_policer_list); + lockdep_register_key(&devlink->lock_key); + mutex_init(&devlink->lock); + lockdep_set_class(&devlink->lock, &devlink->lock_key); + mutex_init(&devlink->reporters_lock); + mutex_init(&devlink->linecards_lock); + refcount_set(&devlink->refcount, 1); + init_completion(&devlink->comp); + + return devlink; + +err_register_netdevice_notifier: + xa_erase(&devlinks, devlink->index); +err_xa_alloc: + kfree(devlink); + return NULL; +} +EXPORT_SYMBOL_GPL(devlink_alloc_ns); + +/** + * devlink_free - Free devlink instance resources + * + * @devlink: devlink + */ +void devlink_free(struct devlink *devlink) +{ + ASSERT_DEVLINK_NOT_REGISTERED(devlink); + + mutex_destroy(&devlink->linecards_lock); + mutex_destroy(&devlink->reporters_lock); + mutex_destroy(&devlink->lock); + lockdep_unregister_key(&devlink->lock_key); + WARN_ON(!list_empty(&devlink->trap_policer_list)); + WARN_ON(!list_empty(&devlink->trap_group_list)); + WARN_ON(!list_empty(&devlink->trap_list)); + WARN_ON(!list_empty(&devlink->reporter_list)); + WARN_ON(!list_empty(&devlink->region_list)); + WARN_ON(!list_empty(&devlink->param_list)); + WARN_ON(!list_empty(&devlink->resource_list)); + WARN_ON(!list_empty(&devlink->dpipe_table_list)); + WARN_ON(!list_empty(&devlink->sb_list)); + WARN_ON(!list_empty(&devlink->rate_list)); + WARN_ON(!list_empty(&devlink->linecard_list)); + WARN_ON(!xa_empty(&devlink->ports)); + + xa_destroy(&devlink->snapshot_ids); + xa_destroy(&devlink->ports); + + WARN_ON_ONCE(unregister_netdevice_notifier_net(devlink_net(devlink), + &devlink->netdevice_nb)); + + xa_erase(&devlinks, devlink->index); + + kfree(devlink); +} +EXPORT_SYMBOL_GPL(devlink_free); + +static void __net_exit devlink_pernet_pre_exit(struct net *net) +{ + struct devlink *devlink; + u32 actions_performed; + unsigned long index; + int err; + + /* In case network namespace is getting destroyed, reload + * all devlink instances from this namespace into init_net. + */ + devlinks_xa_for_each_registered_get(net, index, devlink) { + WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); + mutex_lock(&devlink->lock); + err = devlink_reload(devlink, &init_net, + DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + DEVLINK_RELOAD_LIMIT_UNSPEC, + &actions_performed, NULL); + mutex_unlock(&devlink->lock); + if (err && err != -EOPNOTSUPP) + pr_warn("Failed to reload devlink instance into init_net\n"); + devlink_put(devlink); + } +} + +static struct pernet_operations devlink_pernet_ops __net_initdata = { + .pre_exit = devlink_pernet_pre_exit, +}; + +static int __init devlink_init(void) +{ + int err; + + err = genl_register_family(&devlink_nl_family); + if (err) + goto out; + err = register_pernet_subsys(&devlink_pernet_ops); + +out: + WARN_ON(err); + return err; +} + +subsys_initcall(devlink_init); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h new file mode 100644 index 000000000000..1c74293c6c25 --- /dev/null +++ b/net/devlink/devl_internal.h @@ -0,0 +1,117 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Copyright (c) 2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2016 Jiri Pirko + */ + +#include +#include +#include +#include +#include +#include +#include + +#define DEVLINK_REGISTERED XA_MARK_1 +#define DEVLINK_UNREGISTERING XA_MARK_2 + +#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \ + (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX) + +struct devlink_dev_stats { + u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; + u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; +}; + +struct devlink { + u32 index; + struct xarray ports; + struct list_head rate_list; + struct list_head sb_list; + struct list_head dpipe_table_list; + struct list_head resource_list; + struct list_head param_list; + struct list_head region_list; + struct list_head reporter_list; + struct mutex reporters_lock; /* protects reporter_list */ + struct devlink_dpipe_headers *dpipe_headers; + struct list_head trap_list; + struct list_head trap_group_list; + struct list_head trap_policer_list; + struct list_head linecard_list; + struct mutex linecards_lock; /* protects linecard_list */ + const struct devlink_ops *ops; + u64 features; + struct xarray snapshot_ids; + struct devlink_dev_stats stats; + struct device *dev; + possible_net_t _net; + /* Serializes access to devlink instance specific objects such as + * port, sb, dpipe, resource, params, region, traps and more. + */ + struct mutex lock; + struct lock_class_key lock_key; + u8 reload_failed:1; + refcount_t refcount; + struct completion comp; + struct rcu_head rcu; + struct notifier_block netdevice_nb; + char priv[] __aligned(NETDEV_ALIGN); +}; + +extern struct xarray devlinks; +extern struct genl_family devlink_nl_family; + +/* devlink instances are open to the access from the user space after + * devlink_register() call. Such logical barrier allows us to have certain + * expectations related to locking. + * + * Before *_register() - we are in initialization stage and no parallel + * access possible to the devlink instance. All drivers perform that phase + * by implicitly holding device_lock. + * + * After *_register() - users and driver can access devlink instance at + * the same time. + */ +#define ASSERT_DEVLINK_REGISTERED(d) \ + WARN_ON_ONCE(!xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) +#define ASSERT_DEVLINK_NOT_REGISTERED(d) \ + WARN_ON_ONCE(xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) + +/* Iterate over devlink pointers which were possible to get reference to. + * devlink_put() needs to be called for each iterated devlink pointer + * in loop body in order to release the reference. + */ +#define devlinks_xa_for_each_get(net, index, devlink, filter) \ + for (index = 0, \ + devlink = devlinks_xa_find_get_first(net, &index, filter); \ + devlink; devlink = devlinks_xa_find_get_next(net, &index, filter)) + +#define devlinks_xa_for_each_registered_get(net, index, devlink) \ + devlinks_xa_for_each_get(net, index, devlink, DEVLINK_REGISTERED) + +struct devlink * +devlinks_xa_find_get_first(struct net *net, unsigned long *indexp, + xa_mark_t filter); +struct devlink * +devlinks_xa_find_get_next(struct net *net, unsigned long *indexp, + xa_mark_t filter); + +/* Netlink */ +void devlink_notify_unregister(struct devlink *devlink); +void devlink_notify_register(struct devlink *devlink); + +/* Ports */ +int devlink_port_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr); + +/* Reload */ +bool devlink_reload_actions_valid(const struct devlink_ops *ops); +int devlink_reload(struct devlink *devlink, struct net *dest_net, + enum devlink_reload_action action, + enum devlink_reload_limit limit, + u32 *actions_performed, struct netlink_ext_ack *extack); + +static inline bool devlink_reload_supported(const struct devlink_ops *ops) +{ + return ops->reload_down && ops->reload_up; +} diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 1221c7277c0b..05f2e75b3a03 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -31,52 +31,7 @@ #define CREATE_TRACE_POINTS #include -#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \ - (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX) - -struct devlink_dev_stats { - u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; - u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; -}; - -struct devlink { - u32 index; - struct xarray ports; - struct list_head rate_list; - struct list_head sb_list; - struct list_head dpipe_table_list; - struct list_head resource_list; - struct list_head param_list; - struct list_head region_list; - struct list_head reporter_list; - struct mutex reporters_lock; /* protects reporter_list */ - struct devlink_dpipe_headers *dpipe_headers; - struct list_head trap_list; - struct list_head trap_group_list; - struct list_head trap_policer_list; - struct list_head linecard_list; - struct mutex linecards_lock; /* protects linecard_list */ - const struct devlink_ops *ops; - u64 features; - struct xarray snapshot_ids; - struct devlink_dev_stats stats; - struct device *dev; - possible_net_t _net; - /* Serializes access to devlink instance specific objects such as - * port, sb, dpipe, resource, params, region, traps and more. - */ - struct mutex lock; - struct lock_class_key lock_key; - u8 reload_failed:1; - refcount_t refcount; - struct completion comp; - struct rcu_head rcu; - struct notifier_block netdevice_nb; - char priv[] __aligned(NETDEV_ALIGN); -}; - -struct devlink_linecard_ops; -struct devlink_linecard_type; +#include "devl_internal.h" struct devlink_linecard { struct list_head list; @@ -122,24 +77,6 @@ struct devlink_resource { void *occ_get_priv; }; -void *devlink_priv(struct devlink *devlink) -{ - return &devlink->priv; -} -EXPORT_SYMBOL_GPL(devlink_priv); - -struct devlink *priv_to_devlink(void *priv) -{ - return container_of(priv, struct devlink, priv); -} -EXPORT_SYMBOL_GPL(priv_to_devlink); - -struct device *devlink_to_dev(const struct devlink *devlink) -{ - return devlink->dev; -} -EXPORT_SYMBOL_GPL(devlink_to_dev); - static struct devlink_dpipe_field devlink_dpipe_fields_ethernet[] = { { .name = "destination mac", @@ -211,148 +148,6 @@ static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ [DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG }, }; -static DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC); -#define DEVLINK_REGISTERED XA_MARK_1 -#define DEVLINK_UNREGISTERING XA_MARK_2 - -/* devlink instances are open to the access from the user space after - * devlink_register() call. Such logical barrier allows us to have certain - * expectations related to locking. - * - * Before *_register() - we are in initialization stage and no parallel - * access possible to the devlink instance. All drivers perform that phase - * by implicitly holding device_lock. - * - * After *_register() - users and driver can access devlink instance at - * the same time. - */ -#define ASSERT_DEVLINK_REGISTERED(d) \ - WARN_ON_ONCE(!xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) -#define ASSERT_DEVLINK_NOT_REGISTERED(d) \ - WARN_ON_ONCE(xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED)) - -struct net *devlink_net(const struct devlink *devlink) -{ - return read_pnet(&devlink->_net); -} -EXPORT_SYMBOL_GPL(devlink_net); - -static void __devlink_put_rcu(struct rcu_head *head) -{ - struct devlink *devlink = container_of(head, struct devlink, rcu); - - complete(&devlink->comp); -} - -void devlink_put(struct devlink *devlink) -{ - if (refcount_dec_and_test(&devlink->refcount)) - /* Make sure unregister operation that may await the completion - * is unblocked only after all users are after the end of - * RCU grace period. - */ - call_rcu(&devlink->rcu, __devlink_put_rcu); -} - -struct devlink *__must_check devlink_try_get(struct devlink *devlink) -{ - if (refcount_inc_not_zero(&devlink->refcount)) - return devlink; - return NULL; -} - -void devl_assert_locked(struct devlink *devlink) -{ - lockdep_assert_held(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_assert_locked); - -#ifdef CONFIG_LOCKDEP -/* For use in conjunction with LOCKDEP only e.g. rcu_dereference_protected() */ -bool devl_lock_is_held(struct devlink *devlink) -{ - return lockdep_is_held(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_lock_is_held); -#endif - -void devl_lock(struct devlink *devlink) -{ - mutex_lock(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_lock); - -int devl_trylock(struct devlink *devlink) -{ - return mutex_trylock(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_trylock); - -void devl_unlock(struct devlink *devlink) -{ - mutex_unlock(&devlink->lock); -} -EXPORT_SYMBOL_GPL(devl_unlock); - -static struct devlink * -devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter, - void * (*xa_find_fn)(struct xarray *, unsigned long *, - unsigned long, xa_mark_t)) -{ - struct devlink *devlink; - - rcu_read_lock(); -retry: - devlink = xa_find_fn(&devlinks, indexp, ULONG_MAX, DEVLINK_REGISTERED); - if (!devlink) - goto unlock; - - /* In case devlink_unregister() was already called and "unregistering" - * mark was set, do not allow to get a devlink reference here. - * This prevents live-lock of devlink_unregister() wait for completion. - */ - if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING)) - goto retry; - - /* For a possible retry, the xa_find_after() should be always used */ - xa_find_fn = xa_find_after; - if (!devlink_try_get(devlink)) - goto retry; - if (!net_eq(devlink_net(devlink), net)) { - devlink_put(devlink); - goto retry; - } -unlock: - rcu_read_unlock(); - return devlink; -} - -static struct devlink *devlinks_xa_find_get_first(struct net *net, - unsigned long *indexp, - xa_mark_t filter) -{ - return devlinks_xa_find_get(net, indexp, filter, xa_find); -} - -static struct devlink *devlinks_xa_find_get_next(struct net *net, - unsigned long *indexp, - xa_mark_t filter) -{ - return devlinks_xa_find_get(net, indexp, filter, xa_find_after); -} - -/* Iterate over devlink pointers which were possible to get reference to. - * devlink_put() needs to be called for each iterated devlink pointer - * in loop body in order to release the reference. - */ -#define devlinks_xa_for_each_get(net, index, devlink, filter) \ - for (index = 0, \ - devlink = devlinks_xa_find_get_first(net, &index, filter); \ - devlink; devlink = devlinks_xa_find_get_next(net, &index, filter)) - -#define devlinks_xa_for_each_registered_get(net, index, devlink) \ - devlinks_xa_for_each_get(net, index, devlink, DEVLINK_REGISTERED) - static struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs) { @@ -917,8 +712,6 @@ static void devlink_nl_post_doit(const struct genl_split_ops *ops, devlink_put(devlink); } -static struct genl_family devlink_nl_family; - enum devlink_multicast_groups { DEVLINK_MCGRP_CONFIG, }; @@ -4653,11 +4446,6 @@ static void devlink_ns_change_notify(struct devlink *devlink, devlink_notify(devlink, DEVLINK_CMD_DEL); } -static bool devlink_reload_supported(const struct devlink_ops *ops) -{ - return ops->reload_down && ops->reload_up; -} - static void devlink_reload_failed_set(struct devlink *devlink, bool reload_failed) { @@ -4725,9 +4513,10 @@ void devlink_remote_reload_actions_performed(struct devlink *devlink, } EXPORT_SYMBOL_GPL(devlink_remote_reload_actions_performed); -static int devlink_reload(struct devlink *devlink, struct net *dest_net, - enum devlink_reload_action action, enum devlink_reload_limit limit, - u32 *actions_performed, struct netlink_ext_ack *extack) +int devlink_reload(struct devlink *devlink, struct net *dest_net, + enum devlink_reload_action action, + enum devlink_reload_limit limit, + u32 *actions_performed, struct netlink_ext_ack *extack) { u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; struct net *curr_net; @@ -9877,7 +9666,7 @@ static const struct genl_small_ops devlink_nl_ops[] = { }, }; -static struct genl_family devlink_nl_family __ro_after_init = { +struct genl_family devlink_nl_family __ro_after_init = { .name = DEVLINK_GENL_NAME, .version = DEVLINK_GENL_VERSION, .maxattr = DEVLINK_ATTR_MAX, @@ -9894,7 +9683,7 @@ static struct genl_family devlink_nl_family __ro_after_init = { .n_mcgrps = ARRAY_SIZE(devlink_nl_mcgrps), }; -static bool devlink_reload_actions_valid(const struct devlink_ops *ops) +bool devlink_reload_actions_valid(const struct devlink_ops *ops) { const struct devlink_reload_combination *comb; int i; @@ -9923,100 +9712,6 @@ static bool devlink_reload_actions_valid(const struct devlink_ops *ops) return true; } -/** - * devlink_set_features - Set devlink supported features - * - * @devlink: devlink - * @features: devlink support features - * - * This interface allows us to set reload ops separatelly from - * the devlink_alloc. - */ -void devlink_set_features(struct devlink *devlink, u64 features) -{ - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - WARN_ON(features & DEVLINK_F_RELOAD && - !devlink_reload_supported(devlink->ops)); - devlink->features = features; -} -EXPORT_SYMBOL_GPL(devlink_set_features); - -static int devlink_port_netdevice_event(struct notifier_block *nb, - unsigned long event, void *ptr); - -/** - * devlink_alloc_ns - Allocate new devlink instance resources - * in specific namespace - * - * @ops: ops - * @priv_size: size of user private data - * @net: net namespace - * @dev: parent device - * - * Allocate new devlink instance resources, including devlink index - * and name. - */ -struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, - size_t priv_size, struct net *net, - struct device *dev) -{ - struct devlink *devlink; - static u32 last_id; - int ret; - - WARN_ON(!ops || !dev); - if (!devlink_reload_actions_valid(ops)) - return NULL; - - devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL); - if (!devlink) - return NULL; - - ret = xa_alloc_cyclic(&devlinks, &devlink->index, devlink, xa_limit_31b, - &last_id, GFP_KERNEL); - if (ret < 0) - goto err_xa_alloc; - - devlink->netdevice_nb.notifier_call = devlink_port_netdevice_event; - ret = register_netdevice_notifier_net(net, &devlink->netdevice_nb); - if (ret) - goto err_register_netdevice_notifier; - - devlink->dev = dev; - devlink->ops = ops; - xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC); - xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC); - write_pnet(&devlink->_net, net); - INIT_LIST_HEAD(&devlink->rate_list); - INIT_LIST_HEAD(&devlink->linecard_list); - INIT_LIST_HEAD(&devlink->sb_list); - INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list); - INIT_LIST_HEAD(&devlink->resource_list); - INIT_LIST_HEAD(&devlink->param_list); - INIT_LIST_HEAD(&devlink->region_list); - INIT_LIST_HEAD(&devlink->reporter_list); - INIT_LIST_HEAD(&devlink->trap_list); - INIT_LIST_HEAD(&devlink->trap_group_list); - INIT_LIST_HEAD(&devlink->trap_policer_list); - lockdep_register_key(&devlink->lock_key); - mutex_init(&devlink->lock); - lockdep_set_class(&devlink->lock, &devlink->lock_key); - mutex_init(&devlink->reporters_lock); - mutex_init(&devlink->linecards_lock); - refcount_set(&devlink->refcount, 1); - init_completion(&devlink->comp); - - return devlink; - -err_register_netdevice_notifier: - xa_erase(&devlinks, devlink->index); -err_xa_alloc: - kfree(devlink); - return NULL; -} -EXPORT_SYMBOL_GPL(devlink_alloc_ns); - static void devlink_trap_policer_notify(struct devlink *devlink, const struct devlink_trap_policer_item *policer_item, @@ -10029,7 +9724,7 @@ static void devlink_trap_notify(struct devlink *devlink, const struct devlink_trap_item *trap_item, enum devlink_command cmd); -static void devlink_notify_register(struct devlink *devlink) +void devlink_notify_register(struct devlink *devlink) { struct devlink_trap_policer_item *policer_item; struct devlink_trap_group_item *group_item; @@ -10070,7 +9765,7 @@ static void devlink_notify_register(struct devlink *devlink) DEVLINK_CMD_PARAM_NEW); } -static void devlink_notify_unregister(struct devlink *devlink) +void devlink_notify_unregister(struct devlink *devlink) { struct devlink_trap_policer_item *policer_item; struct devlink_trap_group_item *group_item; @@ -10107,79 +9802,6 @@ static void devlink_notify_unregister(struct devlink *devlink) devlink_notify(devlink, DEVLINK_CMD_DEL); } -/** - * devlink_register - Register devlink instance - * - * @devlink: devlink - */ -void devlink_register(struct devlink *devlink) -{ - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - /* Make sure that we are in .probe() routine */ - - xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); - devlink_notify_register(devlink); -} -EXPORT_SYMBOL_GPL(devlink_register); - -/** - * devlink_unregister - Unregister devlink instance - * - * @devlink: devlink - */ -void devlink_unregister(struct devlink *devlink) -{ - ASSERT_DEVLINK_REGISTERED(devlink); - /* Make sure that we are in .remove() routine */ - - xa_set_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); - devlink_put(devlink); - wait_for_completion(&devlink->comp); - - devlink_notify_unregister(devlink); - xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); - xa_clear_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); -} -EXPORT_SYMBOL_GPL(devlink_unregister); - -/** - * devlink_free - Free devlink instance resources - * - * @devlink: devlink - */ -void devlink_free(struct devlink *devlink) -{ - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - - mutex_destroy(&devlink->linecards_lock); - mutex_destroy(&devlink->reporters_lock); - mutex_destroy(&devlink->lock); - lockdep_unregister_key(&devlink->lock_key); - WARN_ON(!list_empty(&devlink->trap_policer_list)); - WARN_ON(!list_empty(&devlink->trap_group_list)); - WARN_ON(!list_empty(&devlink->trap_list)); - WARN_ON(!list_empty(&devlink->reporter_list)); - WARN_ON(!list_empty(&devlink->region_list)); - WARN_ON(!list_empty(&devlink->param_list)); - WARN_ON(!list_empty(&devlink->resource_list)); - WARN_ON(!list_empty(&devlink->dpipe_table_list)); - WARN_ON(!list_empty(&devlink->sb_list)); - WARN_ON(!list_empty(&devlink->rate_list)); - WARN_ON(!list_empty(&devlink->linecard_list)); - WARN_ON(!xa_empty(&devlink->ports)); - - xa_destroy(&devlink->snapshot_ids); - xa_destroy(&devlink->ports); - - WARN_ON_ONCE(unregister_netdevice_notifier_net(devlink_net(devlink), - &devlink->netdevice_nb)); - - xa_erase(&devlinks, devlink->index); - - kfree(devlink); -} -EXPORT_SYMBOL_GPL(devlink_free); - static void devlink_port_type_warn(struct work_struct *work) { WARN(true, "Type was not set for devlink port."); @@ -10480,8 +10102,8 @@ void devlink_port_type_clear(struct devlink_port *devlink_port) } EXPORT_SYMBOL_GPL(devlink_port_type_clear); -static int devlink_port_netdevice_event(struct notifier_block *nb, - unsigned long event, void *ptr) +int devlink_port_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr) { struct net_device *netdev = netdev_notifier_info_to_dev(ptr); struct devlink_port *devlink_port = netdev->devlink_port; @@ -12983,47 +12605,3 @@ int devlink_compat_switch_id_get(struct net_device *dev, return 0; } - -static void __net_exit devlink_pernet_pre_exit(struct net *net) -{ - struct devlink *devlink; - u32 actions_performed; - unsigned long index; - int err; - - /* In case network namespace is getting destroyed, reload - * all devlink instances from this namespace into init_net. - */ - devlinks_xa_for_each_registered_get(net, index, devlink) { - WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); - mutex_lock(&devlink->lock); - err = devlink_reload(devlink, &init_net, - DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - DEVLINK_RELOAD_LIMIT_UNSPEC, - &actions_performed, NULL); - mutex_unlock(&devlink->lock); - if (err && err != -EOPNOTSUPP) - pr_warn("Failed to reload devlink instance into init_net\n"); - devlink_put(devlink); - } -} - -static struct pernet_operations devlink_pernet_ops __net_initdata = { - .pre_exit = devlink_pernet_pre_exit, -}; - -static int __init devlink_init(void) -{ - int err; - - err = genl_register_family(&devlink_nl_family); - if (err) - goto out; - err = register_pernet_subsys(&devlink_pernet_ops); - -out: - WARN_ON(err); - return err; -} - -subsys_initcall(devlink_init); -- cgit v1.2.3 From 623cd13b165486afaa2df706d49209392f3764ca Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:20 -0800 Subject: devlink: split out netlink code Move out the netlink glue into a separate file. Leave the ops in the old file because we'd have to export a ton of functions. Going forward we should switch to split ops which will let us to put the new ops in the netlink.c file. Pure code move, no functional changes. Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/Makefile | 2 +- net/devlink/devl_internal.h | 31 +++++++ net/devlink/leftover.c | 212 ++------------------------------------------ net/devlink/netlink.c | 195 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 235 insertions(+), 205 deletions(-) create mode 100644 net/devlink/netlink.c (limited to 'net') diff --git a/net/devlink/Makefile b/net/devlink/Makefile index aff7da844e5d..1b1eeac59cb3 100644 --- a/net/devlink/Makefile +++ b/net/devlink/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 -obj-y := leftover.o core.o +obj-y := leftover.o core.o netlink.o diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 1c74293c6c25..fdf7634b0cd1 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -97,6 +97,20 @@ devlinks_xa_find_get_next(struct net *net, unsigned long *indexp, xa_mark_t filter); /* Netlink */ +#define DEVLINK_NL_FLAG_NEED_PORT BIT(0) +#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) +#define DEVLINK_NL_FLAG_NEED_RATE BIT(2) +#define DEVLINK_NL_FLAG_NEED_RATE_NODE BIT(3) +#define DEVLINK_NL_FLAG_NEED_LINECARD BIT(4) + +enum devlink_multicast_groups { + DEVLINK_MCGRP_CONFIG, +}; + +extern const struct genl_small_ops devlink_nl_ops[56]; + +struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs); + void devlink_notify_unregister(struct devlink *devlink); void devlink_notify_register(struct devlink *devlink); @@ -104,6 +118,9 @@ void devlink_notify_register(struct devlink *devlink); int devlink_port_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr); +struct devlink_port * +devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info); + /* Reload */ bool devlink_reload_actions_valid(const struct devlink_ops *ops); int devlink_reload(struct devlink *devlink, struct net *dest_net, @@ -115,3 +132,17 @@ static inline bool devlink_reload_supported(const struct devlink_ops *ops) { return ops->reload_down && ops->reload_up; } + +/* Line cards */ +struct devlink_linecard; + +struct devlink_linecard * +devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info); +void devlink_linecard_put(struct devlink_linecard *linecard); + +/* Rates */ +struct devlink_rate * +devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info); +struct devlink_rate * +devlink_rate_node_get_from_info(struct devlink *devlink, + struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 05f2e75b3a03..e01ba7999b91 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -148,30 +148,6 @@ static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ [DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG }, }; -static struct devlink *devlink_get_from_attrs(struct net *net, - struct nlattr **attrs) -{ - struct devlink *devlink; - unsigned long index; - char *busname; - char *devname; - - if (!attrs[DEVLINK_ATTR_BUS_NAME] || !attrs[DEVLINK_ATTR_DEV_NAME]) - return ERR_PTR(-EINVAL); - - busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]); - devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); - - devlinks_xa_for_each_registered_get(net, index, devlink) { - if (strcmp(devlink->dev->bus->name, busname) == 0 && - strcmp(dev_name(devlink->dev), devname) == 0) - return devlink; - devlink_put(devlink); - } - - return ERR_PTR(-ENODEV); -} - #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ WARN_ON_ONCE(!(devlink_port)->registered) #define ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port) \ @@ -200,8 +176,8 @@ static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, return ERR_PTR(-EINVAL); } -static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink, - struct genl_info *info) +struct devlink_port *devlink_port_get_from_info(struct devlink *devlink, + struct genl_info *info) { return devlink_port_get_from_attrs(devlink, info->attrs); } @@ -261,13 +237,13 @@ devlink_rate_node_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) return devlink_rate_node_get_by_name(devlink, rate_node_name); } -static struct devlink_rate * +struct devlink_rate * devlink_rate_node_get_from_info(struct devlink *devlink, struct genl_info *info) { return devlink_rate_node_get_from_attrs(devlink, info->attrs); } -static struct devlink_rate * +struct devlink_rate * devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info) { struct nlattr **attrs = info->attrs; @@ -318,13 +294,13 @@ devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) return ERR_PTR(-EINVAL); } -static struct devlink_linecard * +struct devlink_linecard * devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info) { return devlink_linecard_get_from_attrs(devlink, info->attrs); } -static void devlink_linecard_put(struct devlink_linecard *linecard) +void devlink_linecard_put(struct devlink_linecard *linecard) { if (refcount_dec_and_test(&linecard->refcount)) { mutex_destroy(&linecard->state_lock); @@ -633,93 +609,6 @@ devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id) return NULL; } -#define DEVLINK_NL_FLAG_NEED_PORT BIT(0) -#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) -#define DEVLINK_NL_FLAG_NEED_RATE BIT(2) -#define DEVLINK_NL_FLAG_NEED_RATE_NODE BIT(3) -#define DEVLINK_NL_FLAG_NEED_LINECARD BIT(4) - -static int devlink_nl_pre_doit(const struct genl_split_ops *ops, - struct sk_buff *skb, struct genl_info *info) -{ - struct devlink_linecard *linecard; - struct devlink_port *devlink_port; - struct devlink *devlink; - int err; - - devlink = devlink_get_from_attrs(genl_info_net(info), info->attrs); - if (IS_ERR(devlink)) - return PTR_ERR(devlink); - devl_lock(devlink); - info->user_ptr[0] = devlink; - if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) { - devlink_port = devlink_port_get_from_info(devlink, info); - if (IS_ERR(devlink_port)) { - err = PTR_ERR(devlink_port); - goto unlock; - } - info->user_ptr[1] = devlink_port; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT) { - devlink_port = devlink_port_get_from_info(devlink, info); - if (!IS_ERR(devlink_port)) - info->user_ptr[1] = devlink_port; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE) { - struct devlink_rate *devlink_rate; - - devlink_rate = devlink_rate_get_from_info(devlink, info); - if (IS_ERR(devlink_rate)) { - err = PTR_ERR(devlink_rate); - goto unlock; - } - info->user_ptr[1] = devlink_rate; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE_NODE) { - struct devlink_rate *rate_node; - - rate_node = devlink_rate_node_get_from_info(devlink, info); - if (IS_ERR(rate_node)) { - err = PTR_ERR(rate_node); - goto unlock; - } - info->user_ptr[1] = rate_node; - } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { - linecard = devlink_linecard_get_from_info(devlink, info); - if (IS_ERR(linecard)) { - err = PTR_ERR(linecard); - goto unlock; - } - info->user_ptr[1] = linecard; - } - return 0; - -unlock: - devl_unlock(devlink); - devlink_put(devlink); - return err; -} - -static void devlink_nl_post_doit(const struct genl_split_ops *ops, - struct sk_buff *skb, struct genl_info *info) -{ - struct devlink_linecard *linecard; - struct devlink *devlink; - - devlink = info->user_ptr[0]; - if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { - linecard = info->user_ptr[1]; - devlink_linecard_put(linecard); - } - devl_unlock(devlink); - devlink_put(devlink); -} - -enum devlink_multicast_groups { - DEVLINK_MCGRP_CONFIG, -}; - -static const struct genl_multicast_group devlink_nl_mcgrps[] = { - [DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME }, -}; - static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink) { if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name)) @@ -9234,76 +9123,7 @@ static int devlink_nl_cmd_trap_policer_set_doit(struct sk_buff *skb, return devlink_trap_policer_set(devlink, policer_item, info); } -static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { - [DEVLINK_ATTR_UNSPEC] = { .strict_start_type = - DEVLINK_ATTR_TRAP_POLICER_ID }, - [DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32 }, - [DEVLINK_ATTR_PORT_TYPE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_PORT_TYPE_AUTO, - DEVLINK_PORT_TYPE_IB), - [DEVLINK_ATTR_PORT_SPLIT_COUNT] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_INDEX] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_POOL_INDEX] = { .type = NLA_U16 }, - [DEVLINK_ATTR_SB_POOL_TYPE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_SB_POOL_SIZE] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 }, - [DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 }, - [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_ESWITCH_MODE_LEGACY, - DEVLINK_ESWITCH_MODE_SWITCHDEV), - [DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 }, - [DEVLINK_ATTR_RESOURCE_ID] = { .type = NLA_U64}, - [DEVLINK_ATTR_RESOURCE_SIZE] = { .type = NLA_U64}, - [DEVLINK_ATTR_PARAM_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 }, - [DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_REGION_CHUNK_ADDR] = { .type = NLA_U64 }, - [DEVLINK_ATTR_REGION_CHUNK_LEN] = { .type = NLA_U64 }, - [DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 }, - [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 }, - [DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK] = - NLA_POLICY_BITFIELD32(DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS), - [DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 }, - [DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 }, - [DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 }, - [DEVLINK_ATTR_TRAP_POLICER_ID] = { .type = NLA_U32 }, - [DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 }, - [DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 }, - [DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED }, - [DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - DEVLINK_RELOAD_ACTION_MAX), - [DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK), - [DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 }, - [DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 }, - [DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 }, - [DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 }, - [DEVLINK_ATTR_RATE_TYPE] = { .type = NLA_U16 }, - [DEVLINK_ATTR_RATE_TX_SHARE] = { .type = NLA_U64 }, - [DEVLINK_ATTR_RATE_TX_MAX] = { .type = NLA_U64 }, - [DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 }, - [DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING }, - [DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED }, - [DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 }, - [DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 }, - [DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG }, -}; - -static const struct genl_small_ops devlink_nl_ops[] = { +const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, @@ -9664,23 +9484,7 @@ static const struct genl_small_ops devlink_nl_ops[] = { .doit = devlink_nl_cmd_selftests_run, .flags = GENL_ADMIN_PERM, }, -}; - -struct genl_family devlink_nl_family __ro_after_init = { - .name = DEVLINK_GENL_NAME, - .version = DEVLINK_GENL_VERSION, - .maxattr = DEVLINK_ATTR_MAX, - .policy = devlink_nl_policy, - .netnsok = true, - .parallel_ops = true, - .pre_doit = devlink_nl_pre_doit, - .post_doit = devlink_nl_post_doit, - .module = THIS_MODULE, - .small_ops = devlink_nl_ops, - .n_small_ops = ARRAY_SIZE(devlink_nl_ops), - .resv_start_op = DEVLINK_CMD_SELFTESTS_RUN + 1, - .mcgrps = devlink_nl_mcgrps, - .n_mcgrps = ARRAY_SIZE(devlink_nl_mcgrps), + /* -- No new ops here! Use split ops going forward! -- */ }; bool devlink_reload_actions_valid(const struct devlink_ops *ops) diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c new file mode 100644 index 000000000000..ce1a7d674d14 --- /dev/null +++ b/net/devlink/netlink.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2016 Jiri Pirko + */ + +#include + +#include "devl_internal.h" + +static const struct genl_multicast_group devlink_nl_mcgrps[] = { + [DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME }, +}; + +static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { + [DEVLINK_ATTR_UNSPEC] = { .strict_start_type = + DEVLINK_ATTR_TRAP_POLICER_ID }, + [DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32 }, + [DEVLINK_ATTR_PORT_TYPE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_PORT_TYPE_AUTO, + DEVLINK_PORT_TYPE_IB), + [DEVLINK_ATTR_PORT_SPLIT_COUNT] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_INDEX] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_POOL_INDEX] = { .type = NLA_U16 }, + [DEVLINK_ATTR_SB_POOL_TYPE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_SB_POOL_SIZE] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 }, + [DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 }, + [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_RANGE(NLA_U16, DEVLINK_ESWITCH_MODE_LEGACY, + DEVLINK_ESWITCH_MODE_SWITCHDEV), + [DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 }, + [DEVLINK_ATTR_RESOURCE_ID] = { .type = NLA_U64}, + [DEVLINK_ATTR_RESOURCE_SIZE] = { .type = NLA_U64}, + [DEVLINK_ATTR_PARAM_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_PARAM_TYPE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 }, + [DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_REGION_CHUNK_ADDR] = { .type = NLA_U64 }, + [DEVLINK_ATTR_REGION_CHUNK_LEN] = { .type = NLA_U64 }, + [DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 }, + [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 }, + [DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK] = + NLA_POLICY_BITFIELD32(DEVLINK_SUPPORTED_FLASH_OVERWRITE_SECTIONS), + [DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 }, + [DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_NETNS_PID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_NETNS_FD] = { .type = NLA_U32 }, + [DEVLINK_ATTR_NETNS_ID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8 }, + [DEVLINK_ATTR_TRAP_POLICER_ID] = { .type = NLA_U32 }, + [DEVLINK_ATTR_TRAP_POLICER_RATE] = { .type = NLA_U64 }, + [DEVLINK_ATTR_TRAP_POLICER_BURST] = { .type = NLA_U64 }, + [DEVLINK_ATTR_PORT_FUNCTION] = { .type = NLA_NESTED }, + [DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + DEVLINK_RELOAD_ACTION_MAX), + [DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK), + [DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 }, + [DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 }, + [DEVLINK_ATTR_RATE_TYPE] = { .type = NLA_U16 }, + [DEVLINK_ATTR_RATE_TX_SHARE] = { .type = NLA_U64 }, + [DEVLINK_ATTR_RATE_TX_MAX] = { .type = NLA_U64 }, + [DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 }, + [DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING }, + [DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED }, + [DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 }, + [DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 }, + [DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG }, +}; + +struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs) +{ + struct devlink *devlink; + unsigned long index; + char *busname; + char *devname; + + if (!attrs[DEVLINK_ATTR_BUS_NAME] || !attrs[DEVLINK_ATTR_DEV_NAME]) + return ERR_PTR(-EINVAL); + + busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]); + devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); + + devlinks_xa_for_each_registered_get(net, index, devlink) { + if (strcmp(devlink->dev->bus->name, busname) == 0 && + strcmp(dev_name(devlink->dev), devname) == 0) + return devlink; + devlink_put(devlink); + } + + return ERR_PTR(-ENODEV); +} + +static int devlink_nl_pre_doit(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + struct devlink_linecard *linecard; + struct devlink_port *devlink_port; + struct devlink *devlink; + int err; + + devlink = devlink_get_from_attrs(genl_info_net(info), info->attrs); + if (IS_ERR(devlink)) + return PTR_ERR(devlink); + devl_lock(devlink); + info->user_ptr[0] = devlink; + if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) { + devlink_port = devlink_port_get_from_info(devlink, info); + if (IS_ERR(devlink_port)) { + err = PTR_ERR(devlink_port); + goto unlock; + } + info->user_ptr[1] = devlink_port; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT) { + devlink_port = devlink_port_get_from_info(devlink, info); + if (!IS_ERR(devlink_port)) + info->user_ptr[1] = devlink_port; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE) { + struct devlink_rate *devlink_rate; + + devlink_rate = devlink_rate_get_from_info(devlink, info); + if (IS_ERR(devlink_rate)) { + err = PTR_ERR(devlink_rate); + goto unlock; + } + info->user_ptr[1] = devlink_rate; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_RATE_NODE) { + struct devlink_rate *rate_node; + + rate_node = devlink_rate_node_get_from_info(devlink, info); + if (IS_ERR(rate_node)) { + err = PTR_ERR(rate_node); + goto unlock; + } + info->user_ptr[1] = rate_node; + } else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { + linecard = devlink_linecard_get_from_info(devlink, info); + if (IS_ERR(linecard)) { + err = PTR_ERR(linecard); + goto unlock; + } + info->user_ptr[1] = linecard; + } + return 0; + +unlock: + devl_unlock(devlink); + devlink_put(devlink); + return err; +} + +static void devlink_nl_post_doit(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + struct devlink_linecard *linecard; + struct devlink *devlink; + + devlink = info->user_ptr[0]; + if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { + linecard = info->user_ptr[1]; + devlink_linecard_put(linecard); + } + devl_unlock(devlink); + devlink_put(devlink); +} + +struct genl_family devlink_nl_family __ro_after_init = { + .name = DEVLINK_GENL_NAME, + .version = DEVLINK_GENL_VERSION, + .maxattr = DEVLINK_ATTR_MAX, + .policy = devlink_nl_policy, + .netnsok = true, + .parallel_ops = true, + .pre_doit = devlink_nl_pre_doit, + .post_doit = devlink_nl_post_doit, + .module = THIS_MODULE, + .small_ops = devlink_nl_ops, + .n_small_ops = ARRAY_SIZE(devlink_nl_ops), + .resv_start_op = DEVLINK_CMD_SELFTESTS_RUN + 1, + .mcgrps = devlink_nl_mcgrps, + .n_mcgrps = ARRAY_SIZE(devlink_nl_mcgrps), +}; -- cgit v1.2.3 From 2c7bc10d0f7b0f66d9042ed88bb3ecd588352a00 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:21 -0800 Subject: netlink: add macro for checking dump ctx size We encourage casting struct netlink_callback::ctx to a local struct (in a comment above the field). Provide a convenience macro for checking if the local struct fits into the ctx. Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- include/linux/netlink.h | 4 ++++ net/netfilter/nf_conntrack_netlink.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index d81bde5a5844..38f6334f408c 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -263,6 +263,10 @@ struct netlink_callback { }; }; +#define NL_ASSET_DUMP_CTX_FITS(type_name) \ + BUILD_BUG_ON(sizeof(type_name) > \ + sizeof_field(struct netlink_callback, ctx)) + struct netlink_notify { struct net *net; u32 portid; diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 1286ae7d4609..90672e293e57 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -3866,7 +3866,7 @@ static int __init ctnetlink_init(void) { int ret; - BUILD_BUG_ON(sizeof(struct ctnetlink_list_dump_ctx) > sizeof_field(struct netlink_callback, ctx)); + NL_ASSET_DUMP_CTX_FITS(struct ctnetlink_list_dump_ctx); ret = nfnetlink_subsys_register(&ctnl_subsys); if (ret < 0) { -- cgit v1.2.3 From 3015f8224961dbc701c7e0b9e74823289efbec0a Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:22 -0800 Subject: devlink: use an explicit structure for dump context Create a dump context structure instead of using cb->args as an unsigned long array. This is a pure conversion which is intended to be as much of a noop as possible. Subsequent changes will use this to simplify the code. The two non-trivial parts are: - devlink_nl_cmd_health_reporter_dump_get_dumpit() checks args[0] to see if devlink_fmsg_dumpit() has already been called (whether this is the first msg), but doesn't use the exact value, so we can drop the local variable there already - devlink_nl_cmd_region_read_dumpit() uses args[0] for address but we'll use args[1] now, shouldn't matter Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 23 +++++++++++ net/devlink/leftover.c | 98 +++++++++++++++++++++++++++------------------ 2 files changed, 81 insertions(+), 40 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index fdf7634b0cd1..171219597cc6 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -107,6 +107,21 @@ enum devlink_multicast_groups { DEVLINK_MCGRP_CONFIG, }; +/* state held across netlink dumps */ +struct devlink_nl_dump_state { + int idx; + union { + /* DEVLINK_CMD_REGION_READ */ + struct { + u64 start_offset; + }; + /* DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET */ + struct { + u64 dump_ts; + }; + }; +}; + extern const struct genl_small_ops devlink_nl_ops[56]; struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs); @@ -114,6 +129,14 @@ struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs); void devlink_notify_unregister(struct devlink *devlink); void devlink_notify_register(struct devlink *devlink); +static inline struct devlink_nl_dump_state * +devlink_dump_state(struct netlink_callback *cb) +{ + NL_ASSET_DUMP_CTX_FITS(struct devlink_nl_dump_state); + + return (struct devlink_nl_dump_state *)cb->ctx; +} + /* Ports */ int devlink_port_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index e01ba7999b91..56ff63b41e96 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1222,9 +1222,10 @@ static void devlink_rate_notify(struct devlink_rate *devlink_rate, static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_rate *devlink_rate; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -1256,7 +1257,7 @@ out: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -1317,8 +1318,9 @@ static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info) static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err; @@ -1342,7 +1344,7 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, idx++; } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -1371,10 +1373,11 @@ static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb, static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_port *devlink_port; unsigned long index, port_index; - int start = cb->args[0]; + int start = state->idx; int idx = 0; int err; @@ -1401,7 +1404,7 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -2150,9 +2153,10 @@ static int devlink_nl_cmd_linecard_get_doit(struct sk_buff *skb, static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_linecard *linecard; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err; @@ -2183,7 +2187,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -2412,9 +2416,10 @@ static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb, static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err; @@ -2442,7 +2447,7 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -2554,9 +2559,10 @@ static int __sb_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -2587,7 +2593,7 @@ out: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -2769,9 +2775,10 @@ static int __sb_port_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -2802,7 +2809,7 @@ out: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -3012,9 +3019,10 @@ static int devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -3046,7 +3054,7 @@ out: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -4871,8 +4879,9 @@ static int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -4899,7 +4908,7 @@ inc: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -5351,9 +5360,10 @@ static void devlink_param_notify(struct devlink *devlink, static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_param_item *param_item; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -5386,7 +5396,7 @@ out: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -6095,8 +6105,9 @@ out: static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -6109,7 +6120,7 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, goto out; } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -6394,6 +6405,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { const struct genl_dumpit_info *info = genl_dumpit_info(cb); + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct nlattr *chunks_attr, *region_attr, *snapshot_attr; u64 ret_offset, start_offset, end_offset = U64_MAX; struct nlattr **attrs = info->attrs; @@ -6407,7 +6419,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, void *hdr; int err; - start_offset = *((u64 *)&cb->args[0]); + start_offset = state->start_offset; devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); if (IS_ERR(devlink)) @@ -6546,7 +6558,7 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, goto nla_put_failure; } - *((u64 *)&cb->args[0]) = ret_offset; + state->start_offset = ret_offset; nla_nest_end(skb, chunks_attr); genlmsg_end(skb, hdr); @@ -6745,8 +6757,9 @@ static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -6775,7 +6788,7 @@ inc: if (err != -EMSGSIZE) return err; - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -7344,7 +7357,8 @@ static int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, struct netlink_callback *cb, enum devlink_command cmd) { - int index = cb->args[0]; + struct devlink_nl_dump_state *state = devlink_dump_state(cb); + int index = state->idx; int tmp_index = index; void *hdr; int err; @@ -7360,7 +7374,7 @@ static int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, if ((err && err != -EMSGSIZE) || tmp_index == index) goto nla_put_failure; - cb->args[0] = index; + state->idx = index; genlmsg_end(skb, hdr); return skb->len; @@ -7911,11 +7925,12 @@ static int devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_health_reporter *reporter; unsigned long index, port_index; struct devlink_port *port; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; int idx = 0; int err; @@ -7970,7 +7985,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -8082,8 +8097,8 @@ static int devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_health_reporter *reporter; - u64 start = cb->args[0]; int err; reporter = devlink_health_reporter_get_from_cb(cb); @@ -8095,13 +8110,13 @@ devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, goto out; } mutex_lock(&reporter->dump_lock); - if (!start) { + if (!state->idx) { err = devlink_health_do_dump(reporter, NULL, cb->extack); if (err) goto unlock; - cb->args[1] = reporter->dump_ts; + state->dump_ts = reporter->dump_ts; } - if (!reporter->dump_fmsg || cb->args[1] != reporter->dump_ts) { + if (!reporter->dump_fmsg || state->dump_ts != reporter->dump_ts) { NL_SET_ERR_MSG_MOD(cb->extack, "Dump trampled, please retry"); err = -EAGAIN; goto unlock; @@ -8495,9 +8510,10 @@ err_trap_fill: static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_trap_item *trap_item; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err; @@ -8525,7 +8541,7 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -8710,11 +8726,12 @@ err_trap_group_fill: static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); enum devlink_command cmd = DEVLINK_CMD_TRAP_GROUP_NEW; struct devlink_trap_group_item *group_item; u32 portid = NETLINK_CB(cb->skb).portid; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err; @@ -8743,7 +8760,7 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } @@ -9014,11 +9031,12 @@ err_trap_policer_fill: static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); enum devlink_command cmd = DEVLINK_CMD_TRAP_POLICER_NEW; struct devlink_trap_policer_item *policer_item; u32 portid = NETLINK_CB(cb->skb).portid; struct devlink *devlink; - int start = cb->args[0]; + int start = state->idx; unsigned long index; int idx = 0; int err; @@ -9047,7 +9065,7 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - cb->args[0] = idx; + state->idx = idx; return msg->len; } -- cgit v1.2.3 From 20615659b514c8f94e9fef590e873b5ca673a49d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:23 -0800 Subject: devlink: remove start variables from dumps The start variables made the code clearer when we had to access cb->args[0] directly, as the name args doesn't explain much. Now that we use a structure to hold state this seems no longer needed. Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 55 +++++++++++++++++--------------------------------- 1 file changed, 19 insertions(+), 36 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 56ff63b41e96..d88461b33ddf 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1225,7 +1225,6 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_rate *devlink_rate; struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -1236,7 +1235,7 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; u32 id = NETLINK_CB(cb->skb).portid; - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -1320,13 +1319,12 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err; devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < start) { + if (idx < state->idx) { idx++; devlink_put(devlink); continue; @@ -1377,14 +1375,13 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, struct devlink *devlink; struct devlink_port *devlink_port; unsigned long index, port_index; - int start = state->idx; int idx = 0; int err; devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { devl_lock(devlink); xa_for_each(&devlink->ports, port_index, devlink_port) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -2156,7 +2153,6 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_linecard *linecard; struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err; @@ -2164,7 +2160,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { mutex_lock(&devlink->linecards_lock); list_for_each_entry(linecard, &devlink->linecard_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -2419,7 +2415,6 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = state->idx; unsigned long index; int idx = 0; int err; @@ -2427,7 +2422,7 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { devl_lock(devlink); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -2562,7 +2557,6 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -2573,8 +2567,8 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, devl_lock(devlink); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_pool_get_dumpit(msg, start, &idx, devlink, - devlink_sb, + err = __sb_pool_get_dumpit(msg, state->idx, &idx, + devlink, devlink_sb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq); if (err == -EOPNOTSUPP) { @@ -2778,7 +2772,6 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -2789,7 +2782,7 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, devl_lock(devlink); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_port_pool_get_dumpit(msg, start, &idx, + err = __sb_port_pool_get_dumpit(msg, state->idx, &idx, devlink, devlink_sb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq); @@ -3022,7 +3015,6 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; struct devlink_sb *devlink_sb; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -3033,9 +3025,8 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, devl_lock(devlink); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_tc_pool_bind_get_dumpit(msg, start, &idx, - devlink, - devlink_sb, + err = __sb_tc_pool_bind_get_dumpit(msg, state->idx, &idx, + devlink, devlink_sb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq); if (err == -EOPNOTSUPP) { @@ -4881,13 +4872,12 @@ static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < start || !devlink->ops->selftest_check) + if (idx < state->idx || !devlink->ops->selftest_check) goto inc; devl_lock(devlink); @@ -5363,7 +5353,6 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_param_item *param_item; struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; @@ -5371,7 +5360,7 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { devl_lock(devlink); list_for_each_entry(param_item, &devlink->param_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -6107,14 +6096,13 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, - &idx, start); + &idx, state->idx); devlink_put(devlink); if (err) goto out; @@ -6759,13 +6747,12 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err = 0; devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < start) + if (idx < state->idx) goto inc; devl_lock(devlink); @@ -7930,7 +7917,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, unsigned long index, port_index; struct devlink_port *port; struct devlink *devlink; - int start = state->idx; int idx = 0; int err; @@ -7938,7 +7924,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, mutex_lock(&devlink->reporters_lock); list_for_each_entry(reporter, &devlink->reporter_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -7962,7 +7948,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, xa_for_each(&devlink->ports, port_index, port) { mutex_lock(&port->reporters_lock); list_for_each_entry(reporter, &port->reporter_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -8513,7 +8499,6 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_trap_item *trap_item; struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err; @@ -8521,7 +8506,7 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { devl_lock(devlink); list_for_each_entry(trap_item, &devlink->trap_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -8731,7 +8716,6 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, struct devlink_trap_group_item *group_item; u32 portid = NETLINK_CB(cb->skb).portid; struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err; @@ -8740,7 +8724,7 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, devl_lock(devlink); list_for_each_entry(group_item, &devlink->trap_group_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } @@ -9036,7 +9020,6 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, struct devlink_trap_policer_item *policer_item; u32 portid = NETLINK_CB(cb->skb).portid; struct devlink *devlink; - int start = state->idx; unsigned long index; int idx = 0; int err; @@ -9045,7 +9028,7 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, devl_lock(devlink); list_for_each_entry(policer_item, &devlink->trap_policer_list, list) { - if (idx < start) { + if (idx < state->idx) { idx++; continue; } -- cgit v1.2.3 From 8861c0933c78e3631fe752feadc0d2a6e5eab1e1 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:24 -0800 Subject: devlink: drop the filter argument from devlinks_xa_find_get Looks like devlinks_xa_find_get() was intended to get the mark from the @filter argument. It doesn't actually use @filter, passing DEVLINK_REGISTERED to xa_find_fn() directly. Walking marks other than registered is unlikely so drop @filter argument completely. Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/core.c | 12 +++++------- net/devlink/devl_internal.h | 15 +++++---------- 2 files changed, 10 insertions(+), 17 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index c084eafa17fb..3a99bf84632e 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -92,7 +92,7 @@ void devlink_put(struct devlink *devlink) } static struct devlink * -devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter, +devlinks_xa_find_get(struct net *net, unsigned long *indexp, void * (*xa_find_fn)(struct xarray *, unsigned long *, unsigned long, xa_mark_t)) { @@ -125,17 +125,15 @@ unlock: } struct devlink * -devlinks_xa_find_get_first(struct net *net, unsigned long *indexp, - xa_mark_t filter) +devlinks_xa_find_get_first(struct net *net, unsigned long *indexp) { - return devlinks_xa_find_get(net, indexp, filter, xa_find); + return devlinks_xa_find_get(net, indexp, xa_find); } struct devlink * -devlinks_xa_find_get_next(struct net *net, unsigned long *indexp, - xa_mark_t filter) +devlinks_xa_find_get_next(struct net *net, unsigned long *indexp) { - return devlinks_xa_find_get(net, indexp, filter, xa_find_after); + return devlinks_xa_find_get(net, indexp, xa_find_after); } /** diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 171219597cc6..86b321aef3eb 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -81,20 +81,15 @@ extern struct genl_family devlink_nl_family; * devlink_put() needs to be called for each iterated devlink pointer * in loop body in order to release the reference. */ -#define devlinks_xa_for_each_get(net, index, devlink, filter) \ - for (index = 0, \ - devlink = devlinks_xa_find_get_first(net, &index, filter); \ - devlink; devlink = devlinks_xa_find_get_next(net, &index, filter)) - #define devlinks_xa_for_each_registered_get(net, index, devlink) \ - devlinks_xa_for_each_get(net, index, devlink, DEVLINK_REGISTERED) + for (index = 0, \ + devlink = devlinks_xa_find_get_first(net, &index); \ + devlink; devlink = devlinks_xa_find_get_next(net, &index)) struct devlink * -devlinks_xa_find_get_first(struct net *net, unsigned long *indexp, - xa_mark_t filter); +devlinks_xa_find_get_first(struct net *net, unsigned long *indexp); struct devlink * -devlinks_xa_find_get_next(struct net *net, unsigned long *indexp, - xa_mark_t filter); +devlinks_xa_find_get_next(struct net *net, unsigned long *indexp); /* Netlink */ #define DEVLINK_NL_FLAG_NEED_PORT BIT(0) -- cgit v1.2.3 From a0e13dfdc391dfb2a9b165e2475fb1796c9be98d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:25 -0800 Subject: devlink: health: combine loops in dump Walk devlink instances only once. Dump the instance reporters and port reporters before moving to the next instance. User space should not depend on ordering of messages. This will make improving stability of the walk easier. Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index d88461b33ddf..83cd7bd55941 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7940,10 +7940,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, idx++; } mutex_unlock(&devlink->reporters_lock); - devlink_put(devlink); - } - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { devl_lock(devlink); xa_for_each(&devlink->ports, port_index, port) { mutex_lock(&port->reporters_lock); -- cgit v1.2.3 From 731d69a6bd13b7c0cdbd3607edfa681269d54828 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:26 -0800 Subject: devlink: restart dump based on devlink instance ids (simple) xarray gives each devlink instance an id and allows us to restart walk based on that id quite neatly. This is nice both from the perspective of code brevity and from the stability of the dump (devlink instances disappearing from before the resumption point will not cause inconsistent dumps). This patch takes care of simple cases where state->idx counts devlink instances only. Reviewed-by: Jacob Keller Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/core.c | 2 +- net/devlink/devl_internal.h | 16 ++++++++++++++++ net/devlink/leftover.c | 36 ++++++++---------------------------- 3 files changed, 25 insertions(+), 29 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index 3a99bf84632e..371d6821315d 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -91,7 +91,7 @@ void devlink_put(struct devlink *devlink) call_rcu(&devlink->rcu, __devlink_put_rcu); } -static struct devlink * +struct devlink * devlinks_xa_find_get(struct net *net, unsigned long *indexp, void * (*xa_find_fn)(struct xarray *, unsigned long *, unsigned long, xa_mark_t)) diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 86b321aef3eb..b9bb6f73a7c1 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -87,6 +87,10 @@ extern struct genl_family devlink_nl_family; devlink; devlink = devlinks_xa_find_get_next(net, &index)) struct devlink * +devlinks_xa_find_get(struct net *net, unsigned long *indexp, + void * (*xa_find_fn)(struct xarray *, unsigned long *, + unsigned long, xa_mark_t)); +struct devlink * devlinks_xa_find_get_first(struct net *net, unsigned long *indexp); struct devlink * devlinks_xa_find_get_next(struct net *net, unsigned long *indexp); @@ -104,6 +108,7 @@ enum devlink_multicast_groups { /* state held across netlink dumps */ struct devlink_nl_dump_state { + unsigned long instance; int idx; union { /* DEVLINK_CMD_REGION_READ */ @@ -117,6 +122,17 @@ struct devlink_nl_dump_state { }; }; +/* Iterate over registered devlink instances for devlink dump. + * devlink_put() needs to be called for each iterated devlink pointer + * in loop body in order to release the reference. + * Note: this is NOT a generic iterator, it makes assumptions about the use + * of @state and can only be used once per dumpit implementation. + */ +#define devlink_dump_for_each_instance_get(msg, state, devlink) \ + for (; (devlink = devlinks_xa_find_get(sock_net(msg->sk), \ + &state->instance, xa_find)); \ + state->instance++) + extern const struct genl_small_ops devlink_nl_ops[56]; struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 83cd7bd55941..db7c095a0d75 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1319,17 +1319,9 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - unsigned long index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < state->idx) { - idx++; - devlink_put(devlink); - continue; - } - + devlink_dump_for_each_instance_get(msg, state, devlink) { devl_lock(devlink); err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, NETLINK_CB(cb->skb).portid, @@ -1339,10 +1331,8 @@ static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, if (err) goto out; - idx++; } out: - state->idx = idx; return msg->len; } @@ -4872,13 +4862,13 @@ static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < state->idx || !devlink->ops->selftest_check) - goto inc; + devlink_dump_for_each_instance_get(msg, state, devlink) { + if (!devlink->ops->selftest_check) { + devlink_put(devlink); + continue; + } devl_lock(devlink); err = devlink_nl_selftests_fill(msg, devlink, @@ -4890,15 +4880,13 @@ static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, devlink_put(devlink); break; } -inc: - idx++; + devlink_put(devlink); } if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } @@ -6747,14 +6735,9 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { - if (idx < state->idx) - goto inc; - + devlink_dump_for_each_instance_get(msg, state, devlink) { devl_lock(devlink); err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, NETLINK_CB(cb->skb).portid, @@ -6767,15 +6750,12 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, devlink_put(devlink); break; } -inc: - idx++; devlink_put(devlink); } if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } -- cgit v1.2.3 From a8f947073f4ae489e8b9f3c19a17341cb260f660 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:27 -0800 Subject: devlink: restart dump based on devlink instance ids (nested) Use xarray id for cases of simple sub-object iteration. We'll now use the state->instance for the devlink instances and state->idx for subobject index. Moving the definition of idx into the inner loop makes sense, so while at it also move other sub-object local variables into the loop. Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 2 +- net/devlink/leftover.c | 93 +++++++++++++++++++++++---------------------- 2 files changed, 49 insertions(+), 46 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index b9bb6f73a7c1..1e32bded7c8a 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -131,7 +131,7 @@ struct devlink_nl_dump_state { #define devlink_dump_for_each_instance_get(msg, state, devlink) \ for (; (devlink = devlinks_xa_find_get(sock_net(msg->sk), \ &state->instance, xa_find)); \ - state->instance++) + state->instance++, state->idx = 0) extern const struct genl_small_ops devlink_nl_ops[56]; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index db7c095a0d75..358cdfbb1393 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1223,13 +1223,13 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink_rate *devlink_rate; struct devlink *devlink; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_rate *devlink_rate; + int idx = 0; + devl_lock(devlink); list_for_each_entry(devlink_rate, &devlink->rate_list, list) { enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; @@ -1245,6 +1245,7 @@ static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -1256,7 +1257,6 @@ out: if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } @@ -1363,12 +1363,13 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - struct devlink_port *devlink_port; - unsigned long index, port_index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_port *devlink_port; + unsigned long port_index; + int idx = 0; + devl_lock(devlink); xa_for_each(&devlink->ports, port_index, devlink_port) { if (idx < state->idx) { @@ -1383,6 +1384,7 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -1391,7 +1393,6 @@ static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } @@ -2143,11 +2144,11 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_linecard *linecard; struct devlink *devlink; - unsigned long index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + int idx = 0; + mutex_lock(&devlink->linecards_lock); list_for_each_entry(linecard, &devlink->linecard_list, list) { if (idx < state->idx) { @@ -2165,6 +2166,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, if (err) { mutex_unlock(&devlink->linecards_lock); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -2173,7 +2175,6 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } @@ -2404,12 +2405,12 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - struct devlink_sb *devlink_sb; - unsigned long index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_sb *devlink_sb; + int idx = 0; + devl_lock(devlink); list_for_each_entry(devlink_sb, &devlink->sb_list, list) { if (idx < state->idx) { @@ -2424,6 +2425,7 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -2432,7 +2434,6 @@ static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } @@ -5339,13 +5340,13 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink_param_item *param_item; struct devlink *devlink; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_param_item *param_item; + int idx = 0; + devl_lock(devlink); list_for_each_entry(param_item, &devlink->param_list, list) { if (idx < state->idx) { @@ -5362,6 +5363,7 @@ static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, } else if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -5373,7 +5375,6 @@ out: if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } @@ -7893,14 +7894,15 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink_health_reporter *reporter; - unsigned long index, port_index; - struct devlink_port *port; struct devlink *devlink; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_health_reporter *reporter; + struct devlink_port *port; + unsigned long port_index; + int idx = 0; + mutex_lock(&devlink->reporters_lock); list_for_each_entry(reporter, &devlink->reporter_list, list) { @@ -7915,6 +7917,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, if (err) { mutex_unlock(&devlink->reporters_lock); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -7938,6 +7941,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, mutex_unlock(&port->reporters_lock); devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -7948,7 +7952,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } @@ -8474,13 +8477,13 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink_trap_item *trap_item; struct devlink *devlink; - unsigned long index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_trap_item *trap_item; + int idx = 0; + devl_lock(devlink); list_for_each_entry(trap_item, &devlink->trap_list, list) { if (idx < state->idx) { @@ -8495,6 +8498,7 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -8503,7 +8507,6 @@ static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } @@ -8690,14 +8693,14 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); enum devlink_command cmd = DEVLINK_CMD_TRAP_GROUP_NEW; - struct devlink_trap_group_item *group_item; u32 portid = NETLINK_CB(cb->skb).portid; struct devlink *devlink; - unsigned long index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_trap_group_item *group_item; + int idx = 0; + devl_lock(devlink); list_for_each_entry(group_item, &devlink->trap_group_list, list) { @@ -8713,6 +8716,7 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -8721,7 +8725,6 @@ static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } @@ -8994,14 +8997,14 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); enum devlink_command cmd = DEVLINK_CMD_TRAP_POLICER_NEW; - struct devlink_trap_policer_item *policer_item; u32 portid = NETLINK_CB(cb->skb).portid; struct devlink *devlink; - unsigned long index; - int idx = 0; int err; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_trap_policer_item *policer_item; + int idx = 0; + devl_lock(devlink); list_for_each_entry(policer_item, &devlink->trap_policer_list, list) { @@ -9017,6 +9020,7 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } idx++; @@ -9025,7 +9029,6 @@ static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, devlink_put(devlink); } out: - state->idx = idx; return msg->len; } -- cgit v1.2.3 From c9666bac537e0f98ce15ea57c5fb25fc0a9a29b1 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:28 -0800 Subject: devlink: restart dump based on devlink instance ids (function) Use xarray id for cases of sub-objects which are iterated in a function. Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 358cdfbb1393..091f8814185d 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -2547,12 +2547,12 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - struct devlink_sb *devlink_sb; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_sb *devlink_sb; + int idx = 0; + if (!devlink->ops->sb_pool_get) goto retry; @@ -2567,6 +2567,7 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, } else if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } } @@ -2578,7 +2579,6 @@ out: if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } @@ -2762,12 +2762,12 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - struct devlink_sb *devlink_sb; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_sb *devlink_sb; + int idx = 0; + if (!devlink->ops->sb_port_pool_get) goto retry; @@ -2782,6 +2782,7 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, } else if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } } @@ -2793,7 +2794,6 @@ out: if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } @@ -3005,12 +3005,12 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - struct devlink_sb *devlink_sb; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + struct devlink_sb *devlink_sb; + int idx = 0; + if (!devlink->ops->sb_tc_pool_bind_get) goto retry; @@ -3025,6 +3025,7 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, } else if (err) { devl_unlock(devlink); devlink_put(devlink); + state->idx = idx; goto out; } } @@ -3036,7 +3037,6 @@ out: if (err != -EMSGSIZE) return err; - state->idx = idx; return msg->len; } @@ -6085,19 +6085,20 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink *devlink; - unsigned long index; - int idx = 0; int err = 0; - devlinks_xa_for_each_registered_get(sock_net(msg->sk), index, devlink) { + devlink_dump_for_each_instance_get(msg, state, devlink) { + int idx = 0; + err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, &idx, state->idx); devlink_put(devlink); - if (err) + if (err) { + state->idx = idx; goto out; + } } out: - state->idx = idx; return msg->len; } -- cgit v1.2.3 From e4d5015bc11bcc5123ec8805d8c7f1e28a77a8a9 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:29 -0800 Subject: devlink: uniformly take the devlink instance lock in the dump loop Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit(). This way all dumps will take the instance lock in the main iteration loop directly, making refactoring and reading the code easier. Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 091f8814185d..4c91143d0792 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -6050,9 +6050,8 @@ static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, struct devlink_region *region; struct devlink_port *port; unsigned long port_index; - int err = 0; + int err; - devl_lock(devlink); list_for_each_entry(region, &devlink->region_list, list) { if (*idx < start) { (*idx)++; @@ -6064,7 +6063,7 @@ static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, cb->nlh->nlmsg_seq, NLM_F_MULTI, region); if (err) - goto out; + return err; (*idx)++; } @@ -6072,12 +6071,10 @@ static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, err = devlink_nl_cmd_region_get_port_dumpit(msg, cb, port, idx, start); if (err) - goto out; + return err; } -out: - devl_unlock(devlink); - return err; + return 0; } static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, @@ -6090,8 +6087,10 @@ static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, devlink_dump_for_each_instance_get(msg, state, devlink) { int idx = 0; + devl_lock(devlink); err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, &idx, state->idx); + devl_unlock(devlink); devlink_put(devlink); if (err) { state->idx = idx; -- cgit v1.2.3 From 07f3af66089e20fe439b219d3c9d3e68d964193f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:30 -0800 Subject: devlink: add by-instance dump infra Most dumpit implementations walk the devlink instances. This requires careful lock taking and reference dropping. Factor the loop out and provide just a callback to handle a single instance dump. Convert one user as an example, other users converted in the next change. Slightly inspired by ethtool netlink code. Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 10 +++++++++ net/devlink/leftover.c | 55 ++++++++++++++++++++------------------------- net/devlink/netlink.c | 34 ++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 31 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 1e32bded7c8a..856954e4a02d 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -122,6 +122,11 @@ struct devlink_nl_dump_state { }; }; +struct devlink_gen_cmd { + int (*dump_one)(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb); +}; + /* Iterate over registered devlink instances for devlink dump. * devlink_put() needs to be called for each iterated devlink pointer * in loop body in order to release the reference. @@ -140,6 +145,9 @@ struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs); void devlink_notify_unregister(struct devlink *devlink); void devlink_notify_register(struct devlink *devlink); +int devlink_nl_instance_iter_dump(struct sk_buff *msg, + struct netlink_callback *cb); + static inline struct devlink_nl_dump_state * devlink_dump_state(struct netlink_callback *cb) { @@ -175,6 +183,8 @@ devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info); void devlink_linecard_put(struct devlink_linecard *linecard); /* Rates */ +extern const struct devlink_gen_cmd devl_gen_rate_get; + struct devlink_rate * devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info); struct devlink_rate * diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 4c91143d0792..6b5d60c91816 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1219,47 +1219,40 @@ static void devlink_rate_notify(struct devlink_rate *devlink_rate, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); } -static int devlink_nl_cmd_rate_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; + struct devlink_rate *devlink_rate; + int idx = 0; int err = 0; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_rate *devlink_rate; - int idx = 0; - - devl_lock(devlink); - list_for_each_entry(devlink_rate, &devlink->rate_list, list) { - enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; - u32 id = NETLINK_CB(cb->skb).portid; + list_for_each_entry(devlink_rate, &devlink->rate_list, list) { + enum devlink_command cmd = DEVLINK_CMD_RATE_NEW; + u32 id = NETLINK_CB(cb->skb).portid; - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_rate_fill(msg, devlink_rate, cmd, id, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, NULL); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_rate_fill(msg, devlink_rate, cmd, id, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, NULL); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - if (err != -EMSGSIZE) - return err; - return msg->len; + return err; } +const struct devlink_gen_cmd devl_gen_rate_get = { + .dump_one = devlink_nl_cmd_rate_get_dump_one, +}; + static int devlink_nl_cmd_rate_get_doit(struct sk_buff *skb, struct genl_info *info) { @@ -9130,7 +9123,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_RATE_GET, .doit = devlink_nl_cmd_rate_get_doit, - .dumpit = devlink_nl_cmd_rate_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .internal_flags = DEVLINK_NL_FLAG_NEED_RATE, /* can be retrieved by unprivileged users */ }, diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index ce1a7d674d14..82ee5621bd9c 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -5,6 +5,7 @@ */ #include +#include #include "devl_internal.h" @@ -177,6 +178,39 @@ static void devlink_nl_post_doit(const struct genl_split_ops *ops, devlink_put(devlink); } +static const struct devlink_gen_cmd *devl_gen_cmds[] = { + [DEVLINK_CMD_RATE_GET] = &devl_gen_rate_get, +}; + +int devlink_nl_instance_iter_dump(struct sk_buff *msg, + struct netlink_callback *cb) +{ + const struct genl_dumpit_info *info = genl_dumpit_info(cb); + struct devlink_nl_dump_state *state = devlink_dump_state(cb); + const struct devlink_gen_cmd *cmd; + struct devlink *devlink; + int err = 0; + + cmd = devl_gen_cmds[info->op.cmd]; + + devlink_dump_for_each_instance_get(msg, state, devlink) { + devl_lock(devlink); + err = cmd->dump_one(msg, devlink, cb); + devl_unlock(devlink); + devlink_put(devlink); + + if (err) + break; + + /* restart sub-object walk for the next instance */ + state->idx = 0; + } + + if (err != -EMSGSIZE) + return err; + return msg->len; +} + struct genl_family devlink_nl_family __ro_after_init = { .name = DEVLINK_GENL_NAME, .version = DEVLINK_GENL_VERSION, -- cgit v1.2.3 From 5ce76d78b99650c0b22f466060d8b75df9123511 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 4 Jan 2023 20:05:31 -0800 Subject: devlink: convert remaining dumps to the by-instance scheme Soon we'll have to check if a devlink instance is alive after locking it. Convert to the by-instance dumping scheme to make refactoring easier. Most of the subobject code no longer has to worry about any devlink locking / lifetime rules (the only ones that still do are the two subject types which stubbornly use their own locking). Both dump and do callbacks are given a devlink instance which is already locked and good-to-access (do from the .pre_doit handler, dump from the new dump indirection). Note that we'll now check presence of an op (e.g. for sb_pool_get) under the devlink instance lock, that will soon be necessary anyway, because we don't hold refs on the driver modules so the memory in which ops live may be gone for a dead instance, after upcoming locking changes. Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 15 + net/devlink/leftover.c | 686 +++++++++++++++++++------------------------- net/devlink/netlink.c | 13 + 3 files changed, 320 insertions(+), 394 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 856954e4a02d..adf9f6c177db 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -156,6 +156,21 @@ devlink_dump_state(struct netlink_callback *cb) return (struct devlink_nl_dump_state *)cb->ctx; } +/* gen cmds */ +extern const struct devlink_gen_cmd devl_gen_inst; +extern const struct devlink_gen_cmd devl_gen_port; +extern const struct devlink_gen_cmd devl_gen_sb; +extern const struct devlink_gen_cmd devl_gen_sb_pool; +extern const struct devlink_gen_cmd devl_gen_sb_port_pool; +extern const struct devlink_gen_cmd devl_gen_sb_tc_pool_bind; +extern const struct devlink_gen_cmd devl_gen_selftests; +extern const struct devlink_gen_cmd devl_gen_param; +extern const struct devlink_gen_cmd devl_gen_region; +extern const struct devlink_gen_cmd devl_gen_info; +extern const struct devlink_gen_cmd devl_gen_trap; +extern const struct devlink_gen_cmd devl_gen_trap_group; +extern const struct devlink_gen_cmd devl_gen_trap_policer; + /* Ports */ int devlink_port_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 6b5d60c91816..e6d6c7f74ae7 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1307,28 +1307,19 @@ static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info) return genlmsg_reply(msg, info); } -static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - devl_lock(devlink); - err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI); - devl_unlock(devlink); - devlink_put(devlink); - - if (err) - goto out; - } -out: - return msg->len; + return devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI); } +const struct devlink_gen_cmd devl_gen_inst = { + .dump_one = devlink_nl_cmd_get_dump_one, +}; + static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb, struct genl_info *info) { @@ -1351,44 +1342,40 @@ static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb, return genlmsg_reply(msg, info); } -static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_port *devlink_port; - unsigned long port_index; - int idx = 0; + struct devlink_port *devlink_port; + unsigned long port_index; + int idx = 0; + int err = 0; - devl_lock(devlink); - xa_for_each(&devlink->ports, port_index, devlink_port) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_port_fill(msg, devlink_port, - DEVLINK_CMD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, cb->extack); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + xa_for_each(&devlink->ports, port_index, devlink_port) { + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_port_fill(msg, devlink_port, + DEVLINK_CMD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, cb->extack); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - return msg->len; + + return err; } +const struct devlink_gen_cmd devl_gen_port = { + .dump_one = devlink_nl_cmd_port_get_dump_one, +}; + static int devlink_port_type_set(struct devlink_port *devlink_port, enum devlink_port_type port_type) @@ -2393,43 +2380,39 @@ static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb, return genlmsg_reply(msg, info); } -static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_sb_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_sb *devlink_sb; - int idx = 0; + struct devlink_sb *devlink_sb; + int idx = 0; + int err = 0; - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_sb_fill(msg, devlink, devlink_sb, - DEVLINK_CMD_SB_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_sb_fill(msg, devlink, devlink_sb, + DEVLINK_CMD_SB_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - return msg->len; + + return err; } +const struct devlink_gen_cmd devl_gen_sb = { + .dump_one = devlink_nl_cmd_sb_get_dump_one, +}; + static int devlink_nl_sb_pool_fill(struct sk_buff *msg, struct devlink *devlink, struct devlink_sb *devlink_sb, u16 pool_index, enum devlink_command cmd, @@ -2535,46 +2518,39 @@ static int __sb_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, return 0; } -static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_sb_pool_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; + struct devlink_sb *devlink_sb; int err = 0; + int idx = 0; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_sb *devlink_sb; - int idx = 0; - - if (!devlink->ops->sb_pool_get) - goto retry; + if (!devlink->ops->sb_pool_get) + return 0; - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_pool_get_dumpit(msg, state->idx, &idx, - devlink, devlink_sb, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + err = __sb_pool_get_dumpit(msg, state->idx, &idx, + devlink, devlink_sb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + state->idx = idx; + break; } - devl_unlock(devlink); -retry: - devlink_put(devlink); } -out: - if (err != -EMSGSIZE) - return err; - return msg->len; + return err; } +const struct devlink_gen_cmd devl_gen_sb_pool = { + .dump_one = devlink_nl_cmd_sb_pool_get_dump_one, +}; + static int devlink_sb_pool_set(struct devlink *devlink, unsigned int sb_index, u16 pool_index, u32 size, enum devlink_sb_threshold_type threshold_type, @@ -2750,46 +2726,39 @@ static int __sb_port_pool_get_dumpit(struct sk_buff *msg, int start, int *p_idx, return 0; } -static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_sb_port_pool_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; + struct devlink_sb *devlink_sb; + int idx = 0; int err = 0; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_sb *devlink_sb; - int idx = 0; - - if (!devlink->ops->sb_port_pool_get) - goto retry; + if (!devlink->ops->sb_port_pool_get) + return 0; - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_port_pool_get_dumpit(msg, state->idx, &idx, - devlink, devlink_sb, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + err = __sb_port_pool_get_dumpit(msg, state->idx, &idx, + devlink, devlink_sb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + state->idx = idx; + break; } - devl_unlock(devlink); -retry: - devlink_put(devlink); } -out: - if (err != -EMSGSIZE) - return err; - return msg->len; + return err; } +const struct devlink_gen_cmd devl_gen_sb_port_pool = { + .dump_one = devlink_nl_cmd_sb_port_pool_get_dump_one, +}; + static int devlink_sb_port_pool_set(struct devlink_port *devlink_port, unsigned int sb_index, u16 pool_index, u32 threshold, @@ -2993,46 +2962,38 @@ static int __sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, } static int -devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +devlink_nl_cmd_sb_tc_pool_bind_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; + struct devlink_sb *devlink_sb; + int idx = 0; int err = 0; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_sb *devlink_sb; - int idx = 0; - - if (!devlink->ops->sb_tc_pool_bind_get) - goto retry; + if (!devlink->ops->sb_tc_pool_bind_get) + return 0; - devl_lock(devlink); - list_for_each_entry(devlink_sb, &devlink->sb_list, list) { - err = __sb_tc_pool_bind_get_dumpit(msg, state->idx, &idx, - devlink, devlink_sb, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(devlink_sb, &devlink->sb_list, list) { + err = __sb_tc_pool_bind_get_dumpit(msg, state->idx, &idx, + devlink, devlink_sb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + state->idx = idx; + break; } - devl_unlock(devlink); -retry: - devlink_put(devlink); } -out: - if (err != -EMSGSIZE) - return err; - return msg->len; + return err; } +const struct devlink_gen_cmd devl_gen_sb_tc_pool_bind = { + .dump_one = devlink_nl_cmd_sb_tc_pool_bind_get_dump_one, +}; + static int devlink_sb_tc_pool_bind_set(struct devlink_port *devlink_port, unsigned int sb_index, u16 tc_index, enum devlink_sb_pool_type pool_type, @@ -4851,39 +4812,24 @@ static int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, return genlmsg_reply(msg, info); } -static int devlink_nl_cmd_selftests_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_selftests_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err = 0; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - if (!devlink->ops->selftest_check) { - devlink_put(devlink); - continue; - } - - devl_lock(devlink); - err = devlink_nl_selftests_fill(msg, devlink, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->extack); - devl_unlock(devlink); - if (err) { - devlink_put(devlink); - break; - } - - devlink_put(devlink); - } - - if (err != -EMSGSIZE) - return err; + if (!devlink->ops->selftest_check) + return 0; - return msg->len; + return devlink_nl_selftests_fill(msg, devlink, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); } +const struct devlink_gen_cmd devl_gen_selftests = { + .dump_one = devlink_nl_cmd_selftests_get_dump_one, +}; + static int devlink_selftest_result_put(struct sk_buff *skb, unsigned int id, enum devlink_selftest_status test_status) { @@ -5329,48 +5275,41 @@ static void devlink_param_notify(struct devlink *devlink, msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); } -static int devlink_nl_cmd_param_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_param_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; + struct devlink_param_item *param_item; + int idx = 0; int err = 0; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_param_item *param_item; - int idx = 0; - - devl_lock(devlink); - list_for_each_entry(param_item, &devlink->param_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_param_fill(msg, devlink, 0, param_item, - DEVLINK_CMD_PARAM_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err == -EOPNOTSUPP) { - err = 0; - } else if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(param_item, &devlink->param_list, list) { + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_param_fill(msg, devlink, 0, param_item, + DEVLINK_CMD_PARAM_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err == -EOPNOTSUPP) { + err = 0; + } else if (err) { + state->idx = idx; + break; + } + idx++; } -out: - if (err != -EMSGSIZE) - return err; - return msg->len; + return err; } +const struct devlink_gen_cmd devl_gen_param = { + .dump_one = devlink_nl_cmd_param_get_dump_one, +}; + static int devlink_param_type_get_from_info(struct genl_info *info, enum devlink_param_type *param_type) @@ -6034,20 +5973,20 @@ out: return err; } -static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, - struct netlink_callback *cb, - struct devlink *devlink, - int *idx, - int start) +static int +devlink_nl_cmd_region_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { + struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_region *region; struct devlink_port *port; unsigned long port_index; + int idx = 0; int err; list_for_each_entry(region, &devlink->region_list, list) { - if (*idx < start) { - (*idx)++; + if (idx < state->idx) { + idx++; continue; } err = devlink_nl_region_fill(msg, devlink, @@ -6055,44 +5994,28 @@ static int devlink_nl_cmd_region_get_devlink_dumpit(struct sk_buff *msg, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, region); - if (err) + if (err) { + state->idx = idx; return err; - (*idx)++; + } + idx++; } xa_for_each(&devlink->ports, port_index, port) { - err = devlink_nl_cmd_region_get_port_dumpit(msg, cb, port, idx, - start); - if (err) + err = devlink_nl_cmd_region_get_port_dumpit(msg, cb, port, &idx, + state->idx); + if (err) { + state->idx = idx; return err; + } } return 0; } -static int devlink_nl_cmd_region_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) -{ - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err = 0; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - int idx = 0; - - devl_lock(devlink); - err = devlink_nl_cmd_region_get_devlink_dumpit(msg, cb, devlink, - &idx, state->idx); - devl_unlock(devlink); - devlink_put(devlink); - if (err) { - state->idx = idx; - goto out; - } - } -out: - return msg->len; -} +const struct devlink_gen_cmd devl_gen_region = { + .dump_one = devlink_nl_cmd_region_get_dump_one, +}; static int devlink_nl_cmd_region_del(struct sk_buff *skb, struct genl_info *info) @@ -6724,35 +6647,25 @@ static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, return genlmsg_reply(msg, info); } -static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_info_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err = 0; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - devl_lock(devlink); - err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->extack); - devl_unlock(devlink); - if (err == -EOPNOTSUPP) - err = 0; - else if (err) { - devlink_put(devlink); - break; - } - devlink_put(devlink); - } - - if (err != -EMSGSIZE) - return err; + int err; - return msg->len; + err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); + if (err == -EOPNOTSUPP) + err = 0; + return err; } +const struct devlink_gen_cmd devl_gen_info = { + .dump_one = devlink_nl_cmd_info_get_dump_one, +}; + struct devlink_fmsg_item { struct list_head list; int attrtype; @@ -8466,43 +8379,39 @@ err_trap_fill: return err; } -static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_trap_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; - int err; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_trap_item *trap_item; - int idx = 0; + struct devlink_trap_item *trap_item; + int idx = 0; + int err = 0; - devl_lock(devlink); - list_for_each_entry(trap_item, &devlink->trap_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_trap_fill(msg, devlink, trap_item, - DEVLINK_CMD_TRAP_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(trap_item, &devlink->trap_list, list) { + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_trap_fill(msg, devlink, trap_item, + DEVLINK_CMD_TRAP_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - return msg->len; + + return err; } +const struct devlink_gen_cmd devl_gen_trap = { + .dump_one = devlink_nl_cmd_trap_get_dump_one, +}; + static int __devlink_trap_action_set(struct devlink *devlink, struct devlink_trap_item *trap_item, enum devlink_trap_action trap_action, @@ -8681,46 +8590,41 @@ err_trap_group_fill: return err; } -static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_trap_group_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - enum devlink_command cmd = DEVLINK_CMD_TRAP_GROUP_NEW; - u32 portid = NETLINK_CB(cb->skb).portid; - struct devlink *devlink; - int err; + struct devlink_trap_group_item *group_item; + int idx = 0; + int err = 0; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_trap_group_item *group_item; - int idx = 0; - devl_lock(devlink); - list_for_each_entry(group_item, &devlink->trap_group_list, - list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_trap_group_fill(msg, devlink, - group_item, cmd, - portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(group_item, &devlink->trap_group_list, list) { + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_trap_group_fill(msg, devlink, group_item, + DEVLINK_CMD_TRAP_GROUP_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - return msg->len; + + return err; } +const struct devlink_gen_cmd devl_gen_trap_group = { + .dump_one = devlink_nl_cmd_trap_group_get_dump_one, +}; + static int __devlink_trap_group_action_set(struct devlink *devlink, struct devlink_trap_group_item *group_item, @@ -8985,46 +8889,40 @@ err_trap_policer_fill: return err; } -static int devlink_nl_cmd_trap_policer_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int +devlink_nl_cmd_trap_policer_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - enum devlink_command cmd = DEVLINK_CMD_TRAP_POLICER_NEW; - u32 portid = NETLINK_CB(cb->skb).portid; - struct devlink *devlink; - int err; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_trap_policer_item *policer_item; - int idx = 0; + struct devlink_trap_policer_item *policer_item; + int idx = 0; + int err = 0; - devl_lock(devlink); - list_for_each_entry(policer_item, &devlink->trap_policer_list, - list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_trap_policer_fill(msg, devlink, - policer_item, cmd, - portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(policer_item, &devlink->trap_policer_list, list) { + if (idx < state->idx) { idx++; + continue; } - devl_unlock(devlink); - devlink_put(devlink); + err = devlink_nl_trap_policer_fill(msg, devlink, policer_item, + DEVLINK_CMD_TRAP_POLICER_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - return msg->len; + + return err; } +const struct devlink_gen_cmd devl_gen_trap_policer = { + .dump_one = devlink_nl_cmd_trap_policer_get_dump_one, +}; + static int devlink_trap_policer_set(struct devlink *devlink, struct devlink_trap_policer_item *policer_item, @@ -9102,14 +9000,14 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_get_doit, - .dumpit = devlink_nl_cmd_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { .cmd = DEVLINK_CMD_PORT_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_port_get_doit, - .dumpit = devlink_nl_cmd_port_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, /* can be retrieved by unprivileged users */ }, @@ -9185,14 +9083,14 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_SB_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_get_doit, - .dumpit = devlink_nl_cmd_sb_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { .cmd = DEVLINK_CMD_SB_POOL_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_pool_get_doit, - .dumpit = devlink_nl_cmd_sb_pool_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { @@ -9205,7 +9103,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_SB_PORT_POOL_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_port_pool_get_doit, - .dumpit = devlink_nl_cmd_sb_port_pool_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, /* can be retrieved by unprivileged users */ }, @@ -9220,7 +9118,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_SB_TC_POOL_BIND_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_tc_pool_bind_get_doit, - .dumpit = devlink_nl_cmd_sb_tc_pool_bind_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, /* can be retrieved by unprivileged users */ }, @@ -9301,7 +9199,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_PARAM_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_param_get_doit, - .dumpit = devlink_nl_cmd_param_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { @@ -9329,7 +9227,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_REGION_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_region_get_doit, - .dumpit = devlink_nl_cmd_region_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .flags = GENL_ADMIN_PERM, }, { @@ -9355,7 +9253,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_INFO_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_info_get_doit, - .dumpit = devlink_nl_cmd_info_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { @@ -9417,7 +9315,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_TRAP_GET, .doit = devlink_nl_cmd_trap_get_doit, - .dumpit = devlink_nl_cmd_trap_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { @@ -9428,7 +9326,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_TRAP_GROUP_GET, .doit = devlink_nl_cmd_trap_group_get_doit, - .dumpit = devlink_nl_cmd_trap_group_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { @@ -9439,7 +9337,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_TRAP_POLICER_GET, .doit = devlink_nl_cmd_trap_policer_get_doit, - .dumpit = devlink_nl_cmd_trap_policer_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { @@ -9450,7 +9348,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_SELFTESTS_GET, .doit = devlink_nl_cmd_selftests_get_doit, - .dumpit = devlink_nl_cmd_selftests_get_dumpit + .dumpit = devlink_nl_instance_iter_dump, /* can be retrieved by unprivileged users */ }, { diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 82ee5621bd9c..a552e723f4a6 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -179,7 +179,20 @@ static void devlink_nl_post_doit(const struct genl_split_ops *ops, } static const struct devlink_gen_cmd *devl_gen_cmds[] = { + [DEVLINK_CMD_GET] = &devl_gen_inst, + [DEVLINK_CMD_PORT_GET] = &devl_gen_port, + [DEVLINK_CMD_SB_GET] = &devl_gen_sb, + [DEVLINK_CMD_SB_POOL_GET] = &devl_gen_sb_pool, + [DEVLINK_CMD_SB_PORT_POOL_GET] = &devl_gen_sb_port_pool, + [DEVLINK_CMD_SB_TC_POOL_BIND_GET] = &devl_gen_sb_tc_pool_bind, + [DEVLINK_CMD_PARAM_GET] = &devl_gen_param, + [DEVLINK_CMD_REGION_GET] = &devl_gen_region, + [DEVLINK_CMD_INFO_GET] = &devl_gen_info, [DEVLINK_CMD_RATE_GET] = &devl_gen_rate_get, + [DEVLINK_CMD_TRAP_GET] = &devl_gen_trap, + [DEVLINK_CMD_TRAP_GROUP_GET] = &devl_gen_trap_group, + [DEVLINK_CMD_TRAP_POLICER_GET] = &devl_gen_trap_policer, + [DEVLINK_CMD_SELFTESTS_GET] = &devl_gen_selftests, }; int devlink_nl_instance_iter_dump(struct sk_buff *msg, -- cgit v1.2.3 From 55307f51f48e192560252b70c8b0fb504b84bc3c Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Sat, 17 Dec 2022 12:13:24 +0100 Subject: batman-adv: Start new development cycle This version will contain all the (major or even only minor) changes for Linux 6.3. The version number isn't a semantic version number with major and minor information. It is just encoding the year of the expected publishing as Linux -rc1 and the number of published versions this year (starting at 0). Signed-off-by: Simon Wunderlich --- net/batman-adv/main.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h index c48803b32bb0..156ed39eded1 100644 --- a/net/batman-adv/main.h +++ b/net/batman-adv/main.h @@ -13,7 +13,7 @@ #define BATADV_DRIVER_DEVICE "batman-adv" #ifndef BATADV_SOURCE_VERSION -#define BATADV_SOURCE_VERSION "2022.3" +#define BATADV_SOURCE_VERSION "2023.1" #endif /* B.A.T.M.A.N. parameters */ -- cgit v1.2.3 From c4b40f80585c9349c401c2547ee11ecb451abb01 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Sat, 17 Dec 2022 12:40:08 +0100 Subject: batman-adv: Drop prandom.h includes The commit 8032bf1233a7 ("treewide: use get_random_u32_below() instead of deprecated function") replaced the prandom.h function prandom_u32_max with the random.h function get_random_u32_below. There is no need to still include prandom.h. Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich --- net/batman-adv/bat_iv_ogm.c | 1 - net/batman-adv/bat_v_elp.c | 1 - net/batman-adv/bat_v_ogm.c | 1 - net/batman-adv/network-coding.c | 2 +- 4 files changed, 1 insertion(+), 4 deletions(-) (limited to 'net') diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 114ee5da261f..828fb393ee94 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c index f9a58fb5442e..acff565849ae 100644 --- a/net/batman-adv/bat_v_elp.c +++ b/net/batman-adv/bat_v_elp.c @@ -21,7 +21,6 @@ #include #include #include -#include #include #include #include diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index addfd8c4fe95..96e027364ddd 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -21,7 +21,6 @@ #include #include #include -#include #include #include #include diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index bf29fba4dde5..ecd871abda34 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -25,8 +25,8 @@ #include #include #include -#include #include +#include #include #include #include -- cgit v1.2.3 From 6b754d7bd007c5f68fbb2d9abd5c00d253b033d0 Mon Sep 17 00:00:00 2001 From: Mahesh Bandewar Date: Wed, 4 Jan 2023 18:28:42 -0800 Subject: sysctl: expose all net/core sysctls inside netns All were not visible to the non-priv users inside netns. However, with 4ecb90090c84 ("sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN"), these vars are protected from getting modified. A proc with capable(CAP_NET_ADMIN) can change the values so not having them visible inside netns is just causing nuisance to process that check certain values (e.g. net.core.somaxconn) and see different behavior in root-netns vs. other-netns Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Soheil Hassas Yeganeh Signed-off-by: Mahesh Bandewar Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller --- net/core/sysctl_net_core.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'net') diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 5b1ce656baa1..e7b98162c632 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -643,11 +643,6 @@ static __net_init int sysctl_core_net_init(struct net *net) for (tmp = tbl; tmp->procname; tmp++) tmp->data += (char *)net - (char *)&init_net; - - /* Don't export any sysctls to unprivileged users */ - if (net->user_ns != &init_user_ns) { - tbl[0].procname = NULL; - } } net->core.sysctl_hdr = register_net_sysctl(net, "net/core", tbl); -- cgit v1.2.3 From d772781964415c63759572b917e21c4f7ec08d9f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:33:54 -0800 Subject: devlink: bump the instance index directly when iterating xa_find_after() is designed to handle multi-index entries correctly. If a xarray has two entries one which spans indexes 0-3 and one at index 4 xa_find_after(0) will return the entry at index 4. Having to juggle the two callbacks, however, is unnecessary in case of the devlink xarray, as there is 1:1 relationship with indexes. Always use xa_find() and increment the index manually. Signed-off-by: Jakub Kicinski Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- net/devlink/core.c | 31 +++++++++---------------------- net/devlink/devl_internal.h | 17 ++++------------- 2 files changed, 13 insertions(+), 35 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index 371d6821315d..88c88b8053e2 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -91,16 +91,13 @@ void devlink_put(struct devlink *devlink) call_rcu(&devlink->rcu, __devlink_put_rcu); } -struct devlink * -devlinks_xa_find_get(struct net *net, unsigned long *indexp, - void * (*xa_find_fn)(struct xarray *, unsigned long *, - unsigned long, xa_mark_t)) +struct devlink *devlinks_xa_find_get(struct net *net, unsigned long *indexp) { - struct devlink *devlink; + struct devlink *devlink = NULL; rcu_read_lock(); retry: - devlink = xa_find_fn(&devlinks, indexp, ULONG_MAX, DEVLINK_REGISTERED); + devlink = xa_find(&devlinks, indexp, ULONG_MAX, DEVLINK_REGISTERED); if (!devlink) goto unlock; @@ -109,31 +106,21 @@ retry: * This prevents live-lock of devlink_unregister() wait for completion. */ if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING)) - goto retry; + goto next; - /* For a possible retry, the xa_find_after() should be always used */ - xa_find_fn = xa_find_after; if (!devlink_try_get(devlink)) - goto retry; + goto next; if (!net_eq(devlink_net(devlink), net)) { devlink_put(devlink); - goto retry; + goto next; } unlock: rcu_read_unlock(); return devlink; -} - -struct devlink * -devlinks_xa_find_get_first(struct net *net, unsigned long *indexp) -{ - return devlinks_xa_find_get(net, indexp, xa_find); -} -struct devlink * -devlinks_xa_find_get_next(struct net *net, unsigned long *indexp) -{ - return devlinks_xa_find_get(net, indexp, xa_find_after); +next: + (*indexp)++; + goto retry; } /** diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index adf9f6c177db..14767e809178 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -82,18 +82,9 @@ extern struct genl_family devlink_nl_family; * in loop body in order to release the reference. */ #define devlinks_xa_for_each_registered_get(net, index, devlink) \ - for (index = 0, \ - devlink = devlinks_xa_find_get_first(net, &index); \ - devlink; devlink = devlinks_xa_find_get_next(net, &index)) - -struct devlink * -devlinks_xa_find_get(struct net *net, unsigned long *indexp, - void * (*xa_find_fn)(struct xarray *, unsigned long *, - unsigned long, xa_mark_t)); -struct devlink * -devlinks_xa_find_get_first(struct net *net, unsigned long *indexp); -struct devlink * -devlinks_xa_find_get_next(struct net *net, unsigned long *indexp); + for (index = 0; (devlink = devlinks_xa_find_get(net, &index)); index++) + +struct devlink *devlinks_xa_find_get(struct net *net, unsigned long *indexp); /* Netlink */ #define DEVLINK_NL_FLAG_NEED_PORT BIT(0) @@ -135,7 +126,7 @@ struct devlink_gen_cmd { */ #define devlink_dump_for_each_instance_get(msg, state, devlink) \ for (; (devlink = devlinks_xa_find_get(sock_net(msg->sk), \ - &state->instance, xa_find)); \ + &state->instance)); \ state->instance++, state->idx = 0) extern const struct genl_small_ops devlink_nl_ops[56]; -- cgit v1.2.3 From 7a54a5195b2a877a972ec21a5ca415c1fc2aec61 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:33:55 -0800 Subject: devlink: update the code in netns move to latest helpers devlink_pernet_pre_exit() is the only obvious place which takes the instance lock without using the devl_ helpers. Update the code and move the error print after releasing the reference (having unlock and put together feels slightly idiomatic). Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/devlink/core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index 88c88b8053e2..d3b8336946fd 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -299,15 +299,16 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) */ devlinks_xa_for_each_registered_get(net, index, devlink) { WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); - mutex_lock(&devlink->lock); + devl_lock(devlink); err = devlink_reload(devlink, &init_net, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, DEVLINK_RELOAD_LIMIT_UNSPEC, &actions_performed, NULL); - mutex_unlock(&devlink->lock); + devl_unlock(devlink); + devlink_put(devlink); + if (err && err != -EOPNOTSUPP) pr_warn("Failed to reload devlink instance into init_net\n"); - devlink_put(devlink); } } -- cgit v1.2.3 From 870c7ad4a52be2acff92d0355ca118654c7efece Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:33:56 -0800 Subject: devlink: protect devlink->dev by the instance lock devlink->dev is assumed to be always valid as long as any outstanding reference to the devlink instance exists. In prep for weakening of the references take the instance lock. Signed-off-by: Jakub Kicinski Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- net/devlink/devl_internal.h | 3 ++- net/devlink/leftover.c | 7 +++---- net/devlink/netlink.c | 9 ++++++--- 3 files changed, 11 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 14767e809178..6342552e5f99 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -131,7 +131,8 @@ struct devlink_gen_cmd { extern const struct genl_small_ops devlink_nl_ops[56]; -struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs); +struct devlink * +devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs); void devlink_notify_unregister(struct devlink *devlink); void devlink_notify_register(struct devlink *devlink); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index e6d6c7f74ae7..bec408da4dbe 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -6314,12 +6314,10 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, start_offset = state->start_offset; - devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); if (IS_ERR(devlink)) return PTR_ERR(devlink); - devl_lock(devlink); - if (!attrs[DEVLINK_ATTR_REGION_NAME]) { NL_SET_ERR_MSG(cb->extack, "No region name provided"); err = -EINVAL; @@ -7735,9 +7733,10 @@ devlink_health_reporter_get_from_cb(struct netlink_callback *cb) struct nlattr **attrs = info->attrs; struct devlink *devlink; - devlink = devlink_get_from_attrs(sock_net(cb->skb->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); if (IS_ERR(devlink)) return NULL; + devl_unlock(devlink); reporter = devlink_health_reporter_get_from_attrs(devlink, attrs); devlink_put(devlink); diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index a552e723f4a6..69111746f5d9 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -82,7 +82,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG }, }; -struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs) +struct devlink * +devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) { struct devlink *devlink; unsigned long index; @@ -96,9 +97,11 @@ struct devlink *devlink_get_from_attrs(struct net *net, struct nlattr **attrs) devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); devlinks_xa_for_each_registered_get(net, index, devlink) { + devl_lock(devlink); if (strcmp(devlink->dev->bus->name, busname) == 0 && strcmp(dev_name(devlink->dev), devname) == 0) return devlink; + devl_unlock(devlink); devlink_put(devlink); } @@ -113,10 +116,10 @@ static int devlink_nl_pre_doit(const struct genl_split_ops *ops, struct devlink *devlink; int err; - devlink = devlink_get_from_attrs(genl_info_net(info), info->attrs); + devlink = devlink_get_from_attrs_lock(genl_info_net(info), info->attrs); if (IS_ERR(devlink)) return PTR_ERR(devlink); - devl_lock(devlink); + info->user_ptr[0] = devlink; if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) { devlink_port = devlink_port_get_from_info(devlink, info); -- cgit v1.2.3 From ed539ba614a079ea696b92beef1eafec66f831a4 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:33:57 -0800 Subject: devlink: always check if the devlink instance is registered Always check under the instance lock whether the devlink instance is still / already registered. This is a no-op for the most part, as the unregistration path currently waits for all references. On the init path, however, we may temporarily open up a race with netdev code, if netdevs are registered before the devlink instance. This is temporary, the next change fixes it, and this commit has been split out for the ease of review. Note that in case of iterating over sub-objects which have their own lock (regions and line cards) we assume an implicit dependency between those objects existing and devlink unregistration. Signed-off-by: Jakub Kicinski Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- net/devlink/core.c | 19 +++++++++++++++---- net/devlink/devl_internal.h | 8 ++++++++ net/devlink/leftover.c | 35 +++++++++++++++++++++++++++++------ net/devlink/netlink.c | 10 ++++++++-- 4 files changed, 60 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index d3b8336946fd..c53c996edf1d 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -67,6 +67,15 @@ void devl_unlock(struct devlink *devlink) } EXPORT_SYMBOL_GPL(devl_unlock); +/** + * devlink_try_get() - try to obtain a reference on a devlink instance + * @devlink: instance to reference + * + * Obtain a reference on a devlink instance. A reference on a devlink instance + * only implies that it's safe to take the instance lock. It does not imply + * that the instance is registered, use devl_is_registered() after taking + * the instance lock to check registration status. + */ struct devlink *__must_check devlink_try_get(struct devlink *devlink) { if (refcount_inc_not_zero(&devlink->refcount)) @@ -300,10 +309,12 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) devlinks_xa_for_each_registered_get(net, index, devlink) { WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); devl_lock(devlink); - err = devlink_reload(devlink, &init_net, - DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - DEVLINK_RELOAD_LIMIT_UNSPEC, - &actions_performed, NULL); + err = 0; + if (devl_is_registered(devlink)) + err = devlink_reload(devlink, &init_net, + DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + DEVLINK_RELOAD_LIMIT_UNSPEC, + &actions_performed, NULL); devl_unlock(devlink); devlink_put(devlink); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 6342552e5f99..01a00df81d0e 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -86,6 +86,14 @@ extern struct genl_family devlink_nl_family; struct devlink *devlinks_xa_find_get(struct net *net, unsigned long *indexp); +static inline bool devl_is_registered(struct devlink *devlink) +{ + /* To prevent races the caller must hold the instance lock + * or another lock taken during unregistration. + */ + return xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); +} + /* Netlink */ #define DEVLINK_NL_FLAG_NEED_PORT BIT(0) #define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index bec408da4dbe..491f821c8b77 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -2130,6 +2130,9 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, int idx = 0; mutex_lock(&devlink->linecards_lock); + if (!devl_is_registered(devlink)) + goto next_devlink; + list_for_each_entry(linecard, &devlink->linecard_list, list) { if (idx < state->idx) { idx++; @@ -2151,6 +2154,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, } idx++; } +next_devlink: mutex_unlock(&devlink->linecards_lock); devlink_put(devlink); } @@ -7809,6 +7813,12 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, int idx = 0; mutex_lock(&devlink->reporters_lock); + if (!devl_is_registered(devlink)) { + mutex_unlock(&devlink->reporters_lock); + devlink_put(devlink); + continue; + } + list_for_each_entry(reporter, &devlink->reporter_list, list) { if (idx < state->idx) { @@ -7830,6 +7840,9 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, mutex_unlock(&devlink->reporters_lock); devl_lock(devlink); + if (!devl_is_registered(devlink)) + goto next_devlink; + xa_for_each(&devlink->ports, port_index, port) { mutex_lock(&port->reporters_lock); list_for_each_entry(reporter, &port->reporter_list, list) { @@ -7853,6 +7866,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, } mutex_unlock(&port->reporters_lock); } +next_devlink: devl_unlock(devlink); devlink_put(devlink); } @@ -12218,7 +12232,8 @@ void devlink_compat_running_version(struct devlink *devlink, return; devl_lock(devlink); - __devlink_compat_running_version(devlink, buf, len); + if (devl_is_registered(devlink)) + __devlink_compat_running_version(devlink, buf, len); devl_unlock(devlink); } @@ -12227,20 +12242,28 @@ int devlink_compat_flash_update(struct devlink *devlink, const char *file_name) struct devlink_flash_update_params params = {}; int ret; - if (!devlink->ops->flash_update) - return -EOPNOTSUPP; + devl_lock(devlink); + if (!devl_is_registered(devlink)) { + ret = -ENODEV; + goto out_unlock; + } + + if (!devlink->ops->flash_update) { + ret = -EOPNOTSUPP; + goto out_unlock; + } ret = request_firmware(¶ms.fw, file_name, devlink->dev); if (ret) - return ret; + goto out_unlock; - devl_lock(devlink); devlink_flash_update_begin_notify(devlink); ret = devlink->ops->flash_update(devlink, ¶ms, NULL); devlink_flash_update_end_notify(devlink); - devl_unlock(devlink); release_firmware(params.fw); +out_unlock: + devl_unlock(devlink); return ret; } diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 69111746f5d9..b5b8ac6db2d1 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -98,7 +98,8 @@ devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) devlinks_xa_for_each_registered_get(net, index, devlink) { devl_lock(devlink); - if (strcmp(devlink->dev->bus->name, busname) == 0 && + if (devl_is_registered(devlink) && + strcmp(devlink->dev->bus->name, busname) == 0 && strcmp(dev_name(devlink->dev), devname) == 0) return devlink; devl_unlock(devlink); @@ -211,7 +212,12 @@ int devlink_nl_instance_iter_dump(struct sk_buff *msg, devlink_dump_for_each_instance_get(msg, state, devlink) { devl_lock(devlink); - err = cmd->dump_one(msg, devlink, cb); + + if (devl_is_registered(devlink)) + err = cmd->dump_one(msg, devlink, cb); + else + err = 0; + devl_unlock(devlink); devlink_put(devlink); -- cgit v1.2.3 From 9053637e0da783efdb37bbfea6a27b856c0228d7 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:33:58 -0800 Subject: devlink: remove the registration guarantee of references The objective of exposing the devlink instance locks to drivers was to let them use these locks to prevent user space from accessing the device before it's fully initialized. This is difficult because devlink_unregister() waits for all references to be released, meaning that devlink_unregister() can't itself be called under the instance lock. To avoid this issue devlink_register() was moved after subobject registration a while ago. Unfortunately the netdev paths get a hold of the devlink instances _before_ they are registered. Ideally netdev should wait for devlink init to finish (synchronizing on the instance lock). This can't work because we don't know if the instance will _ever_ be registered (in case of failures it may not). The other option of returning an error until devlink_register() is called is unappealing (user space would get a notification netdev exist but would have to wait arbitrary amount of time before accessing some of its attributes). Weaken the guarantees of the devlink references. Holding a reference will now only guarantee that the memory of the object is around. Another way of looking at it is that the reference now protects the object not its "registered" status. Use devlink instance lock to synchronize unregistration. This implies that releasing of the "main" reference of the devlink instance moves from devlink_unregister() to devlink_free(). Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- include/net/devlink.h | 2 ++ net/devlink/core.c | 64 ++++++++++++++++++++------------------------- net/devlink/devl_internal.h | 2 -- 3 files changed, 30 insertions(+), 38 deletions(-) (limited to 'net') diff --git a/include/net/devlink.h b/include/net/devlink.h index 6a2e4f21779f..425ecef431b7 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1647,6 +1647,8 @@ static inline struct devlink *devlink_alloc(const struct devlink_ops *ops, return devlink_alloc_ns(ops, priv_size, &init_net, dev); } void devlink_set_features(struct devlink *devlink, u64 features); +int devl_register(struct devlink *devlink); +void devl_unregister(struct devlink *devlink); void devlink_register(struct devlink *devlink); void devlink_unregister(struct devlink *devlink); void devlink_free(struct devlink *devlink); diff --git a/net/devlink/core.c b/net/devlink/core.c index c53c996edf1d..7cf0b3efbb2f 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -83,21 +83,10 @@ struct devlink *__must_check devlink_try_get(struct devlink *devlink) return NULL; } -static void __devlink_put_rcu(struct rcu_head *head) -{ - struct devlink *devlink = container_of(head, struct devlink, rcu); - - complete(&devlink->comp); -} - void devlink_put(struct devlink *devlink) { if (refcount_dec_and_test(&devlink->refcount)) - /* Make sure unregister operation that may await the completion - * is unblocked only after all users are after the end of - * RCU grace period. - */ - call_rcu(&devlink->rcu, __devlink_put_rcu); + kfree_rcu(devlink, rcu); } struct devlink *devlinks_xa_find_get(struct net *net, unsigned long *indexp) @@ -110,13 +99,6 @@ retry: if (!devlink) goto unlock; - /* In case devlink_unregister() was already called and "unregistering" - * mark was set, do not allow to get a devlink reference here. - * This prevents live-lock of devlink_unregister() wait for completion. - */ - if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING)) - goto next; - if (!devlink_try_get(devlink)) goto next; if (!net_eq(devlink_net(devlink), net)) { @@ -152,37 +134,48 @@ void devlink_set_features(struct devlink *devlink, u64 features) EXPORT_SYMBOL_GPL(devlink_set_features); /** - * devlink_register - Register devlink instance - * - * @devlink: devlink + * devl_register - Register devlink instance + * @devlink: devlink */ -void devlink_register(struct devlink *devlink) +int devl_register(struct devlink *devlink) { ASSERT_DEVLINK_NOT_REGISTERED(devlink); - /* Make sure that we are in .probe() routine */ + devl_assert_locked(devlink); xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); devlink_notify_register(devlink); + + return 0; +} +EXPORT_SYMBOL_GPL(devl_register); + +void devlink_register(struct devlink *devlink) +{ + devl_lock(devlink); + devl_register(devlink); + devl_unlock(devlink); } EXPORT_SYMBOL_GPL(devlink_register); /** - * devlink_unregister - Unregister devlink instance - * - * @devlink: devlink + * devl_unregister - Unregister devlink instance + * @devlink: devlink */ -void devlink_unregister(struct devlink *devlink) +void devl_unregister(struct devlink *devlink) { ASSERT_DEVLINK_REGISTERED(devlink); - /* Make sure that we are in .remove() routine */ - - xa_set_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); - devlink_put(devlink); - wait_for_completion(&devlink->comp); + devl_assert_locked(devlink); devlink_notify_unregister(devlink); xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); - xa_clear_mark(&devlinks, devlink->index, DEVLINK_UNREGISTERING); +} +EXPORT_SYMBOL_GPL(devl_unregister); + +void devlink_unregister(struct devlink *devlink) +{ + devl_lock(devlink); + devl_unregister(devlink); + devl_unlock(devlink); } EXPORT_SYMBOL_GPL(devlink_unregister); @@ -246,7 +239,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, mutex_init(&devlink->reporters_lock); mutex_init(&devlink->linecards_lock); refcount_set(&devlink->refcount, 1); - init_completion(&devlink->comp); return devlink; @@ -292,7 +284,7 @@ void devlink_free(struct devlink *devlink) xa_erase(&devlinks, devlink->index); - kfree(devlink); + devlink_put(devlink); } EXPORT_SYMBOL_GPL(devlink_free); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 01a00df81d0e..5d2bbe295659 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -12,7 +12,6 @@ #include #define DEVLINK_REGISTERED XA_MARK_1 -#define DEVLINK_UNREGISTERING XA_MARK_2 #define DEVLINK_RELOAD_STATS_ARRAY_SIZE \ (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX) @@ -52,7 +51,6 @@ struct devlink { struct lock_class_key lock_key; u8 reload_failed:1; refcount_t refcount; - struct completion comp; struct rcu_head rcu; struct notifier_block netdevice_nb; char priv[] __aligned(NETDEV_ALIGN); -- cgit v1.2.3 From 6ef8f7da92750c3c25755fac39b561fff2d47378 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:33:59 -0800 Subject: devlink: don't require setting features before registration Requiring devlink_set_features() to be run before devlink is registered is overzealous. devlink_set_features() itself is a leftover from old workarounds which were trying to prevent initiating reload before probe was complete. Signed-off-by: Jakub Kicinski Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- net/devlink/core.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index 7cf0b3efbb2f..a31a317626d7 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -125,8 +125,6 @@ next: */ void devlink_set_features(struct devlink *devlink, u64 features) { - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - WARN_ON(features & DEVLINK_F_RELOAD && !devlink_reload_supported(devlink->ops)); devlink->features = features; -- cgit v1.2.3 From 1d18bb1a4ddde676de948c13bca04b23cebd028b Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 Jan 2023 22:34:00 -0800 Subject: devlink: allow registering parameters after the instance It's most natural to register the instance first and then its subobjects. Now that we can use the instance lock to protect the atomicity of all init - it should also be safe. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/devlink/leftover.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 491f821c8b77..1e23b2da78cc 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5263,7 +5263,13 @@ static void devlink_param_notify(struct devlink *devlink, WARN_ON(cmd != DEVLINK_CMD_PARAM_NEW && cmd != DEVLINK_CMD_PARAM_DEL && cmd != DEVLINK_CMD_PORT_PARAM_NEW && cmd != DEVLINK_CMD_PORT_PARAM_DEL); - ASSERT_DEVLINK_REGISTERED(devlink); + + /* devlink_notify_register() / devlink_notify_unregister() + * will replay the notifications if the params are added/removed + * outside of the lifetime of the instance. + */ + if (!devl_is_registered(devlink)) + return; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) @@ -10915,8 +10921,6 @@ int devlink_params_register(struct devlink *devlink, const struct devlink_param *param = params; int i, err; - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - for (i = 0; i < params_count; i++, param++) { err = devlink_param_register(devlink, param); if (err) @@ -10947,8 +10951,6 @@ void devlink_params_unregister(struct devlink *devlink, const struct devlink_param *param = params; int i; - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - for (i = 0; i < params_count; i++, param++) devlink_param_unregister(devlink, param); } @@ -10968,8 +10970,6 @@ int devlink_param_register(struct devlink *devlink, { struct devlink_param_item *param_item; - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - WARN_ON(devlink_param_verify(param)); WARN_ON(devlink_param_find_by_name(&devlink->param_list, param->name)); @@ -10985,6 +10985,7 @@ int devlink_param_register(struct devlink *devlink, param_item->param = param; list_add_tail(¶m_item->list, &devlink->param_list); + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); return 0; } EXPORT_SYMBOL_GPL(devlink_param_register); @@ -10999,11 +11000,10 @@ void devlink_param_unregister(struct devlink *devlink, { struct devlink_param_item *param_item; - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - param_item = devlink_param_find_by_name(&devlink->param_list, param->name); WARN_ON(!param_item); + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_DEL); list_del(¶m_item->list); kfree(param_item); } @@ -11063,8 +11063,6 @@ int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, { struct devlink_param_item *param_item; - ASSERT_DEVLINK_NOT_REGISTERED(devlink); - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); if (!param_item) return -EINVAL; @@ -11078,6 +11076,8 @@ int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, else param_item->driverinit_value = init_val; param_item->driverinit_value_valid = true; + + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); return 0; } EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_set); -- cgit v1.2.3 From e8d283b6cf0e83d5fcb5345e037956eb3e9b2483 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 5 Jan 2023 14:15:37 -0800 Subject: net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arrays Zero-length arrays are deprecated[1]. Replace struct ipv6_rpl_sr_hdr's "segments" union of 0-length arrays with flexible arrays. Detected with GCC 13, using -fstrict-flex-arrays=3: In function 'rpl_validate_srh', inlined from 'rpl_build_state' at ../net/ipv6/rpl_iptunnel.c:96:7: ../net/ipv6/rpl_iptunnel.c:60:28: warning: array subscript is outside array bounds of 'struct in6_addr[0]' [-Warray-bounds=] 60 | if (ipv6_addr_type(&srh->rpl_segaddr[srh->segments_left - 1]) & | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../include/net/rpl.h:12, from ../net/ipv6/rpl_iptunnel.c:13: ../include/uapi/linux/rpl.h: In function 'rpl_build_state': ../include/uapi/linux/rpl.h:40:33: note: while referencing 'addr' 40 | struct in6_addr addr[0]; | ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Hideaki YOSHIFUJI Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20230105221533.never.711-kees@kernel.org Signed-off-by: Jakub Kicinski --- include/uapi/linux/rpl.h | 4 ++-- net/ipv6/rpl_iptunnel.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/rpl.h b/include/uapi/linux/rpl.h index 708adddf9f13..7c8970e5b84b 100644 --- a/include/uapi/linux/rpl.h +++ b/include/uapi/linux/rpl.h @@ -37,8 +37,8 @@ struct ipv6_rpl_sr_hdr { #endif union { - struct in6_addr addr[0]; - __u8 data[0]; + __DECLARE_FLEX_ARRAY(struct in6_addr, addr); + __DECLARE_FLEX_ARRAY(__u8, data); } segments; } __attribute__((packed)); diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c index ff691d9f4a04..b1c028df686e 100644 --- a/net/ipv6/rpl_iptunnel.c +++ b/net/ipv6/rpl_iptunnel.c @@ -13,7 +13,7 @@ #include struct rpl_iptunnel_encap { - struct ipv6_rpl_sr_hdr srh[0]; + DECLARE_FLEX_ARRAY(struct ipv6_rpl_sr_hdr, srh); }; struct rpl_lwt { -- cgit v1.2.3 From 109cdeb8dfa3ab18afc7509a5044e48bd9fc89f7 Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Fri, 6 Jan 2023 10:57:17 -0800 Subject: mptcp: use msk_owned_by_me helper The helper msk_owned_by_me() is defined in protocol.h, so use it instead of sock_owned_by_me(). Reviewed-by: Mat Martineau Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/protocol.c | 9 ++++----- net/mptcp/sockopt.c | 2 +- 2 files changed, 5 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b7ad030dfe89..c533b4647fbe 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -923,9 +923,8 @@ static void mptcp_check_for_eof(struct mptcp_sock *msk) static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk) { struct mptcp_subflow_context *subflow; - struct sock *sk = (struct sock *)msk; - sock_owned_by_me(sk); + msk_owned_by_me(msk); mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->data_avail)) @@ -1408,7 +1407,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) u64 linger_time; long tout = 0; - sock_owned_by_me(sk); + msk_owned_by_me(msk); if (__mptcp_check_fallback(msk)) { if (!msk->first) @@ -1890,7 +1889,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) u32 time, advmss = 1; u64 rtt_us, mstamp; - sock_owned_by_me(sk); + msk_owned_by_me(msk); if (copied <= 0) return; @@ -2217,7 +2216,7 @@ static struct sock *mptcp_subflow_get_retrans(struct mptcp_sock *msk) struct mptcp_subflow_context *subflow; int min_stale_count = INT_MAX; - sock_owned_by_me((const struct sock *)msk); + msk_owned_by_me(msk); if (__mptcp_check_fallback(msk)) return NULL; diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index d4b1e6ec1b36..582ed93bcc8a 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -18,7 +18,7 @@ static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk) { - sock_owned_by_me((const struct sock *)msk); + msk_owned_by_me(msk); if (likely(!__mptcp_check_fallback(msk))) return NULL; -- cgit v1.2.3 From a963853fd465e6ede985e71a320bc8a0ae7f878f Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Fri, 6 Jan 2023 10:57:18 -0800 Subject: mptcp: use net instead of sock_net Use the local variable 'net' instead of sock_net() in the functions where the variable 'struct net *net' has been defined. Reviewed-by: Mat Martineau Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/pm_netlink.c | 5 ++--- net/mptcp/protocol.c | 4 ++-- 2 files changed, 4 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 2ea7eae43bdb..b5505b8167f9 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1143,7 +1143,7 @@ void mptcp_pm_nl_subflow_chk_stale(const struct mptcp_sock *msk, struct sock *ss if (!tcp_rtx_and_write_queues_empty(ssk)) { subflow->stale = 1; __mptcp_retransmit_pending_data(sk); - MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_SUBFLOWSTALE); + MPTCP_INC_STATS(net, MPTCP_MIB_SUBFLOWSTALE); } unlock_sock_fast(ssk, slow); @@ -1903,8 +1903,7 @@ static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info) } if (token) - return mptcp_userspace_pm_set_flags(sock_net(skb->sk), - token, &addr, &remote, bkup); + return mptcp_userspace_pm_set_flags(net, token, &addr, &remote, bkup); spin_lock_bh(&pernet->lock); entry = __lookup_addr(pernet, &addr.addr, lookup_by_id); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c533b4647fbe..62c140f96d77 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2723,8 +2723,8 @@ static int mptcp_init_sock(struct sock *sk) mptcp_ca_reset(sk); sk_sockets_allocated_inc(sk); - sk->sk_rcvbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[1]); - sk->sk_sndbuf = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_wmem[1]); + sk->sk_rcvbuf = READ_ONCE(net->ipv4.sysctl_tcp_rmem[1]); + sk->sk_sndbuf = READ_ONCE(net->ipv4.sysctl_tcp_wmem[1]); return 0; } -- cgit v1.2.3 From 3c976f4c99237bcdab918ad18e7bee3b77e6d316 Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Fri, 6 Jan 2023 10:57:19 -0800 Subject: mptcp: use local variable ssk in write_options The local variable 'ssk' has been defined at the beginning of the function mptcp_write_options(), use it instead of getting 'ssk' again. Reviewed-by: Mat Martineau Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/options.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'net') diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 5ded85e2c374..b30cea2fbf3f 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1594,8 +1594,7 @@ mp_rst: TCPOLEN_MPTCP_PRIO, opts->backup, TCPOPT_NOP); - MPTCP_INC_STATS(sock_net((const struct sock *)tp), - MPTCP_MIB_MPPRIOTX); + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPPRIOTX); } mp_capable_done: -- cgit v1.2.3 From cfdcfeed6449d702825d249cb85346ecf56236fc Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 6 Jan 2023 10:57:20 -0800 Subject: mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen() 'sock->sk' is used frequently in mptcp_listen(). Therefore, we can introduce the 'sk' and replace 'sock->sk' with it. Acked-by: Paolo Abeni Signed-off-by: Menglong Dong Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/protocol.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 62c140f96d77..1ce003f15d70 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3638,12 +3638,13 @@ static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr, static int mptcp_listen(struct socket *sock, int backlog) { struct mptcp_sock *msk = mptcp_sk(sock->sk); + struct sock *sk = sock->sk; struct socket *ssock; int err; pr_debug("msk=%p", msk); - lock_sock(sock->sk); + lock_sock(sk); ssock = __mptcp_nmpc_socket(msk); if (!ssock) { err = -EINVAL; @@ -3651,18 +3652,18 @@ static int mptcp_listen(struct socket *sock, int backlog) } mptcp_token_destroy(msk); - inet_sk_state_store(sock->sk, TCP_LISTEN); - sock_set_flag(sock->sk, SOCK_RCU_FREE); + inet_sk_state_store(sk, TCP_LISTEN); + sock_set_flag(sk, SOCK_RCU_FREE); err = ssock->ops->listen(ssock, backlog); - inet_sk_state_store(sock->sk, inet_sk_state_load(ssock->sk)); + inet_sk_state_store(sk, inet_sk_state_load(ssock->sk)); if (!err) - mptcp_copy_inaddrs(sock->sk, ssock->sk); + mptcp_copy_inaddrs(sk, ssock->sk); mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED); unlock: - release_sock(sock->sk); + release_sock(sk); return err; } -- cgit v1.2.3 From ade4d754620f6d605705bf337aedef115158697e Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 6 Jan 2023 10:57:21 -0800 Subject: mptcp: init sk->sk_prot in build_msk() The 'sk_prot' field in token KUNIT self-tests will be dereferenced in mptcp_token_new_connect(). Therefore, init it with tcp_prot. Acked-by: Paolo Abeni Signed-off-by: Menglong Dong Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/token_test.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'net') diff --git a/net/mptcp/token_test.c b/net/mptcp/token_test.c index 5d984bec1cd8..0758865ab658 100644 --- a/net/mptcp/token_test.c +++ b/net/mptcp/token_test.c @@ -57,6 +57,9 @@ static struct mptcp_sock *build_msk(struct kunit *test) KUNIT_EXPECT_NOT_ERR_OR_NULL(test, msk); refcount_set(&((struct sock *)msk)->sk_refcnt, 1); sock_net_set((struct sock *)msk, &init_net); + + /* be sure the token helpers can dereference sk->sk_prot */ + ((struct sock *)msk)->sk_prot = &tcp_prot; return msk; } -- cgit v1.2.3 From 294de9090938a7959e2757573509abd9ea7bd254 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 6 Jan 2023 10:57:22 -0800 Subject: mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect() 'ssk' should be more appropriate to be the name of the first argument in mptcp_token_new_connect(). Acked-by: Paolo Abeni Signed-off-by: Menglong Dong Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/protocol.h | 2 +- net/mptcp/token.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a0d1658ce59e..5a8af5657796 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -754,7 +754,7 @@ static inline void mptcp_token_init_request(struct request_sock *req) int mptcp_token_new_request(struct request_sock *req); void mptcp_token_destroy_request(struct request_sock *req); -int mptcp_token_new_connect(struct sock *sk); +int mptcp_token_new_connect(struct sock *ssk); void mptcp_token_accept(struct mptcp_subflow_request_sock *r, struct mptcp_sock *msk); bool mptcp_token_exists(u32 token); diff --git a/net/mptcp/token.c b/net/mptcp/token.c index 65430f314a68..3af502a374bc 100644 --- a/net/mptcp/token.c +++ b/net/mptcp/token.c @@ -134,7 +134,7 @@ int mptcp_token_new_request(struct request_sock *req) /** * mptcp_token_new_connect - create new key/idsn/token for subflow - * @sk: the socket that will initiate a connection + * @ssk: the socket that will initiate a connection * * This function is called when a new outgoing mptcp connection is * initiated. @@ -148,9 +148,9 @@ int mptcp_token_new_request(struct request_sock *req) * * returns 0 on success. */ -int mptcp_token_new_connect(struct sock *sk) +int mptcp_token_new_connect(struct sock *ssk) { - struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); struct mptcp_sock *msk = mptcp_sk(subflow->conn); int retries = MPTCP_TOKEN_MAX_RETRIES; struct token_bucket *bucket; @@ -169,7 +169,7 @@ again: } pr_debug("ssk=%p, local_key=%llu, token=%u, idsn=%llu\n", - sk, subflow->local_key, subflow->token, subflow->idsn); + ssk, subflow->local_key, subflow->token, subflow->idsn); WRITE_ONCE(msk->token, subflow->token); __sk_nulls_add_node_rcu((struct sock *)msk, &bucket->msk_chain); -- cgit v1.2.3 From c558246ee73e6f623ca503fb1fda626f313393a6 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 6 Jan 2023 10:57:23 -0800 Subject: mptcp: add statistics for mptcp socket in use Do the statistics of mptcp socket in use with sock_prot_inuse_add(). Therefore, we can get the count of used mptcp socket from /proc/net/protocols: & cat /proc/net/protocols protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em MPTCPv6 2048 0 0 no 0 yes kernel y n y y y y y y y y y y n n n y y y n MPTCP 1896 1 0 no 0 yes kernel y n y y y y y y y y y y n n n y y y n Acked-by: Paolo Abeni Signed-off-by: Menglong Dong Signed-off-by: Mat Martineau Signed-off-by: David S. Miller --- net/mptcp/protocol.c | 12 +++++++++++- net/mptcp/token.c | 6 ++++++ 2 files changed, 17 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 1ce003f15d70..13595d6dad8c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2891,6 +2891,12 @@ static __poll_t mptcp_check_readable(struct mptcp_sock *msk) return EPOLLIN | EPOLLRDNORM; } +static void mptcp_listen_inuse_dec(struct sock *sk) +{ + if (inet_sk_state_load(sk) == TCP_LISTEN) + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); +} + bool __mptcp_close(struct sock *sk, long timeout) { struct mptcp_subflow_context *subflow; @@ -2900,6 +2906,7 @@ bool __mptcp_close(struct sock *sk, long timeout) sk->sk_shutdown = SHUTDOWN_MASK; if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) { + mptcp_listen_inuse_dec(sk); inet_sk_state_store(sk, TCP_CLOSE); goto cleanup; } @@ -3000,6 +3007,7 @@ static int mptcp_disconnect(struct sock *sk, int flags) if (msk->fastopening) return 0; + mptcp_listen_inuse_dec(sk); inet_sk_state_store(sk, TCP_CLOSE); mptcp_stop_timer(sk); @@ -3657,8 +3665,10 @@ static int mptcp_listen(struct socket *sock, int backlog) err = ssock->ops->listen(ssock, backlog); inet_sk_state_store(sk, inet_sk_state_load(ssock->sk)); - if (!err) + if (!err) { + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); mptcp_copy_inaddrs(sk, ssock->sk); + } mptcp_event_pm_listener(ssock->sk, MPTCP_EVENT_LISTENER_CREATED); diff --git a/net/mptcp/token.c b/net/mptcp/token.c index 3af502a374bc..5bb924534387 100644 --- a/net/mptcp/token.c +++ b/net/mptcp/token.c @@ -153,6 +153,7 @@ int mptcp_token_new_connect(struct sock *ssk) struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); struct mptcp_sock *msk = mptcp_sk(subflow->conn); int retries = MPTCP_TOKEN_MAX_RETRIES; + struct sock *sk = subflow->conn; struct token_bucket *bucket; again: @@ -175,6 +176,7 @@ again: __sk_nulls_add_node_rcu((struct sock *)msk, &bucket->msk_chain); bucket->chain_len++; spin_unlock_bh(&bucket->lock); + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); return 0; } @@ -190,8 +192,10 @@ void mptcp_token_accept(struct mptcp_subflow_request_sock *req, struct mptcp_sock *msk) { struct mptcp_subflow_request_sock *pos; + struct sock *sk = (struct sock *)msk; struct token_bucket *bucket; + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); bucket = token_bucket(req->token); spin_lock_bh(&bucket->lock); @@ -370,12 +374,14 @@ void mptcp_token_destroy_request(struct request_sock *req) */ void mptcp_token_destroy(struct mptcp_sock *msk) { + struct sock *sk = (struct sock *)msk; struct token_bucket *bucket; struct mptcp_sock *pos; if (sk_unhashed((struct sock *)msk)) return; + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); bucket = token_bucket(msk->token); spin_lock_bh(&bucket->lock); pos = __token_lookup_msk(bucket, msk->token); -- cgit v1.2.3 From 12c1604ae1a39bef87ac099f106594b4cb433b75 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 6 Jan 2023 18:29:04 -0800 Subject: net: skb: remove old comments about frag_size for build_skb() Since commit ce098da1497c ("skbuff: Introduce slab_build_skb()") drivers trying to build skb around slab-backed buffers should go via slab_build_skb() rather than passing frag_size = 0 to the main build_skb(). Remove the copy'n'pasted comments about 0 meaning slab. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/core/skbuff.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4a0eb5593275..3a10387f9434 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -386,8 +386,6 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size) /* build_skb() is wrapper over __build_skb(), that specifically * takes care of skb->head and skb->pfmemalloc - * This means that if @frag_size is not zero, then @data must be backed - * by a page fragment, not kmalloc() or vmalloc() */ struct sk_buff *build_skb(void *data, unsigned int frag_size) { @@ -406,7 +404,7 @@ EXPORT_SYMBOL(build_skb); * build_skb_around - build a network buffer around provided skb * @skb: sk_buff provide by caller, must be memset cleared * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data */ struct sk_buff *build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) @@ -428,7 +426,7 @@ EXPORT_SYMBOL(build_skb_around); /** * __napi_build_skb - build a network buffer * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data * * Version of __build_skb() that uses NAPI percpu caches to obtain * skbuff_head instead of inplace allocation. @@ -452,7 +450,7 @@ static struct sk_buff *__napi_build_skb(void *data, unsigned int frag_size) /** * napi_build_skb - build a network buffer * @data: data buffer provided by caller - * @frag_size: size of data, or 0 if head was kmalloced + * @frag_size: size of data * * Version of __napi_build_skb() that takes care of skb->head_frag * and skb->pfmemalloc when the data is a page or page fragment. -- cgit v1.2.3 From 66cf99b55e587aa4c20d5e572142dcbf80b2684d Mon Sep 17 00:00:00 2001 From: Haiyue Wang Date: Sun, 8 Jan 2023 23:12:57 +0800 Subject: bpf: Remove the unnecessary insn buffer comparison The variable 'insn' is initialized to 'insn_buf' without being changed, only some helper macros are defined, so the insn buffer comparison is unnecessary. Just remove it. This missed removal back in 2377b81de527 ("bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation"). Signed-off-by: Haiyue Wang Signed-off-by: Daniel Borkmann Acked-by: Stanislav Fomichev Link: https://lore.kernel.org/bpf/20230108151258.96570-1-haiyue.wang@intel.com --- net/core/filter.c | 6 ------ 1 file changed, 6 deletions(-) (limited to 'net') diff --git a/net/core/filter.c b/net/core/filter.c index ab811293ae5d..d9befa6ba04e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6847,9 +6847,6 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type, FIELD)); \ } while (0) - if (insn > insn_buf) - return insn - insn_buf; - switch (si->off) { case offsetof(struct bpf_tcp_sock, rtt_min): BUILD_BUG_ON(sizeof_field(struct tcp_sock, rtt_min) != @@ -10147,9 +10144,6 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ); \ } while (0) - if (insn > insn_buf) - return insn - insn_buf; - switch (si->off) { case offsetof(struct bpf_sock_ops, op): *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, -- cgit v1.2.3 From 8580e16c28f3f1a1bee87de115157161577334b4 Mon Sep 17 00:00:00 2001 From: Piergiorgio Beruto Date: Mon, 9 Jan 2023 17:59:39 +0100 Subject: net/ethtool: add netlink interface for the PLCA RS Add support for configuring the PLCA Reconciliation Sublayer on multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g., 10BASE-T1S). This patch adds the appropriate netlink interface to ethtool. Signed-off-by: Piergiorgio Beruto Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- Documentation/networking/ethtool-netlink.rst | 138 +++++++++++++ MAINTAINERS | 6 + include/linux/ethtool.h | 12 ++ include/linux/phy.h | 57 ++++++ include/uapi/linux/ethtool_netlink.h | 25 +++ net/ethtool/Makefile | 2 +- net/ethtool/netlink.c | 29 +++ net/ethtool/netlink.h | 6 + net/ethtool/plca.c | 277 +++++++++++++++++++++++++++ 9 files changed, 551 insertions(+), 1 deletion(-) create mode 100644 net/ethtool/plca.c (limited to 'net') diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index f10f8eb44255..c59b542eb693 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -1716,6 +1716,141 @@ being used. Current supported options are toeplitz, xor or crc32. ETHTOOL_A_RSS_INDIR attribute returns RSS indrection table where each byte indicates queue number. +PLCA_GET_CFG +============ + +Gets the IEEE 802.3cg-2019 Clause 148 Physical Layer Collision Avoidance +(PLCA) Reconciliation Sublayer (RS) attributes. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_PLCA_HEADER`` nested request header + ===================================== ====== ========================== + +Kernel response contents: + + ====================================== ====== ============================= + ``ETHTOOL_A_PLCA_HEADER`` nested reply header + ``ETHTOOL_A_PLCA_VERSION`` u16 Supported PLCA management + interface standard/version + ``ETHTOOL_A_PLCA_ENABLED`` u8 PLCA Admin State + ``ETHTOOL_A_PLCA_NODE_ID`` u32 PLCA unique local node ID + ``ETHTOOL_A_PLCA_NODE_CNT`` u32 Number of PLCA nodes on the + network, including the + coordinator + ``ETHTOOL_A_PLCA_TO_TMR`` u32 Transmit Opportunity Timer + value in bit-times (BT) + ``ETHTOOL_A_PLCA_BURST_CNT`` u32 Number of additional packets + the node is allowed to send + within a single TO + ``ETHTOOL_A_PLCA_BURST_TMR`` u32 Time to wait for the MAC to + transmit a new frame before + terminating the burst + ====================================== ====== ============================= + +When set, the optional ``ETHTOOL_A_PLCA_VERSION`` attribute indicates which +standard and version the PLCA management interface complies to. When not set, +the interface is vendor-specific and (possibly) supplied by the driver. +The OPEN Alliance SIG specifies a standard register map for 10BASE-T1S PHYs +embedding the PLCA Reconcialiation Sublayer. See "10BASE-T1S PLCA Management +Registers" at https://www.opensig.org/about/specifications/. + +When set, the optional ``ETHTOOL_A_PLCA_ENABLED`` attribute indicates the +administrative state of the PLCA RS. When not set, the node operates in "plain" +CSMA/CD mode. This option is corresponding to ``IEEE 802.3cg-2019`` 30.16.1.1.1 +aPLCAAdminState / 30.16.1.2.1 acPLCAAdminControl. + +When set, the optional ``ETHTOOL_A_PLCA_NODE_ID`` attribute indicates the +configured local node ID of the PHY. This ID determines which transmit +opportunity (TO) is reserved for the node to transmit into. This option is +corresponding to ``IEEE 802.3cg-2019`` 30.16.1.1.4 aPLCALocalNodeID. The valid +range for this attribute is [0 .. 255] where 255 means "not configured". + +When set, the optional ``ETHTOOL_A_PLCA_NODE_CNT`` attribute indicates the +configured maximum number of PLCA nodes on the mixing-segment. This number +determines the total number of transmit opportunities generated during a +PLCA cycle. This attribute is relevant only for the PLCA coordinator, which is +the node with aPLCALocalNodeID set to 0. Follower nodes ignore this setting. +This option is corresponding to ``IEEE 802.3cg-2019`` 30.16.1.1.3 +aPLCANodeCount. The valid range for this attribute is [1 .. 255]. + +When set, the optional ``ETHTOOL_A_PLCA_TO_TMR`` attribute indicates the +configured value of the transmit opportunity timer in bit-times. This value +must be set equal across all nodes sharing the medium for PLCA to work +correctly. This option is corresponding to ``IEEE 802.3cg-2019`` 30.16.1.1.5 +aPLCATransmitOpportunityTimer. The valid range for this attribute is +[0 .. 255]. + +When set, the optional ``ETHTOOL_A_PLCA_BURST_CNT`` attribute indicates the +configured number of extra packets that the node is allowed to send during a +single transmit opportunity. By default, this attribute is 0, meaning that +the node can only send a sigle frame per TO. When greater than 0, the PLCA RS +keeps the TO after any transmission, waiting for the MAC to send a new frame +for up to aPLCABurstTimer BTs. This can only happen a number of times per PLCA +cycle up to the value of this parameter. After that, the burst is over and the +normal counting of TOs resumes. This option is corresponding to +``IEEE 802.3cg-2019`` 30.16.1.1.6 aPLCAMaxBurstCount. The valid range for this +attribute is [0 .. 255]. + +When set, the optional ``ETHTOOL_A_PLCA_BURST_TMR`` attribute indicates how +many bit-times the PLCA RS waits for the MAC to initiate a new transmission +when aPLCAMaxBurstCount is greater than 0. If the MAC fails to send a new +frame within this time, the burst ends and the counting of TOs resumes. +Otherwise, the new frame is sent as part of the current burst. This option +is corresponding to ``IEEE 802.3cg-2019`` 30.16.1.1.7 aPLCABurstTimer. The +valid range for this attribute is [0 .. 255]. Although, the value should be +set greater than the Inter-Frame-Gap (IFG) time of the MAC (plus some margin) +for PLCA burst mode to work as intended. + +PLCA_SET_CFG +============ + +Sets PLCA RS parameters. + +Request contents: + + ====================================== ====== ============================= + ``ETHTOOL_A_PLCA_HEADER`` nested request header + ``ETHTOOL_A_PLCA_ENABLED`` u8 PLCA Admin State + ``ETHTOOL_A_PLCA_NODE_ID`` u8 PLCA unique local node ID + ``ETHTOOL_A_PLCA_NODE_CNT`` u8 Number of PLCA nodes on the + netkork, including the + coordinator + ``ETHTOOL_A_PLCA_TO_TMR`` u8 Transmit Opportunity Timer + value in bit-times (BT) + ``ETHTOOL_A_PLCA_BURST_CNT`` u8 Number of additional packets + the node is allowed to send + within a single TO + ``ETHTOOL_A_PLCA_BURST_TMR`` u8 Time to wait for the MAC to + transmit a new frame before + terminating the burst + ====================================== ====== ============================= + +For a description of each attribute, see ``PLCA_GET_CFG``. + +PLCA_GET_STATUS +=============== + +Gets PLCA RS status information. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_PLCA_HEADER`` nested request header + ===================================== ====== ========================== + +Kernel response contents: + + ====================================== ====== ============================= + ``ETHTOOL_A_PLCA_HEADER`` nested reply header + ``ETHTOOL_A_PLCA_STATUS`` u8 PLCA RS operational status + ====================================== ====== ============================= + +When set, the ``ETHTOOL_A_PLCA_STATUS`` attribute indicates whether the node is +detecting the presence of the BEACON on the network. This flag is +corresponding to ``IEEE 802.3cg-2019`` 30.16.1.1.2 aPLCAStatus. + Request translation =================== @@ -1817,4 +1952,7 @@ are netlink only. n/a ``ETHTOOL_MSG_PHC_VCLOCKS_GET`` n/a ``ETHTOOL_MSG_MODULE_GET`` n/a ``ETHTOOL_MSG_MODULE_SET`` + n/a ``ETHTOOL_MSG_PLCA_GET_CFG`` + n/a ``ETHTOOL_MSG_PLCA_SET_CFG`` + n/a ``ETHTOOL_MSG_PLCA_GET_STATUS`` =================================== ===================================== diff --git a/MAINTAINERS b/MAINTAINERS index 0f61b5d64716..7b6e602b90d5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16616,6 +16616,12 @@ S: Maintained F: Documentation/devicetree/bindings/iio/chemical/plantower,pms7003.yaml F: drivers/iio/chemical/pms7003.c +PLCA RECONCILIATION SUBLAYER (IEEE802.3 Clause 148) +M: Piergiorgio Beruto +L: netdev@vger.kernel.org +S: Maintained +F: net/ethtool/plca.c + PLDMFW LIBRARY M: Jacob Keller S: Maintained diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 9e0a76fc7de9..d0da303f6634 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -802,12 +802,17 @@ int ethtool_virtdev_set_link_ksettings(struct net_device *dev, struct phy_device; struct phy_tdr_config; +struct phy_plca_cfg; +struct phy_plca_status; /** * struct ethtool_phy_ops - Optional PHY device options * @get_sset_count: Get number of strings that @get_strings will write. * @get_strings: Return a set of strings that describe the requested objects * @get_stats: Return extended statistics about the PHY device. + * @get_plca_cfg: Return PLCA configuration. + * @set_plca_cfg: Set PLCA configuration. + * @get_plca_status: Get PLCA configuration. * @start_cable_test: Start a cable test * @start_cable_test_tdr: Start a Time Domain Reflectometry cable test * @@ -819,6 +824,13 @@ struct ethtool_phy_ops { int (*get_strings)(struct phy_device *dev, u8 *data); int (*get_stats)(struct phy_device *dev, struct ethtool_stats *stats, u64 *data); + int (*get_plca_cfg)(struct phy_device *dev, + struct phy_plca_cfg *plca_cfg); + int (*set_plca_cfg)(struct phy_device *dev, + const struct phy_plca_cfg *plca_cfg, + struct netlink_ext_ack *extack); + int (*get_plca_status)(struct phy_device *dev, + struct phy_plca_status *plca_st); int (*start_cable_test)(struct phy_device *phydev, struct netlink_ext_ack *extack); int (*start_cable_test_tdr)(struct phy_device *phydev, diff --git a/include/linux/phy.h b/include/linux/phy.h index 89b43cda5bda..b82fdb0d82ed 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -773,6 +773,63 @@ struct phy_tdr_config { }; #define PHY_PAIR_ALL -1 +/** + * struct phy_plca_cfg - Configuration of the PLCA (Physical Layer Collision + * Avoidance) Reconciliation Sublayer. + * + * @version: read-only PLCA register map version. -1 = not available. Ignored + * when setting the configuration. Format is the same as reported by the PLCA + * IDVER register (31.CA00). -1 = not available. + * @enabled: PLCA configured mode (enabled/disabled). -1 = not available / don't + * set. 0 = disabled, anything else = enabled. + * @node_id: the PLCA local node identifier. -1 = not available / don't set. + * Allowed values [0 .. 254]. 255 = node disabled. + * @node_cnt: the PLCA node count (maximum number of nodes having a TO). Only + * meaningful for the coordinator (node_id = 0). -1 = not available / don't + * set. Allowed values [1 .. 255]. + * @to_tmr: The value of the PLCA to_timer in bit-times, which determines the + * PLCA transmit opportunity window opening. See IEEE802.3 Clause 148 for + * more details. The to_timer shall be set equal over all nodes. + * -1 = not available / don't set. Allowed values [0 .. 255]. + * @burst_cnt: controls how many additional frames a node is allowed to send in + * single transmit opportunity (TO). The default value of 0 means that the + * node is allowed exactly one frame per TO. A value of 1 allows two frames + * per TO, and so on. -1 = not available / don't set. + * Allowed values [0 .. 255]. + * @burst_tmr: controls how many bit times to wait for the MAC to send a new + * frame before interrupting the burst. This value should be set to a value + * greater than the MAC inter-packet gap (which is typically 96 bits). + * -1 = not available / don't set. Allowed values [0 .. 255]. + * + * A structure containing configuration parameters for setting/getting the PLCA + * RS configuration. The driver does not need to implement all the parameters, + * but should report what is actually used. + */ +struct phy_plca_cfg { + int version; + int enabled; + int node_id; + int node_cnt; + int to_tmr; + int burst_cnt; + int burst_tmr; +}; + +/** + * struct phy_plca_status - Status of the PLCA (Physical Layer Collision + * Avoidance) Reconciliation Sublayer. + * + * @pst: The PLCA status as reported by the PST bit in the PLCA STATUS + * register(31.CA03), indicating BEACON activity. + * + * A structure containing status information of the PLCA RS configuration. + * The driver does not need to implement all the parameters, but should report + * what is actually used. + */ +struct phy_plca_status { + bool pst; +}; + /** * struct phy_driver - Driver structure for a particular PHY type * diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 5799a9db034e..75b3d6d476ff 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -52,6 +52,9 @@ enum { ETHTOOL_MSG_PSE_GET, ETHTOOL_MSG_PSE_SET, ETHTOOL_MSG_RSS_GET, + ETHTOOL_MSG_PLCA_GET_CFG, + ETHTOOL_MSG_PLCA_SET_CFG, + ETHTOOL_MSG_PLCA_GET_STATUS, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -99,6 +102,9 @@ enum { ETHTOOL_MSG_MODULE_NTF, ETHTOOL_MSG_PSE_GET_REPLY, ETHTOOL_MSG_RSS_GET_REPLY, + ETHTOOL_MSG_PLCA_GET_CFG_REPLY, + ETHTOOL_MSG_PLCA_GET_STATUS_REPLY, + ETHTOOL_MSG_PLCA_NTF, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -894,6 +900,25 @@ enum { ETHTOOL_A_RSS_MAX = (__ETHTOOL_A_RSS_CNT - 1), }; +/* PLCA */ + +enum { + ETHTOOL_A_PLCA_UNSPEC, + ETHTOOL_A_PLCA_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_PLCA_VERSION, /* u16 */ + ETHTOOL_A_PLCA_ENABLED, /* u8 */ + ETHTOOL_A_PLCA_STATUS, /* u8 */ + ETHTOOL_A_PLCA_NODE_CNT, /* u32 */ + ETHTOOL_A_PLCA_NODE_ID, /* u32 */ + ETHTOOL_A_PLCA_TO_TMR, /* u32 */ + ETHTOOL_A_PLCA_BURST_CNT, /* u32 */ + ETHTOOL_A_PLCA_BURST_TMR, /* u32 */ + + /* add new constants above here */ + __ETHTOOL_A_PLCA_CNT, + ETHTOOL_A_PLCA_MAX = (__ETHTOOL_A_PLCA_CNT - 1) +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile index 228f13df2e18..563864c1bf5a 100644 --- a/net/ethtool/Makefile +++ b/net/ethtool/Makefile @@ -8,4 +8,4 @@ ethtool_nl-y := netlink.o bitset.o strset.o linkinfo.o linkmodes.o rss.o \ linkstate.o debug.o wol.o features.o privflags.o rings.o \ channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \ tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o \ - pse-pd.o + pse-pd.o plca.o diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index aee98be6237f..9f924875bba9 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -288,6 +288,8 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = { [ETHTOOL_MSG_MODULE_GET] = ðnl_module_request_ops, [ETHTOOL_MSG_PSE_GET] = ðnl_pse_request_ops, [ETHTOOL_MSG_RSS_GET] = ðnl_rss_request_ops, + [ETHTOOL_MSG_PLCA_GET_CFG] = ðnl_plca_cfg_request_ops, + [ETHTOOL_MSG_PLCA_GET_STATUS] = ðnl_plca_status_request_ops, }; static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb) @@ -603,6 +605,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = { [ETHTOOL_MSG_EEE_NTF] = ðnl_eee_request_ops, [ETHTOOL_MSG_FEC_NTF] = ðnl_fec_request_ops, [ETHTOOL_MSG_MODULE_NTF] = ðnl_module_request_ops, + [ETHTOOL_MSG_PLCA_NTF] = ðnl_plca_cfg_request_ops, }; /* default notification handler */ @@ -696,6 +699,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = { [ETHTOOL_MSG_EEE_NTF] = ethnl_default_notify, [ETHTOOL_MSG_FEC_NTF] = ethnl_default_notify, [ETHTOOL_MSG_MODULE_NTF] = ethnl_default_notify, + [ETHTOOL_MSG_PLCA_NTF] = ethnl_default_notify, }; void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data) @@ -1047,6 +1051,31 @@ static const struct genl_ops ethtool_genl_ops[] = { .policy = ethnl_rss_get_policy, .maxattr = ARRAY_SIZE(ethnl_rss_get_policy) - 1, }, + { + .cmd = ETHTOOL_MSG_PLCA_GET_CFG, + .doit = ethnl_default_doit, + .start = ethnl_default_start, + .dumpit = ethnl_default_dumpit, + .done = ethnl_default_done, + .policy = ethnl_plca_get_cfg_policy, + .maxattr = ARRAY_SIZE(ethnl_plca_get_cfg_policy) - 1, + }, + { + .cmd = ETHTOOL_MSG_PLCA_SET_CFG, + .flags = GENL_UNS_ADMIN_PERM, + .doit = ethnl_set_plca_cfg, + .policy = ethnl_plca_set_cfg_policy, + .maxattr = ARRAY_SIZE(ethnl_plca_set_cfg_policy) - 1, + }, + { + .cmd = ETHTOOL_MSG_PLCA_GET_STATUS, + .doit = ethnl_default_doit, + .start = ethnl_default_start, + .dumpit = ethnl_default_dumpit, + .done = ethnl_default_done, + .policy = ethnl_plca_get_status_policy, + .maxattr = ARRAY_SIZE(ethnl_plca_get_status_policy) - 1, + }, }; static const struct genl_multicast_group ethtool_nl_mcgrps[] = { diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index 3753787ba233..f271266f6e28 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -347,6 +347,8 @@ extern const struct ethnl_request_ops ethnl_phc_vclocks_request_ops; extern const struct ethnl_request_ops ethnl_module_request_ops; extern const struct ethnl_request_ops ethnl_pse_request_ops; extern const struct ethnl_request_ops ethnl_rss_request_ops; +extern const struct ethnl_request_ops ethnl_plca_cfg_request_ops; +extern const struct ethnl_request_ops ethnl_plca_status_request_ops; extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1]; extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1]; @@ -388,6 +390,9 @@ extern const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MO extern const struct nla_policy ethnl_pse_get_policy[ETHTOOL_A_PSE_HEADER + 1]; extern const struct nla_policy ethnl_pse_set_policy[ETHTOOL_A_PSE_MAX + 1]; extern const struct nla_policy ethnl_rss_get_policy[ETHTOOL_A_RSS_CONTEXT + 1]; +extern const struct nla_policy ethnl_plca_get_cfg_policy[ETHTOOL_A_PLCA_HEADER + 1]; +extern const struct nla_policy ethnl_plca_set_cfg_policy[ETHTOOL_A_PLCA_MAX + 1]; +extern const struct nla_policy ethnl_plca_get_status_policy[ETHTOOL_A_PLCA_HEADER + 1]; int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info); int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info); @@ -408,6 +413,7 @@ int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb); int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info); int ethnl_set_module(struct sk_buff *skb, struct genl_info *info); int ethnl_set_pse(struct sk_buff *skb, struct genl_info *info); +int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info); extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN]; extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN]; diff --git a/net/ethtool/plca.c b/net/ethtool/plca.c new file mode 100644 index 000000000000..d9bb13ffc654 --- /dev/null +++ b/net/ethtool/plca.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include + +#include "netlink.h" +#include "common.h" + +struct plca_req_info { + struct ethnl_req_info base; +}; + +struct plca_reply_data { + struct ethnl_reply_data base; + struct phy_plca_cfg plca_cfg; + struct phy_plca_status plca_st; +}; + +// Helpers ------------------------------------------------------------------ // + +#define PLCA_REPDATA(__reply_base) \ + container_of(__reply_base, struct plca_reply_data, base) + +static void plca_update_sint(int *dst, const struct nlattr *attr, + bool *mod) +{ + if (!attr) + return; + + *dst = nla_get_u32(attr); + *mod = true; +} + +// PLCA get configuration message ------------------------------------------- // + +const struct nla_policy ethnl_plca_get_cfg_policy[] = { + [ETHTOOL_A_PLCA_HEADER] = + NLA_POLICY_NESTED(ethnl_header_policy), +}; + +static int plca_get_cfg_prepare_data(const struct ethnl_req_info *req_base, + struct ethnl_reply_data *reply_base, + struct genl_info *info) +{ + struct plca_reply_data *data = PLCA_REPDATA(reply_base); + struct net_device *dev = reply_base->dev; + const struct ethtool_phy_ops *ops; + int ret; + + // check that the PHY device is available and connected + if (!dev->phydev) { + ret = -EOPNOTSUPP; + goto out; + } + + // note: rtnl_lock is held already by ethnl_default_doit + ops = ethtool_phy_ops; + if (!ops || !ops->get_plca_cfg) { + ret = -EOPNOTSUPP; + goto out; + } + + ret = ethnl_ops_begin(dev); + if (!ret) + goto out; + + memset(&data->plca_cfg, 0xff, + sizeof_field(struct plca_reply_data, plca_cfg)); + + ret = ops->get_plca_cfg(dev->phydev, &data->plca_cfg); + ethnl_ops_complete(dev); + +out: + return ret; +} + +static int plca_get_cfg_reply_size(const struct ethnl_req_info *req_base, + const struct ethnl_reply_data *reply_base) +{ + return nla_total_size(sizeof(u16)) + /* _VERSION */ + nla_total_size(sizeof(u8)) + /* _ENABLED */ + nla_total_size(sizeof(u32)) + /* _NODE_CNT */ + nla_total_size(sizeof(u32)) + /* _NODE_ID */ + nla_total_size(sizeof(u32)) + /* _TO_TIMER */ + nla_total_size(sizeof(u32)) + /* _BURST_COUNT */ + nla_total_size(sizeof(u32)); /* _BURST_TIMER */ +} + +static int plca_get_cfg_fill_reply(struct sk_buff *skb, + const struct ethnl_req_info *req_base, + const struct ethnl_reply_data *reply_base) +{ + const struct plca_reply_data *data = PLCA_REPDATA(reply_base); + const struct phy_plca_cfg *plca = &data->plca_cfg; + + if ((plca->version >= 0 && + nla_put_u16(skb, ETHTOOL_A_PLCA_VERSION, plca->version)) || + (plca->enabled >= 0 && + nla_put_u8(skb, ETHTOOL_A_PLCA_ENABLED, !!plca->enabled)) || + (plca->node_id >= 0 && + nla_put_u32(skb, ETHTOOL_A_PLCA_NODE_ID, plca->node_id)) || + (plca->node_cnt >= 0 && + nla_put_u32(skb, ETHTOOL_A_PLCA_NODE_CNT, plca->node_cnt)) || + (plca->to_tmr >= 0 && + nla_put_u32(skb, ETHTOOL_A_PLCA_TO_TMR, plca->to_tmr)) || + (plca->burst_cnt >= 0 && + nla_put_u32(skb, ETHTOOL_A_PLCA_BURST_CNT, plca->burst_cnt)) || + (plca->burst_tmr >= 0 && + nla_put_u32(skb, ETHTOOL_A_PLCA_BURST_TMR, plca->burst_tmr))) + return -EMSGSIZE; + + return 0; +}; + +const struct ethnl_request_ops ethnl_plca_cfg_request_ops = { + .request_cmd = ETHTOOL_MSG_PLCA_GET_CFG, + .reply_cmd = ETHTOOL_MSG_PLCA_GET_CFG_REPLY, + .hdr_attr = ETHTOOL_A_PLCA_HEADER, + .req_info_size = sizeof(struct plca_req_info), + .reply_data_size = sizeof(struct plca_reply_data), + + .prepare_data = plca_get_cfg_prepare_data, + .reply_size = plca_get_cfg_reply_size, + .fill_reply = plca_get_cfg_fill_reply, +}; + +// PLCA set configuration message ------------------------------------------- // + +const struct nla_policy ethnl_plca_set_cfg_policy[] = { + [ETHTOOL_A_PLCA_HEADER] = + NLA_POLICY_NESTED(ethnl_header_policy), + [ETHTOOL_A_PLCA_ENABLED] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_PLCA_NODE_ID] = NLA_POLICY_MAX(NLA_U32, 255), + [ETHTOOL_A_PLCA_NODE_CNT] = NLA_POLICY_RANGE(NLA_U32, 1, 255), + [ETHTOOL_A_PLCA_TO_TMR] = NLA_POLICY_MAX(NLA_U32, 255), + [ETHTOOL_A_PLCA_BURST_CNT] = NLA_POLICY_MAX(NLA_U32, 255), + [ETHTOOL_A_PLCA_BURST_TMR] = NLA_POLICY_MAX(NLA_U32, 255), +}; + +int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info) +{ + struct ethnl_req_info req_info = {}; + struct nlattr **tb = info->attrs; + const struct ethtool_phy_ops *ops; + struct phy_plca_cfg plca_cfg; + struct net_device *dev; + bool mod = false; + int ret; + + ret = ethnl_parse_header_dev_get(&req_info, + tb[ETHTOOL_A_PLCA_HEADER], + genl_info_net(info), info->extack, + true); + if (!ret) + return ret; + + dev = req_info.dev; + + rtnl_lock(); + + // check that the PHY device is available and connected + if (!dev->phydev) { + ret = -EOPNOTSUPP; + goto out_rtnl; + } + + ops = ethtool_phy_ops; + if (!ops || !ops->set_plca_cfg) { + ret = -EOPNOTSUPP; + goto out_rtnl; + } + + ret = ethnl_ops_begin(dev); + if (!ret) + goto out_rtnl; + + memset(&plca_cfg, 0xff, sizeof(plca_cfg)); + plca_update_sint(&plca_cfg.enabled, tb[ETHTOOL_A_PLCA_ENABLED], &mod); + plca_update_sint(&plca_cfg.node_id, tb[ETHTOOL_A_PLCA_NODE_ID], &mod); + plca_update_sint(&plca_cfg.node_cnt, tb[ETHTOOL_A_PLCA_NODE_CNT], &mod); + plca_update_sint(&plca_cfg.to_tmr, tb[ETHTOOL_A_PLCA_TO_TMR], &mod); + plca_update_sint(&plca_cfg.burst_cnt, tb[ETHTOOL_A_PLCA_BURST_CNT], + &mod); + plca_update_sint(&plca_cfg.burst_tmr, tb[ETHTOOL_A_PLCA_BURST_TMR], + &mod); + + ret = 0; + if (!mod) + goto out_ops; + + ret = ops->set_plca_cfg(dev->phydev, &plca_cfg, info->extack); + if (!ret) + goto out_ops; + + ethtool_notify(dev, ETHTOOL_MSG_PLCA_NTF, NULL); + +out_ops: + ethnl_ops_complete(dev); +out_rtnl: + rtnl_unlock(); + ethnl_parse_header_dev_put(&req_info); + + return ret; +} + +// PLCA get status message -------------------------------------------------- // + +const struct nla_policy ethnl_plca_get_status_policy[] = { + [ETHTOOL_A_PLCA_HEADER] = + NLA_POLICY_NESTED(ethnl_header_policy), +}; + +static int plca_get_status_prepare_data(const struct ethnl_req_info *req_base, + struct ethnl_reply_data *reply_base, + struct genl_info *info) +{ + struct plca_reply_data *data = PLCA_REPDATA(reply_base); + struct net_device *dev = reply_base->dev; + const struct ethtool_phy_ops *ops; + int ret; + + // check that the PHY device is available and connected + if (!dev->phydev) { + ret = -EOPNOTSUPP; + goto out; + } + + // note: rtnl_lock is held already by ethnl_default_doit + ops = ethtool_phy_ops; + if (!ops || !ops->get_plca_status) { + ret = -EOPNOTSUPP; + goto out; + } + + ret = ethnl_ops_begin(dev); + if (!ret) + goto out; + + memset(&data->plca_st, 0xff, + sizeof_field(struct plca_reply_data, plca_st)); + + ret = ops->get_plca_status(dev->phydev, &data->plca_st); + ethnl_ops_complete(dev); +out: + return ret; +} + +static int plca_get_status_reply_size(const struct ethnl_req_info *req_base, + const struct ethnl_reply_data *reply_base) +{ + return nla_total_size(sizeof(u8)); /* _STATUS */ +} + +static int plca_get_status_fill_reply(struct sk_buff *skb, + const struct ethnl_req_info *req_base, + const struct ethnl_reply_data *reply_base) +{ + const struct plca_reply_data *data = PLCA_REPDATA(reply_base); + const u8 status = data->plca_st.pst; + + if (nla_put_u8(skb, ETHTOOL_A_PLCA_STATUS, !!status)) + return -EMSGSIZE; + + return 0; +}; + +const struct ethnl_request_ops ethnl_plca_status_request_ops = { + .request_cmd = ETHTOOL_MSG_PLCA_GET_STATUS, + .reply_cmd = ETHTOOL_MSG_PLCA_GET_STATUS_REPLY, + .hdr_attr = ETHTOOL_A_PLCA_HEADER, + .req_info_size = sizeof(struct plca_req_info), + .reply_data_size = sizeof(struct plca_reply_data), + + .prepare_data = plca_get_status_prepare_data, + .reply_size = plca_get_status_reply_size, + .fill_reply = plca_get_status_fill_reply, +}; -- cgit v1.2.3 From 16178c8ef53dc9734302c4c07633696454579ee3 Mon Sep 17 00:00:00 2001 From: Piergiorgio Beruto Date: Mon, 9 Jan 2023 17:59:58 +0100 Subject: drivers/net/phy: add the link modes for the 10BASE-T1S Ethernet PHY This patch adds the link modes for the IEEE 802.3cg Clause 147 10BASE-T1S Ethernet PHY. According to the specifications, the 10BASE-T1S supports Point-To-Point Full-Duplex, Point-To-Point Half-Duplex and/or Point-To-Multipoint (AKA Multi-Drop) Half-Duplex operations. Signed-off-by: Piergiorgio Beruto Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- drivers/net/phy/phy-core.c | 5 ++++- drivers/net/phy/phy_device.c | 14 ++++++++++++++ drivers/net/phy/phylink.c | 6 +++++- include/linux/phy.h | 14 ++++++++++++++ include/uapi/linux/ethtool.h | 3 +++ net/ethtool/common.c | 8 ++++++++ 6 files changed, 48 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c index 5d08c627a516..a64186dc53f8 100644 --- a/drivers/net/phy/phy-core.c +++ b/drivers/net/phy/phy-core.c @@ -13,7 +13,7 @@ */ const char *phy_speed_to_str(int speed) { - BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 99, + BUILD_BUG_ON_MSG(__ETHTOOL_LINK_MODE_MASK_NBITS != 102, "Enum ethtool_link_mode_bit_indices and phylib are out of sync. " "If a speed or mode has been added please update phy_speed_to_str " "and the PHY settings array.\n"); @@ -260,6 +260,9 @@ static const struct phy_setting settings[] = { PHY_SETTING( 10, FULL, 10baseT_Full ), PHY_SETTING( 10, HALF, 10baseT_Half ), PHY_SETTING( 10, FULL, 10baseT1L_Full ), + PHY_SETTING( 10, FULL, 10baseT1S_Full ), + PHY_SETTING( 10, HALF, 10baseT1S_Half ), + PHY_SETTING( 10, HALF, 10baseT1S_P2MP_Half ), }; #undef PHY_SETTING diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index e4562859ac00..1cde41d39196 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -45,6 +45,9 @@ EXPORT_SYMBOL_GPL(phy_basic_features); __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1_features) __ro_after_init; EXPORT_SYMBOL_GPL(phy_basic_t1_features); +__ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1s_p2mp_features) __ro_after_init; +EXPORT_SYMBOL_GPL(phy_basic_t1s_p2mp_features); + __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_features) __ro_after_init; EXPORT_SYMBOL_GPL(phy_gbit_features); @@ -98,6 +101,12 @@ const int phy_basic_t1_features_array[3] = { }; EXPORT_SYMBOL_GPL(phy_basic_t1_features_array); +const int phy_basic_t1s_p2mp_features_array[2] = { + ETHTOOL_LINK_MODE_TP_BIT, + ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT, +}; +EXPORT_SYMBOL_GPL(phy_basic_t1s_p2mp_features_array); + const int phy_gbit_features_array[2] = { ETHTOOL_LINK_MODE_1000baseT_Half_BIT, ETHTOOL_LINK_MODE_1000baseT_Full_BIT, @@ -138,6 +147,11 @@ static void features_init(void) ARRAY_SIZE(phy_basic_t1_features_array), phy_basic_t1_features); + /* 10 half, P2MP, TP */ + linkmode_set_bit_array(phy_basic_t1s_p2mp_features_array, + ARRAY_SIZE(phy_basic_t1s_p2mp_features_array), + phy_basic_t1s_p2mp_features); + /* 10/100 half/full + 1000 half/full */ linkmode_set_bit_array(phy_basic_ports_array, ARRAY_SIZE(phy_basic_ports_array), diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 09cc65c0da93..319790221d7f 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -241,12 +241,16 @@ void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps) if (caps & MAC_ASYM_PAUSE) __set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, linkmodes); - if (caps & MAC_10HD) + if (caps & MAC_10HD) { __set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, linkmodes); + __set_bit(ETHTOOL_LINK_MODE_10baseT1S_Half_BIT, linkmodes); + __set_bit(ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT, linkmodes); + } if (caps & MAC_10FD) { __set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, linkmodes); __set_bit(ETHTOOL_LINK_MODE_10baseT1L_Full_BIT, linkmodes); + __set_bit(ETHTOOL_LINK_MODE_10baseT1S_Full_BIT, linkmodes); } if (caps & MAC_100HD) { diff --git a/include/linux/phy.h b/include/linux/phy.h index b82fdb0d82ed..63b199f574f7 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -45,6 +45,7 @@ extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1_features) __ro_after_init; +extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1s_p2mp_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_fibre_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_all_ports_features) __ro_after_init; @@ -54,6 +55,7 @@ extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_10gbit_full_features) __ro_after_ini #define PHY_BASIC_FEATURES ((unsigned long *)&phy_basic_features) #define PHY_BASIC_T1_FEATURES ((unsigned long *)&phy_basic_t1_features) +#define PHY_BASIC_T1S_P2MP_FEATURES ((unsigned long *)&phy_basic_t1s_p2mp_features) #define PHY_GBIT_FEATURES ((unsigned long *)&phy_gbit_features) #define PHY_GBIT_FIBRE_FEATURES ((unsigned long *)&phy_gbit_fibre_features) #define PHY_GBIT_ALL_PORTS_FEATURES ((unsigned long *)&phy_gbit_all_ports_features) @@ -66,6 +68,7 @@ extern const int phy_fibre_port_array[1]; extern const int phy_all_ports_features_array[7]; extern const int phy_10_100_features_array[4]; extern const int phy_basic_t1_features_array[3]; +extern const int phy_basic_t1s_p2mp_features_array[2]; extern const int phy_gbit_features_array[2]; extern const int phy_10gbit_features_array[1]; @@ -1041,6 +1044,17 @@ struct phy_driver { int (*get_sqi)(struct phy_device *dev); /** @get_sqi_max: Get the maximum signal quality indication */ int (*get_sqi_max)(struct phy_device *dev); + + /* PLCA RS interface */ + /** @get_plca_cfg: Return the current PLCA configuration */ + int (*get_plca_cfg)(struct phy_device *dev, + struct phy_plca_cfg *plca_cfg); + /** @set_plca_cfg: Set the PLCA configuration */ + int (*set_plca_cfg)(struct phy_device *dev, + const struct phy_plca_cfg *plca_cfg); + /** @get_plca_status: Return the current PLCA status info */ + int (*get_plca_status)(struct phy_device *dev, + struct phy_plca_status *plca_st); }; #define to_phy_driver(d) container_of(to_mdio_common_driver(d), \ struct phy_driver, mdiodrv) diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 3135fa0ba9a4..6389953c32cf 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1741,6 +1741,9 @@ enum ethtool_link_mode_bit_indices { ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT = 96, ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT = 97, ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT = 98, + ETHTOOL_LINK_MODE_10baseT1S_Full_BIT = 99, + ETHTOOL_LINK_MODE_10baseT1S_Half_BIT = 100, + ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT = 101, /* must be last entry */ __ETHTOOL_LINK_MODE_MASK_NBITS diff --git a/net/ethtool/common.c b/net/ethtool/common.c index 6f399afc2ff2..5fb19050991e 100644 --- a/net/ethtool/common.c +++ b/net/ethtool/common.c @@ -208,6 +208,9 @@ const char link_mode_names[][ETH_GSTRING_LEN] = { __DEFINE_LINK_MODE_NAME(800000, DR8_2, Full), __DEFINE_LINK_MODE_NAME(800000, SR8, Full), __DEFINE_LINK_MODE_NAME(800000, VR8, Full), + __DEFINE_LINK_MODE_NAME(10, T1S, Full), + __DEFINE_LINK_MODE_NAME(10, T1S, Half), + __DEFINE_LINK_MODE_NAME(10, T1S_P2MP, Half), }; static_assert(ARRAY_SIZE(link_mode_names) == __ETHTOOL_LINK_MODE_MASK_NBITS); @@ -244,6 +247,8 @@ static_assert(ARRAY_SIZE(link_mode_names) == __ETHTOOL_LINK_MODE_MASK_NBITS); #define __LINK_MODE_LANES_X 1 #define __LINK_MODE_LANES_FX 1 #define __LINK_MODE_LANES_T1L 1 +#define __LINK_MODE_LANES_T1S 1 +#define __LINK_MODE_LANES_T1S_P2MP 1 #define __LINK_MODE_LANES_VR8 8 #define __LINK_MODE_LANES_DR8_2 8 @@ -366,6 +371,9 @@ const struct link_mode_info link_mode_params[] = { __DEFINE_LINK_MODE_PARAMS(800000, DR8_2, Full), __DEFINE_LINK_MODE_PARAMS(800000, SR8, Full), __DEFINE_LINK_MODE_PARAMS(800000, VR8, Full), + __DEFINE_LINK_MODE_PARAMS(10, T1S, Full), + __DEFINE_LINK_MODE_PARAMS(10, T1S, Half), + __DEFINE_LINK_MODE_PARAMS(10, T1S_P2MP, Half), }; static_assert(ARRAY_SIZE(link_mode_params) == __ETHTOOL_LINK_MODE_MASK_NBITS); -- cgit v1.2.3 From 93e71edfd90ca7e07a3645167f1e8e4504d4e8ee Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 10 Jan 2023 20:29:08 -0800 Subject: devlink: keep the instance mutex alive until references are gone The reference needs to keep the instance memory around, but also the instance lock must remain valid. Users will take the lock, check registration status and release the lock. mutex_destroy() etc. belong in the same place as the freeing of the memory. Unfortunately lockdep_unregister_key() sleeps so we need to switch the an rcu_work. Note that the problem is a bit hard to repro, because devlink_pernet_pre_exit() iterates over registered instances. AFAIU the instances must get devlink_free()d concurrently with the namespace getting deleted for the problem to occur. Reported-by: syzbot+d94d214ea473e218fc89@syzkaller.appspotmail.com Reported-by: syzbot+9f0dd863b87113935acf@syzkaller.appspotmail.com Fixes: 9053637e0da7 ("devlink: remove the registration guarantee of references") Reviewed-by: Jiri Pirko Reviewed-by: Jacob Keller Link: https://lore.kernel.org/r/20230111042908.988199-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- net/devlink/core.c | 16 +++++++++++++--- net/devlink/devl_internal.h | 3 ++- 2 files changed, 15 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index a31a317626d7..60beca2df7cc 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -83,10 +83,21 @@ struct devlink *__must_check devlink_try_get(struct devlink *devlink) return NULL; } +static void devlink_release(struct work_struct *work) +{ + struct devlink *devlink; + + devlink = container_of(to_rcu_work(work), struct devlink, rwork); + + mutex_destroy(&devlink->lock); + lockdep_unregister_key(&devlink->lock_key); + kfree(devlink); +} + void devlink_put(struct devlink *devlink) { if (refcount_dec_and_test(&devlink->refcount)) - kfree_rcu(devlink, rcu); + queue_rcu_work(system_wq, &devlink->rwork); } struct devlink *devlinks_xa_find_get(struct net *net, unsigned long *indexp) @@ -231,6 +242,7 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, INIT_LIST_HEAD(&devlink->trap_list); INIT_LIST_HEAD(&devlink->trap_group_list); INIT_LIST_HEAD(&devlink->trap_policer_list); + INIT_RCU_WORK(&devlink->rwork, devlink_release); lockdep_register_key(&devlink->lock_key); mutex_init(&devlink->lock); lockdep_set_class(&devlink->lock, &devlink->lock_key); @@ -259,8 +271,6 @@ void devlink_free(struct devlink *devlink) mutex_destroy(&devlink->linecards_lock); mutex_destroy(&devlink->reporters_lock); - mutex_destroy(&devlink->lock); - lockdep_unregister_key(&devlink->lock_key); WARN_ON(!list_empty(&devlink->trap_policer_list)); WARN_ON(!list_empty(&devlink->trap_group_list)); WARN_ON(!list_empty(&devlink->trap_list)); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 5d2bbe295659..e724e4c2a4ff 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -51,7 +52,7 @@ struct devlink { struct lock_class_key lock_key; u8 reload_failed:1; refcount_t refcount; - struct rcu_head rcu; + struct rcu_work rwork; struct notifier_block netdevice_nb; char priv[] __aligned(NETDEV_ALIGN); }; -- cgit v1.2.3 From 952f6c9daf509f7919887c26753884fa530f8622 Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Mon, 26 Dec 2022 20:16:09 +0100 Subject: wifi: mac80211: Drop stations iterator where the iterator function may sleep This reverts commit acb99b9b2a08f ("mac80211: Add stations iterator where the iterator function may sleep"). A different approach was found for the rtw88 driver where most of the problematic locks were converted to a driver-local mutex. Drop ieee80211_iterate_stations() because there are no users of that function. Signed-off-by: Martin Blumenstingl Link: https://lore.kernel.org/r/20221226191609.2934234-1-martin.blumenstingl@googlemail.com Signed-off-by: Johannes Berg --- include/net/mac80211.h | 21 --------------------- net/mac80211/util.c | 13 ------------- 2 files changed, 34 deletions(-) (limited to 'net') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 689da327ce2e..b421a1bfc7c5 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -5888,9 +5888,6 @@ void ieee80211_iterate_active_interfaces_atomic(struct ieee80211_hw *hw, * This function iterates over the interfaces associated with a given * hardware that are currently active and calls the callback for them. * This version can only be used while holding the wiphy mutex. - * The driver must not call this with a lock held that it can also take in - * response to callbacks from mac80211, and it must not call this within - * callbacks made by mac80211 - both would result in deadlocks. * * @hw: the hardware struct of which the interfaces should be iterated over * @iter_flags: iteration flags, see &enum ieee80211_interface_iteration_flags @@ -5904,24 +5901,6 @@ void ieee80211_iterate_active_interfaces_mtx(struct ieee80211_hw *hw, struct ieee80211_vif *vif), void *data); -/** - * ieee80211_iterate_stations - iterate stations - * - * This function iterates over all stations associated with a given - * hardware that are currently uploaded to the driver and calls the callback - * function for them. - * This function allows the iterator function to sleep, when the iterator - * function is atomic @ieee80211_iterate_stations_atomic can be used. - * - * @hw: the hardware struct of which the interfaces should be iterated over - * @iterator: the iterator function to call, cannot sleep - * @data: first argument of the iterator function - */ -void ieee80211_iterate_stations(struct ieee80211_hw *hw, - void (*iterator)(void *data, - struct ieee80211_sta *sta), - void *data); - /** * ieee80211_iterate_stations_atomic - iterate stations * diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 6f5407038459..bc8c285355a1 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -868,19 +868,6 @@ static void __iterate_stations(struct ieee80211_local *local, } } -void ieee80211_iterate_stations(struct ieee80211_hw *hw, - void (*iterator)(void *data, - struct ieee80211_sta *sta), - void *data) -{ - struct ieee80211_local *local = hw_to_local(hw); - - mutex_lock(&local->sta_mtx); - __iterate_stations(local, iterator, data); - mutex_unlock(&local->sta_mtx); -} -EXPORT_SYMBOL_GPL(ieee80211_iterate_stations); - void ieee80211_iterate_stations_atomic(struct ieee80211_hw *hw, void (*iterator)(void *data, struct ieee80211_sta *sta), -- cgit v1.2.3 From 71a659bffeb9b70787b1e697c6110665af86ed2d Mon Sep 17 00:00:00 2001 From: Nick Hainke Date: Thu, 22 Dec 2022 10:29:57 +0100 Subject: wifi: mac80211: fix double space in comment Remove a space in "the frames". Signed-off-by: Nick Hainke Link: https://lore.kernel.org/r/20221222092957.870790-1-vincent@systemli.org Signed-off-by: Johannes Berg --- net/mac80211/cfg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 8f9a2ab502b3..c885076fae89 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -2727,7 +2727,7 @@ static int ieee80211_scan(struct wiphy *wiphy, * If the scan has been forced (and the driver supports * forcing), don't care about being beaconing already. * This will create problems to the attached stations (e.g. all - * the frames sent while scanning on other channel will be + * the frames sent while scanning on other channel will be * lost) */ if (sdata->deflink.u.ap.beacon && -- cgit v1.2.3 From d64c732dfc9edcd57feb693c23162117737e426b Mon Sep 17 00:00:00 2001 From: Philipp Zabel Date: Mon, 2 Jan 2023 18:29:34 +0100 Subject: net: rfkill: gpio: add DT support Allow probing rfkill-gpio via device tree. This hooks up the already existing support that was started in commit 262c91ee5e52 ("net: rfkill: gpio: prepare for DT and ACPI support") via the "rfkill-gpio" compatible, with the "name" and "type" properties renamed to "label" and "radio-type", respectively, in the device tree case. Signed-off-by: Philipp Zabel Link: https://lore.kernel.org/r/20230102-rfkill-gpio-dt-v2-2-d1b83758c16d@pengutronix.de Signed-off-by: Johannes Berg --- net/rfkill/rfkill-gpio.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c index f5afc9bcdee6..786dbfdad772 100644 --- a/net/rfkill/rfkill-gpio.c +++ b/net/rfkill/rfkill-gpio.c @@ -75,6 +75,8 @@ static int rfkill_gpio_probe(struct platform_device *pdev) { struct rfkill_gpio_data *rfkill; struct gpio_desc *gpio; + const char *name_property; + const char *type_property; const char *type_name; int ret; @@ -82,8 +84,15 @@ static int rfkill_gpio_probe(struct platform_device *pdev) if (!rfkill) return -ENOMEM; - device_property_read_string(&pdev->dev, "name", &rfkill->name); - device_property_read_string(&pdev->dev, "type", &type_name); + if (dev_of_node(&pdev->dev)) { + name_property = "label"; + type_property = "radio-type"; + } else { + name_property = "name"; + type_property = "type"; + } + device_property_read_string(&pdev->dev, name_property, &rfkill->name); + device_property_read_string(&pdev->dev, type_property, &type_name); if (!rfkill->name) rfkill->name = dev_name(&pdev->dev); @@ -157,12 +166,19 @@ static const struct acpi_device_id rfkill_acpi_match[] = { MODULE_DEVICE_TABLE(acpi, rfkill_acpi_match); #endif +static const struct of_device_id rfkill_of_match[] __maybe_unused = { + { .compatible = "rfkill-gpio", }, + { }, +}; +MODULE_DEVICE_TABLE(of, rfkill_of_match); + static struct platform_driver rfkill_gpio_driver = { .probe = rfkill_gpio_probe, .remove = rfkill_gpio_remove, .driver = { .name = "rfkill_gpio", .acpi_match_table = ACPI_PTR(rfkill_acpi_match), + .of_match_table = of_match_ptr(rfkill_of_match), }, }; -- cgit v1.2.3 From c43170b7e1571efc11c443fb8889842073b77fe5 Mon Sep 17 00:00:00 2001 From: Bobby Eshleman Date: Tue, 10 Jan 2023 10:13:23 +0000 Subject: vsock: return errors other than -ENOMEM to socket This removes behaviour, where error code returned from any transport was always switched to ENOMEM. For example when user tries to send too big message via SEQPACKET socket, transport layers return EMSGSIZE, but this error code was always replaced with ENOMEM and returned to user. Signed-off-by: Bobby Eshleman Signed-off-by: Arseniy Krasnov Reviewed-by: Stefano Garzarella Signed-off-by: Paolo Abeni --- net/vmw_vsock/af_vsock.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index d593d5b6d4b1..19aea7cba26e 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1861,8 +1861,9 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg, written = transport->stream_enqueue(vsk, msg, len - total_written); } + if (written < 0) { - err = -ENOMEM; + err = written; goto out_err; } -- cgit v1.2.3 From c2977c61f32e0d54319d158a04cad5fd82c2afcf Mon Sep 17 00:00:00 2001 From: Arun Ramadoss Date: Tue, 10 Jan 2023 14:19:20 +0530 Subject: net: dsa: microchip: ptp: add 4 bytes in tail tag when ptp enabled When the PTP is enabled in hardware bit 6 of PTP_MSG_CONF1 register, the transmit frame needs additional 4 bytes before the tail tag. It is needed for all the transmission packets irrespective of PTP packets or not. The 4-byte timestamp field is 0 for frames other than Pdelay_Resp. For the one-step Pdelay_Resp, the switch needs the receive timestamp of the Pdelay_Req message so that it can put the turnaround time in the correction field. Since PTP has to be enabled for both Transmission and reception timestamping, driver needs to track of the tx and rx setting of the all the user ports in the switch. Two flags hw_tx_en and hw_rx_en are added in ksz_port to track the timestampping setting of each port. When any one of ports has tx or rx timestampping enabled, bit 6 of PTP_MSG_CONF1 is set and it is indicated to tag_ksz.c through tagger bytes. This flag adds 4 additional bytes to the tail tag. When tx and rx timestamping of all the ports are disabled, then 4 bytes are not added. Tested using hwstamp -i Signed-off-by: Arun Ramadoss Reviewed-by: Vladimir Oltean # mostly api Signed-off-by: David S. Miller --- MAINTAINERS | 1 + drivers/net/dsa/microchip/ksz_common.h | 2 + drivers/net/dsa/microchip/ksz_ptp.c | 34 +++++++++++- include/linux/dsa/ksz_common.h | 22 ++++++++ net/dsa/tag_ksz.c | 95 +++++++++++++++++++++++++++++++--- 5 files changed, 145 insertions(+), 9 deletions(-) create mode 100644 include/linux/dsa/ksz_common.h (limited to 'net') diff --git a/MAINTAINERS b/MAINTAINERS index c3d796e12bb8..5d6a5d51fca0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13710,6 +13710,7 @@ S: Maintained F: Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml F: Documentation/devicetree/bindings/net/dsa/microchip,lan937x.yaml F: drivers/net/dsa/microchip/* +F: include/linux/dsa/ksz_common.h F: include/linux/platform_data/microchip-ksz.h F: net/dsa/tag_ksz.c diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index a5ce7ec30ba2..641aca78ef05 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -104,6 +104,8 @@ struct ksz_port { u8 num; #if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ_PTP) struct hwtstamp_config tstamp_config; + bool hwts_tx_en; + bool hwts_rx_en; #endif }; diff --git a/drivers/net/dsa/microchip/ksz_ptp.c b/drivers/net/dsa/microchip/ksz_ptp.c index 6f6747671610..5281aeb84db6 100644 --- a/drivers/net/dsa/microchip/ksz_ptp.c +++ b/drivers/net/dsa/microchip/ksz_ptp.c @@ -5,6 +5,7 @@ * Copyright (C) 2022 Microchip Technology Inc. */ +#include #include #include #include @@ -24,6 +25,27 @@ #define KSZ_PTP_INC_NS 40ULL /* HW clock is incremented every 40 ns (by 40) */ #define KSZ_PTP_SUBNS_BITS 32 +static int ksz_ptp_enable_mode(struct ksz_device *dev) +{ + struct ksz_tagger_data *tagger_data = ksz_tagger_data(dev->ds); + struct ksz_port *prt; + struct dsa_port *dp; + bool tag_en = false; + + dsa_switch_for_each_user_port(dp, dev->ds) { + prt = &dev->ports[dp->index]; + if (prt->hwts_tx_en || prt->hwts_rx_en) { + tag_en = true; + break; + } + } + + tagger_data->hwtstamp_set_state(dev->ds, tag_en); + + return ksz_rmw16(dev, REG_PTP_MSG_CONF1, PTP_ENABLE, + tag_en ? PTP_ENABLE : 0); +} + /* The function is return back the capability of timestamping feature when * requested through ethtool -T utility */ @@ -67,6 +89,7 @@ int ksz_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr) } static int ksz_set_hwtstamp_config(struct ksz_device *dev, + struct ksz_port *prt, struct hwtstamp_config *config) { if (config->flags) @@ -74,7 +97,10 @@ static int ksz_set_hwtstamp_config(struct ksz_device *dev, switch (config->tx_type) { case HWTSTAMP_TX_OFF: + prt->hwts_tx_en = false; + break; case HWTSTAMP_TX_ONESTEP_P2P: + prt->hwts_tx_en = true; break; default: return -ERANGE; @@ -82,25 +108,29 @@ static int ksz_set_hwtstamp_config(struct ksz_device *dev, switch (config->rx_filter) { case HWTSTAMP_FILTER_NONE: + prt->hwts_rx_en = false; break; case HWTSTAMP_FILTER_PTP_V2_L4_EVENT: case HWTSTAMP_FILTER_PTP_V2_L4_SYNC: config->rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT; + prt->hwts_rx_en = true; break; case HWTSTAMP_FILTER_PTP_V2_L2_EVENT: case HWTSTAMP_FILTER_PTP_V2_L2_SYNC: config->rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT; + prt->hwts_rx_en = true; break; case HWTSTAMP_FILTER_PTP_V2_EVENT: case HWTSTAMP_FILTER_PTP_V2_SYNC: config->rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT; + prt->hwts_rx_en = true; break; default: config->rx_filter = HWTSTAMP_FILTER_NONE; return -ERANGE; } - return 0; + return ksz_ptp_enable_mode(dev); } int ksz_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr) @@ -116,7 +146,7 @@ int ksz_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr) if (ret) return ret; - ret = ksz_set_hwtstamp_config(dev, &config); + ret = ksz_set_hwtstamp_config(dev, prt, &config); if (ret) return ret; diff --git a/include/linux/dsa/ksz_common.h b/include/linux/dsa/ksz_common.h new file mode 100644 index 000000000000..d2a54161be97 --- /dev/null +++ b/include/linux/dsa/ksz_common.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Microchip switch tag common header + * + * Copyright (C) 2022 Microchip Technology Inc. + */ + +#ifndef _NET_DSA_KSZ_COMMON_H_ +#define _NET_DSA_KSZ_COMMON_H_ + +#include + +struct ksz_tagger_data { + void (*hwtstamp_set_state)(struct dsa_switch *ds, bool on); +}; + +static inline struct ksz_tagger_data * +ksz_tagger_data(struct dsa_switch *ds) +{ + return ds->tagger_data; +} + +#endif /* _NET_DSA_KSZ_COMMON_H_ */ diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 080e5c369f5b..420a12853676 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -4,6 +4,7 @@ * Copyright (c) 2017 Microchip Technology */ +#include #include #include #include @@ -16,9 +17,58 @@ #define LAN937X_NAME "lan937x" /* Typically only one byte is used for tail tag. */ +#define KSZ_PTP_TAG_LEN 4 #define KSZ_EGRESS_TAG_LEN 1 #define KSZ_INGRESS_TAG_LEN 1 +#define KSZ_HWTS_EN 0 + +struct ksz_tagger_private { + struct ksz_tagger_data data; /* Must be first */ + unsigned long state; +}; + +static struct ksz_tagger_private * +ksz_tagger_private(struct dsa_switch *ds) +{ + return ds->tagger_data; +} + +static void ksz_hwtstamp_set_state(struct dsa_switch *ds, bool on) +{ + struct ksz_tagger_private *priv = ksz_tagger_private(ds); + + if (on) + set_bit(KSZ_HWTS_EN, &priv->state); + else + clear_bit(KSZ_HWTS_EN, &priv->state); +} + +static void ksz_disconnect(struct dsa_switch *ds) +{ + struct ksz_tagger_private *priv = ds->tagger_data; + + kfree(priv); + ds->tagger_data = NULL; +} + +static int ksz_connect(struct dsa_switch *ds) +{ + struct ksz_tagger_data *tagger_data; + struct ksz_tagger_private *priv; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + /* Export functions for switch driver use */ + tagger_data = &priv->data; + tagger_data->hwtstamp_set_state = ksz_hwtstamp_set_state; + ds->tagger_data = priv; + + return 0; +} + static struct sk_buff *ksz_common_rcv(struct sk_buff *skb, struct net_device *dev, unsigned int port, unsigned int len) @@ -92,10 +142,12 @@ DSA_TAG_DRIVER(ksz8795_netdev_ops); MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ8795, KSZ8795_NAME); /* - * For Ingress (Host -> KSZ9477), 2 bytes are added before FCS. + * For Ingress (Host -> KSZ9477), 2/6 bytes are added before FCS. * --------------------------------------------------------------------------- - * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes) + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|ts(4bytes)|tag0(1byte)|tag1(1byte)| + * FCS(4bytes) * --------------------------------------------------------------------------- + * ts : time stamp (Present only if PTP is enabled in the Hardware) * tag0 : Prioritization (not used now) * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5) * @@ -114,6 +166,21 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ8795, KSZ8795_NAME); #define KSZ9477_TAIL_TAG_OVERRIDE BIT(9) #define KSZ9477_TAIL_TAG_LOOKUP BIT(10) +/* Time stamp tag *needs* to be inserted if PTP is enabled in hardware. + * Regardless of Whether it is a PTP frame or not. + */ +static void ksz_xmit_timestamp(struct dsa_port *dp, struct sk_buff *skb) +{ + struct ksz_tagger_private *priv; + + priv = ksz_tagger_private(dp->ds); + + if (!test_bit(KSZ_HWTS_EN, &priv->state)) + return; + + put_unaligned_be32(0, skb_put(skb, KSZ_PTP_TAG_LEN)); +} + static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, struct net_device *dev) { @@ -126,6 +193,8 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, return NULL; /* Tag encoding */ + ksz_xmit_timestamp(dp, skb); + tag = skb_put(skb, KSZ9477_INGRESS_TAG_LEN); addr = skb_mac_header(skb); @@ -158,7 +227,9 @@ static const struct dsa_device_ops ksz9477_netdev_ops = { .proto = DSA_TAG_PROTO_KSZ9477, .xmit = ksz9477_xmit, .rcv = ksz9477_rcv, - .needed_tailroom = KSZ9477_INGRESS_TAG_LEN, + .connect = ksz_connect, + .disconnect = ksz_disconnect, + .needed_tailroom = KSZ9477_INGRESS_TAG_LEN + KSZ_PTP_TAG_LEN, }; DSA_TAG_DRIVER(ksz9477_netdev_ops); @@ -178,6 +249,8 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, return NULL; /* Tag encoding */ + ksz_xmit_timestamp(dp, skb); + tag = skb_put(skb, KSZ_INGRESS_TAG_LEN); addr = skb_mac_header(skb); @@ -194,16 +267,20 @@ static const struct dsa_device_ops ksz9893_netdev_ops = { .proto = DSA_TAG_PROTO_KSZ9893, .xmit = ksz9893_xmit, .rcv = ksz9477_rcv, - .needed_tailroom = KSZ_INGRESS_TAG_LEN, + .connect = ksz_connect, + .disconnect = ksz_disconnect, + .needed_tailroom = KSZ_INGRESS_TAG_LEN + KSZ_PTP_TAG_LEN, }; DSA_TAG_DRIVER(ksz9893_netdev_ops); MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ9893, KSZ9893_NAME); -/* For xmit, 2 bytes are added before FCS. +/* For xmit, 2/6 bytes are added before FCS. * --------------------------------------------------------------------------- - * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|tag1(1byte)|FCS(4bytes) + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|ts(4bytes)|tag0(1byte)|tag1(1byte)| + * FCS(4bytes) * --------------------------------------------------------------------------- + * ts : time stamp (Present only if PTP is enabled in the Hardware) * tag0 : represents tag override, lookup and valid * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x80=port8) * @@ -232,6 +309,8 @@ static struct sk_buff *lan937x_xmit(struct sk_buff *skb, if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) return NULL; + ksz_xmit_timestamp(dp, skb); + tag = skb_put(skb, LAN937X_EGRESS_TAG_LEN); val = BIT(dp->index); @@ -252,7 +331,9 @@ static const struct dsa_device_ops lan937x_netdev_ops = { .proto = DSA_TAG_PROTO_LAN937X, .xmit = lan937x_xmit, .rcv = ksz9477_rcv, - .needed_tailroom = LAN937X_EGRESS_TAG_LEN, + .connect = ksz_connect, + .disconnect = ksz_disconnect, + .needed_tailroom = LAN937X_EGRESS_TAG_LEN + KSZ_PTP_TAG_LEN, }; DSA_TAG_DRIVER(lan937x_netdev_ops); -- cgit v1.2.3 From 90188fff655d8efad79acdf60c3012ffcbbef716 Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Tue, 10 Jan 2023 14:19:24 +0530 Subject: net: dsa: microchip: ptp: add packet reception timestamping Rx Timestamping is done through 4 additional bytes in tail tag. Whenever the ptp packet is received, the 4 byte hardware time stamped value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates it as PTP frame. This 4 byte value is extracted from the tail tag and reconstructed to absolute time and assigned to skb hwtstamp. If the packet received in PDelay_Resp, then partial ingress timestamp is subtracted from the correction field. Since user space tools expects to be done in hardware. Signed-off-by: Christian Eggers Co-developed-by: Arun Ramadoss Signed-off-by: Arun Ramadoss Reviewed-by: Vladimir Oltean Signed-off-by: David S. Miller --- drivers/net/dsa/microchip/ksz_common.c | 1 + drivers/net/dsa/microchip/ksz_ptp.c | 63 ++++++++++++++++++++++++++++++++++ drivers/net/dsa/microchip/ksz_ptp.h | 4 +++ include/linux/dsa/ksz_common.h | 21 ++++++++++++ net/dsa/tag_ksz.c | 25 ++++++++++---- 5 files changed, 108 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index bdd068322ca0..b4e7d579ac51 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -2991,6 +2991,7 @@ static const struct dsa_switch_ops ksz_switch_ops = { .get_ts_info = ksz_get_ts_info, .port_hwtstamp_get = ksz_hwtstamp_get, .port_hwtstamp_set = ksz_hwtstamp_set, + .port_rxtstamp = ksz_port_rxtstamp, }; struct ksz_device *ksz_switch_alloc(struct device *base, void *priv) diff --git a/drivers/net/dsa/microchip/ksz_ptp.c b/drivers/net/dsa/microchip/ksz_ptp.c index 6cf30bf50c7e..29413fb608ed 100644 --- a/drivers/net/dsa/microchip/ksz_ptp.c +++ b/drivers/net/dsa/microchip/ksz_ptp.c @@ -169,6 +169,69 @@ int ksz_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr) return copy_to_user(ifr->ifr_data, &config, sizeof(config)); } +static ktime_t ksz_tstamp_reconstruct(struct ksz_device *dev, ktime_t tstamp) +{ + struct timespec64 ptp_clock_time; + struct ksz_ptp_data *ptp_data; + struct timespec64 diff; + struct timespec64 ts; + + ptp_data = &dev->ptp_data; + ts = ktime_to_timespec64(tstamp); + + spin_lock_bh(&ptp_data->clock_lock); + ptp_clock_time = ptp_data->clock_time; + spin_unlock_bh(&ptp_data->clock_lock); + + /* calculate full time from partial time stamp */ + ts.tv_sec = (ptp_clock_time.tv_sec & ~3) | ts.tv_sec; + + /* find nearest possible point in time */ + diff = timespec64_sub(ts, ptp_clock_time); + if (diff.tv_sec > 2) + ts.tv_sec -= 4; + else if (diff.tv_sec < -2) + ts.tv_sec += 4; + + return timespec64_to_ktime(ts); +} + +bool ksz_port_rxtstamp(struct dsa_switch *ds, int port, struct sk_buff *skb, + unsigned int type) +{ + struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); + struct ksz_device *dev = ds->priv; + struct ptp_header *ptp_hdr; + u8 ptp_msg_type; + ktime_t tstamp; + s64 correction; + + tstamp = KSZ_SKB_CB(skb)->tstamp; + memset(hwtstamps, 0, sizeof(*hwtstamps)); + hwtstamps->hwtstamp = ksz_tstamp_reconstruct(dev, tstamp); + + ptp_hdr = ptp_parse_header(skb, type); + if (!ptp_hdr) + goto out; + + ptp_msg_type = ptp_get_msgtype(ptp_hdr, type); + if (ptp_msg_type != PTP_MSGTYPE_PDELAY_REQ) + goto out; + + /* Only subtract the partial time stamp from the correction field. When + * the hardware adds the egress time stamp to the correction field of + * the PDelay_Resp message on tx, also only the partial time stamp will + * be added. + */ + correction = (s64)get_unaligned_be64(&ptp_hdr->correction); + correction -= ktime_to_ns(tstamp) << 16; + + ptp_header_update_correction(skb, type, ptp_hdr, correction); + +out: + return false; +} + static int _ksz_ptp_gettime(struct ksz_device *dev, struct timespec64 *ts) { u32 nanoseconds; diff --git a/drivers/net/dsa/microchip/ksz_ptp.h b/drivers/net/dsa/microchip/ksz_ptp.h index 7c5679372705..9bb8fb059ac2 100644 --- a/drivers/net/dsa/microchip/ksz_ptp.h +++ b/drivers/net/dsa/microchip/ksz_ptp.h @@ -30,6 +30,8 @@ int ksz_get_ts_info(struct dsa_switch *ds, int port, struct ethtool_ts_info *ts); int ksz_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr); int ksz_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr); +bool ksz_port_rxtstamp(struct dsa_switch *ds, int port, struct sk_buff *skb, + unsigned int type); int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p); void ksz_ptp_irq_free(struct dsa_switch *ds, u8 p); @@ -60,6 +62,8 @@ static inline void ksz_ptp_irq_free(struct dsa_switch *ds, u8 p) {} #define ksz_hwtstamp_set NULL +#define ksz_port_rxtstamp NULL + #endif /* End of CONFIG_NET_DSA_MICROCHIP_KSZ_PTP */ #endif diff --git a/include/linux/dsa/ksz_common.h b/include/linux/dsa/ksz_common.h index d2a54161be97..a256b08d837d 100644 --- a/include/linux/dsa/ksz_common.h +++ b/include/linux/dsa/ksz_common.h @@ -9,10 +9,31 @@ #include +/* All time stamps from the KSZ consist of 2 bits for seconds and 30 bits for + * nanoseconds. This is NOT the same as 32 bits for nanoseconds. + */ +#define KSZ_TSTAMP_SEC_MASK GENMASK(31, 30) +#define KSZ_TSTAMP_NSEC_MASK GENMASK(29, 0) + +static inline ktime_t ksz_decode_tstamp(u32 tstamp) +{ + u64 ns = FIELD_GET(KSZ_TSTAMP_SEC_MASK, tstamp) * NSEC_PER_SEC + + FIELD_GET(KSZ_TSTAMP_NSEC_MASK, tstamp); + + return ns_to_ktime(ns); +} + struct ksz_tagger_data { void (*hwtstamp_set_state)(struct dsa_switch *ds, bool on); }; +struct ksz_skb_cb { + u32 tstamp; +}; + +#define KSZ_SKB_CB(skb) \ + ((struct ksz_skb_cb *)((skb)->cb)) + static inline struct ksz_tagger_data * ksz_tagger_data(struct dsa_switch *ds) { diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 420a12853676..6603eaa234d2 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -151,10 +151,11 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ8795, KSZ8795_NAME); * tag0 : Prioritization (not used now) * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x10=port5) * - * For Egress (KSZ9477 -> Host), 1 byte is added before FCS. + * For Egress (KSZ9477 -> Host), 1/5 bytes is added before FCS. * --------------------------------------------------------------------------- - * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|FCS(4bytes) + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|ts(4bytes)|tag0(1byte)|FCS(4bytes) * --------------------------------------------------------------------------- + * ts : time stamp (Present only if bit 7 of tag0 is set) * tag0 : zero-based value represents port * (eg, 0x00=port1, 0x02=port3, 0x06=port7) */ @@ -166,6 +167,15 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ8795, KSZ8795_NAME); #define KSZ9477_TAIL_TAG_OVERRIDE BIT(9) #define KSZ9477_TAIL_TAG_LOOKUP BIT(10) +static void ksz_rcv_timestamp(struct sk_buff *skb, u8 *tag) +{ + u8 *tstamp_raw = tag - KSZ_PTP_TAG_LEN; + ktime_t tstamp; + + tstamp = ksz_decode_tstamp(get_unaligned_be32(tstamp_raw)); + KSZ_SKB_CB(skb)->tstamp = tstamp; +} + /* Time stamp tag *needs* to be inserted if PTP is enabled in hardware. * Regardless of Whether it is a PTP frame or not. */ @@ -216,8 +226,10 @@ static struct sk_buff *ksz9477_rcv(struct sk_buff *skb, struct net_device *dev) unsigned int len = KSZ_EGRESS_TAG_LEN; /* Extra 4-bytes PTP timestamp */ - if (tag[0] & KSZ9477_PTP_TAG_INDICATION) - len += KSZ9477_PTP_TAG_LEN; + if (tag[0] & KSZ9477_PTP_TAG_INDICATION) { + ksz_rcv_timestamp(skb, tag); + len += KSZ_PTP_TAG_LEN; + } return ksz_common_rcv(skb, dev, port, len); } @@ -284,10 +296,11 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ9893, KSZ9893_NAME); * tag0 : represents tag override, lookup and valid * tag1 : each bit represents port (eg, 0x01=port1, 0x02=port2, 0x80=port8) * - * For rcv, 1 byte is added before FCS. + * For rcv, 1/5 bytes is added before FCS. * --------------------------------------------------------------------------- - * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|FCS(4bytes) + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|ts(4bytes)|tag0(1byte)|FCS(4bytes) * --------------------------------------------------------------------------- + * ts : time stamp (Present only if bit 7 of tag0 is set) * tag0 : zero-based value represents port * (eg, 0x00=port1, 0x02=port3, 0x07=port8) */ -- cgit v1.2.3 From ab32f56a4100f879c5064b45c7657bb4be175ac3 Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Tue, 10 Jan 2023 14:19:25 +0530 Subject: net: dsa: microchip: ptp: add packet transmission timestamping This patch adds the routines for transmission of ptp packets. When the ptp pdelay_req packet to be transmitted, it uses the deferred xmit worker to schedule the packets. During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are enabled. So interrupt is triggered for all three packets. But for p2p1step, we require only time stamp of Pdelay_req packet. Hence to avoid posting of the completion from ISR routine for Sync and Pdelay_resp packets, ts_en flag is introduced. This controls which packets need to processed for timestamp. After the packet is transmitted, ISR is triggered. The time at which packet transmitted is recorded to separate register. This value is reconstructed to absolute time and posted to the user application through socket error queue. Signed-off-by: Christian Eggers Co-developed-by: Arun Ramadoss Signed-off-by: Arun Ramadoss Reviewed-by: Vladimir Oltean Signed-off-by: David S. Miller --- drivers/net/dsa/microchip/ksz_common.c | 14 ++++ drivers/net/dsa/microchip/ksz_common.h | 3 + drivers/net/dsa/microchip/ksz_ptp.c | 115 ++++++++++++++++++++++++++++++++- drivers/net/dsa/microchip/ksz_ptp.h | 6 ++ include/linux/dsa/ksz_common.h | 8 +++ net/dsa/tag_ksz.c | 54 +++++++++++++++- 6 files changed, 196 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index b4e7d579ac51..5e1e5bd555d2 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -6,6 +6,7 @@ */ #include +#include #include #include #include @@ -2539,6 +2540,17 @@ static enum dsa_tag_protocol ksz_get_tag_protocol(struct dsa_switch *ds, return proto; } +static int ksz_connect_tag_protocol(struct dsa_switch *ds, + enum dsa_tag_protocol proto) +{ + struct ksz_tagger_data *tagger_data; + + tagger_data = ksz_tagger_data(ds); + tagger_data->xmit_work_fn = ksz_port_deferred_xmit; + + return 0; +} + static int ksz_port_vlan_filtering(struct dsa_switch *ds, int port, bool flag, struct netlink_ext_ack *extack) { @@ -2954,6 +2966,7 @@ static int ksz_switch_detect(struct ksz_device *dev) static const struct dsa_switch_ops ksz_switch_ops = { .get_tag_protocol = ksz_get_tag_protocol, + .connect_tag_protocol = ksz_connect_tag_protocol, .get_phy_flags = ksz_get_phy_flags, .setup = ksz_setup, .teardown = ksz_teardown, @@ -2991,6 +3004,7 @@ static const struct dsa_switch_ops ksz_switch_ops = { .get_ts_info = ksz_get_ts_info, .port_hwtstamp_get = ksz_hwtstamp_get, .port_hwtstamp_set = ksz_hwtstamp_set, + .port_txtstamp = ksz_port_txtstamp, .port_rxtstamp = ksz_port_rxtstamp, }; diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index ec1bceb4efcc..c8b49c86dfe1 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -87,6 +87,7 @@ struct ksz_irq { struct ksz_ptp_irq { struct ksz_port *port; u16 ts_reg; + bool ts_en; char name[16]; int num; }; @@ -116,6 +117,8 @@ struct ksz_port { bool hwts_rx_en; struct ksz_irq ptpirq; struct ksz_ptp_irq ptpmsg_irq[3]; + ktime_t tstamp_msg; + struct completion tstamp_msg_comp; #endif }; diff --git a/drivers/net/dsa/microchip/ksz_ptp.c b/drivers/net/dsa/microchip/ksz_ptp.c index 29413fb608ed..6edce141cbd7 100644 --- a/drivers/net/dsa/microchip/ksz_ptp.c +++ b/drivers/net/dsa/microchip/ksz_ptp.c @@ -18,6 +18,8 @@ #define ptp_caps_to_data(d) container_of((d), struct ksz_ptp_data, caps) #define ptp_data_to_ksz_dev(d) container_of((d), struct ksz_device, ptp_data) +#define work_to_xmit_work(w) \ + container_of((w), struct ksz_deferred_xmit_work, work) /* Sub-nanoseconds-adj,max * sub-nanoseconds / 40ns * 1ns * = (2^30-1) * (2 ^ 32) / 40 ns * 1 ns = 6249999 @@ -111,9 +113,15 @@ static int ksz_set_hwtstamp_config(struct ksz_device *dev, switch (config->tx_type) { case HWTSTAMP_TX_OFF: + prt->ptpmsg_irq[KSZ_SYNC_MSG].ts_en = false; + prt->ptpmsg_irq[KSZ_XDREQ_MSG].ts_en = false; + prt->ptpmsg_irq[KSZ_PDRES_MSG].ts_en = false; prt->hwts_tx_en = false; break; case HWTSTAMP_TX_ONESTEP_P2P: + prt->ptpmsg_irq[KSZ_SYNC_MSG].ts_en = false; + prt->ptpmsg_irq[KSZ_XDREQ_MSG].ts_en = true; + prt->ptpmsg_irq[KSZ_PDRES_MSG].ts_en = false; prt->hwts_tx_en = true; break; default: @@ -232,6 +240,87 @@ out: return false; } +void ksz_port_txtstamp(struct dsa_switch *ds, int port, struct sk_buff *skb) +{ + struct ksz_device *dev = ds->priv; + struct ptp_header *hdr; + struct sk_buff *clone; + struct ksz_port *prt; + unsigned int type; + u8 ptp_msg_type; + + prt = &dev->ports[port]; + + if (!prt->hwts_tx_en) + return; + + type = ptp_classify_raw(skb); + if (type == PTP_CLASS_NONE) + return; + + hdr = ptp_parse_header(skb, type); + if (!hdr) + return; + + ptp_msg_type = ptp_get_msgtype(hdr, type); + + switch (ptp_msg_type) { + case PTP_MSGTYPE_PDELAY_REQ: + break; + + default: + return; + } + + clone = skb_clone_sk(skb); + if (!clone) + return; + + /* caching the value to be used in tag_ksz.c */ + KSZ_SKB_CB(skb)->clone = clone; +} + +static void ksz_ptp_txtstamp_skb(struct ksz_device *dev, + struct ksz_port *prt, struct sk_buff *skb) +{ + struct skb_shared_hwtstamps hwtstamps = {}; + int ret; + + /* timeout must include DSA master to transmit data, tstamp latency, + * IRQ latency and time for reading the time stamp. + */ + ret = wait_for_completion_timeout(&prt->tstamp_msg_comp, + msecs_to_jiffies(100)); + if (!ret) + return; + + hwtstamps.hwtstamp = prt->tstamp_msg; + skb_complete_tx_timestamp(skb, &hwtstamps); +} + +void ksz_port_deferred_xmit(struct kthread_work *work) +{ + struct ksz_deferred_xmit_work *xmit_work = work_to_xmit_work(work); + struct sk_buff *clone, *skb = xmit_work->skb; + struct dsa_switch *ds = xmit_work->dp->ds; + struct ksz_device *dev = ds->priv; + struct ksz_port *prt; + + prt = &dev->ports[xmit_work->dp->index]; + + clone = KSZ_SKB_CB(skb)->clone; + + skb_shinfo(clone)->tx_flags |= SKBTX_IN_PROGRESS; + + reinit_completion(&prt->tstamp_msg_comp); + + dsa_enqueue_skb(skb, skb->dev); + + ksz_ptp_txtstamp_skb(dev, prt, clone); + + kfree(xmit_work); +} + static int _ksz_ptp_gettime(struct ksz_device *dev, struct timespec64 *ts) { u32 nanoseconds; @@ -488,7 +577,29 @@ void ksz_ptp_clock_unregister(struct dsa_switch *ds) static irqreturn_t ksz_ptp_msg_thread_fn(int irq, void *dev_id) { - return IRQ_NONE; + struct ksz_ptp_irq *ptpmsg_irq = dev_id; + struct ksz_device *dev; + struct ksz_port *port; + u32 tstamp_raw; + ktime_t tstamp; + int ret; + + port = ptpmsg_irq->port; + dev = port->ksz_dev; + + if (ptpmsg_irq->ts_en) { + ret = ksz_read32(dev, ptpmsg_irq->ts_reg, &tstamp_raw); + if (ret) + return IRQ_NONE; + + tstamp = ksz_decode_tstamp(tstamp_raw); + + port->tstamp_msg = ksz_tstamp_reconstruct(dev, tstamp); + + complete(&port->tstamp_msg_comp); + } + + return IRQ_HANDLED; } static irqreturn_t ksz_ptp_irq_thread_fn(int irq, void *dev_id) @@ -633,6 +744,8 @@ int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p) REG_PTP_PORT_TX_INT_STATUS__2); snprintf(ptpirq->name, sizeof(ptpirq->name), "ptp-irq-%d", p); + init_completion(&port->tstamp_msg_comp); + ptpirq->domain = irq_domain_add_linear(dev->dev->of_node, ptpirq->nirqs, &ksz_ptp_irq_domain_ops, ptpirq); if (!ptpirq->domain) diff --git a/drivers/net/dsa/microchip/ksz_ptp.h b/drivers/net/dsa/microchip/ksz_ptp.h index 9bb8fb059ac2..0b14aed71ec2 100644 --- a/drivers/net/dsa/microchip/ksz_ptp.h +++ b/drivers/net/dsa/microchip/ksz_ptp.h @@ -30,6 +30,8 @@ int ksz_get_ts_info(struct dsa_switch *ds, int port, struct ethtool_ts_info *ts); int ksz_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr); int ksz_hwtstamp_set(struct dsa_switch *ds, int port, struct ifreq *ifr); +void ksz_port_txtstamp(struct dsa_switch *ds, int port, struct sk_buff *skb); +void ksz_port_deferred_xmit(struct kthread_work *work); bool ksz_port_rxtstamp(struct dsa_switch *ds, int port, struct sk_buff *skb, unsigned int type); int ksz_ptp_irq_setup(struct dsa_switch *ds, u8 p); @@ -64,6 +66,10 @@ static inline void ksz_ptp_irq_free(struct dsa_switch *ds, u8 p) {} #define ksz_port_rxtstamp NULL +#define ksz_port_txtstamp NULL + +#define ksz_port_deferred_xmit NULL + #endif /* End of CONFIG_NET_DSA_MICROCHIP_KSZ_PTP */ #endif diff --git a/include/linux/dsa/ksz_common.h b/include/linux/dsa/ksz_common.h index a256b08d837d..b91beab5e138 100644 --- a/include/linux/dsa/ksz_common.h +++ b/include/linux/dsa/ksz_common.h @@ -23,11 +23,19 @@ static inline ktime_t ksz_decode_tstamp(u32 tstamp) return ns_to_ktime(ns); } +struct ksz_deferred_xmit_work { + struct dsa_port *dp; + struct sk_buff *skb; + struct kthread_work work; +}; + struct ksz_tagger_data { + void (*xmit_work_fn)(struct kthread_work *work); void (*hwtstamp_set_state)(struct dsa_switch *ds, bool on); }; struct ksz_skb_cb { + struct sk_buff *clone; u32 tstamp; }; diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 6603eaa234d2..e14ee26bf6a0 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -26,6 +26,7 @@ struct ksz_tagger_private { struct ksz_tagger_data data; /* Must be first */ unsigned long state; + struct kthread_worker *xmit_worker; }; static struct ksz_tagger_private * @@ -48,6 +49,7 @@ static void ksz_disconnect(struct dsa_switch *ds) { struct ksz_tagger_private *priv = ds->tagger_data; + kthread_destroy_worker(priv->xmit_worker); kfree(priv); ds->tagger_data = NULL; } @@ -55,12 +57,23 @@ static void ksz_disconnect(struct dsa_switch *ds) static int ksz_connect(struct dsa_switch *ds) { struct ksz_tagger_data *tagger_data; + struct kthread_worker *xmit_worker; struct ksz_tagger_private *priv; + int ret; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; + xmit_worker = kthread_create_worker(0, "dsa%d:%d_xmit", + ds->dst->index, ds->index); + if (IS_ERR(xmit_worker)) { + ret = PTR_ERR(xmit_worker); + kfree(priv); + return ret; + } + + priv->xmit_worker = xmit_worker; /* Export functions for switch driver use */ tagger_data = &priv->data; tagger_data->hwtstamp_set_state = ksz_hwtstamp_set_state; @@ -191,6 +204,41 @@ static void ksz_xmit_timestamp(struct dsa_port *dp, struct sk_buff *skb) put_unaligned_be32(0, skb_put(skb, KSZ_PTP_TAG_LEN)); } +/* Defer transmit if waiting for egress time stamp is required. */ +static struct sk_buff *ksz_defer_xmit(struct dsa_port *dp, struct sk_buff *skb) +{ + struct ksz_tagger_data *tagger_data = ksz_tagger_data(dp->ds); + struct ksz_tagger_private *priv = ksz_tagger_private(dp->ds); + void (*xmit_work_fn)(struct kthread_work *work); + struct sk_buff *clone = KSZ_SKB_CB(skb)->clone; + struct ksz_deferred_xmit_work *xmit_work; + struct kthread_worker *xmit_worker; + + if (!clone) + return skb; /* no deferred xmit for this packet */ + + xmit_work_fn = tagger_data->xmit_work_fn; + xmit_worker = priv->xmit_worker; + + if (!xmit_work_fn || !xmit_worker) + return NULL; + + xmit_work = kzalloc(sizeof(*xmit_work), GFP_ATOMIC); + if (!xmit_work) + return NULL; + + kthread_init_work(&xmit_work->work, xmit_work_fn); + /* Increase refcount so the kfree_skb in dsa_slave_xmit + * won't really free the packet. + */ + xmit_work->dp = dp; + xmit_work->skb = skb_get(skb); + + kthread_queue_work(xmit_worker, &xmit_work->work); + + return NULL; +} + static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, struct net_device *dev) { @@ -215,7 +263,7 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, *tag = cpu_to_be16(val); - return skb; + return ksz_defer_xmit(dp, skb); } static struct sk_buff *ksz9477_rcv(struct sk_buff *skb, struct net_device *dev) @@ -271,7 +319,7 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, if (is_link_local_ether_addr(addr)) *tag |= KSZ9893_TAIL_TAG_OVERRIDE; - return skb; + return ksz_defer_xmit(dp, skb); } static const struct dsa_device_ops ksz9893_netdev_ops = { @@ -336,7 +384,7 @@ static struct sk_buff *lan937x_xmit(struct sk_buff *skb, put_unaligned_be16(val, tag); - return skb; + return ksz_defer_xmit(dp, skb); } static const struct dsa_device_ops lan937x_netdev_ops = { -- cgit v1.2.3 From a32190b154bde4a3bf2fddb9367aec49be09b15d Mon Sep 17 00:00:00 2001 From: Christian Eggers Date: Tue, 10 Jan 2023 14:19:26 +0530 Subject: net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tag For PDelay_Resp messages we will likely have a negative value in the correction field. The switch hardware cannot correctly update such values (produces an off by one error in the UDP checksum), so it must be moved to the time stamp field in the tail tag. Format of the correction field is 48 bit ns + 16 bit fractional ns. After updating the correction field, clone is no longer required hence it is freed. Signed-off-by: Christian Eggers Co-developed-by: Arun Ramadoss Signed-off-by: Arun Ramadoss Signed-off-by: David S. Miller --- drivers/net/dsa/microchip/ksz_ptp.c | 4 ++++ include/linux/dsa/ksz_common.h | 2 ++ net/dsa/tag_ksz.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 34 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/drivers/net/dsa/microchip/ksz_ptp.c b/drivers/net/dsa/microchip/ksz_ptp.c index 6edce141cbd7..2a68649943d5 100644 --- a/drivers/net/dsa/microchip/ksz_ptp.c +++ b/drivers/net/dsa/microchip/ksz_ptp.c @@ -267,6 +267,10 @@ void ksz_port_txtstamp(struct dsa_switch *ds, int port, struct sk_buff *skb) switch (ptp_msg_type) { case PTP_MSGTYPE_PDELAY_REQ: break; + case PTP_MSGTYPE_PDELAY_RESP: + KSZ_SKB_CB(skb)->ptp_type = type; + KSZ_SKB_CB(skb)->update_correction = true; + return; default: return; diff --git a/include/linux/dsa/ksz_common.h b/include/linux/dsa/ksz_common.h index b91beab5e138..576a99ca698d 100644 --- a/include/linux/dsa/ksz_common.h +++ b/include/linux/dsa/ksz_common.h @@ -36,6 +36,8 @@ struct ksz_tagger_data { struct ksz_skb_cb { struct sk_buff *clone; + unsigned int ptp_type; + bool update_correction; u32 tstamp; }; diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index e14ee26bf6a0..694478fe07d6 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "tag.h" @@ -195,13 +196,39 @@ static void ksz_rcv_timestamp(struct sk_buff *skb, u8 *tag) static void ksz_xmit_timestamp(struct dsa_port *dp, struct sk_buff *skb) { struct ksz_tagger_private *priv; + struct ptp_header *ptp_hdr; + unsigned int ptp_type; + u32 tstamp_raw = 0; + s64 correction; priv = ksz_tagger_private(dp->ds); if (!test_bit(KSZ_HWTS_EN, &priv->state)) return; - put_unaligned_be32(0, skb_put(skb, KSZ_PTP_TAG_LEN)); + if (!KSZ_SKB_CB(skb)->update_correction) + goto output_tag; + + ptp_type = KSZ_SKB_CB(skb)->ptp_type; + + ptp_hdr = ptp_parse_header(skb, ptp_type); + if (!ptp_hdr) + goto output_tag; + + correction = (s64)get_unaligned_be64(&ptp_hdr->correction); + + if (correction < 0) { + struct timespec64 ts; + + ts = ns_to_timespec64(-correction >> 16); + tstamp_raw = ((ts.tv_sec & 3) << 30) | ts.tv_nsec; + + /* Set correction field to 0 and update UDP checksum */ + ptp_header_update_correction(skb, ptp_type, ptp_hdr, 0); + } + +output_tag: + put_unaligned_be32(tstamp_raw, skb_put(skb, KSZ_PTP_TAG_LEN)); } /* Defer transmit if waiting for egress time stamp is required. */ -- cgit v1.2.3 From 31de2842399ae68c4b2887ffeaedebb7934f343e Mon Sep 17 00:00:00 2001 From: Daniele Palmas Date: Wed, 11 Jan 2023 14:05:18 +0100 Subject: ethtool: add tx aggregation parameters Add the following ethtool tx aggregation parameters: ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES Maximum size in bytes of a tx aggregated block of frames. ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES Maximum number of frames that can be aggregated into a block. ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS Time in usecs after the first packet arrival in an aggregated block for the block to be sent. Signed-off-by: Daniele Palmas Signed-off-by: David S. Miller --- Documentation/networking/ethtool-netlink.rst | 17 +++++++++++++++++ include/linux/ethtool.h | 12 +++++++++++- include/uapi/linux/ethtool_netlink.h | 3 +++ net/ethtool/coalesce.c | 22 ++++++++++++++++++++-- 4 files changed, 51 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index c59b542eb693..d345f5df248e 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -1004,6 +1004,9 @@ Kernel response contents: ``ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL`` u32 rate sampling interval ``ETHTOOL_A_COALESCE_USE_CQE_TX`` bool timer reset mode, Tx ``ETHTOOL_A_COALESCE_USE_CQE_RX`` bool timer reset mode, Rx + ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` u32 max aggr size, Tx + ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` u32 max aggr packets, Tx + ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx =========================================== ====== ======================= Attributes are only included in reply if their value is not zero or the @@ -1022,6 +1025,17 @@ each packet event resets the timer. In this mode timer is used to force the interrupt if queue goes idle, while busy queues depend on the packet limit to trigger interrupts. +Tx aggregation consists of copying frames into a contiguous buffer so that they +can be submitted as a single IO operation. ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` +describes the maximum size in bytes for the submitted buffer. +``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` describes the maximum number of frames +that can be aggregated into a single buffer. +``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` describes the amount of time in usecs, +counted since the first packet arrival in an aggregated block, after which the +block should be sent. +This feature is mainly of interest for specific USB devices which does not cope +well with frequent small-sized URBs transmissions. + COALESCE_SET ============ @@ -1055,6 +1069,9 @@ Request contents: ``ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL`` u32 rate sampling interval ``ETHTOOL_A_COALESCE_USE_CQE_TX`` bool timer reset mode, Tx ``ETHTOOL_A_COALESCE_USE_CQE_RX`` bool timer reset mode, Rx + ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` u32 max aggr size, Tx + ``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` u32 max aggr packets, Tx + ``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx =========================================== ====== ======================= Request is rejected if it attributes declared as unsupported by driver (i.e. diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index d0da303f6634..20d197693d34 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -217,6 +217,9 @@ __ethtool_get_link_ksettings(struct net_device *dev, struct kernel_ethtool_coalesce { u8 use_cqe_mode_tx; u8 use_cqe_mode_rx; + u32 tx_aggr_max_bytes; + u32 tx_aggr_max_frames; + u32 tx_aggr_time_usecs; }; /** @@ -260,7 +263,10 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32, #define ETHTOOL_COALESCE_RATE_SAMPLE_INTERVAL BIT(21) #define ETHTOOL_COALESCE_USE_CQE_RX BIT(22) #define ETHTOOL_COALESCE_USE_CQE_TX BIT(23) -#define ETHTOOL_COALESCE_ALL_PARAMS GENMASK(23, 0) +#define ETHTOOL_COALESCE_TX_AGGR_MAX_BYTES BIT(24) +#define ETHTOOL_COALESCE_TX_AGGR_MAX_FRAMES BIT(25) +#define ETHTOOL_COALESCE_TX_AGGR_TIME_USECS BIT(26) +#define ETHTOOL_COALESCE_ALL_PARAMS GENMASK(26, 0) #define ETHTOOL_COALESCE_USECS \ (ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_TX_USECS) @@ -288,6 +294,10 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32, ETHTOOL_COALESCE_RATE_SAMPLE_INTERVAL) #define ETHTOOL_COALESCE_USE_CQE \ (ETHTOOL_COALESCE_USE_CQE_RX | ETHTOOL_COALESCE_USE_CQE_TX) +#define ETHTOOL_COALESCE_TX_AGGR \ + (ETHTOOL_COALESCE_TX_AGGR_MAX_BYTES | \ + ETHTOOL_COALESCE_TX_AGGR_MAX_FRAMES | \ + ETHTOOL_COALESCE_TX_AGGR_TIME_USECS) #define ETHTOOL_STAT_NOT_SET (~0ULL) diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 75b3d6d476ff..83557cae0b87 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -406,6 +406,9 @@ enum { ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL, /* u32 */ ETHTOOL_A_COALESCE_USE_CQE_MODE_TX, /* u8 */ ETHTOOL_A_COALESCE_USE_CQE_MODE_RX, /* u8 */ + ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES, /* u32 */ + ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES, /* u32 */ + ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS, /* u32 */ /* add new constants above here */ __ETHTOOL_A_COALESCE_CNT, diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c index 487bdf345541..e405b47f7eed 100644 --- a/net/ethtool/coalesce.c +++ b/net/ethtool/coalesce.c @@ -105,7 +105,10 @@ static int coalesce_reply_size(const struct ethnl_req_info *req_base, nla_total_size(sizeof(u32)) + /* _TX_MAX_FRAMES_HIGH */ nla_total_size(sizeof(u32)) + /* _RATE_SAMPLE_INTERVAL */ nla_total_size(sizeof(u8)) + /* _USE_CQE_MODE_TX */ - nla_total_size(sizeof(u8)); /* _USE_CQE_MODE_RX */ + nla_total_size(sizeof(u8)) + /* _USE_CQE_MODE_RX */ + nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_BYTES */ + nla_total_size(sizeof(u32)) + /* _TX_AGGR_MAX_FRAMES */ + nla_total_size(sizeof(u32)); /* _TX_AGGR_TIME_USECS */ } static bool coalesce_put_u32(struct sk_buff *skb, u16 attr_type, u32 val, @@ -180,7 +183,13 @@ static int coalesce_fill_reply(struct sk_buff *skb, coalesce_put_bool(skb, ETHTOOL_A_COALESCE_USE_CQE_MODE_TX, kcoal->use_cqe_mode_tx, supported) || coalesce_put_bool(skb, ETHTOOL_A_COALESCE_USE_CQE_MODE_RX, - kcoal->use_cqe_mode_rx, supported)) + kcoal->use_cqe_mode_rx, supported) || + coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES, + kcoal->tx_aggr_max_bytes, supported) || + coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES, + kcoal->tx_aggr_max_frames, supported) || + coalesce_put_u32(skb, ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS, + kcoal->tx_aggr_time_usecs, supported)) return -EMSGSIZE; return 0; @@ -227,6 +236,9 @@ const struct nla_policy ethnl_coalesce_set_policy[] = { [ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL] = { .type = NLA_U32 }, [ETHTOOL_A_COALESCE_USE_CQE_MODE_TX] = NLA_POLICY_MAX(NLA_U8, 1), [ETHTOOL_A_COALESCE_USE_CQE_MODE_RX] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 }, + [ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 }, + [ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 }, }; int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info) @@ -321,6 +333,12 @@ int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info) tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_TX], &mod); ethnl_update_u8(&kernel_coalesce.use_cqe_mode_rx, tb[ETHTOOL_A_COALESCE_USE_CQE_MODE_RX], &mod); + ethnl_update_u32(&kernel_coalesce.tx_aggr_max_bytes, + tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES], &mod); + ethnl_update_u32(&kernel_coalesce.tx_aggr_max_frames, + tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES], &mod); + ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs, + tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod); ret = 0; if (!mod) goto out_ops; -- cgit v1.2.3 From 6e6eda44b939c0931533d6681d9f2ed41b44cde9 Mon Sep 17 00:00:00 2001 From: Yunhui Cui Date: Wed, 11 Jan 2023 14:59:30 +0800 Subject: sock: add tracepoint for send recv length Add 2 tracepoints to monitor the tcp/udp traffic of per process and per cgroup. Regarding monitoring the tcp/udp traffic of each process, there are two existing solutions, the first one is https://www.atoptool.nl/netatop.php. The second is via kprobe/kretprobe. Netatop solution is implemented by registering the hook function at the hook point provided by the netfilter framework. These hook functions may be in the soft interrupt context and cannot directly obtain the pid. Some data structures are added to bind packets and processes. For example, struct taskinfobucket, struct taskinfo ... Every time the process sends and receives packets it needs multiple hashmaps,resulting in low performance and it has the problem fo inaccurate tcp/udp traffic statistics(for example: multiple threads share sockets). We can obtain the information with kretprobe, but as we know, kprobe gets the result by trappig in an exception, which loses performance compared to tracepoint. We compared the performance of tracepoints with the above two methods, and the results are as follows: ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html without trace: Time per request: 39.660 [ms] (mean) Time per request: 0.040 [ms] (mean, across all concurrent requests) netatop: Time per request: 50.717 [ms] (mean) Time per request: 0.051 [ms] (mean, across all concurrent requests) kr: Time per request: 43.168 [ms] (mean) Time per request: 0.043 [ms] (mean, across all concurrent requests) tracepoint: Time per request: 41.004 [ms] (mean) Time per request: 0.041 [ms] (mean, across all concurrent requests It can be seen that tracepoint has better performance. Signed-off-by: Yunhui Cui Signed-off-by: Xiongchun Duan Reviewed-by: Steven Rostedt (Google) Signed-off-by: David S. Miller --- include/trace/events/sock.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ net/socket.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 74 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h index 777ee6cbe933..71492e8276da 100644 --- a/include/trace/events/sock.h +++ b/include/trace/events/sock.h @@ -263,6 +263,51 @@ TRACE_EVENT(inet_sk_error_report, __entry->error) ); +/* + * sock send/recv msg length + */ +DECLARE_EVENT_CLASS(sock_msg_length, + + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags), + + TP_STRUCT__entry( + __field(void *, sk) + __field(__u16, family) + __field(__u16, protocol) + __field(int, ret) + __field(int, flags) + ), + + TP_fast_assign( + __entry->sk = sk; + __entry->family = sk->sk_family; + __entry->protocol = sk->sk_protocol; + __entry->ret = ret; + __entry->flags = flags; + ), + + TP_printk("sk address = %p, family = %s protocol = %s, length = %d, error = %d, flags = 0x%x", + __entry->sk, show_family_name(__entry->family), + show_inet_protocol_name(__entry->protocol), + !(__entry->flags & MSG_PEEK) ? + (__entry->ret > 0 ? __entry->ret : 0) : 0, + __entry->ret < 0 ? __entry->ret : 0, + __entry->flags) +); + +DEFINE_EVENT(sock_msg_length, sock_send_length, + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags) +); + +DEFINE_EVENT(sock_msg_length, sock_recv_length, + TP_PROTO(struct sock *sk, int ret, int flags), + + TP_ARGS(sk, ret, flags) +); #endif /* _TRACE_SOCK_H */ /* This part must be outside protection */ diff --git a/net/socket.c b/net/socket.c index 888cd618a968..77626e4d9690 100644 --- a/net/socket.c +++ b/net/socket.c @@ -106,6 +106,7 @@ #include #include #include +#include #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int sysctl_net_busy_read __read_mostly; @@ -709,12 +710,22 @@ INDIRECT_CALLABLE_DECLARE(int inet_sendmsg(struct socket *, struct msghdr *, size_t)); INDIRECT_CALLABLE_DECLARE(int inet6_sendmsg(struct socket *, struct msghdr *, size_t)); + +static noinline void call_trace_sock_send_length(struct sock *sk, int ret, + int flags) +{ + trace_sock_send_length(sk, ret, 0); +} + static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg) { int ret = INDIRECT_CALL_INET(sock->ops->sendmsg, inet6_sendmsg, inet_sendmsg, sock, msg, msg_data_left(msg)); BUG_ON(ret == -EIOCBQUEUED); + + if (trace_sock_send_length_enabled()) + call_trace_sock_send_length(sock->sk, ret, 0); return ret; } @@ -989,12 +1000,21 @@ INDIRECT_CALLABLE_DECLARE(int inet_recvmsg(struct socket *, struct msghdr *, size_t, int)); INDIRECT_CALLABLE_DECLARE(int inet6_recvmsg(struct socket *, struct msghdr *, size_t, int)); + +static noinline void call_trace_sock_recv_length(struct sock *sk, int ret, int flags) +{ + trace_sock_recv_length(sk, ret, flags); +} + static inline int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, int flags) { - return INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, - inet_recvmsg, sock, msg, msg_data_left(msg), - flags); + int ret = INDIRECT_CALL_INET(sock->ops->recvmsg, inet6_recvmsg, + inet_recvmsg, sock, msg, + msg_data_left(msg), flags); + if (trace_sock_recv_length_enabled()) + call_trace_sock_recv_length(sock->sk, ret, flags); + return ret; } /** @@ -1044,6 +1064,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, { struct socket *sock; int flags; + int ret; sock = file->private_data; @@ -1051,7 +1072,11 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ flags |= more; - return kernel_sendpage(sock, page, offset, size, flags); + ret = kernel_sendpage(sock, page, offset, size, flags); + + if (trace_sock_send_length_enabled()) + call_trace_sock_send_length(sock->sk, ret, 0); + return ret; } static ssize_t sock_splice_read(struct file *file, loff_t *ppos, -- cgit v1.2.3 From c1917514107982794fd448ee2596878f396ee6a8 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Wed, 11 Jan 2023 10:42:45 -0800 Subject: caif: don't assume iov_iter type The details of the iov_iter types are appropriately abstracted, so there's no need to check for specific type fields. Just let the abstractions handle it. This is preparing for io_uring/net's io_send to utilize the more efficient ITER_UBUF. Signed-off-by: Keith Busch Reviewed-by: Jens Axboe Link: https://lore.kernel.org/r/20230111184245.3784393-1-kbusch@meta.com Signed-off-by: Jakub Kicinski --- net/caif/caif_socket.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'net') diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c index 748be7253248..1f2c1d7b90e2 100644 --- a/net/caif/caif_socket.c +++ b/net/caif/caif_socket.c @@ -533,10 +533,6 @@ static int caif_seqpkt_sendmsg(struct socket *sock, struct msghdr *msg, if (msg->msg_namelen) goto err; - ret = -EINVAL; - if (unlikely(msg->msg_iter.nr_segs == 0) || - unlikely(msg->msg_iter.iov->iov_base == NULL)) - goto err; noblock = msg->msg_flags & MSG_DONTWAIT; timeo = sock_sndtimeo(sk, noblock); -- cgit v1.2.3 From af6d10345ca76670c1b7c37799f0d5576ccef277 Mon Sep 17 00:00:00 2001 From: Jon Maxwell Date: Thu, 12 Jan 2023 12:25:32 +1100 Subject: ipv6: remove max_size check inline with ipv4 In ip6_dst_gc() replace: if (entries > gc_thresh) With: if (entries > ops->gc_thresh) Sending Ipv6 packets in a loop via a raw socket triggers an issue where a route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly consumes the Ipv6 max_size threshold which defaults to 4096 resulting in these warnings: [1] 99.187805] dst_alloc: 7728 callbacks suppressed [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. . . [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. When this happens the packet is dropped and sendto() gets a network is unreachable error: remaining pkt 200557 errno 101 remaining pkt 196462 errno 101 . . remaining pkt 126821 errno 101 Implement David Aherns suggestion to remove max_size check seeing that Ipv6 has a GC to manage memory usage. Ipv4 already does not check max_size. Here are some memory comparisons for Ipv4 vs Ipv6 with the patch: Test by running 5 instances of a program that sends UDP packets to a raw socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar program. Ipv4: Before test: MemFree: 29427108 kB Slab: 237612 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 2881 3990 192 42 2 : tunables 0 0 0 During test: MemFree: 29417608 kB Slab: 247712 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 44394 44394 192 42 2 : tunables 0 0 0 After test: MemFree: 29422308 kB Slab: 238104 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 Ipv6 with patch: Errno 101 errors are not observed anymore with the patch. Before test: MemFree: 29422308 kB Slab: 238104 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 During Test: MemFree: 29431516 kB Slab: 240940 kB ip6_dst_cache 11980 12064 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 After Test: MemFree: 29441816 kB Slab: 238132 kB ip6_dst_cache 1902 2432 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 Tested-by: Andrea Mayer Signed-off-by: Jon Maxwell Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com Signed-off-by: Jakub Kicinski --- include/net/dst_ops.h | 2 +- net/core/dst.c | 8 ++------ net/ipv6/route.c | 13 +++++-------- 3 files changed, 8 insertions(+), 15 deletions(-) (limited to 'net') diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h index 88ff7bb2bb9b..632086b2f644 100644 --- a/include/net/dst_ops.h +++ b/include/net/dst_ops.h @@ -16,7 +16,7 @@ struct dst_ops { unsigned short family; unsigned int gc_thresh; - int (*gc)(struct dst_ops *ops); + void (*gc)(struct dst_ops *ops); struct dst_entry * (*check)(struct dst_entry *, __u32 cookie); unsigned int (*default_advmss)(const struct dst_entry *); unsigned int (*mtu)(const struct dst_entry *); diff --git a/net/core/dst.c b/net/core/dst.c index 6d2dd03dafa8..31c08a3386d3 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -82,12 +82,8 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev, if (ops->gc && !(flags & DST_NOCOUNT) && - dst_entries_get_fast(ops) > ops->gc_thresh) { - if (ops->gc(ops)) { - pr_notice_ratelimited("Route cache is full: consider increasing sysctl net.ipv6.route.max_size.\n"); - return NULL; - } - } + dst_entries_get_fast(ops) > ops->gc_thresh) + ops->gc(ops); dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC); if (!dst) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e74e0361fd92..b643dda68d31 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -91,7 +91,7 @@ static struct dst_entry *ip6_negative_advice(struct dst_entry *); static void ip6_dst_destroy(struct dst_entry *); static void ip6_dst_ifdown(struct dst_entry *, struct net_device *dev, int how); -static int ip6_dst_gc(struct dst_ops *ops); +static void ip6_dst_gc(struct dst_ops *ops); static int ip6_pkt_discard(struct sk_buff *skb); static int ip6_pkt_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb); @@ -3284,11 +3284,10 @@ out: return dst; } -static int ip6_dst_gc(struct dst_ops *ops) +static void ip6_dst_gc(struct dst_ops *ops) { struct net *net = container_of(ops, struct net, ipv6.ip6_dst_ops); int rt_min_interval = net->ipv6.sysctl.ip6_rt_gc_min_interval; - int rt_max_size = net->ipv6.sysctl.ip6_rt_max_size; int rt_elasticity = net->ipv6.sysctl.ip6_rt_gc_elasticity; int rt_gc_timeout = net->ipv6.sysctl.ip6_rt_gc_timeout; unsigned long rt_last_gc = net->ipv6.ip6_rt_last_gc; @@ -3296,11 +3295,10 @@ static int ip6_dst_gc(struct dst_ops *ops) int entries; entries = dst_entries_get_fast(ops); - if (entries > rt_max_size) + if (entries > ops->gc_thresh) entries = dst_entries_get_slow(ops); - if (time_after(rt_last_gc + rt_min_interval, jiffies) && - entries <= rt_max_size) + if (time_after(rt_last_gc + rt_min_interval, jiffies)) goto out; fib6_run_gc(atomic_inc_return(&net->ipv6.ip6_rt_gc_expire), net, true); @@ -3310,7 +3308,6 @@ static int ip6_dst_gc(struct dst_ops *ops) out: val = atomic_read(&net->ipv6.ip6_rt_gc_expire); atomic_set(&net->ipv6.ip6_rt_gc_expire, val - (val >> rt_elasticity)); - return entries > rt_max_size; } static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, @@ -6512,7 +6509,7 @@ static int __net_init ip6_route_net_init(struct net *net) #endif net->ipv6.sysctl.flush_delay = 0; - net->ipv6.sysctl.ip6_rt_max_size = 4096; + net->ipv6.sysctl.ip6_rt_max_size = INT_MAX; net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2; net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ; net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ; -- cgit v1.2.3 From 28dbf774bc879c1841d58cb711aaea6198955b95 Mon Sep 17 00:00:00 2001 From: Piergiorgio Beruto Date: Fri, 13 Jan 2023 14:26:35 +0100 Subject: plca.c: fix obvious mistake in checking retval Revert a wrong fix that was done during the review process. The intention was to substitute "if(ret < 0)" with "if(ret)". Unfortunately, the intended fix did not meet the code. Besides, after additional review, it was decided that "if(ret < 0)" was actually the right thing to do. Fixes: 8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS") Signed-off-by: Piergiorgio Beruto Reviewed-by: Andrew Lunn Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/f2277af8951a51cfee2fb905af8d7a812b7beaf4.1673616357.git.piergiorgio.beruto@gmail.com Signed-off-by: Jakub Kicinski --- net/ethtool/plca.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/ethtool/plca.c b/net/ethtool/plca.c index d9bb13ffc654..be7404dc9ef2 100644 --- a/net/ethtool/plca.c +++ b/net/ethtool/plca.c @@ -61,7 +61,7 @@ static int plca_get_cfg_prepare_data(const struct ethnl_req_info *req_base, } ret = ethnl_ops_begin(dev); - if (!ret) + if (ret < 0) goto out; memset(&data->plca_cfg, 0xff, @@ -151,7 +151,7 @@ int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info) tb[ETHTOOL_A_PLCA_HEADER], genl_info_net(info), info->extack, true); - if (!ret) + if (ret < 0) return ret; dev = req_info.dev; @@ -171,7 +171,7 @@ int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info) } ret = ethnl_ops_begin(dev); - if (!ret) + if (ret < 0) goto out_rtnl; memset(&plca_cfg, 0xff, sizeof(plca_cfg)); @@ -189,7 +189,7 @@ int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info) goto out_ops; ret = ops->set_plca_cfg(dev->phydev, &plca_cfg, info->extack); - if (!ret) + if (ret < 0) goto out_ops; ethtool_notify(dev, ETHTOOL_MSG_PLCA_NTF, NULL); @@ -233,7 +233,7 @@ static int plca_get_status_prepare_data(const struct ethnl_req_info *req_base, } ret = ethnl_ops_begin(dev); - if (!ret) + if (ret < 0) goto out; memset(&data->plca_st, 0xff, -- cgit v1.2.3 From d219df60a70ed0739aa5dd34b477763311fc5a7b Mon Sep 17 00:00:00 2001 From: Ziyang Xuan Date: Fri, 13 Jan 2023 17:24:51 +0800 Subject: bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room() Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: Willem de Bruijn Signed-off-by: Ziyang Xuan Reviewed-by: Willem de Bruijn Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau --- include/uapi/linux/bpf.h | 7 +++++++ net/core/filter.c | 31 ++++++++++++++++++++++++++++++- tools/include/uapi/linux/bpf.h | 7 +++++++ 3 files changed, 44 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index bc1a3d232ae4..adae5b168f9d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2647,6 +2647,11 @@ union bpf_attr { * Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the * L2 type as Ethernet. * + * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, + * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: + * Indicate the new IP header version after decapsulating the outer + * IP header. Used when the inner and outer IP versions are different. + * * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be @@ -5807,6 +5812,8 @@ enum { BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4), BPF_F_ADJ_ROOM_NO_CSUM_RESET = (1ULL << 5), BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), + BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), + BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index d9befa6ba04e..b4547a2c02f4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3381,13 +3381,17 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) #define BPF_F_ADJ_ROOM_ENCAP_L3_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | \ BPF_F_ADJ_ROOM_ENCAP_L3_IPV6) +#define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \ + BPF_F_ADJ_ROOM_DECAP_L3_IPV6) + #define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \ BPF_F_ADJ_ROOM_ENCAP_L2( \ - BPF_ADJ_ROOM_ENCAP_L2_MASK)) + BPF_ADJ_ROOM_ENCAP_L2_MASK) | \ + BPF_F_ADJ_ROOM_DECAP_L3_MASK) static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, u64 flags) @@ -3501,6 +3505,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, int ret; if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | + BPF_F_ADJ_ROOM_DECAP_L3_MASK | BPF_F_ADJ_ROOM_NO_CSUM_RESET))) return -EINVAL; @@ -3519,6 +3524,14 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, if (unlikely(ret < 0)) return ret; + /* Match skb->protocol to new outer l3 protocol */ + if (skb->protocol == htons(ETH_P_IP) && + flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6) + skb->protocol = htons(ETH_P_IPV6); + else if (skb->protocol == htons(ETH_P_IPV6) && + flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4) + skb->protocol = htons(ETH_P_IP); + if (skb_is_gso(skb)) { struct skb_shared_info *shinfo = skb_shinfo(skb); @@ -3608,6 +3621,22 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, return -ENOTSUPP; } + if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { + if (!shrink) + return -EINVAL; + + switch (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { + case BPF_F_ADJ_ROOM_DECAP_L3_IPV4: + len_min = sizeof(struct iphdr); + break; + case BPF_F_ADJ_ROOM_DECAP_L3_IPV6: + len_min = sizeof(struct ipv6hdr); + break; + default: + return -EINVAL; + } + } + len_cur = skb->len - skb_network_offset(skb); if ((shrink && (len_diff_abs >= len_cur || len_cur - len_diff_abs < len_min)) || diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index bc1a3d232ae4..142b81bcbb2e 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2647,6 +2647,11 @@ union bpf_attr { * Use with BPF_F_ADJ_ROOM_ENCAP_L2 flag to further specify the * L2 type as Ethernet. * + * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, + * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: + * Indicate the new IP header version after decapsulating the outer + * IP header. Used when the inner and outer IP versions are different. + * * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be @@ -5807,6 +5812,8 @@ enum { BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4), BPF_F_ADJ_ROOM_NO_CSUM_RESET = (1ULL << 5), BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), + BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), + BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), }; enum { -- cgit v1.2.3 From b27401a30ee466c4476b035487be0580767ba1fb Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Sat, 14 Jan 2023 12:35:02 +0300 Subject: unix: Improve locking scheme in unix_show_fdinfo() After switching to TCP_ESTABLISHED or TCP_LISTEN sk_state, alive SOCK_STREAM and SOCK_SEQPACKET sockets can't change it anymore (since commit 3ff8bff704f4 "unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()"). Thus, we do not need to take lock here. Signed-off-by: Kirill Tkhai Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/unix/af_unix.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f0c2293f1d3b..009616fa0256 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -807,23 +807,23 @@ static int unix_count_nr_fds(struct sock *sk) static void unix_show_fdinfo(struct seq_file *m, struct socket *sock) { struct sock *sk = sock->sk; + unsigned char s_state; struct unix_sock *u; - int nr_fds; + int nr_fds = 0; if (sk) { + s_state = READ_ONCE(sk->sk_state); u = unix_sk(sk); - if (sock->type == SOCK_DGRAM) { - nr_fds = atomic_read(&u->scm_stat.nr_fds); - goto out_print; - } - unix_state_lock(sk); - if (sk->sk_state != TCP_LISTEN) + /* SOCK_STREAM and SOCK_SEQPACKET sockets never change their + * sk_state after switching to TCP_ESTABLISHED or TCP_LISTEN. + * SOCK_DGRAM is ordinary. So, no lock is needed. + */ + if (sock->type == SOCK_DGRAM || s_state == TCP_ESTABLISHED) nr_fds = atomic_read(&u->scm_stat.nr_fds); - else + else if (s_state == TCP_LISTEN) nr_fds = unix_count_nr_fds(sk); - unix_state_unlock(sk); -out_print: + seq_printf(m, "scm_fds: %u\n", nr_fds); } } -- cgit v1.2.3 From 71dc9ec9ac7d3eee785cdc986c3daeb821381e20 Mon Sep 17 00:00:00 2001 From: Bobby Eshleman Date: Fri, 13 Jan 2023 22:21:37 +0000 Subject: virtio/vsock: replace virtio_vsock_pkt with sk_buff This commit changes virtio/vsock to use sk_buff instead of virtio_vsock_pkt. Beyond better conforming to other net code, using sk_buff allows vsock to use sk_buff-dependent features in the future (such as sockmap) and improves throughput. This patch introduces the following performance changes: Tool: Uperf Env: Phys Host + L1 Guest Payload: 64k Threads: 16 Test Runs: 10 Type: SOCK_STREAM Before: commit b7bfaa761d760 ("Linux 6.2-rc3") Before ------ g2h: 16.77Gb/s h2g: 10.56Gb/s After ----- g2h: 21.04Gb/s h2g: 10.76Gb/s Signed-off-by: Bobby Eshleman Reviewed-by: Stefano Garzarella Signed-off-by: David S. Miller --- drivers/vhost/vsock.c | 214 +++++++--------- include/linux/virtio_vsock.h | 129 ++++++++-- net/vmw_vsock/virtio_transport.c | 149 ++++------- net/vmw_vsock/virtio_transport_common.c | 422 ++++++++++++++++++-------------- net/vmw_vsock/vsock_loopback.c | 51 ++-- 5 files changed, 498 insertions(+), 467 deletions(-) (limited to 'net') diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index a2b374372363..1f3b89c885cc 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -51,8 +51,7 @@ struct vhost_vsock { struct hlist_node hash; struct vhost_work send_pkt_work; - spinlock_t send_pkt_list_lock; - struct list_head send_pkt_list; /* host->guest pending packets */ + struct sk_buff_head send_pkt_queue; /* host->guest pending packets */ atomic_t queued_replies; @@ -108,40 +107,31 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, vhost_disable_notify(&vsock->dev, vq); do { - struct virtio_vsock_pkt *pkt; + struct virtio_vsock_hdr *hdr; + size_t iov_len, payload_len; struct iov_iter iov_iter; + u32 flags_to_restore = 0; + struct sk_buff *skb; unsigned out, in; size_t nbytes; - size_t iov_len, payload_len; int head; - u32 flags_to_restore = 0; - spin_lock_bh(&vsock->send_pkt_list_lock); - if (list_empty(&vsock->send_pkt_list)) { - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb = virtio_vsock_skb_dequeue(&vsock->send_pkt_queue); + + if (!skb) { vhost_enable_notify(&vsock->dev, vq); break; } - pkt = list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, NULL); if (head < 0) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); break; } if (head == vq->num) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); /* We cannot finish yet if more buffers snuck in while * re-enabling notify. */ @@ -153,26 +143,27 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, } if (out) { - virtio_transport_free_pkt(pkt); + kfree_skb(skb); vq_err(vq, "Expected 0 output buffers, got %u\n", out); break; } iov_len = iov_length(&vq->iov[out], in); - if (iov_len < sizeof(pkt->hdr)) { - virtio_transport_free_pkt(pkt); + if (iov_len < sizeof(*hdr)) { + kfree_skb(skb); vq_err(vq, "Buffer len [%zu] too small\n", iov_len); break; } iov_iter_init(&iov_iter, ITER_DEST, &vq->iov[out], in, iov_len); - payload_len = pkt->len - pkt->off; + payload_len = skb->len; + hdr = virtio_vsock_hdr(skb); /* If the packet is greater than the space available in the * buffer, we split it using multiple buffers. */ - if (payload_len > iov_len - sizeof(pkt->hdr)) { - payload_len = iov_len - sizeof(pkt->hdr); + if (payload_len > iov_len - sizeof(*hdr)) { + payload_len = iov_len - sizeof(*hdr); /* As we are copying pieces of large packet's buffer to * small rx buffers, headers of packets in rx queue are @@ -185,31 +176,30 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, * bits set. After initialized header will be copied to * rx buffer, these required bits will be restored. */ - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { - pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) { + hdr->flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); flags_to_restore |= VIRTIO_VSOCK_SEQ_EOM; - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) { - pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) { + hdr->flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); flags_to_restore |= VIRTIO_VSOCK_SEQ_EOR; } } } /* Set the correct length in the header */ - pkt->hdr.len = cpu_to_le32(payload_len); + hdr->len = cpu_to_le32(payload_len); - nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); - if (nbytes != sizeof(pkt->hdr)) { - virtio_transport_free_pkt(pkt); + nbytes = copy_to_iter(hdr, sizeof(*hdr), &iov_iter); + if (nbytes != sizeof(*hdr)) { + kfree_skb(skb); vq_err(vq, "Faulted on copying pkt hdr\n"); break; } - nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len, - &iov_iter); + nbytes = copy_to_iter(skb->data, payload_len, &iov_iter); if (nbytes != payload_len) { - virtio_transport_free_pkt(pkt); + kfree_skb(skb); vq_err(vq, "Faulted on copying pkt buf\n"); break; } @@ -217,31 +207,28 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, /* Deliver to monitoring devices all packets that we * will transmit. */ - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); - vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len); + vhost_add_used(vq, head, sizeof(*hdr) + payload_len); added = true; - pkt->off += payload_len; + skb_pull(skb, payload_len); total_len += payload_len; /* If we didn't send all the payload we can requeue the packet * to send it with the next available buffer. */ - if (pkt->off < pkt->len) { - pkt->hdr.flags |= cpu_to_le32(flags_to_restore); + if (skb->len > 0) { + hdr->flags |= cpu_to_le32(flags_to_restore); - /* We are queueing the same virtio_vsock_pkt to handle + /* We are queueing the same skb to handle * the remaining bytes, and we want to deliver it * to monitoring devices in the next iteration. */ - pkt->tap_delivered = false; - - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_clear_tap_delivered(skb); + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); } else { - if (pkt->reply) { + if (virtio_vsock_skb_reply(skb)) { int val; val = atomic_dec_return(&vsock->queued_replies); @@ -253,7 +240,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, restart_tx = true; } - virtio_transport_free_pkt(pkt); + consume_skb(skb); } } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); if (added) @@ -278,28 +265,26 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work) } static int -vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt) +vhost_transport_send_pkt(struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vhost_vsock *vsock; - int len = pkt->len; + int len = skb->len; rcu_read_lock(); /* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid)); + vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid)); if (!vsock) { rcu_read_unlock(); - virtio_transport_free_pkt(pkt); + kfree_skb(skb); return -ENODEV; } - if (pkt->reply) + if (virtio_vsock_skb_reply(skb)) atomic_inc(&vsock->queued_replies); - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add_tail(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); vhost_work_queue(&vsock->dev, &vsock->send_pkt_work); rcu_read_unlock(); @@ -310,10 +295,8 @@ static int vhost_transport_cancel_pkt(struct vsock_sock *vsk) { struct vhost_vsock *vsock; - struct virtio_vsock_pkt *pkt, *n; int cnt = 0; int ret = -ENODEV; - LIST_HEAD(freeme); rcu_read_lock(); @@ -322,20 +305,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) if (!vsock) goto out; - spin_lock_bh(&vsock->send_pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) { - if (pkt->vsk != vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - if (pkt->reply) - cnt++; - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue); if (cnt) { struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX]; @@ -352,12 +322,14 @@ out: return ret; } -static struct virtio_vsock_pkt * -vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, +static struct sk_buff * +vhost_vsock_alloc_skb(struct vhost_virtqueue *vq, unsigned int out, unsigned int in) { - struct virtio_vsock_pkt *pkt; + struct virtio_vsock_hdr *hdr; struct iov_iter iov_iter; + struct sk_buff *skb; + size_t payload_len; size_t nbytes; size_t len; @@ -366,50 +338,48 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } - pkt = kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + len = iov_length(vq->iov, out); + + /* len contains both payload and hdr */ + skb = virtio_vsock_alloc_skb(len, GFP_KERNEL); + if (!skb) return NULL; - len = iov_length(vq->iov, out); iov_iter_init(&iov_iter, ITER_SOURCE, vq->iov, out, len); - nbytes = copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); - if (nbytes != sizeof(pkt->hdr)) { + hdr = virtio_vsock_hdr(skb); + nbytes = copy_from_iter(hdr, sizeof(*hdr), &iov_iter); + if (nbytes != sizeof(*hdr)) { vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n", - sizeof(pkt->hdr), nbytes); - kfree(pkt); + sizeof(*hdr), nbytes); + kfree_skb(skb); return NULL; } - pkt->len = le32_to_cpu(pkt->hdr.len); + payload_len = le32_to_cpu(hdr->len); /* No payload */ - if (!pkt->len) - return pkt; + if (!payload_len) + return skb; - /* The pkt is too big */ - if (pkt->len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) { - kfree(pkt); + /* The pkt is too big or the length in the header is invalid */ + if (payload_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || + payload_len + sizeof(*hdr) > len) { + kfree_skb(skb); return NULL; } - pkt->buf = kvmalloc(pkt->len, GFP_KERNEL); - if (!pkt->buf) { - kfree(pkt); - return NULL; - } + virtio_vsock_skb_rx_put(skb); - pkt->buf_len = pkt->len; - - nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); - if (nbytes != pkt->len) { - vq_err(vq, "Expected %u byte payload, got %zu bytes\n", - pkt->len, nbytes); - virtio_transport_free_pkt(pkt); + nbytes = copy_from_iter(skb->data, payload_len, &iov_iter); + if (nbytes != payload_len) { + vq_err(vq, "Expected %zu byte payload, got %zu bytes\n", + payload_len, nbytes); + kfree_skb(skb); return NULL; } - return pkt; + return skb; } /* Is there space left for replies to rx packets? */ @@ -496,9 +466,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) poll.work); struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock, dev); - struct virtio_vsock_pkt *pkt; int head, pkts = 0, total_len = 0; unsigned int out, in; + struct sk_buff *skb; bool added = false; mutex_lock(&vq->mutex); @@ -511,6 +481,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) vhost_disable_notify(&vsock->dev, vq); do { + struct virtio_vsock_hdr *hdr; + if (!vhost_vsock_more_replies(vsock)) { /* Stop tx until the device processes already * pending replies. Leave tx virtqueue @@ -532,24 +504,26 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) break; } - pkt = vhost_vsock_alloc_pkt(vq, out, in); - if (!pkt) { + skb = vhost_vsock_alloc_skb(vq, out, in); + if (!skb) { vq_err(vq, "Faulted on pkt\n"); continue; } - total_len += sizeof(pkt->hdr) + pkt->len; + total_len += sizeof(*hdr) + skb->len; /* Deliver to monitoring devices all received packets */ - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); + + hdr = virtio_vsock_hdr(skb); /* Only accept correctly addressed packets */ - if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid && - le64_to_cpu(pkt->hdr.dst_cid) == + if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid && + le64_to_cpu(hdr->dst_cid) == vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, pkt); + virtio_transport_recv_pkt(&vhost_transport, skb); else - virtio_transport_free_pkt(pkt); + kfree_skb(skb); vhost_add_used(vq, head, 0); added = true; @@ -693,8 +667,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) VHOST_VSOCK_WEIGHT, true, NULL); file->private_data = vsock; - spin_lock_init(&vsock->send_pkt_list_lock); - INIT_LIST_HEAD(&vsock->send_pkt_list); + skb_queue_head_init(&vsock->send_pkt_queue); vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work); return 0; @@ -760,16 +733,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file) vhost_vsock_flush(vsock); vhost_dev_stop(&vsock->dev); - spin_lock_bh(&vsock->send_pkt_list_lock); - while (!list_empty(&vsock->send_pkt_list)) { - struct virtio_vsock_pkt *pkt; - - pkt = list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - virtio_transport_free_pkt(pkt); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue); vhost_dev_cleanup(&vsock->dev); kfree(vsock->dev.vqs); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 35d7eedb5e8e..3f9c16611306 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -7,6 +7,109 @@ #include #include +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr)) + +struct virtio_vsock_skb_cb { + bool reply; + bool tap_delivered; +}; + +#define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb)) + +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *skb) +{ + return (struct virtio_vsock_hdr *)skb->head; +} + +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb) +{ + return VIRTIO_VSOCK_SKB_CB(skb)->reply; +} + +static inline void virtio_vsock_skb_set_reply(struct sk_buff *skb) +{ + VIRTIO_VSOCK_SKB_CB(skb)->reply = true; +} + +static inline bool virtio_vsock_skb_tap_delivered(struct sk_buff *skb) +{ + return VIRTIO_VSOCK_SKB_CB(skb)->tap_delivered; +} + +static inline void virtio_vsock_skb_set_tap_delivered(struct sk_buff *skb) +{ + VIRTIO_VSOCK_SKB_CB(skb)->tap_delivered = true; +} + +static inline void virtio_vsock_skb_clear_tap_delivered(struct sk_buff *skb) +{ + VIRTIO_VSOCK_SKB_CB(skb)->tap_delivered = false; +} + +static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb) +{ + u32 len; + + len = le32_to_cpu(virtio_vsock_hdr(skb)->len); + + if (len > 0) + skb_put(skb, len); +} + +static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask) +{ + struct sk_buff *skb; + + if (size < VIRTIO_VSOCK_SKB_HEADROOM) + return NULL; + + skb = alloc_skb(size, mask); + if (!skb) + return NULL; + + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM); + return skb; +} + +static inline void +virtio_vsock_skb_queue_head(struct sk_buff_head *list, struct sk_buff *skb) +{ + spin_lock_bh(&list->lock); + __skb_queue_head(list, skb); + spin_unlock_bh(&list->lock); +} + +static inline void +virtio_vsock_skb_queue_tail(struct sk_buff_head *list, struct sk_buff *skb) +{ + spin_lock_bh(&list->lock); + __skb_queue_tail(list, skb); + spin_unlock_bh(&list->lock); +} + +static inline struct sk_buff *virtio_vsock_skb_dequeue(struct sk_buff_head *list) +{ + struct sk_buff *skb; + + spin_lock_bh(&list->lock); + skb = __skb_dequeue(list); + spin_unlock_bh(&list->lock); + + return skb; +} + +static inline void virtio_vsock_skb_queue_purge(struct sk_buff_head *list) +{ + spin_lock_bh(&list->lock); + __skb_queue_purge(list); + spin_unlock_bh(&list->lock); +} + +static inline size_t virtio_vsock_skb_len(struct sk_buff *skb) +{ + return (size_t)(skb_end_pointer(skb) - skb->head); +} + #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4) #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64) @@ -35,23 +138,10 @@ struct virtio_vsock_sock { u32 last_fwd_cnt; u32 rx_bytes; u32 buf_alloc; - struct list_head rx_queue; + struct sk_buff_head rx_queue; u32 msg_count; }; -struct virtio_vsock_pkt { - struct virtio_vsock_hdr hdr; - struct list_head list; - /* socket refcnt not held, only use for cancellation */ - struct vsock_sock *vsk; - void *buf; - u32 buf_len; - u32 len; - u32 off; - bool reply; - bool tap_delivered; -}; - struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; @@ -68,7 +158,7 @@ struct virtio_transport { struct vsock_transport transport; /* Takes ownership of the packet */ - int (*send_pkt)(struct virtio_vsock_pkt *pkt); + int (*send_pkt)(struct sk_buff *skb); }; ssize_t @@ -149,11 +239,10 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk); void virtio_transport_recv_pkt(struct virtio_transport *t, - struct virtio_vsock_pkt *pkt); -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt); -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt); + struct sk_buff *skb); +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt); - +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb); +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list); #endif /* _LINUX_VIRTIO_VSOCK_H */ diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index ad64f403536a..28b5a8e8e094 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -42,8 +42,7 @@ struct virtio_vsock { bool tx_run; struct work_struct send_pkt_work; - spinlock_t send_pkt_list_lock; - struct list_head send_pkt_list; + struct sk_buff_head send_pkt_queue; atomic_t queued_replies; @@ -101,41 +100,31 @@ virtio_transport_send_pkt_work(struct work_struct *work) vq = vsock->vqs[VSOCK_VQ_TX]; for (;;) { - struct virtio_vsock_pkt *pkt; struct scatterlist hdr, buf, *sgs[2]; int ret, in_sg = 0, out_sg = 0; + struct sk_buff *skb; bool reply; - spin_lock_bh(&vsock->send_pkt_list_lock); - if (list_empty(&vsock->send_pkt_list)) { - spin_unlock_bh(&vsock->send_pkt_list_lock); + skb = virtio_vsock_skb_dequeue(&vsock->send_pkt_queue); + if (!skb) break; - } - - pkt = list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_deliver_tap_pkt(skb); + reply = virtio_vsock_skb_reply(skb); - reply = pkt->reply; - - sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); + sg_init_one(&hdr, virtio_vsock_hdr(skb), sizeof(*virtio_vsock_hdr(skb))); sgs[out_sg++] = &hdr; - if (pkt->buf) { - sg_init_one(&buf, pkt->buf, pkt->len); + if (skb->len > 0) { + sg_init_one(&buf, skb->data, skb->len); sgs[out_sg++] = &buf; } - ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL); + ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, GFP_KERNEL); /* Usually this means that there is no more space available in * the vq */ if (ret < 0) { - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); break; } @@ -164,32 +153,32 @@ out: } static int -virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt) +virtio_transport_send_pkt(struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; - int len = pkt->len; + int len = skb->len; + + hdr = virtio_vsock_hdr(skb); rcu_read_lock(); vsock = rcu_dereference(the_virtio_vsock); if (!vsock) { - virtio_transport_free_pkt(pkt); + kfree_skb(skb); len = -ENODEV; goto out_rcu; } - if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) { - virtio_transport_free_pkt(pkt); + if (le64_to_cpu(hdr->dst_cid) == vsock->guest_cid) { + kfree_skb(skb); len = -ENODEV; goto out_rcu; } - if (pkt->reply) + if (virtio_vsock_skb_reply(skb)) atomic_inc(&vsock->queued_replies); - spin_lock_bh(&vsock->send_pkt_list_lock); - list_add_tail(&pkt->list, &vsock->send_pkt_list); - spin_unlock_bh(&vsock->send_pkt_list_lock); - + virtio_vsock_skb_queue_tail(&vsock->send_pkt_queue, skb); queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work); out_rcu: @@ -201,9 +190,7 @@ static int virtio_transport_cancel_pkt(struct vsock_sock *vsk) { struct virtio_vsock *vsock; - struct virtio_vsock_pkt *pkt, *n; int cnt = 0, ret; - LIST_HEAD(freeme); rcu_read_lock(); vsock = rcu_dereference(the_virtio_vsock); @@ -212,20 +199,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk) goto out_rcu; } - spin_lock_bh(&vsock->send_pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) { - if (pkt->vsk != vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - if (pkt->reply) - cnt++; - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue); if (cnt) { struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX]; @@ -246,38 +220,28 @@ out_rcu: static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) { - int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE; - struct virtio_vsock_pkt *pkt; - struct scatterlist hdr, buf, *sgs[2]; + int total_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM; + struct scatterlist pkt, *p; struct virtqueue *vq; + struct sk_buff *skb; int ret; vq = vsock->vqs[VSOCK_VQ_RX]; do { - pkt = kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + skb = virtio_vsock_alloc_skb(total_len, GFP_KERNEL); + if (!skb) break; - pkt->buf = kmalloc(buf_len, GFP_KERNEL); - if (!pkt->buf) { - virtio_transport_free_pkt(pkt); + memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM); + sg_init_one(&pkt, virtio_vsock_hdr(skb), total_len); + p = &pkt; + ret = virtqueue_add_sgs(vq, &p, 0, 1, skb, GFP_KERNEL); + if (ret < 0) { + kfree_skb(skb); break; } - pkt->buf_len = buf_len; - pkt->len = buf_len; - - sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); - sgs[0] = &hdr; - - sg_init_one(&buf, pkt->buf, buf_len); - sgs[1] = &buf; - ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL); - if (ret) { - virtio_transport_free_pkt(pkt); - break; - } vsock->rx_buf_nr++; } while (vq->num_free); if (vsock->rx_buf_nr > vsock->rx_buf_max_nr) @@ -299,12 +263,12 @@ static void virtio_transport_tx_work(struct work_struct *work) goto out; do { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; unsigned int len; virtqueue_disable_cb(vq); - while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) { - virtio_transport_free_pkt(pkt); + while ((skb = virtqueue_get_buf(vq, &len)) != NULL) { + consume_skb(skb); added = true; } } while (!virtqueue_enable_cb(vq)); @@ -529,7 +493,7 @@ static void virtio_transport_rx_work(struct work_struct *work) do { virtqueue_disable_cb(vq); for (;;) { - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; unsigned int len; if (!virtio_transport_more_replies(vsock)) { @@ -540,23 +504,22 @@ static void virtio_transport_rx_work(struct work_struct *work) goto out; } - pkt = virtqueue_get_buf(vq, &len); - if (!pkt) { + skb = virtqueue_get_buf(vq, &len); + if (!skb) break; - } vsock->rx_buf_nr--; /* Drop short/long packets */ - if (unlikely(len < sizeof(pkt->hdr) || - len > sizeof(pkt->hdr) + pkt->len)) { - virtio_transport_free_pkt(pkt); + if (unlikely(len < sizeof(struct virtio_vsock_hdr) || + len > virtio_vsock_skb_len(skb))) { + kfree_skb(skb); continue; } - pkt->len = len - sizeof(pkt->hdr); - virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&virtio_transport, pkt); + virtio_vsock_skb_rx_put(skb); + virtio_transport_deliver_tap_pkt(skb); + virtio_transport_recv_pkt(&virtio_transport, skb); } } while (!virtqueue_enable_cb(vq)); @@ -610,7 +573,7 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock) static void virtio_vsock_vqs_del(struct virtio_vsock *vsock) { struct virtio_device *vdev = vsock->vdev; - struct virtio_vsock_pkt *pkt; + struct sk_buff *skb; /* Reset all connected sockets when the VQs disappear */ vsock_for_each_connected_socket(&virtio_transport.transport, @@ -637,23 +600,16 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *vsock) virtio_reset_device(vdev); mutex_lock(&vsock->rx_lock); - while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) - virtio_transport_free_pkt(pkt); + while ((skb = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) + kfree_skb(skb); mutex_unlock(&vsock->rx_lock); mutex_lock(&vsock->tx_lock); - while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX]))) - virtio_transport_free_pkt(pkt); + while ((skb = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX]))) + kfree_skb(skb); mutex_unlock(&vsock->tx_lock); - spin_lock_bh(&vsock->send_pkt_list_lock); - while (!list_empty(&vsock->send_pkt_list)) { - pkt = list_first_entry(&vsock->send_pkt_list, - struct virtio_vsock_pkt, list); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } - spin_unlock_bh(&vsock->send_pkt_list_lock); + virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue); /* Delete virtqueues and flush outstanding callbacks if any */ vdev->config->del_vqs(vdev); @@ -690,8 +646,7 @@ static int virtio_vsock_probe(struct virtio_device *vdev) mutex_init(&vsock->tx_lock); mutex_init(&vsock->rx_lock); mutex_init(&vsock->event_lock); - spin_lock_init(&vsock->send_pkt_list_lock); - INIT_LIST_HEAD(&vsock->send_pkt_list); + skb_queue_head_init(&vsock->send_pkt_queue); INIT_WORK(&vsock->rx_work, virtio_transport_rx_work); INIT_WORK(&vsock->tx_work, virtio_transport_tx_work); INIT_WORK(&vsock->event_work, virtio_transport_event_work); diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index a9980e9b9304..a1581c77cf84 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -37,53 +37,56 @@ virtio_transport_get_ops(struct vsock_sock *vsk) return container_of(t, struct virtio_transport, transport); } -static struct virtio_vsock_pkt * -virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, +/* Returns a new packet on success, otherwise returns NULL. + * + * If NULL is returned, errp is set to a negative errno. + */ +static struct sk_buff * +virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, size_t len, u32 src_cid, u32 src_port, u32 dst_cid, u32 dst_port) { - struct virtio_vsock_pkt *pkt; + const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len; + struct virtio_vsock_hdr *hdr; + struct sk_buff *skb; + void *payload; int err; - pkt = kzalloc(sizeof(*pkt), GFP_KERNEL); - if (!pkt) + skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); + if (!skb) return NULL; - pkt->hdr.type = cpu_to_le16(info->type); - pkt->hdr.op = cpu_to_le16(info->op); - pkt->hdr.src_cid = cpu_to_le64(src_cid); - pkt->hdr.dst_cid = cpu_to_le64(dst_cid); - pkt->hdr.src_port = cpu_to_le32(src_port); - pkt->hdr.dst_port = cpu_to_le32(dst_port); - pkt->hdr.flags = cpu_to_le32(info->flags); - pkt->len = len; - pkt->hdr.len = cpu_to_le32(len); - pkt->reply = info->reply; - pkt->vsk = info->vsk; + hdr = virtio_vsock_hdr(skb); + hdr->type = cpu_to_le16(info->type); + hdr->op = cpu_to_le16(info->op); + hdr->src_cid = cpu_to_le64(src_cid); + hdr->dst_cid = cpu_to_le64(dst_cid); + hdr->src_port = cpu_to_le32(src_port); + hdr->dst_port = cpu_to_le32(dst_port); + hdr->flags = cpu_to_le32(info->flags); + hdr->len = cpu_to_le32(len); if (info->msg && len > 0) { - pkt->buf = kmalloc(len, GFP_KERNEL); - if (!pkt->buf) - goto out_pkt; - - pkt->buf_len = len; - - err = memcpy_from_msg(pkt->buf, info->msg, len); + payload = skb_put(skb, len); + err = memcpy_from_msg(payload, info->msg, len); if (err) goto out; if (msg_data_left(info->msg) == 0 && info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) { - pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); if (info->msg->msg_flags & MSG_EOR) - pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); } } + if (info->reply) + virtio_vsock_skb_set_reply(skb); + trace_virtio_transport_alloc_pkt(src_cid, src_port, dst_cid, dst_port, len, @@ -91,19 +94,18 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, info->op, info->flags); - return pkt; + return skb; out: - kfree(pkt->buf); -out_pkt: - kfree(pkt); + kfree_skb(skb); return NULL; } /* Packet capture */ static struct sk_buff *virtio_transport_build_skb(void *opaque) { - struct virtio_vsock_pkt *pkt = opaque; + struct virtio_vsock_hdr *pkt_hdr; + struct sk_buff *pkt = opaque; struct af_vsockmon_hdr *hdr; struct sk_buff *skb; size_t payload_len; @@ -113,10 +115,11 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) * the payload length from the header and the buffer pointer taking * care of the offset in the original packet. */ - payload_len = le32_to_cpu(pkt->hdr.len); - payload_buf = pkt->buf + pkt->off; + pkt_hdr = virtio_vsock_hdr(pkt); + payload_len = pkt->len; + payload_buf = pkt->data; - skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len, + skb = alloc_skb(sizeof(*hdr) + sizeof(*pkt_hdr) + payload_len, GFP_ATOMIC); if (!skb) return NULL; @@ -124,16 +127,16 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) hdr = skb_put(skb, sizeof(*hdr)); /* pkt->hdr is little-endian so no need to byteswap here */ - hdr->src_cid = pkt->hdr.src_cid; - hdr->src_port = pkt->hdr.src_port; - hdr->dst_cid = pkt->hdr.dst_cid; - hdr->dst_port = pkt->hdr.dst_port; + hdr->src_cid = pkt_hdr->src_cid; + hdr->src_port = pkt_hdr->src_port; + hdr->dst_cid = pkt_hdr->dst_cid; + hdr->dst_port = pkt_hdr->dst_port; hdr->transport = cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO); - hdr->len = cpu_to_le16(sizeof(pkt->hdr)); + hdr->len = cpu_to_le16(sizeof(*pkt_hdr)); memset(hdr->reserved, 0, sizeof(hdr->reserved)); - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(pkt_hdr->op)) { case VIRTIO_VSOCK_OP_REQUEST: case VIRTIO_VSOCK_OP_RESPONSE: hdr->op = cpu_to_le16(AF_VSOCK_OP_CONNECT); @@ -154,7 +157,7 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) break; } - skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr)); + skb_put_data(skb, pkt_hdr, sizeof(*pkt_hdr)); if (payload_len) { skb_put_data(skb, payload_buf, payload_len); @@ -163,13 +166,13 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) return skb; } -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt) +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb) { - if (pkt->tap_delivered) + if (virtio_vsock_skb_tap_delivered(skb)) return; - vsock_deliver_tap(virtio_transport_build_skb, pkt); - pkt->tap_delivered = true; + vsock_deliver_tap(virtio_transport_build_skb, skb); + virtio_vsock_skb_set_tap_delivered(skb); } EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt); @@ -192,8 +195,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, u32 src_cid, src_port, dst_cid, dst_port; const struct virtio_transport *t_ops; struct virtio_vsock_sock *vvs; - struct virtio_vsock_pkt *pkt; u32 pkt_len = info->pkt_len; + struct sk_buff *skb; info->type = virtio_transport_get_type(sk_vsock(vsk)); @@ -224,42 +227,47 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW) return pkt_len; - pkt = virtio_transport_alloc_pkt(info, pkt_len, + skb = virtio_transport_alloc_skb(info, pkt_len, src_cid, src_port, dst_cid, dst_port); - if (!pkt) { + if (!skb) { virtio_transport_put_credit(vvs, pkt_len); return -ENOMEM; } - virtio_transport_inc_tx_pkt(vvs, pkt); + virtio_transport_inc_tx_pkt(vvs, skb); - return t_ops->send_pkt(pkt); + return t_ops->send_pkt(skb); } static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - if (vvs->rx_bytes + pkt->len > vvs->buf_alloc) + if (vvs->rx_bytes + skb->len > vvs->buf_alloc) return false; - vvs->rx_bytes += pkt->len; + vvs->rx_bytes += skb->len; return true; } static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - vvs->rx_bytes -= pkt->len; - vvs->fwd_cnt += pkt->len; + int len; + + len = skb_headroom(skb) - sizeof(struct virtio_vsock_hdr) - skb->len; + vvs->rx_bytes -= len; + vvs->fwd_cnt += len; } -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt) +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); + spin_lock_bh(&vvs->rx_lock); vvs->last_fwd_cnt = vvs->fwd_cnt; - pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt); - pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc); + hdr->fwd_cnt = cpu_to_le32(vvs->fwd_cnt); + hdr->buf_alloc = cpu_to_le32(vvs->buf_alloc); spin_unlock_bh(&vvs->rx_lock); } EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt); @@ -303,29 +311,29 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk, size_t len) { struct virtio_vsock_sock *vvs = vsk->trans; - struct virtio_vsock_pkt *pkt; size_t bytes, total = 0, off; + struct sk_buff *skb, *tmp; int err = -EFAULT; spin_lock_bh(&vvs->rx_lock); - list_for_each_entry(pkt, &vvs->rx_queue, list) { - off = pkt->off; + skb_queue_walk_safe(&vvs->rx_queue, skb, tmp) { + off = 0; if (total == len) break; - while (total < len && off < pkt->len) { + while (total < len && off < skb->len) { bytes = len - total; - if (bytes > pkt->len - off) - bytes = pkt->len - off; + if (bytes > skb->len - off) + bytes = skb->len - off; /* sk_lock is held by caller so no one else can dequeue. * Unlock rx_lock since memcpy_to_msg() may sleep. */ spin_unlock_bh(&vvs->rx_lock); - err = memcpy_to_msg(msg, pkt->buf + off, bytes); + err = memcpy_to_msg(msg, skb->data + off, bytes); if (err) goto out; @@ -352,37 +360,38 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, size_t len) { struct virtio_vsock_sock *vvs = vsk->trans; - struct virtio_vsock_pkt *pkt; size_t bytes, total = 0; - u32 free_space; + struct sk_buff *skb; int err = -EFAULT; + u32 free_space; spin_lock_bh(&vvs->rx_lock); - while (total < len && !list_empty(&vvs->rx_queue)) { - pkt = list_first_entry(&vvs->rx_queue, - struct virtio_vsock_pkt, list); + while (total < len && !skb_queue_empty(&vvs->rx_queue)) { + skb = __skb_dequeue(&vvs->rx_queue); bytes = len - total; - if (bytes > pkt->len - pkt->off) - bytes = pkt->len - pkt->off; + if (bytes > skb->len) + bytes = skb->len; /* sk_lock is held by caller so no one else can dequeue. * Unlock rx_lock since memcpy_to_msg() may sleep. */ spin_unlock_bh(&vvs->rx_lock); - err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes); + err = memcpy_to_msg(msg, skb->data, bytes); if (err) goto out; spin_lock_bh(&vvs->rx_lock); total += bytes; - pkt->off += bytes; - if (pkt->off == pkt->len) { - virtio_transport_dec_rx_pkt(vvs, pkt); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); + skb_pull(skb, bytes); + + if (skb->len == 0) { + virtio_transport_dec_rx_pkt(vvs, skb); + consume_skb(skb); + } else { + __skb_queue_head(&vvs->rx_queue, skb); } } @@ -414,10 +423,10 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, int flags) { struct virtio_vsock_sock *vvs = vsk->trans; - struct virtio_vsock_pkt *pkt; int dequeued_len = 0; size_t user_buf_len = msg_data_left(msg); bool msg_ready = false; + struct sk_buff *skb; spin_lock_bh(&vvs->rx_lock); @@ -427,13 +436,18 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, } while (!msg_ready) { - pkt = list_first_entry(&vvs->rx_queue, struct virtio_vsock_pkt, list); + struct virtio_vsock_hdr *hdr; + + skb = __skb_dequeue(&vvs->rx_queue); + if (!skb) + break; + hdr = virtio_vsock_hdr(skb); if (dequeued_len >= 0) { size_t pkt_len; size_t bytes_to_copy; - pkt_len = (size_t)le32_to_cpu(pkt->hdr.len); + pkt_len = (size_t)le32_to_cpu(hdr->len); bytes_to_copy = min(user_buf_len, pkt_len); if (bytes_to_copy) { @@ -444,7 +458,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, */ spin_unlock_bh(&vvs->rx_lock); - err = memcpy_to_msg(msg, pkt->buf, bytes_to_copy); + err = memcpy_to_msg(msg, skb->data, bytes_to_copy); if (err) { /* Copy of message failed. Rest of * fragments will be freed without copy. @@ -452,6 +466,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, dequeued_len = err; } else { user_buf_len -= bytes_to_copy; + skb_pull(skb, bytes_to_copy); } spin_lock_bh(&vvs->rx_lock); @@ -461,17 +476,16 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, dequeued_len += pkt_len; } - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) { msg_ready = true; vvs->msg_count--; - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) msg->msg_flags |= MSG_EOR; } - virtio_transport_dec_rx_pkt(vvs, pkt); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); + virtio_transport_dec_rx_pkt(vvs, skb); + kfree_skb(skb); } spin_unlock_bh(&vvs->rx_lock); @@ -609,7 +623,7 @@ int virtio_transport_do_socket_init(struct vsock_sock *vsk, spin_lock_init(&vvs->rx_lock); spin_lock_init(&vvs->tx_lock); - INIT_LIST_HEAD(&vvs->rx_queue); + skb_queue_head_init(&vvs->rx_queue); return 0; } @@ -806,16 +820,16 @@ void virtio_transport_destruct(struct vsock_sock *vsk) EXPORT_SYMBOL_GPL(virtio_transport_destruct); static int virtio_transport_reset(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, - .reply = !!pkt, + .reply = !!skb, .vsk = vsk, }; /* Send RST only if the original pkt is not a RST pkt */ - if (pkt && le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST) + if (skb && le16_to_cpu(virtio_vsock_hdr(skb)->op) == VIRTIO_VSOCK_OP_RST) return 0; return virtio_transport_send_pkt_info(vsk, &info); @@ -825,29 +839,30 @@ static int virtio_transport_reset(struct vsock_sock *vsk, * attempt was made to connect to a socket that does not exist. */ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { - struct virtio_vsock_pkt *reply; + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, - .type = le16_to_cpu(pkt->hdr.type), + .type = le16_to_cpu(hdr->type), .reply = true, }; + struct sk_buff *reply; /* Send RST only if the original pkt is not a RST pkt */ - if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST) + if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST) return 0; - reply = virtio_transport_alloc_pkt(&info, 0, - le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port), - le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); + reply = virtio_transport_alloc_skb(&info, 0, + le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port), + le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); if (!reply) return -ENOMEM; if (!t) { - virtio_transport_free_pkt(reply); + kfree_skb(reply); return -ENOTCONN; } @@ -858,16 +873,11 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, static void virtio_transport_remove_sock(struct vsock_sock *vsk) { struct virtio_vsock_sock *vvs = vsk->trans; - struct virtio_vsock_pkt *pkt, *tmp; /* We don't need to take rx_lock, as the socket is closing and we are * removing it. */ - list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) { - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } - + __skb_queue_purge(&vvs->rx_queue); vsock_remove_sock(vsk); } @@ -981,13 +991,14 @@ EXPORT_SYMBOL_GPL(virtio_transport_release); static int virtio_transport_recv_connecting(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vsock_sock *vsk = vsock_sk(sk); - int err; int skerr; + int err; - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RESPONSE: sk->sk_state = TCP_ESTABLISHED; sk->sk_socket->state = SS_CONNECTED; @@ -1008,7 +1019,7 @@ virtio_transport_recv_connecting(struct sock *sk, return 0; destroy: - virtio_transport_reset(vsk, pkt); + virtio_transport_reset(vsk, skb); sk->sk_state = TCP_CLOSE; sk->sk_err = skerr; sk_error_report(sk); @@ -1017,34 +1028,37 @@ destroy: static void virtio_transport_recv_enqueue(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { struct virtio_vsock_sock *vvs = vsk->trans; bool can_enqueue, free_pkt = false; + struct virtio_vsock_hdr *hdr; + u32 len; - pkt->len = le32_to_cpu(pkt->hdr.len); - pkt->off = 0; + hdr = virtio_vsock_hdr(skb); + len = le32_to_cpu(hdr->len); spin_lock_bh(&vvs->rx_lock); - can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt); + can_enqueue = virtio_transport_inc_rx_pkt(vvs, skb); if (!can_enqueue) { free_pkt = true; goto out; } - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) vvs->msg_count++; /* Try to copy small packets into the buffer of last packet queued, * to avoid wasting memory queueing the entire buffer with a small * payload. */ - if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) { - struct virtio_vsock_pkt *last_pkt; + if (len <= GOOD_COPY_LEN && !skb_queue_empty(&vvs->rx_queue)) { + struct virtio_vsock_hdr *last_hdr; + struct sk_buff *last_skb; - last_pkt = list_last_entry(&vvs->rx_queue, - struct virtio_vsock_pkt, list); + last_skb = skb_peek_tail(&vvs->rx_queue); + last_hdr = virtio_vsock_hdr(last_skb); /* If there is space in the last packet queued, we copy the * new packet in its buffer. We avoid this if the last packet @@ -1052,35 +1066,35 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk, * delimiter of SEQPACKET message, so 'pkt' is the first packet * of a new message. */ - if ((pkt->len <= last_pkt->buf_len - last_pkt->len) && - !(le32_to_cpu(last_pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM)) { - memcpy(last_pkt->buf + last_pkt->len, pkt->buf, - pkt->len); - last_pkt->len += pkt->len; + if (skb->len < skb_tailroom(last_skb) && + !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)) { + memcpy(skb_put(last_skb, skb->len), skb->data, skb->len); free_pkt = true; - last_pkt->hdr.flags |= pkt->hdr.flags; + last_hdr->flags |= hdr->flags; + last_hdr->len = cpu_to_le32(last_skb->len); goto out; } } - list_add_tail(&pkt->list, &vvs->rx_queue); + __skb_queue_tail(&vvs->rx_queue, skb); out: spin_unlock_bh(&vvs->rx_lock); if (free_pkt) - virtio_transport_free_pkt(pkt); + kfree_skb(skb); } static int virtio_transport_recv_connected(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vsock_sock *vsk = vsock_sk(sk); int err = 0; - switch (le16_to_cpu(pkt->hdr.op)) { + switch (le16_to_cpu(hdr->op)) { case VIRTIO_VSOCK_OP_RW: - virtio_transport_recv_enqueue(vsk, pkt); + virtio_transport_recv_enqueue(vsk, skb); vsock_data_ready(sk); return err; case VIRTIO_VSOCK_OP_CREDIT_REQUEST: @@ -1090,18 +1104,17 @@ virtio_transport_recv_connected(struct sock *sk, sk->sk_write_space(sk); break; case VIRTIO_VSOCK_OP_SHUTDOWN: - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_RCV) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_RCV) vsk->peer_shutdown |= RCV_SHUTDOWN; - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) + if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_SEND) vsk->peer_shutdown |= SEND_SHUTDOWN; if (vsk->peer_shutdown == SHUTDOWN_MASK && vsock_stream_has_data(vsk) <= 0 && !sock_flag(sk, SOCK_DONE)) { (void)virtio_transport_reset(vsk, NULL); - virtio_transport_do_close(vsk, true); } - if (le32_to_cpu(pkt->hdr.flags)) + if (le32_to_cpu(virtio_vsock_hdr(skb)->flags)) sk->sk_state_change(sk); break; case VIRTIO_VSOCK_OP_RST: @@ -1112,28 +1125,30 @@ virtio_transport_recv_connected(struct sock *sk, break; } - virtio_transport_free_pkt(pkt); + kfree_skb(skb); return err; } static void virtio_transport_recv_disconnecting(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vsock_sock *vsk = vsock_sk(sk); - if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST) + if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST) virtio_transport_do_close(vsk, true); } static int virtio_transport_send_response(struct vsock_sock *vsk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RESPONSE, - .remote_cid = le64_to_cpu(pkt->hdr.src_cid), - .remote_port = le32_to_cpu(pkt->hdr.src_port), + .remote_cid = le64_to_cpu(hdr->src_cid), + .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk, }; @@ -1142,8 +1157,9 @@ virtio_transport_send_response(struct vsock_sock *vsk, } static bool virtio_transport_space_update(struct sock *sk, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vsock_sock *vsk = vsock_sk(sk); struct virtio_vsock_sock *vvs = vsk->trans; bool space_available; @@ -1158,8 +1174,8 @@ static bool virtio_transport_space_update(struct sock *sk, /* buf_alloc and fwd_cnt is always included in the hdr */ spin_lock_bh(&vvs->tx_lock); - vvs->peer_buf_alloc = le32_to_cpu(pkt->hdr.buf_alloc); - vvs->peer_fwd_cnt = le32_to_cpu(pkt->hdr.fwd_cnt); + vvs->peer_buf_alloc = le32_to_cpu(hdr->buf_alloc); + vvs->peer_fwd_cnt = le32_to_cpu(hdr->fwd_cnt); space_available = virtio_transport_has_space(vsk); spin_unlock_bh(&vvs->tx_lock); return space_available; @@ -1167,27 +1183,28 @@ static bool virtio_transport_space_update(struct sock *sk, /* Handle server socket */ static int -virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt, +virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, struct virtio_transport *t) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vsock_sock *vsk = vsock_sk(sk); struct vsock_sock *vchild; struct sock *child; int ret; - if (le16_to_cpu(pkt->hdr.op) != VIRTIO_VSOCK_OP_REQUEST) { - virtio_transport_reset_no_sock(t, pkt); + if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) { + virtio_transport_reset_no_sock(t, skb); return -EINVAL; } if (sk_acceptq_is_full(sk)) { - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); return -ENOMEM; } child = vsock_create_connected(sk); if (!child) { - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); return -ENOMEM; } @@ -1198,10 +1215,10 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt, child->sk_state = TCP_ESTABLISHED; vchild = vsock_sk(child); - vsock_addr_init(&vchild->local_addr, le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port)); - vsock_addr_init(&vchild->remote_addr, le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); + vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port)); + vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); ret = vsock_assign_transport(vchild, vsk); /* Transport assigned (looking at remote_addr) must be the same @@ -1209,17 +1226,17 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt, */ if (ret || vchild->transport != &t->transport) { release_sock(child); - virtio_transport_reset_no_sock(t, pkt); + virtio_transport_reset_no_sock(t, skb); sock_put(child); return ret; } - if (virtio_transport_space_update(child, pkt)) + if (virtio_transport_space_update(child, skb)) child->sk_write_space(child); vsock_insert_connected(vchild); vsock_enqueue_accept(sk, child); - virtio_transport_send_response(vchild, pkt); + virtio_transport_send_response(vchild, skb); release_sock(child); @@ -1237,29 +1254,30 @@ static bool virtio_transport_valid_type(u16 type) * lock. */ void virtio_transport_recv_pkt(struct virtio_transport *t, - struct virtio_vsock_pkt *pkt) + struct sk_buff *skb) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct sockaddr_vm src, dst; struct vsock_sock *vsk; struct sock *sk; bool space_available; - vsock_addr_init(&src, le64_to_cpu(pkt->hdr.src_cid), - le32_to_cpu(pkt->hdr.src_port)); - vsock_addr_init(&dst, le64_to_cpu(pkt->hdr.dst_cid), - le32_to_cpu(pkt->hdr.dst_port)); + vsock_addr_init(&src, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); + vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid), + le32_to_cpu(hdr->dst_port)); trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port, dst.svm_cid, dst.svm_port, - le32_to_cpu(pkt->hdr.len), - le16_to_cpu(pkt->hdr.type), - le16_to_cpu(pkt->hdr.op), - le32_to_cpu(pkt->hdr.flags), - le32_to_cpu(pkt->hdr.buf_alloc), - le32_to_cpu(pkt->hdr.fwd_cnt)); - - if (!virtio_transport_valid_type(le16_to_cpu(pkt->hdr.type))) { - (void)virtio_transport_reset_no_sock(t, pkt); + le32_to_cpu(hdr->len), + le16_to_cpu(hdr->type), + le16_to_cpu(hdr->op), + le32_to_cpu(hdr->flags), + le32_to_cpu(hdr->buf_alloc), + le32_to_cpu(hdr->fwd_cnt)); + + if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) { + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } @@ -1270,13 +1288,13 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, if (!sk) { sk = vsock_find_bound_socket(&dst); if (!sk) { - (void)virtio_transport_reset_no_sock(t, pkt); + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } } - if (virtio_transport_get_type(sk) != le16_to_cpu(pkt->hdr.type)) { - (void)virtio_transport_reset_no_sock(t, pkt); + if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) { + (void)virtio_transport_reset_no_sock(t, skb); sock_put(sk); goto free_pkt; } @@ -1287,13 +1305,13 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, /* Check if sk has been closed before lock_sock */ if (sock_flag(sk, SOCK_DONE)) { - (void)virtio_transport_reset_no_sock(t, pkt); + (void)virtio_transport_reset_no_sock(t, skb); release_sock(sk); sock_put(sk); goto free_pkt; } - space_available = virtio_transport_space_update(sk, pkt); + space_available = virtio_transport_space_update(sk, skb); /* Update CID in case it has changed after a transport reset event */ if (vsk->local_addr.svm_cid != VMADDR_CID_ANY) @@ -1304,23 +1322,23 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, switch (sk->sk_state) { case TCP_LISTEN: - virtio_transport_recv_listen(sk, pkt, t); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_listen(sk, skb, t); + kfree_skb(skb); break; case TCP_SYN_SENT: - virtio_transport_recv_connecting(sk, pkt); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_connecting(sk, skb); + kfree_skb(skb); break; case TCP_ESTABLISHED: - virtio_transport_recv_connected(sk, pkt); + virtio_transport_recv_connected(sk, skb); break; case TCP_CLOSING: - virtio_transport_recv_disconnecting(sk, pkt); - virtio_transport_free_pkt(pkt); + virtio_transport_recv_disconnecting(sk, skb); + kfree_skb(skb); break; default: - (void)virtio_transport_reset_no_sock(t, pkt); - virtio_transport_free_pkt(pkt); + (void)virtio_transport_reset_no_sock(t, skb); + kfree_skb(skb); break; } @@ -1333,16 +1351,42 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, return; free_pkt: - virtio_transport_free_pkt(pkt); + kfree_skb(skb); } EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) +/* Remove skbs found in a queue that have a vsk that matches. + * + * Each skb is freed. + * + * Returns the count of skbs that were reply packets. + */ +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue) { - kvfree(pkt->buf); - kfree(pkt); + struct sk_buff_head freeme; + struct sk_buff *skb, *tmp; + int cnt = 0; + + skb_queue_head_init(&freeme); + + spin_lock_bh(&queue->lock); + skb_queue_walk_safe(queue, skb, tmp) { + if (vsock_sk(skb->sk) != vsk) + continue; + + __skb_unlink(skb, queue); + __skb_queue_tail(&freeme, skb); + + if (virtio_vsock_skb_reply(skb)) + cnt++; + } + spin_unlock_bh(&queue->lock); + + __skb_queue_purge(&freeme); + + return cnt; } -EXPORT_SYMBOL_GPL(virtio_transport_free_pkt); +EXPORT_SYMBOL_GPL(virtio_transport_purge_skbs); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Asias He"); diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 169a8cf65b39..671e03240fc5 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -16,7 +16,7 @@ struct vsock_loopback { struct workqueue_struct *workqueue; spinlock_t pkt_list_lock; /* protects pkt_list */ - struct list_head pkt_list; + struct sk_buff_head pkt_queue; struct work_struct pkt_work; }; @@ -27,13 +27,13 @@ static u32 vsock_loopback_get_local_cid(void) return VMADDR_CID_LOCAL; } -static int vsock_loopback_send_pkt(struct virtio_vsock_pkt *pkt) +static int vsock_loopback_send_pkt(struct sk_buff *skb) { struct vsock_loopback *vsock = &the_vsock_loopback; - int len = pkt->len; + int len = skb->len; spin_lock_bh(&vsock->pkt_list_lock); - list_add_tail(&pkt->list, &vsock->pkt_list); + skb_queue_tail(&vsock->pkt_queue, skb); spin_unlock_bh(&vsock->pkt_list_lock); queue_work(vsock->workqueue, &vsock->pkt_work); @@ -44,21 +44,8 @@ static int vsock_loopback_send_pkt(struct virtio_vsock_pkt *pkt) static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) { struct vsock_loopback *vsock = &the_vsock_loopback; - struct virtio_vsock_pkt *pkt, *n; - LIST_HEAD(freeme); - spin_lock_bh(&vsock->pkt_list_lock); - list_for_each_entry_safe(pkt, n, &vsock->pkt_list, list) { - if (pkt->vsk != vsk) - continue; - list_move(&pkt->list, &freeme); - } - spin_unlock_bh(&vsock->pkt_list_lock); - - list_for_each_entry_safe(pkt, n, &freeme, list) { - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + virtio_transport_purge_skbs(vsk, &vsock->pkt_queue); return 0; } @@ -121,20 +108,18 @@ static void vsock_loopback_work(struct work_struct *work) { struct vsock_loopback *vsock = container_of(work, struct vsock_loopback, pkt_work); - LIST_HEAD(pkts); + struct sk_buff_head pkts; + struct sk_buff *skb; + + skb_queue_head_init(&pkts); spin_lock_bh(&vsock->pkt_list_lock); - list_splice_init(&vsock->pkt_list, &pkts); + skb_queue_splice_init(&vsock->pkt_queue, &pkts); spin_unlock_bh(&vsock->pkt_list_lock); - while (!list_empty(&pkts)) { - struct virtio_vsock_pkt *pkt; - - pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list); - list_del_init(&pkt->list); - - virtio_transport_deliver_tap_pkt(pkt); - virtio_transport_recv_pkt(&loopback_transport, pkt); + while ((skb = __skb_dequeue(&pkts))) { + virtio_transport_deliver_tap_pkt(skb); + virtio_transport_recv_pkt(&loopback_transport, skb); } } @@ -148,7 +133,7 @@ static int __init vsock_loopback_init(void) return -ENOMEM; spin_lock_init(&vsock->pkt_list_lock); - INIT_LIST_HEAD(&vsock->pkt_list); + skb_queue_head_init(&vsock->pkt_queue); INIT_WORK(&vsock->pkt_work, vsock_loopback_work); ret = vsock_core_register(&loopback_transport.transport, @@ -166,19 +151,13 @@ out_wq: static void __exit vsock_loopback_exit(void) { struct vsock_loopback *vsock = &the_vsock_loopback; - struct virtio_vsock_pkt *pkt; vsock_core_unregister(&loopback_transport.transport); flush_work(&vsock->pkt_work); spin_lock_bh(&vsock->pkt_list_lock); - while (!list_empty(&vsock->pkt_list)) { - pkt = list_first_entry(&vsock->pkt_list, - struct virtio_vsock_pkt, list); - list_del(&pkt->list); - virtio_transport_free_pkt(pkt); - } + virtio_vsock_skb_queue_purge(&vsock->pkt_queue); spin_unlock_bh(&vsock->pkt_list_lock); destroy_workqueue(vsock->workqueue); -- cgit v1.2.3 From 0349b8779cc949ad9e6aced32672ee48cf79b497 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Fri, 13 Jan 2023 11:43:53 +0800 Subject: sched: add new attr TCA_EXT_WARN_MSG to report tc extact message We will report extack message if there is an error via netlink_ack(). But if the rule is not to be exclusively executed by the hardware, extack is not passed along and offloading failures don't get logged. In commit 81c7288b170a ("sched: cls: enable verbose logging") Marcelo made cls could log verbose info for offloading failures, which helps improving Open vSwitch debuggability when using flower offloading. It would also be helpful if userspace monitor tools, like "tc monitor", could log this kind of message, as it doesn't require vswitchd log level adjusment. Let's add a new tc attributes to report the extack message so the monitor program could receive the failures. e.g. # tc monitor added chain dev enp3s0f1np1 parent ffff: chain 0 added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1 ct_state +trk+new not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 Warning: mlx5_core: matching on ct_state +new isn't supported. In this patch I only report the extack message on add/del operations. It doesn't look like we need to report the extack message on get/dump operations. Note this message not only reporte to multicast groups, it could also be reported unicast, which may affect the current usersapce tool's behaivor. Suggested-by: Marcelo Ricardo Leitner Signed-off-by: Hangbin Liu Acked-by: Jakub Kicinski Acked-by: Jamal Hadi Salim Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni --- include/uapi/linux/rtnetlink.h | 1 + net/sched/act_api.c | 15 ++++++---- net/sched/cls_api.c | 62 ++++++++++++++++++++++++++---------------- net/sched/sch_api.c | 55 +++++++++++++++++++++++-------------- 4 files changed, 84 insertions(+), 49 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index eb2747d58a81..25a0af57dd5e 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -635,6 +635,7 @@ enum { TCA_INGRESS_BLOCK, TCA_EGRESS_BLOCK, TCA_DUMP_FLAGS, + TCA_EXT_WARN_MSG, __TCA_MAX }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 5b3c0ac495be..cd09ef49df22 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -1582,7 +1582,7 @@ errout: static int tca_get_fill(struct sk_buff *skb, struct tc_action *actions[], u32 portid, u32 seq, u16 flags, int event, int bind, - int ref) + int ref, struct netlink_ext_ack *extack) { struct tcamsg *t; struct nlmsghdr *nlh; @@ -1606,7 +1606,12 @@ static int tca_get_fill(struct sk_buff *skb, struct tc_action *actions[], nla_nest_end(skb, nest); + if (extack && extack->_msg && + nla_put_string(skb, TCA_EXT_WARN_MSG, extack->_msg)) + goto out_nlmsg_trim; + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + return skb->len; out_nlmsg_trim: @@ -1625,7 +1630,7 @@ tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n, if (!skb) return -ENOBUFS; if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, event, - 0, 1) <= 0) { + 0, 1, NULL) <= 0) { NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while adding TC action"); kfree_skb(skb); return -EINVAL; @@ -1799,7 +1804,7 @@ tcf_reoffload_del_notify(struct net *net, struct tc_action *action) if (!skb) return -ENOBUFS; - if (tca_get_fill(skb, actions, 0, 0, 0, RTM_DELACTION, 0, 1) <= 0) { + if (tca_get_fill(skb, actions, 0, 0, 0, RTM_DELACTION, 0, 1, NULL) <= 0) { kfree_skb(skb); return -EINVAL; } @@ -1886,7 +1891,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], return -ENOBUFS; if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, RTM_DELACTION, - 0, 2) <= 0) { + 0, 2, extack) <= 0) { NL_SET_ERR_MSG(extack, "Failed to fill netlink TC action attributes"); kfree_skb(skb); return -EINVAL; @@ -1965,7 +1970,7 @@ tcf_add_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[], return -ENOBUFS; if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, n->nlmsg_flags, - RTM_NEWACTION, 0, 0) <= 0) { + RTM_NEWACTION, 0, 0, extack) <= 0) { NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while adding TC action"); kfree_skb(skb); return -EINVAL; diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 668130f08903..5b4a95e8a1ee 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -488,7 +488,8 @@ static struct tcf_chain *tcf_chain_lookup_rcu(const struct tcf_block *block, #endif static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb, - u32 seq, u16 flags, int event, bool unicast); + u32 seq, u16 flags, int event, bool unicast, + struct netlink_ext_ack *extack); static struct tcf_chain *__tcf_chain_get(struct tcf_block *block, u32 chain_index, bool create, @@ -521,7 +522,7 @@ static struct tcf_chain *__tcf_chain_get(struct tcf_block *block, */ if (is_first_reference && !by_act) tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL, - RTM_NEWCHAIN, false); + RTM_NEWCHAIN, false, NULL); return chain; @@ -1817,7 +1818,8 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb, struct tcf_proto *tp, struct tcf_block *block, struct Qdisc *q, u32 parent, void *fh, u32 portid, u32 seq, u16 flags, int event, - bool terse_dump, bool rtnl_held) + bool terse_dump, bool rtnl_held, + struct netlink_ext_ack *extack) { struct tcmsg *tcm; struct nlmsghdr *nlh; @@ -1857,7 +1859,13 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb, tp->ops->dump(net, tp, fh, skb, tcm, rtnl_held) < 0) goto nla_put_failure; } + + if (extack && extack->_msg && + nla_put_string(skb, TCA_EXT_WARN_MSG, extack->_msg)) + goto nla_put_failure; + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + return skb->len; out_nlmsg_trim: @@ -1871,7 +1879,7 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, struct tcf_proto *tp, struct tcf_block *block, struct Qdisc *q, u32 parent, void *fh, int event, bool unicast, - bool rtnl_held) + bool rtnl_held, struct netlink_ext_ack *extack) { struct sk_buff *skb; u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; @@ -1883,7 +1891,7 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb, if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid, n->nlmsg_seq, n->nlmsg_flags, event, - false, rtnl_held) <= 0) { + false, rtnl_held, extack) <= 0) { kfree_skb(skb); return -EINVAL; } @@ -1912,7 +1920,7 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid, n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER, - false, rtnl_held) <= 0) { + false, rtnl_held, extack) <= 0) { NL_SET_ERR_MSG(extack, "Failed to build del event notification"); kfree_skb(skb); return -EINVAL; @@ -1938,14 +1946,15 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, struct tcf_block *block, struct Qdisc *q, u32 parent, struct nlmsghdr *n, - struct tcf_chain *chain, int event) + struct tcf_chain *chain, int event, + struct netlink_ext_ack *extack) { struct tcf_proto *tp; for (tp = tcf_get_next_proto(chain, NULL); tp; tp = tcf_get_next_proto(chain, tp)) - tfilter_notify(net, oskb, n, tp, block, - q, parent, NULL, event, false, true); + tfilter_notify(net, oskb, n, tp, block, q, parent, NULL, + event, false, true, extack); } static void tfilter_put(struct tcf_proto *tp, void *fh) @@ -2156,7 +2165,7 @@ replay: flags, extack); if (err == 0) { tfilter_notify(net, skb, n, tp, block, q, parent, fh, - RTM_NEWTFILTER, false, rtnl_held); + RTM_NEWTFILTER, false, rtnl_held, extack); tfilter_put(tp, fh); /* q pointer is NULL for shared blocks */ if (q) @@ -2284,7 +2293,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n, if (prio == 0) { tfilter_notify_chain(net, skb, block, q, parent, n, - chain, RTM_DELTFILTER); + chain, RTM_DELTFILTER, extack); tcf_chain_flush(chain, rtnl_held); err = 0; goto errout; @@ -2308,7 +2317,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n, tcf_proto_put(tp, rtnl_held, NULL); tfilter_notify(net, skb, n, tp, block, q, parent, fh, - RTM_DELTFILTER, false, rtnl_held); + RTM_DELTFILTER, false, rtnl_held, extack); err = 0; goto errout; } @@ -2452,7 +2461,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n, err = -ENOENT; } else { err = tfilter_notify(net, skb, n, tp, block, q, parent, - fh, RTM_NEWTFILTER, true, rtnl_held); + fh, RTM_NEWTFILTER, true, rtnl_held, NULL); if (err < 0) NL_SET_ERR_MSG(extack, "Failed to send filter notify message"); } @@ -2490,7 +2499,7 @@ static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg) return tcf_fill_node(net, a->skb, tp, a->block, a->q, a->parent, n, NETLINK_CB(a->cb->skb).portid, a->cb->nlh->nlmsg_seq, NLM_F_MULTI, - RTM_NEWTFILTER, a->terse_dump, true); + RTM_NEWTFILTER, a->terse_dump, true, NULL); } static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent, @@ -2524,7 +2533,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent, if (tcf_fill_node(net, skb, tp, block, q, parent, NULL, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, - RTM_NEWTFILTER, false, true) <= 0) + RTM_NEWTFILTER, false, true, NULL) <= 0) goto errout; cb->args[1] = 1; } @@ -2667,7 +2676,8 @@ static int tc_chain_fill_node(const struct tcf_proto_ops *tmplt_ops, void *tmplt_priv, u32 chain_index, struct net *net, struct sk_buff *skb, struct tcf_block *block, - u32 portid, u32 seq, u16 flags, int event) + u32 portid, u32 seq, u16 flags, int event, + struct netlink_ext_ack *extack) { unsigned char *b = skb_tail_pointer(skb); const struct tcf_proto_ops *ops; @@ -2704,7 +2714,12 @@ static int tc_chain_fill_node(const struct tcf_proto_ops *tmplt_ops, goto nla_put_failure; } + if (extack && extack->_msg && + nla_put_string(skb, TCA_EXT_WARN_MSG, extack->_msg)) + goto out_nlmsg_trim; + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + return skb->len; out_nlmsg_trim: @@ -2714,7 +2729,8 @@ nla_put_failure: } static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb, - u32 seq, u16 flags, int event, bool unicast) + u32 seq, u16 flags, int event, bool unicast, + struct netlink_ext_ack *extack) { u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; struct tcf_block *block = chain->block; @@ -2728,7 +2744,7 @@ static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb, if (tc_chain_fill_node(chain->tmplt_ops, chain->tmplt_priv, chain->index, net, skb, block, portid, - seq, flags, event) <= 0) { + seq, flags, event, extack) <= 0) { kfree_skb(skb); return -EINVAL; } @@ -2756,7 +2772,7 @@ static int tc_chain_notify_delete(const struct tcf_proto_ops *tmplt_ops, return -ENOBUFS; if (tc_chain_fill_node(tmplt_ops, tmplt_priv, chain_index, net, skb, - block, portid, seq, flags, RTM_DELCHAIN) <= 0) { + block, portid, seq, flags, RTM_DELCHAIN, NULL) <= 0) { kfree_skb(skb); return -EINVAL; } @@ -2908,11 +2924,11 @@ replay: } tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL, - RTM_NEWCHAIN, false); + RTM_NEWCHAIN, false, extack); break; case RTM_DELCHAIN: tfilter_notify_chain(net, skb, block, q, parent, n, - chain, RTM_DELTFILTER); + chain, RTM_DELTFILTER, extack); /* Flush the chain first as the user requested chain removal. */ tcf_chain_flush(chain, true); /* In case the chain was successfully deleted, put a reference @@ -2922,7 +2938,7 @@ replay: break; case RTM_GETCHAIN: err = tc_chain_notify(chain, skb, n->nlmsg_seq, - n->nlmsg_flags, n->nlmsg_type, true); + n->nlmsg_flags, n->nlmsg_type, true, extack); if (err < 0) NL_SET_ERR_MSG(extack, "Failed to send chain notify message"); break; @@ -3022,7 +3038,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb) chain->index, net, skb, block, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, - RTM_NEWCHAIN); + RTM_NEWCHAIN, NULL); if (err <= 0) break; index++; diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 72d2c204d5f3..c14018a8052c 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -902,7 +902,8 @@ static void qdisc_offload_graft_root(struct net_device *dev, } static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, - u32 portid, u32 seq, u16 flags, int event) + u32 portid, u32 seq, u16 flags, int event, + struct netlink_ext_ack *extack) { struct gnet_stats_basic_sync __percpu *cpu_bstats = NULL; struct gnet_stats_queue __percpu *cpu_qstats = NULL; @@ -970,7 +971,12 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, if (gnet_stats_finish_copy(&d) < 0) goto nla_put_failure; + if (extack && extack->_msg && + nla_put_string(skb, TCA_EXT_WARN_MSG, extack->_msg)) + goto out_nlmsg_trim; + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + return skb->len; out_nlmsg_trim: @@ -991,7 +997,8 @@ static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool dump_invisible) static int qdisc_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, u32 clid, - struct Qdisc *old, struct Qdisc *new) + struct Qdisc *old, struct Qdisc *new, + struct netlink_ext_ack *extack) { struct sk_buff *skb; u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; @@ -1002,12 +1009,12 @@ static int qdisc_notify(struct net *net, struct sk_buff *oskb, if (old && !tc_qdisc_dump_ignore(old, false)) { if (tc_fill_qdisc(skb, old, clid, portid, n->nlmsg_seq, - 0, RTM_DELQDISC) < 0) + 0, RTM_DELQDISC, extack) < 0) goto err_out; } if (new && !tc_qdisc_dump_ignore(new, false)) { if (tc_fill_qdisc(skb, new, clid, portid, n->nlmsg_seq, - old ? NLM_F_REPLACE : 0, RTM_NEWQDISC) < 0) + old ? NLM_F_REPLACE : 0, RTM_NEWQDISC, extack) < 0) goto err_out; } @@ -1022,10 +1029,11 @@ err_out: static void notify_and_destroy(struct net *net, struct sk_buff *skb, struct nlmsghdr *n, u32 clid, - struct Qdisc *old, struct Qdisc *new) + struct Qdisc *old, struct Qdisc *new, + struct netlink_ext_ack *extack) { if (new || old) - qdisc_notify(net, skb, n, clid, old, new); + qdisc_notify(net, skb, n, clid, old, new, extack); if (old) qdisc_put(old); @@ -1105,12 +1113,12 @@ skip: qdisc_refcount_inc(new); rcu_assign_pointer(dev->qdisc, new ? : &noop_qdisc); - notify_and_destroy(net, skb, n, classid, old, new); + notify_and_destroy(net, skb, n, classid, old, new, extack); if (new && new->ops->attach) new->ops->attach(new); } else { - notify_and_destroy(net, skb, n, classid, old, new); + notify_and_destroy(net, skb, n, classid, old, new, extack); } if (dev->flags & IFF_UP) @@ -1141,7 +1149,7 @@ skip: err = cops->graft(parent, cl, new, &old, extack); if (err) return err; - notify_and_destroy(net, skb, n, classid, old, new); + notify_and_destroy(net, skb, n, classid, old, new, extack); } return 0; } @@ -1509,7 +1517,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, if (err != 0) return err; } else { - qdisc_notify(net, skb, n, clid, NULL, q); + qdisc_notify(net, skb, n, clid, NULL, q, NULL); } return 0; } @@ -1648,7 +1656,7 @@ replay: } err = qdisc_change(q, tca, extack); if (err == 0) - qdisc_notify(net, skb, n, clid, NULL, q); + qdisc_notify(net, skb, n, clid, NULL, q, extack); return err; create_n_graft: @@ -1715,7 +1723,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, if (!tc_qdisc_dump_ignore(q, dump_invisible) && tc_fill_qdisc(skb, q, q->parent, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, - RTM_NEWQDISC) <= 0) + RTM_NEWQDISC, NULL) <= 0) goto done; q_idx++; } @@ -1737,7 +1745,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, if (!tc_qdisc_dump_ignore(q, dump_invisible) && tc_fill_qdisc(skb, q, q->parent, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, - RTM_NEWQDISC) <= 0) + RTM_NEWQDISC, NULL) <= 0) goto done; q_idx++; } @@ -1810,8 +1818,8 @@ done: ************************************************/ static int tc_fill_tclass(struct sk_buff *skb, struct Qdisc *q, - unsigned long cl, - u32 portid, u32 seq, u16 flags, int event) + unsigned long cl, u32 portid, u32 seq, u16 flags, + int event, struct netlink_ext_ack *extack) { struct tcmsg *tcm; struct nlmsghdr *nlh; @@ -1846,7 +1854,12 @@ static int tc_fill_tclass(struct sk_buff *skb, struct Qdisc *q, if (gnet_stats_finish_copy(&d) < 0) goto nla_put_failure; + if (extack && extack->_msg && + nla_put_string(skb, TCA_EXT_WARN_MSG, extack->_msg)) + goto out_nlmsg_trim; + nlh->nlmsg_len = skb_tail_pointer(skb) - b; + return skb->len; out_nlmsg_trim: @@ -1857,7 +1870,7 @@ nla_put_failure: static int tclass_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, struct Qdisc *q, - unsigned long cl, int event) + unsigned long cl, int event, struct netlink_ext_ack *extack) { struct sk_buff *skb; u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; @@ -1866,7 +1879,7 @@ static int tclass_notify(struct net *net, struct sk_buff *oskb, if (!skb) return -ENOBUFS; - if (tc_fill_tclass(skb, q, cl, portid, n->nlmsg_seq, 0, event) < 0) { + if (tc_fill_tclass(skb, q, cl, portid, n->nlmsg_seq, 0, event, extack) < 0) { kfree_skb(skb); return -EINVAL; } @@ -1893,7 +1906,7 @@ static int tclass_del_notify(struct net *net, return -ENOBUFS; if (tc_fill_tclass(skb, q, cl, portid, n->nlmsg_seq, 0, - RTM_DELTCLASS) < 0) { + RTM_DELTCLASS, extack) < 0) { kfree_skb(skb); return -EINVAL; } @@ -2100,7 +2113,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, tc_bind_tclass(q, portid, clid, 0); goto out; case RTM_GETTCLASS: - err = tclass_notify(net, skb, n, q, cl, RTM_NEWTCLASS); + err = tclass_notify(net, skb, n, q, cl, RTM_NEWTCLASS, extack); goto out; default: err = -EINVAL; @@ -2118,7 +2131,7 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, if (cops->change) err = cops->change(q, clid, portid, tca, &new_cl, extack); if (err == 0) { - tclass_notify(net, skb, n, q, new_cl, RTM_NEWTCLASS); + tclass_notify(net, skb, n, q, new_cl, RTM_NEWTCLASS, extack); /* We just create a new class, need to do reverse binding. */ if (cl != new_cl) tc_bind_tclass(q, portid, clid, new_cl); @@ -2140,7 +2153,7 @@ static int qdisc_class_dump(struct Qdisc *q, unsigned long cl, return tc_fill_tclass(a->skb, q, cl, NETLINK_CB(a->cb->skb).portid, a->cb->nlh->nlmsg_seq, NLM_F_MULTI, - RTM_NEWTCLASS); + RTM_NEWTCLASS, NULL); } static int tc_dump_tclass_qdisc(struct Qdisc *q, struct sk_buff *skb, -- cgit v1.2.3 From 501543b4fff0ff70bde28a829eb8835081ccef2f Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 13 Jan 2023 10:35:43 +0300 Subject: devlink: remove some unnecessary code This code checks if (attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) twice. Once at the start of the function and then a couple lines later. Delete the second check since that one must be true. Because the second condition is always true, it means the: policer_item = group_item->policer_item; assignment is immediately over-written. Delete that as well. Signed-off-by: Dan Carpenter Reviewed-by: Jiri Pirko Acked-by: Jakub Kicinski Reviewed-by: Ido Schimmel Link: https://lore.kernel.org/r/Y8EJz8oxpMhfiPUb@kili Signed-off-by: Paolo Abeni --- net/devlink/leftover.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 1e23b2da78cc..bf5e0b1c0422 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -8719,6 +8719,7 @@ static int devlink_trap_group_set(struct devlink *devlink, struct netlink_ext_ack *extack = info->extack; const struct devlink_trap_policer *policer; struct nlattr **attrs = info->attrs; + u32 policer_id; int err; if (!attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) @@ -8727,17 +8728,11 @@ static int devlink_trap_group_set(struct devlink *devlink, if (!devlink->ops->trap_group_set) return -EOPNOTSUPP; - policer_item = group_item->policer_item; - if (attrs[DEVLINK_ATTR_TRAP_POLICER_ID]) { - u32 policer_id; - - policer_id = nla_get_u32(attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); - policer_item = devlink_trap_policer_item_lookup(devlink, - policer_id); - if (policer_id && !policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); - return -ENOENT; - } + policer_id = nla_get_u32(attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); + policer_item = devlink_trap_policer_item_lookup(devlink, policer_id); + if (policer_id && !policer_item) { + NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + return -ENOENT; } policer = policer_item ? policer_item->policer : NULL; -- cgit v1.2.3 From a4650da2a2d6150a8ff1ea36fde9f6a26cf5fda3 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Fri, 13 Jan 2023 14:51:59 +0100 Subject: net: fix call location in kfree_skb_list_reason The SKB drop reason uses __builtin_return_address(0) to give the call "location" to trace_kfree_skb() tracepoint skb:kfree_skb. To keep this stable for compilers kfree_skb_reason() is annotated with __fix_address (noinline __noclone) as fixed in commit c205cc7534a9 ("net: skb: prevent the split of kfree_skb_reason() by gcc"). The function kfree_skb_list_reason() invoke kfree_skb_reason(), which cause the __builtin_return_address(0) "location" to report the unexpected address of kfree_skb_list_reason. Example output from 'perf script': kpktgend_0 1337 [000] 81.002597: skb:kfree_skb: skbaddr=0xffff888144824700 protocol=2048 location=kfree_skb_list_reason+0x1e reason: QDISC_DROP Patch creates an __always_inline __kfree_skb_reason() helper call that is called from both kfree_skb_list() and kfree_skb_list_reason(). Suggestions for solutions that shares code better are welcome. As preparation for next patch move __kfree_skb() invocation out of this helper function. Reviewed-by: Saeed Mahameed Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Paolo Abeni --- net/core/skbuff.c | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3a10387f9434..ca2b4d6f9c9a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -930,6 +930,21 @@ void __kfree_skb(struct sk_buff *skb) } EXPORT_SYMBOL(__kfree_skb); +static __always_inline +bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) +{ + if (unlikely(!skb_unref(skb))) + return false; + + DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX); + + if (reason == SKB_CONSUMED) + trace_consume_skb(skb); + else + trace_kfree_skb(skb, __builtin_return_address(0), reason); + return true; +} + /** * kfree_skb_reason - free an sk_buff with special reason * @skb: buffer to free @@ -942,26 +957,19 @@ EXPORT_SYMBOL(__kfree_skb); void __fix_address kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) { - if (unlikely(!skb_unref(skb))) - return; - - DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX); - - if (reason == SKB_CONSUMED) - trace_consume_skb(skb); - else - trace_kfree_skb(skb, __builtin_return_address(0), reason); - __kfree_skb(skb); + if (__kfree_skb_reason(skb, reason)) + __kfree_skb(skb); } EXPORT_SYMBOL(kfree_skb_reason); -void kfree_skb_list_reason(struct sk_buff *segs, - enum skb_drop_reason reason) +void __fix_address +kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) { while (segs) { struct sk_buff *next = segs->next; - kfree_skb_reason(segs, reason); + if (__kfree_skb_reason(segs, reason)) + __kfree_skb(segs); segs = next; } } -- cgit v1.2.3 From eedade12f4cb7284555c4c0314485e9575c70ab7 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Fri, 13 Jan 2023 14:52:04 +0100 Subject: net: kfree_skb_list use kmem_cache_free_bulk The kfree_skb_list function walks SKB (via skb->next) and frees them individually to the SLUB/SLAB allocator (kmem_cache). It is more efficient to bulk free them via the kmem_cache_free_bulk API. This patches create a stack local array with SKBs to bulk free while walking the list. Bulk array size is limited to 16 SKBs to trade off stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" uses objsize 256 bytes usually in an order-1 page 8192 bytes that is 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, for SLUB the optimal bulk free case is 32 objects belonging to same slab, but runtime this isn't likely to occur. The expected gain from using kmem_cache bulk alloc and free API have been assessed via a microbencmark kernel module[1]. The module 'slab_bulk_test01' results at bulk 16 element: kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) More detailed description of benchmarks avail in [2]. [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org V2: rename function to kfree_skb_add_bulk. Reviewed-by: Saeed Mahameed Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Paolo Abeni --- net/core/skbuff.c | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ca2b4d6f9c9a..4e73ab3482b8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -962,16 +962,54 @@ kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) } EXPORT_SYMBOL(kfree_skb_reason); +#define KFREE_SKB_BULK_SIZE 16 + +struct skb_free_array { + unsigned int skb_count; + void *skb_array[KFREE_SKB_BULK_SIZE]; +}; + +static void kfree_skb_add_bulk(struct sk_buff *skb, + struct skb_free_array *sa, + enum skb_drop_reason reason) +{ + /* if SKB is a clone, don't handle this case */ + if (unlikely(skb->fclone != SKB_FCLONE_UNAVAILABLE)) { + __kfree_skb(skb); + return; + } + + skb_release_all(skb, reason); + sa->skb_array[sa->skb_count++] = skb; + + if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) { + kmem_cache_free_bulk(skbuff_head_cache, KFREE_SKB_BULK_SIZE, + sa->skb_array); + sa->skb_count = 0; + } +} + void __fix_address kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) { + struct skb_free_array sa; + + sa.skb_count = 0; + while (segs) { struct sk_buff *next = segs->next; + skb_mark_not_on_list(segs); + if (__kfree_skb_reason(segs, reason)) - __kfree_skb(segs); + kfree_skb_add_bulk(segs, &sa, reason); + segs = next; } + + if (sa.skb_count) + kmem_cache_free_bulk(skbuff_head_cache, sa.skb_count, + sa.skb_array); } EXPORT_SYMBOL(kfree_skb_list_reason); -- cgit v1.2.3 From 21cbd90a6fab7123905386985e3e4a80236b8714 Mon Sep 17 00:00:00 2001 From: Pietro Borrello Date: Sat, 14 Jan 2023 13:11:41 +0000 Subject: inet: fix fast path in __inet_hash_connect() __inet_hash_connect() has a fast path taken if sk_head(&tb->owners) is equal to the sk parameter. sk_head() returns the hlist_entry() with respect to the sk_node field. However entries in the tb->owners list are inserted with respect to the sk_bind_node field with sk_add_bind_node(). Thus the check would never pass and the fast path never execute. This fast path has never been executed or tested as this bug seems to be present since commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), thus remove it to reduce code complexity. Signed-off-by: Pietro Borrello Reviewed-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20230112-inet_hash_connect_bind_head-v3-1-b591fd212b93@diag.uniroma1.it Signed-off-by: Paolo Abeni --- net/ipv4/inet_hashtables.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) (limited to 'net') diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 24a38b56fab9..b97dd10e02a7 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -995,17 +995,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, u32 index; if (port) { - head = &hinfo->bhash[inet_bhashfn(net, port, - hinfo->bhash_size)]; - tb = inet_csk(sk)->icsk_bind_hash; - spin_lock_bh(&head->lock); - if (sk_head(&tb->owners) == sk && !sk->sk_bind_node.next) { - inet_ehash_nolisten(sk, NULL, NULL); - spin_unlock_bh(&head->lock); - return 0; - } - spin_unlock(&head->lock); - /* No definite answer... Walk to established hash table */ + local_bh_disable(); ret = check_established(death_row, sk, port, NULL); local_bh_enable(); return ret; -- cgit v1.2.3 From f71cb8f45d092a1805e9ad474b6c6a17219cede5 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 2 Jan 2023 12:46:10 +0100 Subject: netfilter: conntrack: sctp: use nf log infrastructure for invalid packets The conntrack logging facilities include useful info such as in/out interface names and packet headers. Use those in more places instead of pr_debug calls. Furthermore, several pr_debug calls can be removed, they are useless on production machines due to the sheer volume of log messages. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_proto_sctp.c | 46 +++++++++++---------------------- 1 file changed, 15 insertions(+), 31 deletions(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c index d88b92a8ffca..dbdfcc6cd2aa 100644 --- a/net/netfilter/nf_conntrack_proto_sctp.c +++ b/net/netfilter/nf_conntrack_proto_sctp.c @@ -168,7 +168,8 @@ for ((offset) = (dataoff) + sizeof(struct sctphdr), (count) = 0; \ static int do_basic_checks(struct nf_conn *ct, const struct sk_buff *skb, unsigned int dataoff, - unsigned long *map) + unsigned long *map, + const struct nf_hook_state *state) { u_int32_t offset, count; struct sctp_chunkhdr _sch, *sch; @@ -177,8 +178,6 @@ static int do_basic_checks(struct nf_conn *ct, flag = 0; for_each_sctp_chunk (skb, sch, _sch, offset, dataoff, count) { - pr_debug("Chunk Num: %d Type: %d\n", count, sch->type); - if (sch->type == SCTP_CID_INIT || sch->type == SCTP_CID_INIT_ACK || sch->type == SCTP_CID_SHUTDOWN_COMPLETE) @@ -193,7 +192,9 @@ static int do_basic_checks(struct nf_conn *ct, sch->type == SCTP_CID_COOKIE_ECHO || flag) && count != 0) || !sch->length) { - pr_debug("Basic checks failed\n"); + nf_ct_l4proto_log_invalid(skb, ct, state, + "%s failed. chunk num %d, type %d, len %d flag %d\n", + __func__, count, sch->type, sch->length, flag); return 1; } @@ -201,7 +202,6 @@ static int do_basic_checks(struct nf_conn *ct, set_bit(sch->type, map); } - pr_debug("Basic checks passed\n"); return count == 0; } @@ -211,69 +211,51 @@ static int sctp_new_state(enum ip_conntrack_dir dir, { int i; - pr_debug("Chunk type: %d\n", chunk_type); - switch (chunk_type) { case SCTP_CID_INIT: - pr_debug("SCTP_CID_INIT\n"); i = 0; break; case SCTP_CID_INIT_ACK: - pr_debug("SCTP_CID_INIT_ACK\n"); i = 1; break; case SCTP_CID_ABORT: - pr_debug("SCTP_CID_ABORT\n"); i = 2; break; case SCTP_CID_SHUTDOWN: - pr_debug("SCTP_CID_SHUTDOWN\n"); i = 3; break; case SCTP_CID_SHUTDOWN_ACK: - pr_debug("SCTP_CID_SHUTDOWN_ACK\n"); i = 4; break; case SCTP_CID_ERROR: - pr_debug("SCTP_CID_ERROR\n"); i = 5; break; case SCTP_CID_COOKIE_ECHO: - pr_debug("SCTP_CID_COOKIE_ECHO\n"); i = 6; break; case SCTP_CID_COOKIE_ACK: - pr_debug("SCTP_CID_COOKIE_ACK\n"); i = 7; break; case SCTP_CID_SHUTDOWN_COMPLETE: - pr_debug("SCTP_CID_SHUTDOWN_COMPLETE\n"); i = 8; break; case SCTP_CID_HEARTBEAT: - pr_debug("SCTP_CID_HEARTBEAT"); i = 9; break; case SCTP_CID_HEARTBEAT_ACK: - pr_debug("SCTP_CID_HEARTBEAT_ACK"); i = 10; break; case SCTP_CID_DATA: case SCTP_CID_SACK: - pr_debug("SCTP_CID_DATA/SACK"); i = 11; break; default: /* Other chunks like DATA or SACK do not change the state */ - pr_debug("Unknown chunk type, Will stay in %s\n", - sctp_conntrack_names[cur_state]); + pr_debug("Unknown chunk type %d, Will stay in %s\n", + chunk_type, sctp_conntrack_names[cur_state]); return cur_state; } - pr_debug("dir: %d cur_state: %s chunk_type: %d new_state: %s\n", - dir, sctp_conntrack_names[cur_state], chunk_type, - sctp_conntrack_names[sctp_conntracks[dir][i][cur_state]]); - return sctp_conntracks[dir][i][cur_state]; } @@ -392,7 +374,7 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct, if (sh == NULL) goto out; - if (do_basic_checks(ct, skb, dataoff, map) != 0) + if (do_basic_checks(ct, skb, dataoff, map, state) != 0) goto out; if (!nf_ct_is_confirmed(ct)) { @@ -414,7 +396,9 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct, !test_bit(SCTP_CID_HEARTBEAT, map) && !test_bit(SCTP_CID_HEARTBEAT_ACK, map) && sh->vtag != ct->proto.sctp.vtag[dir]) { - pr_debug("Verification tag check failed\n"); + nf_ct_l4proto_log_invalid(skb, ct, state, + "verification tag check failed %x vs %x for dir %d", + sh->vtag, ct->proto.sctp.vtag[dir], dir); goto out; } } @@ -488,9 +472,10 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct, /* Invalid */ if (new_state == SCTP_CONNTRACK_MAX) { - pr_debug("nf_conntrack_sctp: Invalid dir=%i ctype=%u " - "conntrack=%u\n", - dir, sch->type, old_state); + nf_ct_l4proto_log_invalid(skb, ct, state, + "Invalid, old_state %d, dir %d, type %d", + old_state, dir, sch->type); + goto out_unlock; } @@ -536,7 +521,6 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct, if (old_state == SCTP_CONNTRACK_COOKIE_ECHOED && dir == IP_CT_DIR_REPLY && new_state == SCTP_CONNTRACK_ESTABLISHED) { - pr_debug("Setting assured bit\n"); set_bit(IPS_ASSURED_BIT, &ct->status); nf_conntrack_event_cache(IPCT_ASSURED, ct); } -- cgit v1.2.3 From 50bfbb8957abebc2359220d7c1e4663994461b36 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 2 Jan 2023 12:46:11 +0100 Subject: netfilter: conntrack: remove pr_debug calls Those are all useless or dubious. getorigdst() is called via setsockopt, so return value/errno will already indicate an appropriate error. For other pr_debug calls there are better replacements, such as slab/slub debugging or 'conntrack -E' (ctnetlink events). Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_core.c | 20 ++------------------ net/netfilter/nf_conntrack_proto.c | 20 +++----------------- net/netfilter/nf_conntrack_proto_tcp.c | 9 --------- 3 files changed, 5 insertions(+), 44 deletions(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 496c4920505b..81ece117033a 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -514,7 +514,6 @@ EXPORT_SYMBOL_GPL(nf_ct_get_id); static void clean_from_lists(struct nf_conn *ct) { - pr_debug("clean_from_lists(%p)\n", ct); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode); hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode); @@ -582,7 +581,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct) { struct nf_conn *ct = (struct nf_conn *)nfct; - pr_debug("%s(%p)\n", __func__, ct); WARN_ON(refcount_read(&nfct->use) != 0); if (unlikely(nf_ct_is_template(ct))) { @@ -603,7 +601,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct) if (ct->master) nf_ct_put(ct->master); - pr_debug("%s: returning ct=%p to slab\n", __func__, ct); nf_conntrack_free(ct); } EXPORT_SYMBOL(nf_ct_destroy); @@ -1210,7 +1207,6 @@ __nf_conntrack_confirm(struct sk_buff *skb) goto dying; } - pr_debug("Confirming conntrack %p\n", ct); /* We have to check the DYING flag after unlink to prevent * a race against nf_ct_get_next_corpse() possibly called from * user context, else we insert an already 'dead' hash, blocking @@ -1721,10 +1717,8 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, struct nf_conntrack_zone tmp; struct nf_conntrack_net *cnet; - if (!nf_ct_invert_tuple(&repl_tuple, tuple)) { - pr_debug("Can't invert tuple.\n"); + if (!nf_ct_invert_tuple(&repl_tuple, tuple)) return NULL; - } zone = nf_ct_zone_tmpl(tmpl, skb, &tmp); ct = __nf_conntrack_alloc(net, zone, tuple, &repl_tuple, GFP_ATOMIC, @@ -1764,8 +1758,6 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, spin_lock_bh(&nf_conntrack_expect_lock); exp = nf_ct_find_expectation(net, zone, tuple); if (exp) { - pr_debug("expectation arrives ct=%p exp=%p\n", - ct, exp); /* Welcome, Mr. Bond. We've been expecting you... */ __set_bit(IPS_EXPECTED_BIT, &ct->status); /* exp->master safe, refcnt bumped in nf_ct_find_expectation */ @@ -1829,10 +1821,8 @@ resolve_normal_ct(struct nf_conn *tmpl, if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, state->pf, protonum, state->net, - &tuple)) { - pr_debug("Can't get tuple\n"); + &tuple)) return 0; - } /* look for tuple match */ zone = nf_ct_zone_tmpl(tmpl, skb, &tmp); @@ -1866,13 +1856,10 @@ resolve_normal_ct(struct nf_conn *tmpl, } else { /* Once we've had two way comms, always ESTABLISHED. */ if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { - pr_debug("normal packet for %p\n", ct); ctinfo = IP_CT_ESTABLISHED; } else if (test_bit(IPS_EXPECTED_BIT, &ct->status)) { - pr_debug("related packet for %p\n", ct); ctinfo = IP_CT_RELATED; } else { - pr_debug("new packet for %p\n", ct); ctinfo = IP_CT_NEW; } } @@ -1988,7 +1975,6 @@ nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state) /* rcu_read_lock()ed by nf_hook_thresh */ dataoff = get_l4proto(skb, skb_network_offset(skb), state->pf, &protonum); if (dataoff <= 0) { - pr_debug("not prepared to track yet or error occurred\n"); NF_CT_STAT_INC_ATOMIC(state->net, invalid); ret = NF_ACCEPT; goto out; @@ -2027,7 +2013,6 @@ repeat: if (ret <= 0) { /* Invalid: inverse of the return code tells * the netfilter core what to do */ - pr_debug("nf_conntrack_in: Can't track with proto module\n"); nf_ct_put(ct); skb->_nfct = 0; /* Special case: TCP tracker reports an attempt to reopen a @@ -2066,7 +2051,6 @@ void nf_conntrack_alter_reply(struct nf_conn *ct, /* Should be unconfirmed, so not in hash table yet */ WARN_ON(nf_ct_is_confirmed(ct)); - pr_debug("Altering reply tuple of %p to ", ct); nf_ct_dump_tuple(newreply); ct->tuplehash[IP_CT_DIR_REPLY].tuple = *newreply; diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c index ccef340be575..c928ff63b10e 100644 --- a/net/netfilter/nf_conntrack_proto.c +++ b/net/netfilter/nf_conntrack_proto.c @@ -284,16 +284,11 @@ getorigdst(struct sock *sk, int optval, void __user *user, int *len) /* We only do TCP and SCTP at the moment: is there a better way? */ if (tuple.dst.protonum != IPPROTO_TCP && - tuple.dst.protonum != IPPROTO_SCTP) { - pr_debug("SO_ORIGINAL_DST: Not a TCP/SCTP socket\n"); + tuple.dst.protonum != IPPROTO_SCTP) return -ENOPROTOOPT; - } - if ((unsigned int)*len < sizeof(struct sockaddr_in)) { - pr_debug("SO_ORIGINAL_DST: len %d not %zu\n", - *len, sizeof(struct sockaddr_in)); + if ((unsigned int)*len < sizeof(struct sockaddr_in)) return -EINVAL; - } h = nf_conntrack_find_get(sock_net(sk), &nf_ct_zone_dflt, &tuple); if (h) { @@ -307,17 +302,12 @@ getorigdst(struct sock *sk, int optval, void __user *user, int *len) .tuple.dst.u3.ip; memset(sin.sin_zero, 0, sizeof(sin.sin_zero)); - pr_debug("SO_ORIGINAL_DST: %pI4 %u\n", - &sin.sin_addr.s_addr, ntohs(sin.sin_port)); nf_ct_put(ct); if (copy_to_user(user, &sin, sizeof(sin)) != 0) return -EFAULT; else return 0; } - pr_debug("SO_ORIGINAL_DST: Can't find %pI4/%u-%pI4/%u.\n", - &tuple.src.u3.ip, ntohs(tuple.src.u.tcp.port), - &tuple.dst.u3.ip, ntohs(tuple.dst.u.tcp.port)); return -ENOENT; } @@ -360,12 +350,8 @@ ipv6_getorigdst(struct sock *sk, int optval, void __user *user, int *len) return -EINVAL; h = nf_conntrack_find_get(sock_net(sk), &nf_ct_zone_dflt, &tuple); - if (!h) { - pr_debug("IP6T_SO_ORIGINAL_DST: Can't find %pI6c/%u-%pI6c/%u.\n", - &tuple.src.u3.ip6, ntohs(tuple.src.u.tcp.port), - &tuple.dst.u3.ip6, ntohs(tuple.dst.u.tcp.port)); + if (!h) return -ENOENT; - } ct = nf_ct_tuplehash_to_ctrack(h); diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 656631083177..21a3741162ba 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -930,7 +930,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, { struct net *net = nf_ct_net(ct); struct nf_tcp_net *tn = nf_tcp_pernet(net); - struct nf_conntrack_tuple *tuple; enum tcp_conntrack new_state, old_state; unsigned int index, *timeouts; enum nf_ct_tcp_action res; @@ -954,7 +953,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, dir = CTINFO2DIR(ctinfo); index = get_conntrack_index(th); new_state = tcp_conntracks[dir][index][old_state]; - tuple = &ct->tuplehash[dir].tuple; switch (new_state) { case TCP_CONNTRACK_SYN_SENT: @@ -1217,13 +1215,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, ct->proto.tcp.last_index = index; ct->proto.tcp.last_dir = dir; - pr_debug("tcp_conntracks: "); - nf_ct_dump_tuple(tuple); - pr_debug("syn=%i ack=%i fin=%i rst=%i old=%i new=%i\n", - (th->syn ? 1 : 0), (th->ack ? 1 : 0), - (th->fin ? 1 : 0), (th->rst ? 1 : 0), - old_state, new_state); - ct->proto.tcp.state = new_state; if (old_state != new_state && new_state == TCP_CONNTRACK_FIN_WAIT) -- cgit v1.2.3 From 4883ec512c1715fc827557f0e2bfce76c6530757 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 2 Jan 2023 12:46:12 +0100 Subject: netfilter: conntrack: avoid reload of ct->status Compiler can't merge the two test_bit() calls, so load ct->status once and use non-atomic accesses. This is fine because IPS_EXPECTED or NAT_CLASH are either set at ct creation time or not at all, but compiler can't know that. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_core.c | 9 +++++---- net/netfilter/nf_conntrack_proto_udp.c | 10 ++++++---- 2 files changed, 11 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 81ece117033a..9e12cade4e0f 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1854,14 +1854,15 @@ resolve_normal_ct(struct nf_conn *tmpl, if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) { ctinfo = IP_CT_ESTABLISHED_REPLY; } else { + unsigned long status = READ_ONCE(ct->status); + /* Once we've had two way comms, always ESTABLISHED. */ - if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { + if (likely(status & IPS_SEEN_REPLY)) ctinfo = IP_CT_ESTABLISHED; - } else if (test_bit(IPS_EXPECTED_BIT, &ct->status)) { + else if (status & IPS_EXPECTED) ctinfo = IP_CT_RELATED; - } else { + else ctinfo = IP_CT_NEW; - } } nf_ct_set(skb, ct, ctinfo); return 0; diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c index 3b516cffc779..6b9206635b24 100644 --- a/net/netfilter/nf_conntrack_proto_udp.c +++ b/net/netfilter/nf_conntrack_proto_udp.c @@ -88,6 +88,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, const struct nf_hook_state *state) { unsigned int *timeouts; + unsigned long status; if (udp_error(skb, dataoff, state)) return -NF_ACCEPT; @@ -96,26 +97,27 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, if (!timeouts) timeouts = udp_get_timeouts(nf_ct_net(ct)); - if (!nf_ct_is_confirmed(ct)) + status = READ_ONCE(ct->status); + if ((status & IPS_CONFIRMED) == 0) ct->proto.udp.stream_ts = 2 * HZ + jiffies; /* If we've seen traffic both ways, this is some kind of UDP * stream. Set Assured. */ - if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { + if (status & IPS_SEEN_REPLY_BIT) { unsigned long extra = timeouts[UDP_CT_UNREPLIED]; bool stream = false; /* Still active after two seconds? Extend timeout. */ if (time_after(jiffies, ct->proto.udp.stream_ts)) { extra = timeouts[UDP_CT_REPLIED]; - stream = true; + stream = (status & IPS_ASSURED) == 0; } nf_ct_refresh_acct(ct, ctinfo, skb, extra); /* never set ASSURED for IPS_NAT_CLASH, they time out soon */ - if (unlikely((ct->status & IPS_NAT_CLASH))) + if (unlikely((status & IPS_NAT_CLASH))) return NF_ACCEPT; /* Also, more likely to be important, and not a probe */ -- cgit v1.2.3 From 2a2fa2efc65fb4d96bdcde819473943fb90ec28b Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 16 Dec 2022 02:46:28 +0100 Subject: netfilter: conntrack: move rcu read lock to nf_conntrack_find_get Move rcu_read_lock/unlock to nf_conntrack_find_get(), this avoids nested rcu_read_lock call from resolve_normal_ct(). Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_core.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 9e12cade4e0f..c00858344f02 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -783,8 +783,6 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, struct nf_conntrack_tuple_hash *h; struct nf_conn *ct; - rcu_read_lock(); - h = ____nf_conntrack_find(net, zone, tuple, hash); if (h) { /* We have a candidate that matches the tuple we're interested @@ -796,7 +794,7 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, smp_acquire__after_ctrl_dep(); if (likely(nf_ct_key_equal(h, tuple, zone, net))) - goto found; + return h; /* TYPESAFE_BY_RCU recycled the candidate */ nf_ct_put(ct); @@ -804,8 +802,6 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, h = NULL; } -found: - rcu_read_unlock(); return h; } @@ -817,16 +813,21 @@ nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone, unsigned int rid, zone_id = nf_ct_zone_id(zone, IP_CT_DIR_ORIGINAL); struct nf_conntrack_tuple_hash *thash; + rcu_read_lock(); + thash = __nf_conntrack_find_get(net, zone, tuple, hash_conntrack_raw(tuple, zone_id, net)); if (thash) - return thash; + goto out_unlock; rid = nf_ct_zone_id(zone, IP_CT_DIR_REPLY); if (rid != zone_id) - return __nf_conntrack_find_get(net, zone, tuple, - hash_conntrack_raw(tuple, rid, net)); + thash = __nf_conntrack_find_get(net, zone, tuple, + hash_conntrack_raw(tuple, rid, net)); + +out_unlock: + rcu_read_unlock(); return thash; } EXPORT_SYMBOL_GPL(nf_conntrack_find_get); -- cgit v1.2.3 From 9db5d918e2c07fa09fab18bc7addf3408da0c76f Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 5 Jan 2023 20:22:02 +0100 Subject: netfilter: ip_tables: remove clusterip target Marked as 'to be removed soon' since kernel 4.1 (2015). Functionality was superseded by the 'cluster' match, added in kernel 2.6.30 (2009). clusterip_tg_check still has races that can give proc_dir_entry 'ipt_CLUSTERIP/10.1.1.2' already registered followed by a WARN splat. Remove it instead of trying to fix this up again. clusterip uapi header is left as-is for now. Signed-off-by: Florian Westphal --- net/ipv4/netfilter/Kconfig | 14 - net/ipv4/netfilter/Makefile | 1 - net/ipv4/netfilter/ipt_CLUSTERIP.c | 929 ------------------------------------- 3 files changed, 944 deletions(-) delete mode 100644 net/ipv4/netfilter/ipt_CLUSTERIP.c (limited to 'net') diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index aab384126f61..f71a7e9a7de6 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -259,20 +259,6 @@ config IP_NF_MANGLE To compile it as a module, choose M here. If unsure, say N. -config IP_NF_TARGET_CLUSTERIP - tristate "CLUSTERIP target support" - depends on IP_NF_MANGLE - depends on NF_CONNTRACK - depends on NETFILTER_ADVANCED - select NF_CONNTRACK_MARK - select NETFILTER_FAMILY_ARP - help - The CLUSTERIP target allows you to build load-balancing clusters of - network servers without having a dedicated load-balancing - router/server/switch. - - To compile it as a module, choose M here. If unsure, say N. - config IP_NF_TARGET_ECN tristate "ECN target support" depends on IP_NF_MANGLE diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index 93bad1184251..5a26f9de1ab9 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -39,7 +39,6 @@ obj-$(CONFIG_IP_NF_MATCH_AH) += ipt_ah.o obj-$(CONFIG_IP_NF_MATCH_RPFILTER) += ipt_rpfilter.o # targets -obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o obj-$(CONFIG_IP_NF_TARGET_ECN) += ipt_ECN.o obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o obj-$(CONFIG_IP_NF_TARGET_SYNPROXY) += ipt_SYNPROXY.o diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c deleted file mode 100644 index b3cc416ed292..000000000000 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ /dev/null @@ -1,929 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* Cluster IP hashmark target - * (C) 2003-2004 by Harald Welte - * based on ideas of Fabio Olive Leite - * - * Development of this code funded by SuSE Linux AG, https://www.suse.com/ - */ -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define CLUSTERIP_VERSION "0.8" - -MODULE_LICENSE("GPL"); -MODULE_AUTHOR("Harald Welte "); -MODULE_DESCRIPTION("Xtables: CLUSTERIP target"); - -struct clusterip_config { - struct list_head list; /* list of all configs */ - refcount_t refcount; /* reference count */ - refcount_t entries; /* number of entries/rules - * referencing us */ - - __be32 clusterip; /* the IP address */ - u_int8_t clustermac[ETH_ALEN]; /* the MAC address */ - int ifindex; /* device ifindex */ - u_int16_t num_total_nodes; /* total number of nodes */ - unsigned long local_nodes; /* node number array */ - -#ifdef CONFIG_PROC_FS - struct proc_dir_entry *pde; /* proc dir entry */ -#endif - enum clusterip_hashmode hash_mode; /* which hashing mode */ - u_int32_t hash_initval; /* hash initialization */ - struct rcu_head rcu; /* for call_rcu */ - struct net *net; /* netns for pernet list */ - char ifname[IFNAMSIZ]; /* device ifname */ -}; - -#ifdef CONFIG_PROC_FS -static const struct proc_ops clusterip_proc_ops; -#endif - -struct clusterip_net { - struct list_head configs; - /* lock protects the configs list */ - spinlock_t lock; - - bool clusterip_deprecated_warning; -#ifdef CONFIG_PROC_FS - struct proc_dir_entry *procdir; - /* mutex protects the config->pde*/ - struct mutex mutex; -#endif - unsigned int hook_users; -}; - -static unsigned int clusterip_arp_mangle(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); - -static const struct nf_hook_ops cip_arp_ops = { - .hook = clusterip_arp_mangle, - .pf = NFPROTO_ARP, - .hooknum = NF_ARP_OUT, - .priority = -1 -}; - -static unsigned int clusterip_net_id __read_mostly; -static inline struct clusterip_net *clusterip_pernet(struct net *net) -{ - return net_generic(net, clusterip_net_id); -} - -static inline void -clusterip_config_get(struct clusterip_config *c) -{ - refcount_inc(&c->refcount); -} - -static void clusterip_config_rcu_free(struct rcu_head *head) -{ - struct clusterip_config *config; - struct net_device *dev; - - config = container_of(head, struct clusterip_config, rcu); - dev = dev_get_by_name(config->net, config->ifname); - if (dev) { - dev_mc_del(dev, config->clustermac); - dev_put(dev); - } - kfree(config); -} - -static inline void -clusterip_config_put(struct clusterip_config *c) -{ - if (refcount_dec_and_test(&c->refcount)) - call_rcu(&c->rcu, clusterip_config_rcu_free); -} - -/* decrease the count of entries using/referencing this config. If last - * entry(rule) is removed, remove the config from lists, but don't free it - * yet, since proc-files could still be holding references */ -static inline void -clusterip_config_entry_put(struct clusterip_config *c) -{ - struct clusterip_net *cn = clusterip_pernet(c->net); - - local_bh_disable(); - if (refcount_dec_and_lock(&c->entries, &cn->lock)) { - list_del_rcu(&c->list); - spin_unlock(&cn->lock); - local_bh_enable(); - /* In case anyone still accesses the file, the open/close - * functions are also incrementing the refcount on their own, - * so it's safe to remove the entry even if it's in use. */ -#ifdef CONFIG_PROC_FS - mutex_lock(&cn->mutex); - if (cn->procdir) - proc_remove(c->pde); - mutex_unlock(&cn->mutex); -#endif - return; - } - local_bh_enable(); -} - -static struct clusterip_config * -__clusterip_config_find(struct net *net, __be32 clusterip) -{ - struct clusterip_config *c; - struct clusterip_net *cn = clusterip_pernet(net); - - list_for_each_entry_rcu(c, &cn->configs, list) { - if (c->clusterip == clusterip) - return c; - } - - return NULL; -} - -static inline struct clusterip_config * -clusterip_config_find_get(struct net *net, __be32 clusterip, int entry) -{ - struct clusterip_config *c; - - rcu_read_lock_bh(); - c = __clusterip_config_find(net, clusterip); - if (c) { -#ifdef CONFIG_PROC_FS - if (!c->pde) - c = NULL; - else -#endif - if (unlikely(!refcount_inc_not_zero(&c->refcount))) - c = NULL; - else if (entry) { - if (unlikely(!refcount_inc_not_zero(&c->entries))) { - clusterip_config_put(c); - c = NULL; - } - } - } - rcu_read_unlock_bh(); - - return c; -} - -static void -clusterip_config_init_nodelist(struct clusterip_config *c, - const struct ipt_clusterip_tgt_info *i) -{ - int n; - - for (n = 0; n < i->num_local_nodes; n++) - set_bit(i->local_nodes[n] - 1, &c->local_nodes); -} - -static int -clusterip_netdev_event(struct notifier_block *this, unsigned long event, - void *ptr) -{ - struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct net *net = dev_net(dev); - struct clusterip_net *cn = clusterip_pernet(net); - struct clusterip_config *c; - - spin_lock_bh(&cn->lock); - list_for_each_entry_rcu(c, &cn->configs, list) { - switch (event) { - case NETDEV_REGISTER: - if (!strcmp(dev->name, c->ifname)) { - c->ifindex = dev->ifindex; - dev_mc_add(dev, c->clustermac); - } - break; - case NETDEV_UNREGISTER: - if (dev->ifindex == c->ifindex) { - dev_mc_del(dev, c->clustermac); - c->ifindex = -1; - } - break; - case NETDEV_CHANGENAME: - if (!strcmp(dev->name, c->ifname)) { - c->ifindex = dev->ifindex; - dev_mc_add(dev, c->clustermac); - } else if (dev->ifindex == c->ifindex) { - dev_mc_del(dev, c->clustermac); - c->ifindex = -1; - } - break; - } - } - spin_unlock_bh(&cn->lock); - - return NOTIFY_DONE; -} - -static struct clusterip_config * -clusterip_config_init(struct net *net, const struct ipt_clusterip_tgt_info *i, - __be32 ip, const char *iniface) -{ - struct clusterip_net *cn = clusterip_pernet(net); - struct clusterip_config *c; - struct net_device *dev; - int err; - - if (iniface[0] == '\0') { - pr_info("Please specify an interface name\n"); - return ERR_PTR(-EINVAL); - } - - c = kzalloc(sizeof(*c), GFP_ATOMIC); - if (!c) - return ERR_PTR(-ENOMEM); - - dev = dev_get_by_name(net, iniface); - if (!dev) { - pr_info("no such interface %s\n", iniface); - kfree(c); - return ERR_PTR(-ENOENT); - } - c->ifindex = dev->ifindex; - strcpy(c->ifname, dev->name); - memcpy(&c->clustermac, &i->clustermac, ETH_ALEN); - dev_mc_add(dev, c->clustermac); - dev_put(dev); - - c->clusterip = ip; - c->num_total_nodes = i->num_total_nodes; - clusterip_config_init_nodelist(c, i); - c->hash_mode = i->hash_mode; - c->hash_initval = i->hash_initval; - c->net = net; - refcount_set(&c->refcount, 1); - - spin_lock_bh(&cn->lock); - if (__clusterip_config_find(net, ip)) { - err = -EBUSY; - goto out_config_put; - } - - list_add_rcu(&c->list, &cn->configs); - spin_unlock_bh(&cn->lock); - -#ifdef CONFIG_PROC_FS - { - char buffer[16]; - - /* create proc dir entry */ - sprintf(buffer, "%pI4", &ip); - mutex_lock(&cn->mutex); - c->pde = proc_create_data(buffer, 0600, - cn->procdir, - &clusterip_proc_ops, c); - mutex_unlock(&cn->mutex); - if (!c->pde) { - err = -ENOMEM; - goto err; - } - } -#endif - - refcount_set(&c->entries, 1); - return c; - -#ifdef CONFIG_PROC_FS -err: -#endif - spin_lock_bh(&cn->lock); - list_del_rcu(&c->list); -out_config_put: - spin_unlock_bh(&cn->lock); - clusterip_config_put(c); - return ERR_PTR(err); -} - -#ifdef CONFIG_PROC_FS -static int -clusterip_add_node(struct clusterip_config *c, u_int16_t nodenum) -{ - - if (nodenum == 0 || - nodenum > c->num_total_nodes) - return 1; - - /* check if we already have this number in our bitfield */ - if (test_and_set_bit(nodenum - 1, &c->local_nodes)) - return 1; - - return 0; -} - -static bool -clusterip_del_node(struct clusterip_config *c, u_int16_t nodenum) -{ - if (nodenum == 0 || - nodenum > c->num_total_nodes) - return true; - - if (test_and_clear_bit(nodenum - 1, &c->local_nodes)) - return false; - - return true; -} -#endif - -static inline u_int32_t -clusterip_hashfn(const struct sk_buff *skb, - const struct clusterip_config *config) -{ - const struct iphdr *iph = ip_hdr(skb); - unsigned long hashval; - u_int16_t sport = 0, dport = 0; - int poff; - - poff = proto_ports_offset(iph->protocol); - if (poff >= 0) { - const u_int16_t *ports; - u16 _ports[2]; - - ports = skb_header_pointer(skb, iph->ihl * 4 + poff, 4, _ports); - if (ports) { - sport = ports[0]; - dport = ports[1]; - } - } else { - net_info_ratelimited("unknown protocol %u\n", iph->protocol); - } - - switch (config->hash_mode) { - case CLUSTERIP_HASHMODE_SIP: - hashval = jhash_1word(ntohl(iph->saddr), - config->hash_initval); - break; - case CLUSTERIP_HASHMODE_SIP_SPT: - hashval = jhash_2words(ntohl(iph->saddr), sport, - config->hash_initval); - break; - case CLUSTERIP_HASHMODE_SIP_SPT_DPT: - hashval = jhash_3words(ntohl(iph->saddr), sport, dport, - config->hash_initval); - break; - default: - /* to make gcc happy */ - hashval = 0; - /* This cannot happen, unless the check function wasn't called - * at rule load time */ - pr_info("unknown mode %u\n", config->hash_mode); - BUG(); - break; - } - - /* node numbers are 1..n, not 0..n */ - return reciprocal_scale(hashval, config->num_total_nodes) + 1; -} - -static inline int -clusterip_responsible(const struct clusterip_config *config, u_int32_t hash) -{ - return test_bit(hash - 1, &config->local_nodes); -} - -/*********************************************************************** - * IPTABLES TARGET - ***********************************************************************/ - -static unsigned int -clusterip_tg(struct sk_buff *skb, const struct xt_action_param *par) -{ - const struct ipt_clusterip_tgt_info *cipinfo = par->targinfo; - struct nf_conn *ct; - enum ip_conntrack_info ctinfo; - u_int32_t hash; - - /* don't need to clusterip_config_get() here, since refcount - * is only decremented by destroy() - and ip_tables guarantees - * that the ->target() function isn't called after ->destroy() */ - - ct = nf_ct_get(skb, &ctinfo); - if (ct == NULL) - return NF_DROP; - - /* special case: ICMP error handling. conntrack distinguishes between - * error messages (RELATED) and information requests (see below) */ - if (ip_hdr(skb)->protocol == IPPROTO_ICMP && - (ctinfo == IP_CT_RELATED || - ctinfo == IP_CT_RELATED_REPLY)) - return XT_CONTINUE; - - /* nf_conntrack_proto_icmp guarantees us that we only have ICMP_ECHO, - * TIMESTAMP, INFO_REQUEST or ICMP_ADDRESS type icmp packets from here - * on, which all have an ID field [relevant for hashing]. */ - - hash = clusterip_hashfn(skb, cipinfo->config); - - switch (ctinfo) { - case IP_CT_NEW: - WRITE_ONCE(ct->mark, hash); - break; - case IP_CT_RELATED: - case IP_CT_RELATED_REPLY: - /* FIXME: we don't handle expectations at the moment. - * They can arrive on a different node than - * the master connection (e.g. FTP passive mode) */ - case IP_CT_ESTABLISHED: - case IP_CT_ESTABLISHED_REPLY: - break; - default: /* Prevent gcc warnings */ - break; - } - -#ifdef DEBUG - nf_ct_dump_tuple_ip(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); -#endif - pr_debug("hash=%u ct_hash=%u ", hash, READ_ONCE(ct->mark)); - if (!clusterip_responsible(cipinfo->config, hash)) { - pr_debug("not responsible\n"); - return NF_DROP; - } - pr_debug("responsible\n"); - - /* despite being received via linklayer multicast, this is - * actually a unicast IP packet. TCP doesn't like PACKET_MULTICAST */ - skb->pkt_type = PACKET_HOST; - - return XT_CONTINUE; -} - -static int clusterip_tg_check(const struct xt_tgchk_param *par) -{ - struct ipt_clusterip_tgt_info *cipinfo = par->targinfo; - struct clusterip_net *cn = clusterip_pernet(par->net); - const struct ipt_entry *e = par->entryinfo; - struct clusterip_config *config; - int ret, i; - - if (par->nft_compat) { - pr_err("cannot use CLUSTERIP target from nftables compat\n"); - return -EOPNOTSUPP; - } - - if (cn->hook_users == UINT_MAX) - return -EOVERFLOW; - - if (cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP && - cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT && - cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT_DPT) { - pr_info("unknown mode %u\n", cipinfo->hash_mode); - return -EINVAL; - - } - if (e->ip.dmsk.s_addr != htonl(0xffffffff) || - e->ip.dst.s_addr == 0) { - pr_info("Please specify destination IP\n"); - return -EINVAL; - } - if (cipinfo->num_local_nodes > ARRAY_SIZE(cipinfo->local_nodes)) { - pr_info("bad num_local_nodes %u\n", cipinfo->num_local_nodes); - return -EINVAL; - } - for (i = 0; i < cipinfo->num_local_nodes; i++) { - if (cipinfo->local_nodes[i] - 1 >= - sizeof(config->local_nodes) * 8) { - pr_info("bad local_nodes[%d] %u\n", - i, cipinfo->local_nodes[i]); - return -EINVAL; - } - } - - config = clusterip_config_find_get(par->net, e->ip.dst.s_addr, 1); - if (!config) { - if (!(cipinfo->flags & CLUSTERIP_FLAG_NEW)) { - pr_info("no config found for %pI4, need 'new'\n", - &e->ip.dst.s_addr); - return -EINVAL; - } else { - config = clusterip_config_init(par->net, cipinfo, - e->ip.dst.s_addr, - e->ip.iniface); - if (IS_ERR(config)) - return PTR_ERR(config); - } - } else if (memcmp(&config->clustermac, &cipinfo->clustermac, ETH_ALEN)) { - clusterip_config_entry_put(config); - clusterip_config_put(config); - return -EINVAL; - } - - ret = nf_ct_netns_get(par->net, par->family); - if (ret < 0) { - pr_info("cannot load conntrack support for proto=%u\n", - par->family); - clusterip_config_entry_put(config); - clusterip_config_put(config); - return ret; - } - - if (cn->hook_users == 0) { - ret = nf_register_net_hook(par->net, &cip_arp_ops); - - if (ret < 0) { - clusterip_config_entry_put(config); - clusterip_config_put(config); - nf_ct_netns_put(par->net, par->family); - return ret; - } - } - - cn->hook_users++; - - if (!cn->clusterip_deprecated_warning) { - pr_info("ipt_CLUSTERIP is deprecated and it will removed soon, " - "use xt_cluster instead\n"); - cn->clusterip_deprecated_warning = true; - } - - cipinfo->config = config; - return ret; -} - -/* drop reference count of cluster config when rule is deleted */ -static void clusterip_tg_destroy(const struct xt_tgdtor_param *par) -{ - const struct ipt_clusterip_tgt_info *cipinfo = par->targinfo; - struct clusterip_net *cn = clusterip_pernet(par->net); - - /* if no more entries are referencing the config, remove it - * from the list and destroy the proc entry */ - clusterip_config_entry_put(cipinfo->config); - - clusterip_config_put(cipinfo->config); - - nf_ct_netns_put(par->net, par->family); - cn->hook_users--; - - if (cn->hook_users == 0) - nf_unregister_net_hook(par->net, &cip_arp_ops); -} - -#ifdef CONFIG_NETFILTER_XTABLES_COMPAT -struct compat_ipt_clusterip_tgt_info -{ - u_int32_t flags; - u_int8_t clustermac[6]; - u_int16_t num_total_nodes; - u_int16_t num_local_nodes; - u_int16_t local_nodes[CLUSTERIP_MAX_NODES]; - u_int32_t hash_mode; - u_int32_t hash_initval; - compat_uptr_t config; -}; -#endif /* CONFIG_NETFILTER_XTABLES_COMPAT */ - -static struct xt_target clusterip_tg_reg __read_mostly = { - .name = "CLUSTERIP", - .family = NFPROTO_IPV4, - .target = clusterip_tg, - .checkentry = clusterip_tg_check, - .destroy = clusterip_tg_destroy, - .targetsize = sizeof(struct ipt_clusterip_tgt_info), - .usersize = offsetof(struct ipt_clusterip_tgt_info, config), -#ifdef CONFIG_NETFILTER_XTABLES_COMPAT - .compatsize = sizeof(struct compat_ipt_clusterip_tgt_info), -#endif /* CONFIG_NETFILTER_XTABLES_COMPAT */ - .me = THIS_MODULE -}; - - -/*********************************************************************** - * ARP MANGLING CODE - ***********************************************************************/ - -/* hardcoded for 48bit ethernet and 32bit ipv4 addresses */ -struct arp_payload { - u_int8_t src_hw[ETH_ALEN]; - __be32 src_ip; - u_int8_t dst_hw[ETH_ALEN]; - __be32 dst_ip; -} __packed; - -#ifdef DEBUG -static void arp_print(struct arp_payload *payload) -{ -#define HBUFFERLEN 30 - char hbuffer[HBUFFERLEN]; - int j, k; - - for (k = 0, j = 0; k < HBUFFERLEN - 3 && j < ETH_ALEN; j++) { - hbuffer[k++] = hex_asc_hi(payload->src_hw[j]); - hbuffer[k++] = hex_asc_lo(payload->src_hw[j]); - hbuffer[k++] = ':'; - } - hbuffer[--k] = '\0'; - - pr_debug("src %pI4@%s, dst %pI4\n", - &payload->src_ip, hbuffer, &payload->dst_ip); -} -#endif - -static unsigned int -clusterip_arp_mangle(void *priv, struct sk_buff *skb, - const struct nf_hook_state *state) -{ - struct arphdr *arp = arp_hdr(skb); - struct arp_payload *payload; - struct clusterip_config *c; - struct net *net = state->net; - - /* we don't care about non-ethernet and non-ipv4 ARP */ - if (arp->ar_hrd != htons(ARPHRD_ETHER) || - arp->ar_pro != htons(ETH_P_IP) || - arp->ar_pln != 4 || arp->ar_hln != ETH_ALEN) - return NF_ACCEPT; - - /* we only want to mangle arp requests and replies */ - if (arp->ar_op != htons(ARPOP_REPLY) && - arp->ar_op != htons(ARPOP_REQUEST)) - return NF_ACCEPT; - - payload = (void *)(arp+1); - - /* if there is no clusterip configuration for the arp reply's - * source ip, we don't want to mangle it */ - c = clusterip_config_find_get(net, payload->src_ip, 0); - if (!c) - return NF_ACCEPT; - - /* normally the linux kernel always replies to arp queries of - * addresses on different interfacs. However, in the CLUSTERIP case - * this wouldn't work, since we didn't subscribe the mcast group on - * other interfaces */ - if (c->ifindex != state->out->ifindex) { - pr_debug("not mangling arp reply on different interface: cip'%d'-skb'%d'\n", - c->ifindex, state->out->ifindex); - clusterip_config_put(c); - return NF_ACCEPT; - } - - /* mangle reply hardware address */ - memcpy(payload->src_hw, c->clustermac, arp->ar_hln); - -#ifdef DEBUG - pr_debug("mangled arp reply: "); - arp_print(payload); -#endif - - clusterip_config_put(c); - - return NF_ACCEPT; -} - -/*********************************************************************** - * PROC DIR HANDLING - ***********************************************************************/ - -#ifdef CONFIG_PROC_FS - -struct clusterip_seq_position { - unsigned int pos; /* position */ - unsigned int weight; /* number of bits set == size */ - unsigned int bit; /* current bit */ - unsigned long val; /* current value */ -}; - -static void *clusterip_seq_start(struct seq_file *s, loff_t *pos) -{ - struct clusterip_config *c = s->private; - unsigned int weight; - u_int32_t local_nodes; - struct clusterip_seq_position *idx; - - /* FIXME: possible race */ - local_nodes = c->local_nodes; - weight = hweight32(local_nodes); - if (*pos >= weight) - return NULL; - - idx = kmalloc(sizeof(struct clusterip_seq_position), GFP_KERNEL); - if (!idx) - return ERR_PTR(-ENOMEM); - - idx->pos = *pos; - idx->weight = weight; - idx->bit = ffs(local_nodes); - idx->val = local_nodes; - clear_bit(idx->bit - 1, &idx->val); - - return idx; -} - -static void *clusterip_seq_next(struct seq_file *s, void *v, loff_t *pos) -{ - struct clusterip_seq_position *idx = v; - - *pos = ++idx->pos; - if (*pos >= idx->weight) { - kfree(v); - return NULL; - } - idx->bit = ffs(idx->val); - clear_bit(idx->bit - 1, &idx->val); - return idx; -} - -static void clusterip_seq_stop(struct seq_file *s, void *v) -{ - if (!IS_ERR(v)) - kfree(v); -} - -static int clusterip_seq_show(struct seq_file *s, void *v) -{ - struct clusterip_seq_position *idx = v; - - if (idx->pos != 0) - seq_putc(s, ','); - - seq_printf(s, "%u", idx->bit); - - if (idx->pos == idx->weight - 1) - seq_putc(s, '\n'); - - return 0; -} - -static const struct seq_operations clusterip_seq_ops = { - .start = clusterip_seq_start, - .next = clusterip_seq_next, - .stop = clusterip_seq_stop, - .show = clusterip_seq_show, -}; - -static int clusterip_proc_open(struct inode *inode, struct file *file) -{ - int ret = seq_open(file, &clusterip_seq_ops); - - if (!ret) { - struct seq_file *sf = file->private_data; - struct clusterip_config *c = pde_data(inode); - - sf->private = c; - - clusterip_config_get(c); - } - - return ret; -} - -static int clusterip_proc_release(struct inode *inode, struct file *file) -{ - struct clusterip_config *c = pde_data(inode); - int ret; - - ret = seq_release(inode, file); - - if (!ret) - clusterip_config_put(c); - - return ret; -} - -static ssize_t clusterip_proc_write(struct file *file, const char __user *input, - size_t size, loff_t *ofs) -{ - struct clusterip_config *c = pde_data(file_inode(file)); -#define PROC_WRITELEN 10 - char buffer[PROC_WRITELEN+1]; - unsigned long nodenum; - int rc; - - if (size > PROC_WRITELEN) - return -EIO; - if (copy_from_user(buffer, input, size)) - return -EFAULT; - buffer[size] = 0; - - if (*buffer == '+') { - rc = kstrtoul(buffer+1, 10, &nodenum); - if (rc) - return rc; - if (clusterip_add_node(c, nodenum)) - return -ENOMEM; - } else if (*buffer == '-') { - rc = kstrtoul(buffer+1, 10, &nodenum); - if (rc) - return rc; - if (clusterip_del_node(c, nodenum)) - return -ENOENT; - } else - return -EIO; - - return size; -} - -static const struct proc_ops clusterip_proc_ops = { - .proc_open = clusterip_proc_open, - .proc_read = seq_read, - .proc_write = clusterip_proc_write, - .proc_lseek = seq_lseek, - .proc_release = clusterip_proc_release, -}; - -#endif /* CONFIG_PROC_FS */ - -static int clusterip_net_init(struct net *net) -{ - struct clusterip_net *cn = clusterip_pernet(net); - - INIT_LIST_HEAD(&cn->configs); - - spin_lock_init(&cn->lock); - -#ifdef CONFIG_PROC_FS - cn->procdir = proc_mkdir("ipt_CLUSTERIP", net->proc_net); - if (!cn->procdir) { - pr_err("Unable to proc dir entry\n"); - return -ENOMEM; - } - mutex_init(&cn->mutex); -#endif /* CONFIG_PROC_FS */ - - return 0; -} - -static void clusterip_net_exit(struct net *net) -{ -#ifdef CONFIG_PROC_FS - struct clusterip_net *cn = clusterip_pernet(net); - - mutex_lock(&cn->mutex); - proc_remove(cn->procdir); - cn->procdir = NULL; - mutex_unlock(&cn->mutex); -#endif -} - -static struct pernet_operations clusterip_net_ops = { - .init = clusterip_net_init, - .exit = clusterip_net_exit, - .id = &clusterip_net_id, - .size = sizeof(struct clusterip_net), -}; - -static struct notifier_block cip_netdev_notifier = { - .notifier_call = clusterip_netdev_event -}; - -static int __init clusterip_tg_init(void) -{ - int ret; - - ret = register_pernet_subsys(&clusterip_net_ops); - if (ret < 0) - return ret; - - ret = xt_register_target(&clusterip_tg_reg); - if (ret < 0) - goto cleanup_subsys; - - ret = register_netdevice_notifier(&cip_netdev_notifier); - if (ret < 0) - goto unregister_target; - - pr_info("ClusterIP Version %s loaded successfully\n", - CLUSTERIP_VERSION); - - return 0; - -unregister_target: - xt_unregister_target(&clusterip_tg_reg); -cleanup_subsys: - unregister_pernet_subsys(&clusterip_net_ops); - return ret; -} - -static void __exit clusterip_tg_exit(void) -{ - pr_info("ClusterIP Version %s unloading\n", CLUSTERIP_VERSION); - - unregister_netdevice_notifier(&cip_netdev_notifier); - xt_unregister_target(&clusterip_tg_reg); - unregister_pernet_subsys(&clusterip_net_ops); - - /* Wait for completion of call_rcu()'s (clusterip_config_rcu_free) */ - rcu_barrier(); -} - -module_init(clusterip_tg_init); -module_exit(clusterip_tg_exit); -- cgit v1.2.3 From d8d76062785548167cbc01eb5aaae2ae0665b5da Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 3 Jan 2023 13:47:15 +0100 Subject: netfilter: nf_tables: add static key to skip retpoline workarounds If CONFIG_RETPOLINE is enabled nf_tables avoids indirect calls for builtin expressions. On newer cpus indirect calls do not go through the retpoline thunk anymore, even for RETPOLINE=y builds. Just like with the new tc retpoline wrappers: Add a static key to skip the if / else if cascade if the cpu does not require retpolines. Suggested-by: Eric Dumazet Signed-off-by: Florian Westphal --- net/netfilter/nf_tables_core.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 709a736c301c..0f26d002d8b3 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -21,6 +21,26 @@ #include #include +#if defined(CONFIG_RETPOLINE) && defined(CONFIG_X86) + +static struct static_key_false nf_tables_skip_direct_calls; + +static bool nf_skip_indirect_calls(void) +{ + return static_branch_likely(&nf_tables_skip_direct_calls); +} + +static void __init nf_skip_indirect_calls_enable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_RETPOLINE)) + static_branch_enable(&nf_tables_skip_direct_calls); +} +#else +static inline bool nf_skip_indirect_calls(void) { return false; } + +static inline void nf_skip_indirect_calls_enable(void) { } +#endif + static noinline void __nft_trace_packet(struct nft_traceinfo *info, const struct nft_chain *chain, enum nft_trace_types type) @@ -193,7 +213,12 @@ static void expr_call_ops_eval(const struct nft_expr *expr, struct nft_pktinfo *pkt) { #ifdef CONFIG_RETPOLINE - unsigned long e = (unsigned long)expr->ops->eval; + unsigned long e; + + if (nf_skip_indirect_calls()) + goto indirect_call; + + e = (unsigned long)expr->ops->eval; #define X(e, fun) \ do { if ((e) == (unsigned long)(fun)) \ return fun(expr, regs, pkt); } while (0) @@ -210,6 +235,7 @@ static void expr_call_ops_eval(const struct nft_expr *expr, X(e, nft_rt_get_eval); X(e, nft_bitwise_eval); #undef X +indirect_call: #endif /* CONFIG_RETPOLINE */ expr->ops->eval(expr, regs, pkt); } @@ -369,6 +395,8 @@ int __init nf_tables_core_module_init(void) goto err; } + nf_skip_indirect_calls_enable(); + return 0; err: -- cgit v1.2.3 From 2032e907d8d498fcabfe24b43550c50947817c6d Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 3 Jan 2023 13:47:16 +0100 Subject: netfilter: nf_tables: avoid retpoline overhead for objref calls objref expression is builtin, so avoid calls to it for RETOLINE=y builds. Signed-off-by: Florian Westphal --- include/net/netfilter/nf_tables_core.h | 4 ++++ net/netfilter/nf_tables_core.c | 2 ++ net/netfilter/nft_objref.c | 12 ++++++------ 3 files changed, 12 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 3e825381ac5c..bedef373ec21 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -164,4 +164,8 @@ void nft_payload_inner_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt, struct nft_inner_tun_ctx *ctx); +void nft_objref_eval(const struct nft_expr *expr, struct nft_regs *regs, + const struct nft_pktinfo *pkt); +void nft_objref_map_eval(const struct nft_expr *expr, struct nft_regs *regs, + const struct nft_pktinfo *pkt); #endif /* _NET_NF_TABLES_CORE_H */ diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 0f26d002d8b3..d9992906199f 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -234,6 +234,8 @@ static void expr_call_ops_eval(const struct nft_expr *expr, X(e, nft_dynset_eval); X(e, nft_rt_get_eval); X(e, nft_bitwise_eval); + X(e, nft_objref_eval); + X(e, nft_objref_map_eval); #undef X indirect_call: #endif /* CONFIG_RETPOLINE */ diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c index 7b01aa2ef653..cb37169608ba 100644 --- a/net/netfilter/nft_objref.c +++ b/net/netfilter/nft_objref.c @@ -13,9 +13,9 @@ #define nft_objref_priv(expr) *((struct nft_object **)nft_expr_priv(expr)) -static void nft_objref_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +void nft_objref_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { struct nft_object *obj = nft_objref_priv(expr); @@ -100,9 +100,9 @@ struct nft_objref_map { struct nft_set_binding binding; }; -static void nft_objref_map_eval(const struct nft_expr *expr, - struct nft_regs *regs, - const struct nft_pktinfo *pkt) +void nft_objref_map_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) { struct nft_objref_map *priv = nft_expr_priv(expr); const struct nft_set *set = priv->set; -- cgit v1.2.3 From d9e7891476057b24a1acbf10a491e5b9a1c4ae77 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 3 Jan 2023 13:47:17 +0100 Subject: netfilter: nf_tables: avoid retpoline overhead for some ct expression calls nft_ct expression cannot be made builtin to nf_tables without also forcing the conntrack itself to be builtin. However, this can be avoided by splitting retrieval of a few selector keys that only need to access the nf_conn structure, i.e. no function calls to nf_conntrack code. Many rulesets start with something like "ct status established,related accept" With this change, this no longer requires an indirect call, which gives about 1.8% more throughput with a simple conntrack-enabled forwarding test (retpoline thunk used). Signed-off-by: Florian Westphal --- include/net/netfilter/nf_tables_core.h | 12 ++++++++ net/netfilter/Makefile | 6 ++++ net/netfilter/nf_tables_core.c | 3 ++ net/netfilter/nft_ct.c | 39 +++++++++++++++-------- net/netfilter/nft_ct_fast.c | 56 ++++++++++++++++++++++++++++++++++ 5 files changed, 104 insertions(+), 12 deletions(-) create mode 100644 net/netfilter/nft_ct_fast.c (limited to 'net') diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index bedef373ec21..780a5f6ad4a6 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -61,6 +61,16 @@ struct nft_immediate_expr { extern const struct nft_expr_ops nft_cmp_fast_ops; extern const struct nft_expr_ops nft_cmp16_fast_ops; +struct nft_ct { + enum nft_ct_keys key:8; + enum ip_conntrack_dir dir:8; + u8 len; + union { + u8 dreg; + u8 sreg; + }; +}; + struct nft_payload { enum nft_payload_bases base:8; u8 offset; @@ -140,6 +150,8 @@ void nft_rt_get_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt); void nft_counter_eval(const struct nft_expr *expr, struct nft_regs *regs, const struct nft_pktinfo *pkt); +void nft_ct_get_fast_eval(const struct nft_expr *expr, + struct nft_regs *regs, const struct nft_pktinfo *pkt); enum { NFT_PAYLOAD_CTX_INNER_TUN = (1 << 0), diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3754eb06fb41..ba2a6b5e93d9 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -98,6 +98,12 @@ nf_tables-objs += nft_set_pipapo_avx2.o endif endif +ifdef CONFIG_NFT_CT +ifdef CONFIG_RETPOLINE +nf_tables-objs += nft_ct_fast.o +endif +endif + obj-$(CONFIG_NF_TABLES) += nf_tables.o obj-$(CONFIG_NFT_COMPAT) += nft_compat.o obj-$(CONFIG_NFT_CONNLIMIT) += nft_connlimit.o diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index d9992906199f..6ecd0ba2e546 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -228,6 +228,9 @@ static void expr_call_ops_eval(const struct nft_expr *expr, X(e, nft_counter_eval); X(e, nft_meta_get_eval); X(e, nft_lookup_eval); +#if IS_ENABLED(CONFIG_NFT_CT) + X(e, nft_ct_get_fast_eval); +#endif X(e, nft_range_eval); X(e, nft_immediate_eval); X(e, nft_byteorder_eval); diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c index c68e2151defe..b9c84499438b 100644 --- a/net/netfilter/nft_ct.c +++ b/net/netfilter/nft_ct.c @@ -12,7 +12,7 @@ #include #include #include -#include +#include #include #include #include @@ -23,16 +23,6 @@ #include #include -struct nft_ct { - enum nft_ct_keys key:8; - enum ip_conntrack_dir dir:8; - u8 len; - union { - u8 dreg; - u8 sreg; - }; -}; - struct nft_ct_helper_obj { struct nf_conntrack_helper *helper4; struct nf_conntrack_helper *helper6; @@ -759,6 +749,18 @@ static bool nft_ct_set_reduce(struct nft_regs_track *track, return false; } +#ifdef CONFIG_RETPOLINE +static const struct nft_expr_ops nft_ct_get_fast_ops = { + .type = &nft_ct_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_ct)), + .eval = nft_ct_get_fast_eval, + .init = nft_ct_get_init, + .destroy = nft_ct_get_destroy, + .dump = nft_ct_get_dump, + .reduce = nft_ct_set_reduce, +}; +#endif + static const struct nft_expr_ops nft_ct_set_ops = { .type = &nft_ct_type, .size = NFT_EXPR_SIZE(sizeof(struct nft_ct)), @@ -791,8 +793,21 @@ nft_ct_select_ops(const struct nft_ctx *ctx, if (tb[NFTA_CT_DREG] && tb[NFTA_CT_SREG]) return ERR_PTR(-EINVAL); - if (tb[NFTA_CT_DREG]) + if (tb[NFTA_CT_DREG]) { +#ifdef CONFIG_RETPOLINE + u32 k = ntohl(nla_get_be32(tb[NFTA_CT_KEY])); + + switch (k) { + case NFT_CT_STATE: + case NFT_CT_DIRECTION: + case NFT_CT_STATUS: + case NFT_CT_MARK: + case NFT_CT_SECMARK: + return &nft_ct_get_fast_ops; + } +#endif return &nft_ct_get_ops; + } if (tb[NFTA_CT_SREG]) { #ifdef CONFIG_NF_CONNTRACK_ZONES diff --git a/net/netfilter/nft_ct_fast.c b/net/netfilter/nft_ct_fast.c new file mode 100644 index 000000000000..89983b0613fa --- /dev/null +++ b/net/netfilter/nft_ct_fast.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0-only +#if IS_ENABLED(CONFIG_NFT_CT) +#include +#include +#include + +void nft_ct_get_fast_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + const struct nft_ct *priv = nft_expr_priv(expr); + u32 *dest = ®s->data[priv->dreg]; + enum ip_conntrack_info ctinfo; + const struct nf_conn *ct; + unsigned int state; + + ct = nf_ct_get(pkt->skb, &ctinfo); + if (!ct) { + regs->verdict.code = NFT_BREAK; + return; + } + + switch (priv->key) { + case NFT_CT_STATE: + if (ct) + state = NF_CT_STATE_BIT(ctinfo); + else if (ctinfo == IP_CT_UNTRACKED) + state = NF_CT_STATE_UNTRACKED_BIT; + else + state = NF_CT_STATE_INVALID_BIT; + *dest = state; + return; + case NFT_CT_DIRECTION: + nft_reg_store8(dest, CTINFO2DIR(ctinfo)); + return; + case NFT_CT_STATUS: + *dest = ct->status; + return; +#ifdef CONFIG_NF_CONNTRACK_MARK + case NFT_CT_MARK: + *dest = ct->mark; + return; +#endif +#ifdef CONFIG_NF_CONNTRACK_SECMARK + case NFT_CT_SECMARK: + *dest = ct->secmark; + return; +#endif + default: + WARN_ON_ONCE(1); + regs->verdict.code = NFT_BREAK; + break; + } +} +EXPORT_SYMBOL_GPL(nft_ct_get_fast_eval); +#endif -- cgit v1.2.3 From f80a612dd77c4585171e44a06b490466bdeec1ae Mon Sep 17 00:00:00 2001 From: Fernando Fernandez Mancera Date: Mon, 2 Jan 2023 15:42:34 +0100 Subject: netfilter: nf_tables: add support to destroy operation Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a delete operation but ignoring the ENOENT errors. This is useful for the transaction semantics, where failing to delete an object which does not exist results in aborting the transaction. This new command allows the transaction to proceed in case the object does not exist. Signed-off-by: Fernando Fernandez Mancera Signed-off-by: Pablo Neira Ayuso Signed-off-by: Florian Westphal --- include/uapi/linux/netfilter/nf_tables.h | 14 ++++ net/netfilter/nf_tables_api.c | 111 ++++++++++++++++++++++++++++--- 2 files changed, 117 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index cfa844da1ce6..ff677f3a6cad 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -98,6 +98,13 @@ enum nft_verdicts { * @NFT_MSG_GETFLOWTABLE: get flow table (enum nft_flowtable_attributes) * @NFT_MSG_DELFLOWTABLE: delete flow table (enum nft_flowtable_attributes) * @NFT_MSG_GETRULE_RESET: get rules and reset stateful expressions (enum nft_obj_attributes) + * @NFT_MSG_DESTROYTABLE: destroy a table (enum nft_table_attributes) + * @NFT_MSG_DESTROYCHAIN: destroy a chain (enum nft_chain_attributes) + * @NFT_MSG_DESTROYRULE: destroy a rule (enum nft_rule_attributes) + * @NFT_MSG_DESTROYSET: destroy a set (enum nft_set_attributes) + * @NFT_MSG_DESTROYSETELEM: destroy a set element (enum nft_set_elem_attributes) + * @NFT_MSG_DESTROYOBJ: destroy a stateful object (enum nft_object_attributes) + * @NFT_MSG_DESTROYFLOWTABLE: destroy flow table (enum nft_flowtable_attributes) */ enum nf_tables_msg_types { NFT_MSG_NEWTABLE, @@ -126,6 +133,13 @@ enum nf_tables_msg_types { NFT_MSG_GETFLOWTABLE, NFT_MSG_DELFLOWTABLE, NFT_MSG_GETRULE_RESET, + NFT_MSG_DESTROYTABLE, + NFT_MSG_DESTROYCHAIN, + NFT_MSG_DESTROYRULE, + NFT_MSG_DESTROYSET, + NFT_MSG_DESTROYSETELEM, + NFT_MSG_DESTROYOBJ, + NFT_MSG_DESTROYFLOWTABLE, NFT_MSG_MAX, }; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 8c09e4d12ac1..974b95dece1d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1401,6 +1401,10 @@ static int nf_tables_deltable(struct sk_buff *skb, const struct nfnl_info *info, } if (IS_ERR(table)) { + if (PTR_ERR(table) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYTABLE) + return 0; + NL_SET_BAD_ATTR(extack, attr); return PTR_ERR(table); } @@ -2639,6 +2643,10 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info, chain = nft_chain_lookup(net, table, attr, genmask); } if (IS_ERR(chain)) { + if (PTR_ERR(chain) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYCHAIN) + return 0; + NL_SET_BAD_ATTR(extack, attr); return PTR_ERR(chain); } @@ -3716,6 +3724,10 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info, chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN], genmask); if (IS_ERR(chain)) { + if (PTR_ERR(rule) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYRULE) + return 0; + NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]); return PTR_ERR(chain); } @@ -3729,6 +3741,10 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info, if (nla[NFTA_RULE_HANDLE]) { rule = nft_rule_lookup(chain, nla[NFTA_RULE_HANDLE]); if (IS_ERR(rule)) { + if (PTR_ERR(rule) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYRULE) + return 0; + NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_HANDLE]); return PTR_ERR(rule); } @@ -4808,6 +4824,10 @@ static int nf_tables_delset(struct sk_buff *skb, const struct nfnl_info *info, } if (IS_ERR(set)) { + if (PTR_ERR(set) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYSET) + return 0; + NL_SET_BAD_ATTR(extack, attr); return PTR_ERR(set); } @@ -6690,6 +6710,10 @@ static int nf_tables_delsetelem(struct sk_buff *skb, nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) { err = nft_del_setelem(&ctx, set, attr); + if (err == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYSETELEM) + continue; + if (err < 0) { NL_SET_BAD_ATTR(extack, attr); break; @@ -7334,6 +7358,10 @@ static int nf_tables_delobj(struct sk_buff *skb, const struct nfnl_info *info, } if (IS_ERR(obj)) { + if (PTR_ERR(obj) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYOBJ) + return 0; + NL_SET_BAD_ATTR(extack, attr); return PTR_ERR(obj); } @@ -7964,6 +7992,10 @@ static int nf_tables_delflowtable(struct sk_buff *skb, } if (IS_ERR(flowtable)) { + if (PTR_ERR(flowtable) == -ENOENT && + NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYFLOWTABLE) + return 0; + NL_SET_BAD_ATTR(extack, attr); return PTR_ERR(flowtable); } @@ -8373,6 +8405,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_TABLE_MAX, .policy = nft_table_policy, }, + [NFT_MSG_DESTROYTABLE] = { + .call = nf_tables_deltable, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_TABLE_MAX, + .policy = nft_table_policy, + }, [NFT_MSG_NEWCHAIN] = { .call = nf_tables_newchain, .type = NFNL_CB_BATCH, @@ -8391,6 +8429,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_CHAIN_MAX, .policy = nft_chain_policy, }, + [NFT_MSG_DESTROYCHAIN] = { + .call = nf_tables_delchain, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_CHAIN_MAX, + .policy = nft_chain_policy, + }, [NFT_MSG_NEWRULE] = { .call = nf_tables_newrule, .type = NFNL_CB_BATCH, @@ -8415,6 +8459,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_RULE_MAX, .policy = nft_rule_policy, }, + [NFT_MSG_DESTROYRULE] = { + .call = nf_tables_delrule, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_RULE_MAX, + .policy = nft_rule_policy, + }, [NFT_MSG_NEWSET] = { .call = nf_tables_newset, .type = NFNL_CB_BATCH, @@ -8433,6 +8483,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_SET_MAX, .policy = nft_set_policy, }, + [NFT_MSG_DESTROYSET] = { + .call = nf_tables_delset, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_SET_MAX, + .policy = nft_set_policy, + }, [NFT_MSG_NEWSETELEM] = { .call = nf_tables_newsetelem, .type = NFNL_CB_BATCH, @@ -8451,6 +8507,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_SET_ELEM_LIST_MAX, .policy = nft_set_elem_list_policy, }, + [NFT_MSG_DESTROYSETELEM] = { + .call = nf_tables_delsetelem, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_SET_ELEM_LIST_MAX, + .policy = nft_set_elem_list_policy, + }, [NFT_MSG_GETGEN] = { .call = nf_tables_getgen, .type = NFNL_CB_RCU, @@ -8473,6 +8535,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_OBJ_MAX, .policy = nft_obj_policy, }, + [NFT_MSG_DESTROYOBJ] = { + .call = nf_tables_delobj, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_OBJ_MAX, + .policy = nft_obj_policy, + }, [NFT_MSG_GETOBJ_RESET] = { .call = nf_tables_getobj, .type = NFNL_CB_RCU, @@ -8497,6 +8565,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_FLOWTABLE_MAX, .policy = nft_flowtable_policy, }, + [NFT_MSG_DESTROYFLOWTABLE] = { + .call = nf_tables_delflowtable, + .type = NFNL_CB_BATCH, + .attr_count = NFTA_FLOWTABLE_MAX, + .policy = nft_flowtable_policy, + }, }; static int nf_tables_validate(struct net *net) @@ -8590,6 +8664,7 @@ static void nft_commit_release(struct nft_trans *trans) { switch (trans->msg_type) { case NFT_MSG_DELTABLE: + case NFT_MSG_DESTROYTABLE: nf_tables_table_destroy(&trans->ctx); break; case NFT_MSG_NEWCHAIN: @@ -8597,23 +8672,29 @@ static void nft_commit_release(struct nft_trans *trans) kfree(nft_trans_chain_name(trans)); break; case NFT_MSG_DELCHAIN: + case NFT_MSG_DESTROYCHAIN: nf_tables_chain_destroy(&trans->ctx); break; case NFT_MSG_DELRULE: + case NFT_MSG_DESTROYRULE: nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans)); break; case NFT_MSG_DELSET: + case NFT_MSG_DESTROYSET: nft_set_destroy(&trans->ctx, nft_trans_set(trans)); break; case NFT_MSG_DELSETELEM: + case NFT_MSG_DESTROYSETELEM: nf_tables_set_elem_destroy(&trans->ctx, nft_trans_elem_set(trans), nft_trans_elem(trans).priv); break; case NFT_MSG_DELOBJ: + case NFT_MSG_DESTROYOBJ: nft_obj_destroy(&trans->ctx, nft_trans_obj(trans)); break; case NFT_MSG_DELFLOWTABLE: + case NFT_MSG_DESTROYFLOWTABLE: if (nft_trans_flowtable_update(trans)) nft_flowtable_hooks_destroy(&nft_trans_flowtable_hooks(trans)); else @@ -9065,8 +9146,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_trans_destroy(trans); break; case NFT_MSG_DELTABLE: + case NFT_MSG_DESTROYTABLE: list_del_rcu(&trans->ctx.table->list); - nf_tables_table_notify(&trans->ctx, NFT_MSG_DELTABLE); + nf_tables_table_notify(&trans->ctx, trans->msg_type); break; case NFT_MSG_NEWCHAIN: if (nft_trans_chain_update(trans)) { @@ -9081,8 +9163,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) } break; case NFT_MSG_DELCHAIN: + case NFT_MSG_DESTROYCHAIN: nft_chain_del(trans->ctx.chain); - nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN); + nf_tables_chain_notify(&trans->ctx, trans->msg_type); nf_tables_unregister_hook(trans->ctx.net, trans->ctx.table, trans->ctx.chain); @@ -9098,10 +9181,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_trans_destroy(trans); break; case NFT_MSG_DELRULE: + case NFT_MSG_DESTROYRULE: list_del_rcu(&nft_trans_rule(trans)->list); nf_tables_rule_notify(&trans->ctx, nft_trans_rule(trans), - NFT_MSG_DELRULE); + trans->msg_type); nft_rule_expr_deactivate(&trans->ctx, nft_trans_rule(trans), NFT_TRANS_COMMIT); @@ -9129,9 +9213,10 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_trans_destroy(trans); break; case NFT_MSG_DELSET: + case NFT_MSG_DESTROYSET: list_del_rcu(&nft_trans_set(trans)->list); nf_tables_set_notify(&trans->ctx, nft_trans_set(trans), - NFT_MSG_DELSET, GFP_KERNEL); + trans->msg_type, GFP_KERNEL); break; case NFT_MSG_NEWSETELEM: te = (struct nft_trans_elem *)trans->data; @@ -9143,11 +9228,12 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_trans_destroy(trans); break; case NFT_MSG_DELSETELEM: + case NFT_MSG_DESTROYSETELEM: te = (struct nft_trans_elem *)trans->data; nf_tables_setelem_notify(&trans->ctx, te->set, &te->elem, - NFT_MSG_DELSETELEM); + trans->msg_type); nft_setelem_remove(net, te->set, &te->elem); if (!nft_setelem_is_catchall(te->set, &te->elem)) { atomic_dec(&te->set->nelems); @@ -9169,9 +9255,10 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) } break; case NFT_MSG_DELOBJ: + case NFT_MSG_DESTROYOBJ: nft_obj_del(nft_trans_obj(trans)); nf_tables_obj_notify(&trans->ctx, nft_trans_obj(trans), - NFT_MSG_DELOBJ); + trans->msg_type); break; case NFT_MSG_NEWFLOWTABLE: if (nft_trans_flowtable_update(trans)) { @@ -9193,11 +9280,12 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_trans_destroy(trans); break; case NFT_MSG_DELFLOWTABLE: + case NFT_MSG_DESTROYFLOWTABLE: if (nft_trans_flowtable_update(trans)) { nf_tables_flowtable_notify(&trans->ctx, nft_trans_flowtable(trans), &nft_trans_flowtable_hooks(trans), - NFT_MSG_DELFLOWTABLE); + trans->msg_type); nft_unregister_flowtable_net_hooks(net, &nft_trans_flowtable_hooks(trans)); } else { @@ -9205,7 +9293,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nf_tables_flowtable_notify(&trans->ctx, nft_trans_flowtable(trans), &nft_trans_flowtable(trans)->hook_list, - NFT_MSG_DELFLOWTABLE); + trans->msg_type); nft_unregister_flowtable_net_hooks(net, &nft_trans_flowtable(trans)->hook_list); } @@ -9301,6 +9389,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) } break; case NFT_MSG_DELTABLE: + case NFT_MSG_DESTROYTABLE: nft_clear(trans->ctx.net, trans->ctx.table); nft_trans_destroy(trans); break; @@ -9322,6 +9411,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) } break; case NFT_MSG_DELCHAIN: + case NFT_MSG_DESTROYCHAIN: trans->ctx.table->use++; nft_clear(trans->ctx.net, trans->ctx.chain); nft_trans_destroy(trans); @@ -9336,6 +9426,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) nft_flow_rule_destroy(nft_trans_flow_rule(trans)); break; case NFT_MSG_DELRULE: + case NFT_MSG_DESTROYRULE: trans->ctx.chain->use++; nft_clear(trans->ctx.net, nft_trans_rule(trans)); nft_rule_expr_activate(&trans->ctx, nft_trans_rule(trans)); @@ -9357,6 +9448,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) list_del_rcu(&nft_trans_set(trans)->list); break; case NFT_MSG_DELSET: + case NFT_MSG_DESTROYSET: trans->ctx.table->use++; nft_clear(trans->ctx.net, nft_trans_set(trans)); nft_trans_destroy(trans); @@ -9372,6 +9464,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) atomic_dec(&te->set->nelems); break; case NFT_MSG_DELSETELEM: + case NFT_MSG_DESTROYSETELEM: te = (struct nft_trans_elem *)trans->data; nft_setelem_data_activate(net, te->set, &te->elem); @@ -9391,6 +9484,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) } break; case NFT_MSG_DELOBJ: + case NFT_MSG_DESTROYOBJ: trans->ctx.table->use++; nft_clear(trans->ctx.net, nft_trans_obj(trans)); nft_trans_destroy(trans); @@ -9407,6 +9501,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) } break; case NFT_MSG_DELFLOWTABLE: + case NFT_MSG_DESTROYFLOWTABLE: if (nft_trans_flowtable_update(trans)) { list_splice(&nft_trans_flowtable_hooks(trans), &nft_trans_flowtable(trans)->hook_list); -- cgit v1.2.3 From 9259f6b573cf17c00f50c4b626983a5347b1abe9 Mon Sep 17 00:00:00 2001 From: Tanmay Bhushan <007047221b@gmail.com> Date: Mon, 16 Jan 2023 15:55:00 +0100 Subject: ipv6: Remove extra counter pull before gc Per cpu entries are no longer used in consideration for doing gc or not. Remove the extra per cpu entries pull to directly check for time and perform gc. Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/route.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'net') diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b643dda68d31..76889ceeead9 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3294,10 +3294,6 @@ static void ip6_dst_gc(struct dst_ops *ops) unsigned int val; int entries; - entries = dst_entries_get_fast(ops); - if (entries > ops->gc_thresh) - entries = dst_entries_get_slow(ops); - if (time_after(rt_last_gc + rt_min_interval, jiffies)) goto out; -- cgit v1.2.3 From 68e5b6aa2795fd05c6ff58616cb16f2f216e4123 Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Tue, 17 Jan 2023 10:43:05 +0100 Subject: xdp: document xdp_do_flush() before napi_complete_done() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document in the XDP_REDIRECT manual section that drivers must call xdp_do_flush() before napi_complete_done(). The two reasons behind this can be found following the links below. Signed-off-by: Magnus Karlsson Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/ Acked-by: Toke Høiland-Jørgensen Signed-off-by: David S. Miller --- net/core/filter.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/core/filter.c b/net/core/filter.c index ab811293ae5d..7a2b67893afd 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4128,9 +4128,13 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { * bpf_redirect_info to actually enqueue the frame into a map type-specific * bulk queue structure. * - * 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(), - * which will flush all the different bulk queues, thus completing the - * redirect. + * 3. Before exiting its NAPI poll loop, the driver will call + * xdp_do_flush(), which will flush all the different bulk queues, + * thus completing the redirect. Note that xdp_do_flush() must be + * called before napi_complete_done() in the driver, as the + * XDP_REDIRECT logic relies on being inside a single NAPI instance + * through to the xdp_do_flush() call for RCU protection of all + * in-kernel data structures. */ /* * Pointers to the map entries will be kept around for this whole sequence of -- cgit v1.2.3 From 585b6e1304dcc46e65dc1aaca5973b33abd0c48d Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 16 Jan 2023 15:24:11 +0100 Subject: wifi: cfg80211: remove support for static WEP This reverts commit b8676221f00d ("cfg80211: Add support for static WEP in the driver") since no driver ever ended up using it. Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 9 --------- net/wireless/core.h | 4 ++-- net/wireless/ibss.c | 5 ++--- net/wireless/sme.c | 6 +----- net/wireless/util.c | 2 +- net/wireless/wext-compat.c | 2 +- net/wireless/wext-sme.c | 2 +- 7 files changed, 8 insertions(+), 22 deletions(-) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 03d4f4deadae..1f8f827290a2 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1075,7 +1075,6 @@ struct survey_info { s8 noise; }; -#define CFG80211_MAX_WEP_KEYS 4 #define CFG80211_MAX_NUM_AKM_SUITES 10 /** @@ -1099,9 +1098,6 @@ struct survey_info { * port frames over NL80211 instead of the network interface. * @control_port_no_preauth: disables pre-auth rx over the nl80211 control * port for mac80211 - * @wep_keys: static WEP keys, if not NULL points to an array of - * CFG80211_MAX_WEP_KEYS WEP keys - * @wep_tx_key: key index (0..3) of the default TX static WEP key * @psk: PSK (for devices supporting 4-way-handshake offload) * @sae_pwd: password for SAE authentication (for devices supporting SAE * offload) @@ -1134,8 +1130,6 @@ struct cfg80211_crypto_settings { bool control_port_no_encrypt; bool control_port_over_nl80211; bool control_port_no_preauth; - struct key_params *wep_keys; - int wep_tx_key; const u8 *psk; const u8 *sae_pwd; u8 sae_pwd_len; @@ -4683,8 +4677,6 @@ struct cfg80211_ops { * @WIPHY_FLAG_SUPPORTS_5_10_MHZ: Device supports 5 MHz and 10 MHz channels. * @WIPHY_FLAG_HAS_CHANNEL_SWITCH: Device supports channel switch in * beaconing mode (AP, IBSS, Mesh, ...). - * @WIPHY_FLAG_HAS_STATIC_WEP: The device supports static WEP key installation - * before connection. * @WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK: The device supports bigger kek and kck keys * @WIPHY_FLAG_SUPPORTS_MLO: This is a temporary flag gating the MLO APIs, * in order to not have them reachable in normal drivers, until we have @@ -4715,7 +4707,6 @@ enum wiphy_flags { WIPHY_FLAG_HAS_REMAIN_ON_CHANNEL = BIT(21), WIPHY_FLAG_SUPPORTS_5_10_MHZ = BIT(22), WIPHY_FLAG_HAS_CHANNEL_SWITCH = BIT(23), - WIPHY_FLAG_HAS_STATIC_WEP = BIT(24), }; /** diff --git a/net/wireless/core.h b/net/wireless/core.h index af85d8909935..7c61752f6d83 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -278,8 +278,8 @@ struct cfg80211_event { }; struct cfg80211_cached_keys { - struct key_params params[CFG80211_MAX_WEP_KEYS]; - u8 data[CFG80211_MAX_WEP_KEYS][WLAN_KEY_LEN_WEP104]; + struct key_params params[4]; + u8 data[4][WLAN_KEY_LEN_WEP104]; int def; }; diff --git a/net/wireless/ibss.c b/net/wireless/ibss.c index edd062f104f4..e6fdb0b8187d 100644 --- a/net/wireless/ibss.c +++ b/net/wireless/ibss.c @@ -45,8 +45,7 @@ void __cfg80211_ibss_joined(struct net_device *dev, const u8 *bssid, cfg80211_hold_bss(bss_from_pub(bss)); wdev->u.ibss.current_bss = bss_from_pub(bss); - if (!(wdev->wiphy->flags & WIPHY_FLAG_HAS_STATIC_WEP)) - cfg80211_upload_connect_keys(wdev); + cfg80211_upload_connect_keys(wdev); nl80211_send_ibss_bssid(wiphy_to_rdev(wdev->wiphy), dev, bssid, GFP_KERNEL); @@ -294,7 +293,7 @@ int cfg80211_ibss_wext_join(struct cfg80211_registered_device *rdev, ck = kmemdup(wdev->wext.keys, sizeof(*ck), GFP_KERNEL); if (!ck) return -ENOMEM; - for (i = 0; i < CFG80211_MAX_WEP_KEYS; i++) + for (i = 0; i < 4; i++) ck->params[i].key = ck->data[i]; } err = __cfg80211_join_ibss(rdev, wdev->netdev, diff --git a/net/wireless/sme.c b/net/wireless/sme.c index 4b5b6ee0fe01..123248b2c0be 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -855,8 +855,7 @@ void __cfg80211_connect_result(struct net_device *dev, ETH_ALEN); } - if (!(wdev->wiphy->flags & WIPHY_FLAG_HAS_STATIC_WEP)) - cfg80211_upload_connect_keys(wdev); + cfg80211_upload_connect_keys(wdev); rcu_read_lock(); for_each_valid_link(cr, link) { @@ -1462,9 +1461,6 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev, connect->crypto.ciphers_pairwise[0] = cipher; } } - - connect->crypto.wep_keys = connkeys->params; - connect->crypto.wep_tx_key = connkeys->def; } else { if (WARN_ON(connkeys)) return -EINVAL; diff --git a/net/wireless/util.c b/net/wireless/util.c index 8f403f9fe816..38d3b434c18c 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -934,7 +934,7 @@ void cfg80211_upload_connect_keys(struct wireless_dev *wdev) if (!wdev->connect_keys) return; - for (i = 0; i < CFG80211_MAX_WEP_KEYS; i++) { + for (i = 0; i < 4; i++) { if (!wdev->connect_keys->params[i].cipher) continue; if (rdev_add_key(rdev, dev, -1, i, false, NULL, diff --git a/net/wireless/wext-compat.c b/net/wireless/wext-compat.c index 8a24dfca75af..e3acfac7430a 100644 --- a/net/wireless/wext-compat.c +++ b/net/wireless/wext-compat.c @@ -439,7 +439,7 @@ static int __cfg80211_set_encryption(struct cfg80211_registered_device *rdev, GFP_KERNEL); if (!wdev->wext.keys) return -ENOMEM; - for (i = 0; i < CFG80211_MAX_WEP_KEYS; i++) + for (i = 0; i < 4; i++) wdev->wext.keys->params[i].key = wdev->wext.keys->data[i]; } diff --git a/net/wireless/wext-sme.c b/net/wireless/wext-sme.c index 191c6d98c700..f231207ca210 100644 --- a/net/wireless/wext-sme.c +++ b/net/wireless/wext-sme.c @@ -47,7 +47,7 @@ int cfg80211_mgd_wext_connect(struct cfg80211_registered_device *rdev, ck = kmemdup(wdev->wext.keys, sizeof(*ck), GFP_KERNEL); if (!ck) return -ENOMEM; - for (i = 0; i < CFG80211_MAX_WEP_KEYS; i++) + for (i = 0; i < 4; i++) ck->params[i].key = ck->data[i]; } -- cgit v1.2.3 From df4969ca135b9b3b2c38c07514aaa775112ac835 Mon Sep 17 00:00:00 2001 From: Shivani Baranwal Date: Tue, 6 Dec 2022 20:07:14 +0530 Subject: wifi: cfg80211: Fix extended KCK key length check in nl80211_set_rekey_data() The extended KCK key length check wrongly using the KEK key attribute for validation. Due to this GTK rekey offload is failing when the KCK key length is 24 bytes even though the driver advertising WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK flag. Use correct attribute to fix the same. Fixes: 093a48d2aa4b ("cfg80211: support bigger kek/kck key length") Signed-off-by: Shivani Baranwal Signed-off-by: Veerendranath Jakkam Link: https://lore.kernel.org/r/20221206143715.1802987-2-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 33a82ecab9d5..02b9a0280896 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -13809,7 +13809,7 @@ static int nl80211_set_rekey_data(struct sk_buff *skb, struct genl_info *info) return -ERANGE; if (nla_len(tb[NL80211_REKEY_DATA_KCK]) != NL80211_KCK_LEN && !(rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK && - nla_len(tb[NL80211_REKEY_DATA_KEK]) == NL80211_KCK_EXT_LEN)) + nla_len(tb[NL80211_REKEY_DATA_KCK]) == NL80211_KCK_EXT_LEN)) return -ERANGE; rekey_data.kek = nla_data(tb[NL80211_REKEY_DATA_KEK]); -- cgit v1.2.3 From 648fba791cb0f5ef6166449d056f82e6639fe268 Mon Sep 17 00:00:00 2001 From: Shivani Baranwal Date: Tue, 6 Dec 2022 20:07:15 +0530 Subject: wifi: cfg80211: Support 32 bytes KCK key in GTK rekey offload Currently, maximum KCK key length supported for GTK rekey offload is 24 bytes but with some newer AKMs the KCK key length can be 32 bytes. e.g., 00-0F-AC:24 AKM suite with SAE finite cyclic group 21. Add support to allow 32 bytes KCK keys in GTK rekey offload. Signed-off-by: Shivani Baranwal Signed-off-by: Veerendranath Jakkam Link: https://lore.kernel.org/r/20221206143715.1802987-3-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 3 ++- include/uapi/linux/nl80211.h | 1 + net/wireless/nl80211.c | 6 ++++-- 3 files changed, 7 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 1f8f827290a2..f96db7ad64f1 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -4682,6 +4682,7 @@ struct cfg80211_ops { * in order to not have them reachable in normal drivers, until we have * complete feature/interface combinations/etc. advertisement. No driver * should set this flag for now. + * @WIPHY_FLAG_SUPPORTS_EXT_KCK_32: The device supports 32-byte KCK keys. */ enum wiphy_flags { WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK = BIT(0), @@ -4694,7 +4695,7 @@ enum wiphy_flags { WIPHY_FLAG_CONTROL_PORT_PROTOCOL = BIT(7), WIPHY_FLAG_IBSS_RSN = BIT(8), WIPHY_FLAG_MESH_AUTH = BIT(10), - /* use hole at 11 */ + WIPHY_FLAG_SUPPORTS_EXT_KCK_32 = BIT(11), /* use hole at 12 */ WIPHY_FLAG_SUPPORTS_FW_ROAM = BIT(13), WIPHY_FLAG_AP_UAPSD = BIT(14), diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h index c14a91bbca7c..429bdc399962 100644 --- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -5869,6 +5869,7 @@ enum plink_actions { #define NL80211_KEK_LEN 16 #define NL80211_KCK_EXT_LEN 24 #define NL80211_KEK_EXT_LEN 32 +#define NL80211_KCK_EXT_LEN_32 32 #define NL80211_REPLAY_CTR_LEN 8 /** diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 02b9a0280896..64cf6110ce9d 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -883,7 +883,7 @@ nl80211_rekey_policy[NUM_NL80211_REKEY_DATA] = { }, [NL80211_REKEY_DATA_KCK] = { .type = NLA_BINARY, - .len = NL80211_KCK_EXT_LEN + .len = NL80211_KCK_EXT_LEN_32 }, [NL80211_REKEY_DATA_REPLAY_CTR] = NLA_POLICY_EXACT_LEN(NL80211_REPLAY_CTR_LEN), [NL80211_REKEY_DATA_AKM] = { .type = NLA_U32 }, @@ -13809,7 +13809,9 @@ static int nl80211_set_rekey_data(struct sk_buff *skb, struct genl_info *info) return -ERANGE; if (nla_len(tb[NL80211_REKEY_DATA_KCK]) != NL80211_KCK_LEN && !(rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK && - nla_len(tb[NL80211_REKEY_DATA_KCK]) == NL80211_KCK_EXT_LEN)) + nla_len(tb[NL80211_REKEY_DATA_KCK]) == NL80211_KCK_EXT_LEN) && + !(rdev->wiphy.flags & WIPHY_FLAG_SUPPORTS_EXT_KCK_32 && + nla_len(tb[NL80211_REKEY_DATA_KCK]) == NL80211_KCK_EXT_LEN_32)) return -ERANGE; rekey_data.kek = nla_data(tb[NL80211_REKEY_DATA_KEK]); -- cgit v1.2.3 From 42470fa093248807f668825ff14de9bc623c0d53 Mon Sep 17 00:00:00 2001 From: Muna Sinada Date: Wed, 5 Oct 2022 14:54:45 -0700 Subject: wifi: mac80211: Add VHT MU-MIMO related flags in ieee80211_bss_conf Adding flags for SU Beamformer, SU Beamformee, MU Beamformer and MU Beamformee for VHT. This is utilized to pass MU-MIMO configurations from user space to driver in AP mode. Signed-off-by: Muna Sinada Link: https://lore.kernel.org/r/1665006886-23874-1-git-send-email-quic_msinada@quicinc.com [fixed indentation, removed redundant !!] Signed-off-by: Johannes Berg --- include/net/mac80211.h | 13 +++++++++++++ net/mac80211/cfg.c | 15 +++++++++++++++ 2 files changed, 28 insertions(+) (limited to 'net') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index b5b80f943e82..a0f67d49be05 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -653,6 +653,14 @@ struct ieee80211_fils_discovery { * write-protected by sdata_lock and local->mtx so holding either is fine * for read access. * @color_change_color: the bss color that will be used after the change. + * @vht_su_beamformer: in AP mode, does this BSS support operation as an VHT SU + * beamformer + * @vht_su_beamformee: in AP mode, does this BSS support operation as an VHT SU + * beamformee + * @vht_mu_beamformer: in AP mode, does this BSS support operation as an VHT MU + * beamformer + * @vht_mu_beamformee: in AP mode, does this BSS support operation as an VHT MU + * beamformee */ struct ieee80211_bss_conf { const u8 *bssid; @@ -726,6 +734,11 @@ struct ieee80211_bss_conf { bool color_change_active; u8 color_change_color; + + bool vht_su_beamformer; + bool vht_su_beamformee; + bool vht_mu_beamformer; + bool vht_mu_beamformee; }; /** diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 7f549862c83e..284d19d68521 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1252,6 +1252,21 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, prev_beacon_int = link_conf->beacon_int; link_conf->beacon_int = params->beacon_interval; + if (params->vht_cap) { + link_conf->vht_su_beamformer = + params->vht_cap->vht_cap_info & + cpu_to_le32(IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE); + link_conf->vht_su_beamformee = + params->vht_cap->vht_cap_info & + cpu_to_le32(IEEE80211_VHT_CAP_SU_BEAMFORMEE_CAPABLE); + link_conf->vht_mu_beamformer = + params->vht_cap->vht_cap_info & + cpu_to_le32(IEEE80211_VHT_CAP_MU_BEAMFORMER_CAPABLE); + link_conf->vht_mu_beamformee = + params->vht_cap->vht_cap_info & + cpu_to_le32(IEEE80211_VHT_CAP_MU_BEAMFORMEE_CAPABLE); + } + if (params->he_cap && params->he_oper) { link_conf->he_support = true; link_conf->htc_trig_based_pkt_ext = -- cgit v1.2.3 From b1b3297df7db7065476666ddbca5a61d081347ef Mon Sep 17 00:00:00 2001 From: Muna Sinada Date: Wed, 5 Oct 2022 14:54:46 -0700 Subject: wifi: mac80211: Add HE MU-MIMO related flags in ieee80211_bss_conf Adding flags for SU Beamformer, SU Beamformee, MU Beamformer and Full Bandwidth UL MU-MIMO for HE. This is utilized to pass MU-MIMO configurations from user space to driver in AP mode. Signed-off-by: Muna Sinada Link: https://lore.kernel.org/r/1665006886-23874-2-git-send-email-quic_msinada@quicinc.com [fixed indentation, removed redundant !!] Signed-off-by: Johannes Berg --- include/net/mac80211.h | 13 +++++++++++++ net/mac80211/cfg.c | 15 +++++++++++++++ 2 files changed, 28 insertions(+) (limited to 'net') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index a0f67d49be05..65dd3982391f 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -661,6 +661,15 @@ struct ieee80211_fils_discovery { * beamformer * @vht_mu_beamformee: in AP mode, does this BSS support operation as an VHT MU * beamformee + * @he_su_beamformer: in AP-mode, does this BSS support operation as an HE SU + * beamformer + * @he_su_beamformee: in AP-mode, does this BSS support operation as an HE SU + * beamformee + * @he_mu_beamformer: in AP-mode, does this BSS support operation as an HE MU + * beamformer + * @he_full_ul_mumimo: does this BSS support the reception (AP) or transmission + * (non-AP STA) of an HE TB PPDU on an RU that spans the entire PPDU + * bandwidth */ struct ieee80211_bss_conf { const u8 *bssid; @@ -739,6 +748,10 @@ struct ieee80211_bss_conf { bool vht_su_beamformee; bool vht_mu_beamformer; bool vht_mu_beamformee; + bool he_su_beamformer; + bool he_su_beamformee; + bool he_mu_beamformer; + bool he_full_ul_mumimo; }; /** diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 284d19d68521..99116485eab1 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1281,6 +1281,21 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, changed |= BSS_CHANGED_HE_BSS_COLOR; } + if (params->he_cap) { + link_conf->he_su_beamformer = + params->he_cap->phy_cap_info[3] & + IEEE80211_HE_PHY_CAP3_SU_BEAMFORMER; + link_conf->he_su_beamformee = + params->he_cap->phy_cap_info[4] & + IEEE80211_HE_PHY_CAP4_SU_BEAMFORMEE; + link_conf->he_mu_beamformer = + params->he_cap->phy_cap_info[4] & + IEEE80211_HE_PHY_CAP4_MU_BEAMFORMER; + link_conf->he_full_ul_mumimo = + params->he_cap->phy_cap_info[2] & + IEEE80211_HE_PHY_CAP2_UL_MU_FULL_MU_MIMO; + } + if (sdata->vif.type == NL80211_IFTYPE_AP && params->mbssid_config.tx_wdev) { err = ieee80211_set_ap_mbssid_options(sdata, -- cgit v1.2.3 From f66c48af7a110c0d694c4ac4a1257affb272a2ea Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 9 Jan 2023 13:07:21 +0200 Subject: mac80211: support minimal EHT rate reporting on RX Add minimal support for RX EHT rate reporting, not yet adding (modifying) any radiotap headers, just statistics for cfg80211. Signed-off-by: Johannes Berg --- include/net/mac80211.h | 19 ++++++++++++++++--- net/mac80211/rx.c | 9 +++++++++ net/mac80211/sta_info.c | 9 ++++++++- net/mac80211/sta_info.h | 24 ++++++++++++++++++------ net/mac80211/util.c | 13 +++++++++++++ 5 files changed, 64 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 65dd3982391f..e83cb9519e31 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -1462,6 +1462,7 @@ enum mac80211_rx_encoding { RX_ENC_HT, RX_ENC_VHT, RX_ENC_HE, + RX_ENC_EHT, }; /** @@ -1495,7 +1496,7 @@ enum mac80211_rx_encoding { * @antenna: antenna used * @rate_idx: index of data rate into band's supported rates or MCS index if * HT or VHT is used (%RX_FLAG_HT/%RX_FLAG_VHT) - * @nss: number of streams (VHT and HE only) + * @nss: number of streams (VHT, HE and EHT only) * @flag: %RX_FLAG_\* * @encoding: &enum mac80211_rx_encoding * @bw: &enum rate_info_bw @@ -1503,6 +1504,8 @@ enum mac80211_rx_encoding { * @he_ru: HE RU, from &enum nl80211_he_ru_alloc * @he_gi: HE GI, from &enum nl80211_he_gi * @he_dcm: HE DCM value + * @eht.ru: EHT RU, from &enum nl80211_eht_ru_alloc + * @eht.gi: EHT GI, from &enum nl80211_eht_gi * @rx_flags: internal RX flags for mac80211 * @ampdu_reference: A-MPDU reference number, must be a different value for * each A-MPDU but the same for each subframe within one A-MPDU @@ -1524,8 +1527,18 @@ struct ieee80211_rx_status { u32 flag; u16 freq: 13, freq_offset: 1; u8 enc_flags; - u8 encoding:2, bw:3, he_ru:3; - u8 he_gi:2, he_dcm:1; + u8 encoding:3, bw:4; + union { + struct { + u8 he_ru:3; + u8 he_gi:2; + u8 he_dcm:1; + }; + struct { + u8 ru:4; + u8 gi:2; + } eht; + }; u8 rate_idx; u8 nss; u8 rx_flags; diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index c6562a6d2503..e17e51abe050 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -5194,6 +5194,15 @@ void ieee80211_rx_list(struct ieee80211_hw *hw, struct ieee80211_sta *pubsta, status->rate_idx, status->nss)) goto drop; break; + case RX_ENC_EHT: + if (WARN_ONCE(status->rate_idx > 15 || + !status->nss || + status->nss > 8 || + status->eht.gi > NL80211_RATE_INFO_EHT_GI_3_2, + "Rate marked as an EHT rate but data is invalid: MCS:%d, NSS:%d, GI:%d\n", + status->rate_idx, status->nss, status->eht.gi)) + goto drop; + break; default: WARN_ON_ONCE(1); fallthrough; diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 04e0f132b1d9..27c737fe7fb8 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -4,7 +4,7 @@ * Copyright 2006-2007 Jiri Benc * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2015 - 2017 Intel Deutschland GmbH - * Copyright (C) 2018-2021 Intel Corporation + * Copyright (C) 2018-2022 Intel Corporation */ #include @@ -2406,6 +2406,13 @@ static void sta_stats_decode_rate(struct ieee80211_local *local, u32 rate, rinfo->he_ru_alloc = STA_STATS_GET(HE_RU, rate); rinfo->he_dcm = STA_STATS_GET(HE_DCM, rate); break; + case STA_STATS_RATE_TYPE_EHT: + rinfo->flags = RATE_INFO_FLAGS_EHT_MCS; + rinfo->mcs = STA_STATS_GET(EHT_MCS, rate); + rinfo->nss = STA_STATS_GET(EHT_NSS, rate); + rinfo->eht_gi = STA_STATS_GET(EHT_GI, rate); + rinfo->eht_ru_alloc = STA_STATS_GET(EHT_RU, rate); + break; } } diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index 69820b551668..c30f02874fb1 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -936,6 +936,7 @@ enum sta_stats_type { STA_STATS_RATE_TYPE_VHT, STA_STATS_RATE_TYPE_HE, STA_STATS_RATE_TYPE_S1G, + STA_STATS_RATE_TYPE_EHT, }; #define STA_STATS_FIELD_HT_MCS GENMASK( 7, 0) @@ -945,12 +946,16 @@ enum sta_stats_type { #define STA_STATS_FIELD_VHT_NSS GENMASK( 7, 4) #define STA_STATS_FIELD_HE_MCS GENMASK( 3, 0) #define STA_STATS_FIELD_HE_NSS GENMASK( 7, 4) -#define STA_STATS_FIELD_BW GENMASK(11, 8) -#define STA_STATS_FIELD_SGI GENMASK(12, 12) -#define STA_STATS_FIELD_TYPE GENMASK(15, 13) -#define STA_STATS_FIELD_HE_RU GENMASK(18, 16) -#define STA_STATS_FIELD_HE_GI GENMASK(20, 19) -#define STA_STATS_FIELD_HE_DCM GENMASK(21, 21) +#define STA_STATS_FIELD_EHT_MCS GENMASK( 3, 0) +#define STA_STATS_FIELD_EHT_NSS GENMASK( 7, 4) +#define STA_STATS_FIELD_BW GENMASK(12, 8) +#define STA_STATS_FIELD_SGI GENMASK(13, 13) +#define STA_STATS_FIELD_TYPE GENMASK(16, 14) +#define STA_STATS_FIELD_HE_RU GENMASK(19, 17) +#define STA_STATS_FIELD_HE_GI GENMASK(21, 20) +#define STA_STATS_FIELD_HE_DCM GENMASK(22, 22) +#define STA_STATS_FIELD_EHT_RU GENMASK(20, 17) +#define STA_STATS_FIELD_EHT_GI GENMASK(22, 21) #define STA_STATS_FIELD(_n, _v) FIELD_PREP(STA_STATS_FIELD_ ## _n, _v) #define STA_STATS_GET(_n, _v) FIELD_GET(STA_STATS_FIELD_ ## _n, _v) @@ -989,6 +994,13 @@ static inline u32 sta_stats_encode_rate(struct ieee80211_rx_status *s) r |= STA_STATS_FIELD(HE_RU, s->he_ru); r |= STA_STATS_FIELD(HE_DCM, s->he_dcm); break; + case RX_ENC_EHT: + r |= STA_STATS_FIELD(TYPE, STA_STATS_RATE_TYPE_EHT); + r |= STA_STATS_FIELD(EHT_NSS, s->nss); + r |= STA_STATS_FIELD(EHT_MCS, s->rate_idx); + r |= STA_STATS_FIELD(EHT_GI, s->eht.gi); + r |= STA_STATS_FIELD(EHT_RU, s->eht.ru); + break; default: WARN_ON(1); return STA_STATS_RATE_INVALID; diff --git a/net/mac80211/util.c b/net/mac80211/util.c index d84215bd5f8c..1a28fe5cb614 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -4020,6 +4020,19 @@ u64 ieee80211_calculate_rx_timestamp(struct ieee80211_local *local, /* Fill cfg80211 rate info */ switch (status->encoding) { + case RX_ENC_EHT: + ri.flags |= RATE_INFO_FLAGS_EHT_MCS; + ri.mcs = status->rate_idx; + ri.nss = status->nss; + ri.eht_ru_alloc = status->eht.ru; + if (status->enc_flags & RX_ENC_FLAG_SHORT_GI) + ri.flags |= RATE_INFO_FLAGS_SHORT_GI; + /* TODO/FIXME: is this right? handle other PPDUs */ + if (status->flag & RX_FLAG_MACTIME_PLCP_START) { + mpdu_offset += 2; + ts += 36; + } + break; case RX_ENC_HE: ri.flags |= RATE_INFO_FLAGS_HE_MCS; ri.mcs = status->rate_idx; -- cgit v1.2.3 From 3609ff6401c3660e859cda0dd944782ec8300e7e Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Sun, 8 Jan 2023 18:08:08 +0100 Subject: wifi: cfg80211: Deduplicate certificate loading load_keys_from_buffer() in net/wireless/reg.c duplicates x509_load_certificate_list() in crypto/asymmetric_keys/x509_loader.c for no apparent reason. Deduplicate it. No functional change intended. Signed-off-by: Lukas Wunner Acked-by: David Howells Link: https://lore.kernel.org/r/e7280be84acda02634bc7cb52c97656182b9c700.1673197326.git.lukas@wunner.de Signed-off-by: Johannes Berg --- crypto/asymmetric_keys/x509_loader.c | 1 + net/wireless/reg.c | 54 ++++++------------------------------ 2 files changed, 9 insertions(+), 46 deletions(-) (limited to 'net') diff --git a/crypto/asymmetric_keys/x509_loader.c b/crypto/asymmetric_keys/x509_loader.c index 1bc169dee22e..a41741326998 100644 --- a/crypto/asymmetric_keys/x509_loader.c +++ b/crypto/asymmetric_keys/x509_loader.c @@ -55,3 +55,4 @@ dodgy_cert: pr_err("Problem parsing in-kernel X.509 certificate list\n"); return 0; } +EXPORT_SYMBOL_GPL(x509_load_certificate_list); diff --git a/net/wireless/reg.c b/net/wireless/reg.c index 4f3f31244e8b..af65196c916e 100644 --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -737,51 +737,9 @@ static bool valid_country(const u8 *data, unsigned int size, } #ifdef CONFIG_CFG80211_REQUIRE_SIGNED_REGDB -static struct key *builtin_regdb_keys; - -static void __init load_keys_from_buffer(const u8 *p, unsigned int buflen) -{ - const u8 *end = p + buflen; - size_t plen; - key_ref_t key; - - while (p < end) { - /* Each cert begins with an ASN.1 SEQUENCE tag and must be more - * than 256 bytes in size. - */ - if (end - p < 4) - goto dodgy_cert; - if (p[0] != 0x30 && - p[1] != 0x82) - goto dodgy_cert; - plen = (p[2] << 8) | p[3]; - plen += 4; - if (plen > end - p) - goto dodgy_cert; - - key = key_create_or_update(make_key_ref(builtin_regdb_keys, 1), - "asymmetric", NULL, p, plen, - ((KEY_POS_ALL & ~KEY_POS_SETATTR) | - KEY_USR_VIEW | KEY_USR_READ), - KEY_ALLOC_NOT_IN_QUOTA | - KEY_ALLOC_BUILT_IN | - KEY_ALLOC_BYPASS_RESTRICTION); - if (IS_ERR(key)) { - pr_err("Problem loading in-kernel X.509 certificate (%ld)\n", - PTR_ERR(key)); - } else { - pr_notice("Loaded X.509 cert '%s'\n", - key_ref_to_ptr(key)->description); - key_ref_put(key); - } - p += plen; - } - - return; +#include -dodgy_cert: - pr_err("Problem parsing in-kernel X.509 certificate list\n"); -} +static struct key *builtin_regdb_keys; static int __init load_builtin_regdb_keys(void) { @@ -797,11 +755,15 @@ static int __init load_builtin_regdb_keys(void) pr_notice("Loading compiled-in X.509 certificates for regulatory database\n"); #ifdef CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS - load_keys_from_buffer(shipped_regdb_certs, shipped_regdb_certs_len); + x509_load_certificate_list(shipped_regdb_certs, + shipped_regdb_certs_len, + builtin_regdb_keys); #endif #ifdef CONFIG_CFG80211_EXTRA_REGDB_KEYDIR if (CONFIG_CFG80211_EXTRA_REGDB_KEYDIR[0] != '\0') - load_keys_from_buffer(extra_regdb_certs, extra_regdb_certs_len); + x509_load_certificate_list(extra_regdb_certs, + extra_regdb_certs_len, + builtin_regdb_keys); #endif return 0; -- cgit v1.2.3 From 82253ddaff582147cd3fd0e629c4e65d62b1d015 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 19 Jan 2023 14:57:11 +0100 Subject: wifi: mac80211: drop extra 'e' from ieeee80211... name Somehow an extra 'e' slipped in there without anyone noticing, drop that from ieeee80211_obss_color_collision_notify(). Signed-off-by: Johannes Berg --- drivers/net/wireless/ath/ath11k/wmi.c | 4 ++-- include/net/mac80211.h | 6 +++--- net/mac80211/cfg.c | 4 ++-- net/mac80211/rx.c | 6 +++--- 4 files changed, 10 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/drivers/net/wireless/ath/ath11k/wmi.c b/drivers/net/wireless/ath/ath11k/wmi.c index 2a8a3e3dcff6..b3a7d7bfe17c 100644 --- a/drivers/net/wireless/ath/ath11k/wmi.c +++ b/drivers/net/wireless/ath/ath11k/wmi.c @@ -3853,8 +3853,8 @@ ath11k_wmi_obss_color_collision_event(struct ath11k_base *ab, struct sk_buff *sk switch (ev->evt_type) { case WMI_BSS_COLOR_COLLISION_DETECTION: - ieeee80211_obss_color_collision_notify(arvif->vif, ev->obss_color_bitmap, - GFP_KERNEL); + ieee80211_obss_color_collision_notify(arvif->vif, ev->obss_color_bitmap, + GFP_KERNEL); ath11k_dbg(ab, ATH11K_DBG_WMI, "OBSS color collision detected vdev:%d, event:%d, bitmap:%08llx\n", ev->vdev_id, ev->evt_type, ev->obss_color_bitmap); diff --git a/include/net/mac80211.h b/include/net/mac80211.h index a945f1b1b4d8..2635e6de8101 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -7210,7 +7210,7 @@ ieee80211_get_unsol_bcast_probe_resp_tmpl(struct ieee80211_hw *hw, struct ieee80211_vif *vif); /** - * ieeee80211_obss_color_collision_notify - notify userland about a BSS color + * ieee80211_obss_color_collision_notify - notify userland about a BSS color * collision. * * @vif: &struct ieee80211_vif pointer from the add_interface callback. @@ -7219,8 +7219,8 @@ ieee80211_get_unsol_bcast_probe_resp_tmpl(struct ieee80211_hw *hw, * @gfp: allocation flags */ void -ieeee80211_obss_color_collision_notify(struct ieee80211_vif *vif, - u64 color_bitmap, gfp_t gfp); +ieee80211_obss_color_collision_notify(struct ieee80211_vif *vif, + u64 color_bitmap, gfp_t gfp); /** * ieee80211_is_tx_data - check if frame is a data frame diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 99116485eab1..f5d43f42f6d8 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -4662,7 +4662,7 @@ void ieee80211_color_change_finish(struct ieee80211_vif *vif) EXPORT_SYMBOL_GPL(ieee80211_color_change_finish); void -ieeee80211_obss_color_collision_notify(struct ieee80211_vif *vif, +ieee80211_obss_color_collision_notify(struct ieee80211_vif *vif, u64 color_bitmap, gfp_t gfp) { struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); @@ -4672,7 +4672,7 @@ ieeee80211_obss_color_collision_notify(struct ieee80211_vif *vif, cfg80211_obss_color_collision_notify(sdata->dev, color_bitmap, gfp); } -EXPORT_SYMBOL_GPL(ieeee80211_obss_color_collision_notify); +EXPORT_SYMBOL_GPL(ieee80211_obss_color_collision_notify); static int ieee80211_color_change(struct wiphy *wiphy, struct net_device *dev, diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index e17e51abe050..e284897ba5e9 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -3219,9 +3219,9 @@ ieee80211_rx_check_bss_color_collision(struct ieee80211_rx_data *rx) color = le32_get_bits(he_oper->he_oper_params, IEEE80211_HE_OPERATION_BSS_COLOR_MASK); if (color == bss_conf->he_bss_color.color) - ieeee80211_obss_color_collision_notify(&rx->sdata->vif, - BIT_ULL(color), - GFP_ATOMIC); + ieee80211_obss_color_collision_notify(&rx->sdata->vif, + BIT_ULL(color), + GFP_ATOMIC); } } -- cgit v1.2.3 From dc09766c755c5b06d71fe5d2f4f3d431540eaae0 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 18 Jan 2023 10:51:52 +0100 Subject: wifi: wireless: warn on most wireless extension usage With WiFi 7 (802.11ax, MLO/EHT) around the corner, we're going to remove support for wireless extensions with new devices since MLO (multi-link operation) cannot be properly indicated using them. Add a warning to indicate which processes are still using wireless extensions, if being used with modern (i.e. cfg80211) drivers. Signed-off-by: Johannes Berg Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230118105152.a7158a929a6f.Ifcf30eeeb8fc7019e4dcf2782b04515254d165e1@changeid --- net/wireless/wext-core.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c index fe8765c4075d..6e5f5ea92ddb 100644 --- a/net/wireless/wext-core.c +++ b/net/wireless/wext-core.c @@ -636,7 +636,15 @@ void wireless_send_event(struct net_device * dev, } EXPORT_SYMBOL(wireless_send_event); +#ifdef CONFIG_CFG80211_WEXT +static void wireless_warn_cfg80211_wext(void) +{ + char name[sizeof(current->comm)]; + pr_warn_ratelimited("warning: `%s' uses wireless extensions that are deprecated for modern drivers; use nl80211\n", + get_task_comm(name, current)); +} +#endif /* IW handlers */ @@ -652,8 +660,10 @@ struct iw_statistics *get_wireless_stats(struct net_device *dev) if (dev->ieee80211_ptr && dev->ieee80211_ptr->wiphy && dev->ieee80211_ptr->wiphy->wext && - dev->ieee80211_ptr->wiphy->wext->get_wireless_stats) + dev->ieee80211_ptr->wiphy->wext->get_wireless_stats) { + wireless_warn_cfg80211_wext(); return dev->ieee80211_ptr->wiphy->wext->get_wireless_stats(dev); + } #endif /* not found */ @@ -690,8 +700,10 @@ static iw_handler get_handler(struct net_device *dev, unsigned int cmd) const struct iw_handler_def *handlers = NULL; #ifdef CONFIG_CFG80211_WEXT - if (dev->ieee80211_ptr && dev->ieee80211_ptr->wiphy) + if (dev->ieee80211_ptr && dev->ieee80211_ptr->wiphy) { + wireless_warn_cfg80211_wext(); handlers = dev->ieee80211_ptr->wiphy->wext; + } #endif #ifdef CONFIG_WIRELESS_EXT if (dev->wireless_handlers) -- cgit v1.2.3 From 4ca69027691a0039279b64cfa0aa511d9c9fde59 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 18 Jan 2023 10:51:53 +0100 Subject: wifi: wireless: deny wireless extensions on MLO-capable devices These are WiFi 7 devices that will be introduced into the market in 2023, with new drivers. Wireless extensions haven't been in real development since 2006. Since wireless has evolved a lot, and continues to evolve significantly with Multi-Link Operation, there's really no good way to still support wireless extensions for devices that do MLO. Stop supporting wireless extensions for new devices. We don't consider this a regression since no such devices (apart from hwsim) exist yet. Signed-off-by: Johannes Berg Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230118105152.45f85078a1e0.Ib9eabc2ec5bf6b0244e4d973e93baaa3d8c91bd8@changeid --- net/wireless/wext-core.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'net') diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c index 6e5f5ea92ddb..13a72b17248e 100644 --- a/net/wireless/wext-core.c +++ b/net/wireless/wext-core.c @@ -662,6 +662,8 @@ struct iw_statistics *get_wireless_stats(struct net_device *dev) dev->ieee80211_ptr->wiphy->wext && dev->ieee80211_ptr->wiphy->wext->get_wireless_stats) { wireless_warn_cfg80211_wext(); + if (dev->ieee80211_ptr->wiphy->flags & WIPHY_FLAG_SUPPORTS_MLO) + return NULL; return dev->ieee80211_ptr->wiphy->wext->get_wireless_stats(dev); } #endif @@ -702,6 +704,8 @@ static iw_handler get_handler(struct net_device *dev, unsigned int cmd) #ifdef CONFIG_CFG80211_WEXT if (dev->ieee80211_ptr && dev->ieee80211_ptr->wiphy) { wireless_warn_cfg80211_wext(); + if (dev->ieee80211_ptr->wiphy->flags & WIPHY_FLAG_SUPPORTS_MLO) + return NULL; handlers = dev->ieee80211_ptr->wiphy->wext; } #endif -- cgit v1.2.3 From 5cc9049cb9021a46ad5711a946eb3ded47eed0de Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:04 +0100 Subject: devlink: remove linecards lock Similar to other devlink objects, convert the linecards list to be protected by devlink instance lock. Alongside with that rename the create/destroy() functions to devl_* to indicate the devlink instance lock needs to be held while calling them. Signed-off-by: Jiri Pirko Reviewed-by: Ido Schimmel Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- .../net/ethernet/mellanox/mlxsw/core_linecards.c | 8 ++--- include/net/devlink.h | 6 ++-- net/devlink/core.c | 2 -- net/devlink/devl_internal.h | 1 - net/devlink/leftover.c | 40 +++++++--------------- 5 files changed, 20 insertions(+), 37 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c index 83d2dc91ba2c..025e0db983fe 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_linecards.c @@ -1259,9 +1259,9 @@ static int mlxsw_linecard_init(struct mlxsw_core *mlxsw_core, linecard->linecards = linecards; mutex_init(&linecard->lock); - devlink_linecard = devlink_linecard_create(priv_to_devlink(mlxsw_core), - slot_index, &mlxsw_linecard_ops, - linecard); + devlink_linecard = devl_linecard_create(priv_to_devlink(mlxsw_core), + slot_index, &mlxsw_linecard_ops, + linecard); if (IS_ERR(devlink_linecard)) return PTR_ERR(devlink_linecard); @@ -1285,7 +1285,7 @@ static void mlxsw_linecard_fini(struct mlxsw_core *mlxsw_core, if (linecard->active) mlxsw_linecard_active_clear(linecard); mlxsw_linecard_bdev_del(linecard); - devlink_linecard_destroy(linecard->devlink_linecard); + devl_linecard_destroy(linecard->devlink_linecard); mutex_destroy(&linecard->lock); } diff --git a/include/net/devlink.h b/include/net/devlink.h index 425ecef431b7..d7c9572e5bea 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1687,9 +1687,9 @@ void devl_rate_nodes_destroy(struct devlink *devlink); void devlink_port_linecard_set(struct devlink_port *devlink_port, struct devlink_linecard *linecard); struct devlink_linecard * -devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index, - const struct devlink_linecard_ops *ops, void *priv); -void devlink_linecard_destroy(struct devlink_linecard *linecard); +devl_linecard_create(struct devlink *devlink, unsigned int linecard_index, + const struct devlink_linecard_ops *ops, void *priv); +void devl_linecard_destroy(struct devlink_linecard *linecard); void devlink_linecard_provision_set(struct devlink_linecard *linecard, const char *type); void devlink_linecard_provision_clear(struct devlink_linecard *linecard); diff --git a/net/devlink/core.c b/net/devlink/core.c index 60beca2df7cc..dfc5b58c0464 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -247,7 +247,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, mutex_init(&devlink->lock); lockdep_set_class(&devlink->lock, &devlink->lock_key); mutex_init(&devlink->reporters_lock); - mutex_init(&devlink->linecards_lock); refcount_set(&devlink->refcount, 1); return devlink; @@ -269,7 +268,6 @@ void devlink_free(struct devlink *devlink) { ASSERT_DEVLINK_NOT_REGISTERED(devlink); - mutex_destroy(&devlink->linecards_lock); mutex_destroy(&devlink->reporters_lock); WARN_ON(!list_empty(&devlink->trap_policer_list)); WARN_ON(!list_empty(&devlink->trap_group_list)); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index e724e4c2a4ff..32f0adc40c18 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -38,7 +38,6 @@ struct devlink { struct list_head trap_group_list; struct list_head trap_policer_list; struct list_head linecard_list; - struct mutex linecards_lock; /* protects linecard_list */ const struct devlink_ops *ops; u64 features; struct xarray snapshot_ids; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index bf5e0b1c0422..44f308d45697 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -282,13 +282,10 @@ devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) u32 linecard_index = nla_get_u32(attrs[DEVLINK_ATTR_LINECARD_INDEX]); struct devlink_linecard *linecard; - mutex_lock(&devlink->linecards_lock); linecard = devlink_linecard_get_by_index(devlink, linecard_index); - if (linecard) - refcount_inc(&linecard->refcount); - mutex_unlock(&devlink->linecards_lock); if (!linecard) return ERR_PTR(-ENODEV); + refcount_inc(&linecard->refcount); return linecard; } return ERR_PTR(-EINVAL); @@ -2129,7 +2126,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, devlink_dump_for_each_instance_get(msg, state, devlink) { int idx = 0; - mutex_lock(&devlink->linecards_lock); + devl_lock(devlink); if (!devl_is_registered(devlink)) goto next_devlink; @@ -2147,7 +2144,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, cb->extack); mutex_unlock(&linecard->state_lock); if (err) { - mutex_unlock(&devlink->linecards_lock); + devl_unlock(devlink); devlink_put(devlink); state->idx = idx; goto out; @@ -2155,7 +2152,7 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, idx++; } next_devlink: - mutex_unlock(&devlink->linecards_lock); + devl_unlock(devlink); devlink_put(devlink); } out: @@ -10223,7 +10220,7 @@ static void devlink_linecard_types_fini(struct devlink_linecard *linecard) } /** - * devlink_linecard_create - Create devlink linecard + * devl_linecard_create - Create devlink linecard * * @devlink: devlink * @linecard_index: driver-specific numerical identifier of the linecard @@ -10236,8 +10233,8 @@ static void devlink_linecard_types_fini(struct devlink_linecard *linecard) * Return: Line card structure or an ERR_PTR() encoded error code. */ struct devlink_linecard * -devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index, - const struct devlink_linecard_ops *ops, void *priv) +devl_linecard_create(struct devlink *devlink, unsigned int linecard_index, + const struct devlink_linecard_ops *ops, void *priv) { struct devlink_linecard *linecard; int err; @@ -10246,17 +10243,12 @@ devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index, !ops->types_count || !ops->types_get)) return ERR_PTR(-EINVAL); - mutex_lock(&devlink->linecards_lock); - if (devlink_linecard_index_exists(devlink, linecard_index)) { - mutex_unlock(&devlink->linecards_lock); + if (devlink_linecard_index_exists(devlink, linecard_index)) return ERR_PTR(-EEXIST); - } linecard = kzalloc(sizeof(*linecard), GFP_KERNEL); - if (!linecard) { - mutex_unlock(&devlink->linecards_lock); + if (!linecard) return ERR_PTR(-ENOMEM); - } linecard->devlink = devlink; linecard->index = linecard_index; @@ -10269,35 +10261,29 @@ devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index, if (err) { mutex_destroy(&linecard->state_lock); kfree(linecard); - mutex_unlock(&devlink->linecards_lock); return ERR_PTR(err); } list_add_tail(&linecard->list, &devlink->linecard_list); refcount_set(&linecard->refcount, 1); - mutex_unlock(&devlink->linecards_lock); devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); return linecard; } -EXPORT_SYMBOL_GPL(devlink_linecard_create); +EXPORT_SYMBOL_GPL(devl_linecard_create); /** - * devlink_linecard_destroy - Destroy devlink linecard + * devl_linecard_destroy - Destroy devlink linecard * * @linecard: devlink linecard */ -void devlink_linecard_destroy(struct devlink_linecard *linecard) +void devl_linecard_destroy(struct devlink_linecard *linecard) { - struct devlink *devlink = linecard->devlink; - devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); - mutex_lock(&devlink->linecards_lock); list_del(&linecard->list); devlink_linecard_types_fini(linecard); - mutex_unlock(&devlink->linecards_lock); devlink_linecard_put(linecard); } -EXPORT_SYMBOL_GPL(devlink_linecard_destroy); +EXPORT_SYMBOL_GPL(devl_linecard_destroy); /** * devlink_linecard_provision_set - Set provisioning on linecard -- cgit v1.2.3 From 3a10173f48aa1bae64baf0fea228c6045996b117 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:05 +0100 Subject: devlink: remove linecard reference counting As long as the linecard life time is protected by devlink instance lock, the reference counting is no longer needed. Remove it. Signed-off-by: Jiri Pirko Reviewed-by: Ido Schimmel Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 1 - net/devlink/leftover.c | 14 ++------------ net/devlink/netlink.c | 5 ----- 3 files changed, 2 insertions(+), 18 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 32f0adc40c18..dd4e2c37cf07 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -193,7 +193,6 @@ struct devlink_linecard; struct devlink_linecard * devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info); -void devlink_linecard_put(struct devlink_linecard *linecard); /* Rates */ extern const struct devlink_gen_cmd devl_gen_rate_get; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 44f308d45697..f029e1bc68f6 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -37,7 +37,6 @@ struct devlink_linecard { struct list_head list; struct devlink *devlink; unsigned int index; - refcount_t refcount; const struct devlink_linecard_ops *ops; void *priv; enum devlink_linecard_state state; @@ -285,7 +284,6 @@ devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) linecard = devlink_linecard_get_by_index(devlink, linecard_index); if (!linecard) return ERR_PTR(-ENODEV); - refcount_inc(&linecard->refcount); return linecard; } return ERR_PTR(-EINVAL); @@ -297,14 +295,6 @@ devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info) return devlink_linecard_get_from_attrs(devlink, info->attrs); } -void devlink_linecard_put(struct devlink_linecard *linecard) -{ - if (refcount_dec_and_test(&linecard->refcount)) { - mutex_destroy(&linecard->state_lock); - kfree(linecard); - } -} - struct devlink_sb { struct list_head list; unsigned int index; @@ -10265,7 +10255,6 @@ devl_linecard_create(struct devlink *devlink, unsigned int linecard_index, } list_add_tail(&linecard->list, &devlink->linecard_list); - refcount_set(&linecard->refcount, 1); devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW); return linecard; } @@ -10281,7 +10270,8 @@ void devl_linecard_destroy(struct devlink_linecard *linecard) devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL); list_del(&linecard->list); devlink_linecard_types_fini(linecard); - devlink_linecard_put(linecard); + mutex_destroy(&linecard->state_lock); + kfree(linecard); } EXPORT_SYMBOL_GPL(devl_linecard_destroy); diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index b5b8ac6db2d1..3f2ab4360f11 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -170,14 +170,9 @@ unlock: static void devlink_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info) { - struct devlink_linecard *linecard; struct devlink *devlink; devlink = info->user_ptr[0]; - if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) { - linecard = info->user_ptr[1]; - devlink_linecard_put(linecard); - } devl_unlock(devlink); devlink_put(devlink); } -- cgit v1.2.3 From dfdfd1305ddecb990566193f2ba8a11bccba4cde Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:08 +0100 Subject: devlink: protect health reporter operation with instance lock Similar to other devlink objects, protect the reporters list by devlink instance lock. Alongside add unlocked versions of health reporter create/destroy functions and use them in drivers on call paths where the instance lock is held. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlxsw/core.c | 8 +-- drivers/net/netdevsim/health.c | 20 +++--- include/net/devlink.h | 22 ++++++- net/devlink/leftover.c | 99 +++++++++++++++++++++++------- 4 files changed, 110 insertions(+), 39 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index a0a06e2eff82..33ef726e4d54 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -2051,8 +2051,8 @@ static int mlxsw_core_health_init(struct mlxsw_core *mlxsw_core) if (!(mlxsw_core->bus->features & MLXSW_BUS_F_TXRX)) return 0; - fw_fatal = devlink_health_reporter_create(devlink, &mlxsw_core_health_fw_fatal_ops, - 0, mlxsw_core); + fw_fatal = devl_health_reporter_create(devlink, &mlxsw_core_health_fw_fatal_ops, + 0, mlxsw_core); if (IS_ERR(fw_fatal)) { dev_err(mlxsw_core->bus_info->dev, "Failed to create fw fatal reporter"); return PTR_ERR(fw_fatal); @@ -2072,7 +2072,7 @@ static int mlxsw_core_health_init(struct mlxsw_core *mlxsw_core) err_fw_fatal_config: mlxsw_core_trap_unregister(mlxsw_core, &mlxsw_core_health_listener, mlxsw_core); err_trap_register: - devlink_health_reporter_destroy(mlxsw_core->health.fw_fatal); + devl_health_reporter_destroy(mlxsw_core->health.fw_fatal); return err; } @@ -2085,7 +2085,7 @@ static void mlxsw_core_health_fini(struct mlxsw_core *mlxsw_core) mlxsw_core_trap_unregister(mlxsw_core, &mlxsw_core_health_listener, mlxsw_core); /* Make sure there is no more event work scheduled */ mlxsw_core_flush_owq(); - devlink_health_reporter_destroy(mlxsw_core->health.fw_fatal); + devl_health_reporter_destroy(mlxsw_core->health.fw_fatal); } static void mlxsw_core_irq_event_handler_init(struct mlxsw_core *mlxsw_core) diff --git a/drivers/net/netdevsim/health.c b/drivers/net/netdevsim/health.c index aa77af4a68df..eb04ed715d2d 100644 --- a/drivers/net/netdevsim/health.c +++ b/drivers/net/netdevsim/health.c @@ -233,16 +233,16 @@ int nsim_dev_health_init(struct nsim_dev *nsim_dev, struct devlink *devlink) int err; health->empty_reporter = - devlink_health_reporter_create(devlink, - &nsim_dev_empty_reporter_ops, - 0, health); + devl_health_reporter_create(devlink, + &nsim_dev_empty_reporter_ops, + 0, health); if (IS_ERR(health->empty_reporter)) return PTR_ERR(health->empty_reporter); health->dummy_reporter = - devlink_health_reporter_create(devlink, - &nsim_dev_dummy_reporter_ops, - 0, health); + devl_health_reporter_create(devlink, + &nsim_dev_dummy_reporter_ops, + 0, health); if (IS_ERR(health->dummy_reporter)) { err = PTR_ERR(health->dummy_reporter); goto err_empty_reporter_destroy; @@ -266,9 +266,9 @@ int nsim_dev_health_init(struct nsim_dev *nsim_dev, struct devlink *devlink) return 0; err_dummy_reporter_destroy: - devlink_health_reporter_destroy(health->dummy_reporter); + devl_health_reporter_destroy(health->dummy_reporter); err_empty_reporter_destroy: - devlink_health_reporter_destroy(health->empty_reporter); + devl_health_reporter_destroy(health->empty_reporter); return err; } @@ -278,6 +278,6 @@ void nsim_dev_health_exit(struct nsim_dev *nsim_dev) debugfs_remove_recursive(health->ddir); kfree(health->recovered_break_msg); - devlink_health_reporter_destroy(health->dummy_reporter); - devlink_health_reporter_destroy(health->empty_reporter); + devl_health_reporter_destroy(health->dummy_reporter); + devl_health_reporter_destroy(health->empty_reporter); } diff --git a/include/net/devlink.h b/include/net/devlink.h index d7c9572e5bea..0d64feaef7cb 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1865,18 +1865,34 @@ int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name, const void *value, u32 value_len); struct devlink_health_reporter * -devlink_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv); +devl_port_health_reporter_create(struct devlink_port *port, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv); struct devlink_health_reporter * devlink_port_health_reporter_create(struct devlink_port *port, const struct devlink_health_reporter_ops *ops, u64 graceful_period, void *priv); +struct devlink_health_reporter * +devl_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv); + +struct devlink_health_reporter * +devlink_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv); + +void +devl_health_reporter_destroy(struct devlink_health_reporter *reporter); + void devlink_health_reporter_destroy(struct devlink_health_reporter *reporter); +void +devl_port_health_reporter_destroy(struct devlink_health_reporter *reporter); + void devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index f029e1bc68f6..0263dc18e72d 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7337,8 +7337,8 @@ __devlink_health_reporter_create(struct devlink *devlink, } /** - * devlink_port_health_reporter_create - create devlink health reporter for - * specified port instance + * devl_port_health_reporter_create - create devlink health reporter for + * specified port instance * * @port: devlink_port which should contain the new reporter * @ops: ops @@ -7346,12 +7346,13 @@ __devlink_health_reporter_create(struct devlink *devlink, * @priv: priv */ struct devlink_health_reporter * -devlink_port_health_reporter_create(struct devlink_port *port, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) +devl_port_health_reporter_create(struct devlink_port *port, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) { struct devlink_health_reporter *reporter; + devl_assert_locked(port->devlink); mutex_lock(&port->reporters_lock); if (__devlink_health_reporter_find_by_name(&port->reporter_list, &port->reporters_lock, ops->name)) { @@ -7370,10 +7371,26 @@ unlock: mutex_unlock(&port->reporters_lock); return reporter; } +EXPORT_SYMBOL_GPL(devl_port_health_reporter_create); + +struct devlink_health_reporter * +devlink_port_health_reporter_create(struct devlink_port *port, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + struct devlink *devlink = port->devlink; + + devl_lock(devlink); + reporter = devl_port_health_reporter_create(port, ops, + graceful_period, priv); + devl_unlock(devlink); + return reporter; +} EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create); /** - * devlink_health_reporter_create - create devlink health reporter + * devl_health_reporter_create - create devlink health reporter * * @devlink: devlink * @ops: ops @@ -7381,12 +7398,13 @@ EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create); * @priv: priv */ struct devlink_health_reporter * -devlink_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) +devl_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) { struct devlink_health_reporter *reporter; + devl_assert_locked(devlink); mutex_lock(&devlink->reporters_lock); if (devlink_health_reporter_find_by_name(devlink, ops->name)) { reporter = ERR_PTR(-EEXIST); @@ -7403,6 +7421,21 @@ unlock: mutex_unlock(&devlink->reporters_lock); return reporter; } +EXPORT_SYMBOL_GPL(devl_health_reporter_create); + +struct devlink_health_reporter * +devlink_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + devl_lock(devlink); + reporter = devl_health_reporter_create(devlink, ops, + graceful_period, priv); + devl_unlock(devlink); + return reporter; +} EXPORT_SYMBOL_GPL(devlink_health_reporter_create); static void @@ -7429,35 +7462,61 @@ __devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) } /** - * devlink_health_reporter_destroy - destroy devlink health reporter + * devl_health_reporter_destroy - destroy devlink health reporter * * @reporter: devlink health reporter to destroy */ void -devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) +devl_health_reporter_destroy(struct devlink_health_reporter *reporter) { struct mutex *lock = &reporter->devlink->reporters_lock; + devl_assert_locked(reporter->devlink); + mutex_lock(lock); __devlink_health_reporter_destroy(reporter); mutex_unlock(lock); } +EXPORT_SYMBOL_GPL(devl_health_reporter_destroy); + +void +devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + struct devlink *devlink = reporter->devlink; + + devl_lock(devlink); + devl_health_reporter_destroy(reporter); + devl_unlock(devlink); +} EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); /** - * devlink_port_health_reporter_destroy - destroy devlink port health reporter + * devl_port_health_reporter_destroy - destroy devlink port health reporter * * @reporter: devlink health reporter to destroy */ void -devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter) +devl_port_health_reporter_destroy(struct devlink_health_reporter *reporter) { struct mutex *lock = &reporter->devlink_port->reporters_lock; + devl_assert_locked(reporter->devlink); + mutex_lock(lock); __devlink_health_reporter_destroy(reporter); mutex_unlock(lock); } +EXPORT_SYMBOL_GPL(devl_port_health_reporter_destroy); + +void +devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + struct devlink *devlink = reporter->devlink; + + devl_lock(devlink); + devl_port_health_reporter_destroy(reporter); + devl_unlock(devlink); +} EXPORT_SYMBOL_GPL(devlink_port_health_reporter_destroy); static int @@ -7805,12 +7864,11 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, unsigned long port_index; int idx = 0; + devl_lock(devlink); + if (!devl_is_registered(devlink)) + goto next_devlink; + mutex_lock(&devlink->reporters_lock); - if (!devl_is_registered(devlink)) { - mutex_unlock(&devlink->reporters_lock); - devlink_put(devlink); - continue; - } list_for_each_entry(reporter, &devlink->reporter_list, list) { @@ -7824,6 +7882,7 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, NLM_F_MULTI); if (err) { mutex_unlock(&devlink->reporters_lock); + devl_unlock(devlink); devlink_put(devlink); state->idx = idx; goto out; @@ -7832,10 +7891,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, } mutex_unlock(&devlink->reporters_lock); - devl_lock(devlink); - if (!devl_is_registered(devlink)) - goto next_devlink; - xa_for_each(&devlink->ports, port_index, port) { mutex_lock(&port->reporters_lock); list_for_each_entry(reporter, &port->reporter_list, list) { -- cgit v1.2.3 From 1dea3b4e4c52f4bed64d1c527d548e82ccaea15a Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:09 +0100 Subject: devlink: remove reporters_lock Similar to other devlink objects, rely on devlink instance lock and remove object specific reporters_lock. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 1 - net/devlink/core.c | 2 -- net/devlink/devl_internal.h | 1 - net/devlink/leftover.c | 53 ++++++++------------------------------------- 4 files changed, 9 insertions(+), 48 deletions(-) (limited to 'net') diff --git a/include/net/devlink.h b/include/net/devlink.h index 0d64feaef7cb..d9ea76bea36e 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -146,7 +146,6 @@ struct devlink_port { initialized:1; struct delayed_work type_warn_dw; struct list_head reporter_list; - struct mutex reporters_lock; /* Protects reporter_list */ struct devlink_rate *devlink_rate; struct devlink_linecard *linecard; diff --git a/net/devlink/core.c b/net/devlink/core.c index dfc5b58c0464..6c0e2fc57e45 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -246,7 +246,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, lockdep_register_key(&devlink->lock_key); mutex_init(&devlink->lock); lockdep_set_class(&devlink->lock, &devlink->lock_key); - mutex_init(&devlink->reporters_lock); refcount_set(&devlink->refcount, 1); return devlink; @@ -268,7 +267,6 @@ void devlink_free(struct devlink *devlink) { ASSERT_DEVLINK_NOT_REGISTERED(devlink); - mutex_destroy(&devlink->reporters_lock); WARN_ON(!list_empty(&devlink->trap_policer_list)); WARN_ON(!list_empty(&devlink->trap_group_list)); WARN_ON(!list_empty(&devlink->trap_list)); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index dd4e2c37cf07..bdb83014b4b5 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -32,7 +32,6 @@ struct devlink { struct list_head param_list; struct list_head region_list; struct list_head reporter_list; - struct mutex reporters_lock; /* protects reporter_list */ struct devlink_dpipe_headers *dpipe_headers; struct list_head trap_list; struct list_head trap_group_list; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 0263dc18e72d..548e4a379e48 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7281,12 +7281,10 @@ EXPORT_SYMBOL_GPL(devlink_health_reporter_priv); static struct devlink_health_reporter * __devlink_health_reporter_find_by_name(struct list_head *reporter_list, - struct mutex *list_lock, const char *reporter_name) { struct devlink_health_reporter *reporter; - lockdep_assert_held(list_lock); list_for_each_entry(reporter, reporter_list, list) if (!strcmp(reporter->ops->name, reporter_name)) return reporter; @@ -7298,7 +7296,6 @@ devlink_health_reporter_find_by_name(struct devlink *devlink, const char *reporter_name) { return __devlink_health_reporter_find_by_name(&devlink->reporter_list, - &devlink->reporters_lock, reporter_name); } @@ -7307,7 +7304,6 @@ devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, const char *reporter_name) { return __devlink_health_reporter_find_by_name(&devlink_port->reporter_list, - &devlink_port->reporters_lock, reporter_name); } @@ -7353,22 +7349,18 @@ devl_port_health_reporter_create(struct devlink_port *port, struct devlink_health_reporter *reporter; devl_assert_locked(port->devlink); - mutex_lock(&port->reporters_lock); + if (__devlink_health_reporter_find_by_name(&port->reporter_list, - &port->reporters_lock, ops->name)) { - reporter = ERR_PTR(-EEXIST); - goto unlock; - } + ops->name)) + return ERR_PTR(-EEXIST); reporter = __devlink_health_reporter_create(port->devlink, ops, graceful_period, priv); if (IS_ERR(reporter)) - goto unlock; + return reporter; reporter->devlink_port = port; list_add_tail(&reporter->list, &port->reporter_list); -unlock: - mutex_unlock(&port->reporters_lock); return reporter; } EXPORT_SYMBOL_GPL(devl_port_health_reporter_create); @@ -7405,20 +7397,16 @@ devl_health_reporter_create(struct devlink *devlink, struct devlink_health_reporter *reporter; devl_assert_locked(devlink); - mutex_lock(&devlink->reporters_lock); - if (devlink_health_reporter_find_by_name(devlink, ops->name)) { - reporter = ERR_PTR(-EEXIST); - goto unlock; - } + + if (devlink_health_reporter_find_by_name(devlink, ops->name)) + return ERR_PTR(-EEXIST); reporter = __devlink_health_reporter_create(devlink, ops, graceful_period, priv); if (IS_ERR(reporter)) - goto unlock; + return reporter; list_add_tail(&reporter->list, &devlink->reporter_list); -unlock: - mutex_unlock(&devlink->reporters_lock); return reporter; } EXPORT_SYMBOL_GPL(devl_health_reporter_create); @@ -7469,13 +7457,9 @@ __devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) void devl_health_reporter_destroy(struct devlink_health_reporter *reporter) { - struct mutex *lock = &reporter->devlink->reporters_lock; - devl_assert_locked(reporter->devlink); - mutex_lock(lock); __devlink_health_reporter_destroy(reporter); - mutex_unlock(lock); } EXPORT_SYMBOL_GPL(devl_health_reporter_destroy); @@ -7498,13 +7482,9 @@ EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); void devl_port_health_reporter_destroy(struct devlink_health_reporter *reporter) { - struct mutex *lock = &reporter->devlink_port->reporters_lock; - devl_assert_locked(reporter->devlink); - mutex_lock(lock); __devlink_health_reporter_destroy(reporter); - mutex_unlock(lock); } EXPORT_SYMBOL_GPL(devl_port_health_reporter_destroy); @@ -7758,17 +7738,13 @@ devlink_health_reporter_get_from_attrs(struct devlink *devlink, reporter_name = nla_data(attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]); devlink_port = devlink_port_get_from_attrs(devlink, attrs); if (IS_ERR(devlink_port)) { - mutex_lock(&devlink->reporters_lock); reporter = devlink_health_reporter_find_by_name(devlink, reporter_name); if (reporter) refcount_inc(&reporter->refcount); - mutex_unlock(&devlink->reporters_lock); } else { - mutex_lock(&devlink_port->reporters_lock); reporter = devlink_port_health_reporter_find_by_name(devlink_port, reporter_name); if (reporter) refcount_inc(&reporter->refcount); - mutex_unlock(&devlink_port->reporters_lock); } return reporter; @@ -7868,8 +7844,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, if (!devl_is_registered(devlink)) goto next_devlink; - mutex_lock(&devlink->reporters_lock); - list_for_each_entry(reporter, &devlink->reporter_list, list) { if (idx < state->idx) { @@ -7881,7 +7855,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI); if (err) { - mutex_unlock(&devlink->reporters_lock); devl_unlock(devlink); devlink_put(devlink); state->idx = idx; @@ -7889,10 +7862,8 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, } idx++; } - mutex_unlock(&devlink->reporters_lock); xa_for_each(&devlink->ports, port_index, port) { - mutex_lock(&port->reporters_lock); list_for_each_entry(reporter, &port->reporter_list, list) { if (idx < state->idx) { idx++; @@ -7904,7 +7875,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI); if (err) { - mutex_unlock(&port->reporters_lock); devl_unlock(devlink); devlink_put(devlink); state->idx = idx; @@ -7912,7 +7882,6 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, } idx++; } - mutex_unlock(&port->reporters_lock); } next_devlink: devl_unlock(devlink); @@ -9633,12 +9602,9 @@ int devl_port_register(struct devlink *devlink, devlink_port->index = port_index; spin_lock_init(&devlink_port->type_lock); INIT_LIST_HEAD(&devlink_port->reporter_list); - mutex_init(&devlink_port->reporters_lock); err = xa_insert(&devlink->ports, port_index, devlink_port, GFP_KERNEL); - if (err) { - mutex_destroy(&devlink_port->reporters_lock); + if (err) return err; - } INIT_DELAYED_WORK(&devlink_port->type_warn_dw, &devlink_port_type_warn); devlink_port_type_warn_schedule(devlink_port); @@ -9689,7 +9655,6 @@ void devl_port_unregister(struct devlink_port *devlink_port) devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL); xa_erase(&devlink_port->devlink->ports, devlink_port->index); WARN_ON(!list_empty(&devlink_port->reporter_list)); - mutex_destroy(&devlink_port->reporters_lock); devlink_port->registered = false; } EXPORT_SYMBOL_GPL(devl_port_unregister); -- cgit v1.2.3 From 9f167327efecc3977deff0c852760e3759b0c2a7 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:10 +0100 Subject: devlink: remove devl*_port_health_reporter_destroy() Remove port-specific health reporter destroy function as it is currently the same as the instance one so no longer needed. Inline __devlink_health_reporter_destroy() as it is no longer called from multiple places. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- .../ethernet/mellanox/mlx5/core/en/reporter_rx.c | 2 +- .../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 +- include/net/devlink.h | 6 ---- net/devlink/leftover.c | 35 ++-------------------- 4 files changed, 4 insertions(+), 41 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c index 1ae15b8536a8..95edab4a1732 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c @@ -754,6 +754,6 @@ void mlx5e_reporter_rx_destroy(struct mlx5e_priv *priv) if (!priv->rx_reporter) return; - devlink_port_health_reporter_destroy(priv->rx_reporter); + devlink_health_reporter_destroy(priv->rx_reporter); priv->rx_reporter = NULL; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index 60bc5b577ab9..b195dbbf6c90 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -609,6 +609,6 @@ void mlx5e_reporter_tx_destroy(struct mlx5e_priv *priv) if (!priv->tx_reporter) return; - devlink_port_health_reporter_destroy(priv->tx_reporter); + devlink_health_reporter_destroy(priv->tx_reporter); priv->tx_reporter = NULL; } diff --git a/include/net/devlink.h b/include/net/devlink.h index d9ea76bea36e..608a0c198be8 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1889,12 +1889,6 @@ devl_health_reporter_destroy(struct devlink_health_reporter *reporter); void devlink_health_reporter_destroy(struct devlink_health_reporter *reporter); -void -devl_port_health_reporter_destroy(struct devlink_health_reporter *reporter); - -void -devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter); - void * devlink_health_reporter_priv(struct devlink_health_reporter *reporter); int devlink_health_report(struct devlink_health_reporter *reporter, diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 548e4a379e48..d4e8c285b255 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7442,13 +7442,6 @@ devlink_health_reporter_put(struct devlink_health_reporter *reporter) devlink_health_reporter_free(reporter); } -static void -__devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - list_del(&reporter->list); - devlink_health_reporter_put(reporter); -} - /** * devl_health_reporter_destroy - destroy devlink health reporter * @@ -7459,7 +7452,8 @@ devl_health_reporter_destroy(struct devlink_health_reporter *reporter) { devl_assert_locked(reporter->devlink); - __devlink_health_reporter_destroy(reporter); + list_del(&reporter->list); + devlink_health_reporter_put(reporter); } EXPORT_SYMBOL_GPL(devl_health_reporter_destroy); @@ -7474,31 +7468,6 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) } EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); -/** - * devl_port_health_reporter_destroy - destroy devlink port health reporter - * - * @reporter: devlink health reporter to destroy - */ -void -devl_port_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - devl_assert_locked(reporter->devlink); - - __devlink_health_reporter_destroy(reporter); -} -EXPORT_SYMBOL_GPL(devl_port_health_reporter_destroy); - -void -devlink_port_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - struct devlink *devlink = reporter->devlink; - - devl_lock(devlink); - devl_port_health_reporter_destroy(reporter); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_port_health_reporter_destroy); - static int devlink_nl_health_reporter_fill(struct sk_buff *msg, struct devlink_health_reporter *reporter, -- cgit v1.2.3 From e994a75fb7f9dcdb53ff5c33859531141de30899 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:11 +0100 Subject: devlink: remove reporter reference counting As long as the reporter life time is protected by devlink instance lock, the reference counting is no longer needed. Remove it. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 113 +++++++++++++------------------------------------ 1 file changed, 30 insertions(+), 83 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index d4e8c285b255..114baecebaeb 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7269,7 +7269,6 @@ struct devlink_health_reporter { u64 error_count; u64 recovery_count; u64 last_recovery_ts; - refcount_t refcount; }; void * @@ -7328,7 +7327,6 @@ __devlink_health_reporter_create(struct devlink *devlink, reporter->auto_recover = !!ops->recover; reporter->auto_dump = !!ops->dump; mutex_init(&reporter->dump_lock); - refcount_set(&reporter->refcount, 1); return reporter; } @@ -7435,13 +7433,6 @@ devlink_health_reporter_free(struct devlink_health_reporter *reporter) kfree(reporter); } -static void -devlink_health_reporter_put(struct devlink_health_reporter *reporter) -{ - if (refcount_dec_and_test(&reporter->refcount)) - devlink_health_reporter_free(reporter); -} - /** * devl_health_reporter_destroy - destroy devlink health reporter * @@ -7453,7 +7444,7 @@ devl_health_reporter_destroy(struct devlink_health_reporter *reporter) devl_assert_locked(reporter->devlink); list_del(&reporter->list); - devlink_health_reporter_put(reporter); + devlink_health_reporter_free(reporter); } EXPORT_SYMBOL_GPL(devl_health_reporter_destroy); @@ -7697,7 +7688,6 @@ static struct devlink_health_reporter * devlink_health_reporter_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) { - struct devlink_health_reporter *reporter; struct devlink_port *devlink_port; char *reporter_name; @@ -7706,17 +7696,12 @@ devlink_health_reporter_get_from_attrs(struct devlink *devlink, reporter_name = nla_data(attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]); devlink_port = devlink_port_get_from_attrs(devlink, attrs); - if (IS_ERR(devlink_port)) { - reporter = devlink_health_reporter_find_by_name(devlink, reporter_name); - if (reporter) - refcount_inc(&reporter->refcount); - } else { - reporter = devlink_port_health_reporter_find_by_name(devlink_port, reporter_name); - if (reporter) - refcount_inc(&reporter->refcount); - } - - return reporter; + if (IS_ERR(devlink_port)) + return devlink_health_reporter_find_by_name(devlink, + reporter_name); + else + return devlink_port_health_reporter_find_by_name(devlink_port, + reporter_name); } static struct devlink_health_reporter * @@ -7775,10 +7760,8 @@ static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, return -EINVAL; msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) { - err = -ENOMEM; - goto out; - } + if (!msg) + return -ENOMEM; err = devlink_nl_health_reporter_fill(msg, reporter, DEVLINK_CMD_HEALTH_REPORTER_GET, @@ -7786,13 +7769,10 @@ static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, 0); if (err) { nlmsg_free(msg); - goto out; + return err; } - err = genlmsg_reply(msg, info); -out: - devlink_health_reporter_put(reporter); - return err; + return genlmsg_reply(msg, info); } static int @@ -7866,7 +7846,6 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, { struct devlink *devlink = info->user_ptr[0]; struct devlink_health_reporter *reporter; - int err; reporter = devlink_health_reporter_get_from_info(devlink, info); if (!reporter) @@ -7874,15 +7853,12 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, if (!reporter->ops->recover && (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) { - err = -EOPNOTSUPP; - goto out; - } + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) + return -EOPNOTSUPP; + if (!reporter->ops->dump && - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) { - err = -EOPNOTSUPP; - goto out; - } + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) + return -EOPNOTSUPP; if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) reporter->graceful_period = @@ -7896,11 +7872,7 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, reporter->auto_dump = nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]); - devlink_health_reporter_put(reporter); return 0; -out: - devlink_health_reporter_put(reporter); - return err; } static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, @@ -7908,16 +7880,12 @@ static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, { struct devlink *devlink = info->user_ptr[0]; struct devlink_health_reporter *reporter; - int err; reporter = devlink_health_reporter_get_from_info(devlink, info); if (!reporter) return -EINVAL; - err = devlink_health_reporter_recover(reporter, NULL, info->extack); - - devlink_health_reporter_put(reporter); - return err; + return devlink_health_reporter_recover(reporter, NULL, info->extack); } static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, @@ -7932,36 +7900,27 @@ static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, if (!reporter) return -EINVAL; - if (!reporter->ops->diagnose) { - devlink_health_reporter_put(reporter); + if (!reporter->ops->diagnose) return -EOPNOTSUPP; - } fmsg = devlink_fmsg_alloc(); - if (!fmsg) { - devlink_health_reporter_put(reporter); + if (!fmsg) return -ENOMEM; - } err = devlink_fmsg_obj_nest_start(fmsg); if (err) - goto out; + return err; err = reporter->ops->diagnose(reporter, fmsg, info->extack); if (err) - goto out; + return err; err = devlink_fmsg_obj_nest_end(fmsg); if (err) - goto out; - - err = devlink_fmsg_snd(fmsg, info, - DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); + return err; -out: - devlink_fmsg_free(fmsg); - devlink_health_reporter_put(reporter); - return err; + return devlink_fmsg_snd(fmsg, info, + DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); } static int @@ -7976,10 +7935,9 @@ devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, if (!reporter) return -EINVAL; - if (!reporter->ops->dump) { - err = -EOPNOTSUPP; - goto out; - } + if (!reporter->ops->dump) + return -EOPNOTSUPP; + mutex_lock(&reporter->dump_lock); if (!state->idx) { err = devlink_health_do_dump(reporter, NULL, cb->extack); @@ -7997,8 +7955,6 @@ devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET); unlock: mutex_unlock(&reporter->dump_lock); -out: - devlink_health_reporter_put(reporter); return err; } @@ -8013,15 +7969,12 @@ devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, if (!reporter) return -EINVAL; - if (!reporter->ops->dump) { - devlink_health_reporter_put(reporter); + if (!reporter->ops->dump) return -EOPNOTSUPP; - } mutex_lock(&reporter->dump_lock); devlink_health_dump_clear(reporter); mutex_unlock(&reporter->dump_lock); - devlink_health_reporter_put(reporter); return 0; } @@ -8030,21 +7983,15 @@ static int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, { struct devlink *devlink = info->user_ptr[0]; struct devlink_health_reporter *reporter; - int err; reporter = devlink_health_reporter_get_from_info(devlink, info); if (!reporter) return -EINVAL; - if (!reporter->ops->test) { - devlink_health_reporter_put(reporter); + if (!reporter->ops->test) return -EOPNOTSUPP; - } - - err = reporter->ops->test(reporter, info->extack); - devlink_health_reporter_put(reporter); - return err; + return reporter->ops->test(reporter, info->extack); } struct devlink_stats { -- cgit v1.2.3 From 2557396808d916189ec5812f64e2b08bfb905807 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:12 +0100 Subject: devlink: convert linecards dump to devlink_nl_instance_iter_dump() Benefit from recently introduced instance iteration and convert linecards .dumpit generic netlink callback to use it. Signed-off-by: Jiri Pirko Reviewed-by: Ido Schimmel Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 1 + net/devlink/leftover.c | 64 ++++++++++++++++++++------------------------- net/devlink/netlink.c | 1 + 3 files changed, 30 insertions(+), 36 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index bdb83014b4b5..d8f8e4bff5b4 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -167,6 +167,7 @@ extern const struct devlink_gen_cmd devl_gen_info; extern const struct devlink_gen_cmd devl_gen_trap; extern const struct devlink_gen_cmd devl_gen_trap_group; extern const struct devlink_gen_cmd devl_gen_trap_policer; +extern const struct devlink_gen_cmd devl_gen_linecard; /* Ports */ int devlink_port_netdevice_event(struct notifier_block *nb, diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 114baecebaeb..685845a2fc78 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -2105,50 +2105,42 @@ static int devlink_nl_cmd_linecard_get_doit(struct sk_buff *skb, return genlmsg_reply(msg, info); } -static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +static int devlink_nl_cmd_linecard_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_linecard *linecard; - struct devlink *devlink; - int err; - - devlink_dump_for_each_instance_get(msg, state, devlink) { - int idx = 0; - - devl_lock(devlink); - if (!devl_is_registered(devlink)) - goto next_devlink; + int idx = 0; + int err = 0; - list_for_each_entry(linecard, &devlink->linecard_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - mutex_lock(&linecard->state_lock); - err = devlink_nl_linecard_fill(msg, devlink, linecard, - DEVLINK_CMD_LINECARD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI, - cb->extack); - mutex_unlock(&linecard->state_lock); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } + list_for_each_entry(linecard, &devlink->linecard_list, list) { + if (idx < state->idx) { idx++; + continue; } -next_devlink: - devl_unlock(devlink); - devlink_put(devlink); + mutex_lock(&linecard->state_lock); + err = devlink_nl_linecard_fill(msg, devlink, linecard, + DEVLINK_CMD_LINECARD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI, + cb->extack); + mutex_unlock(&linecard->state_lock); + if (err) { + state->idx = idx; + break; + } + idx++; } -out: - return msg->len; + + return err; } +const struct devlink_gen_cmd devl_gen_linecard = { + .dump_one = devlink_nl_cmd_linecard_get_dump_one, +}; + static struct devlink_linecard_type * devlink_linecard_type_lookup(struct devlink_linecard *linecard, const char *type) @@ -9010,7 +9002,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_LINECARD_GET, .doit = devlink_nl_cmd_linecard_get_doit, - .dumpit = devlink_nl_cmd_linecard_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD, /* can be retrieved by unprivileged users */ }, diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 3f2ab4360f11..b18e216e09b0 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -191,6 +191,7 @@ static const struct devlink_gen_cmd *devl_gen_cmds[] = { [DEVLINK_CMD_TRAP_GET] = &devl_gen_trap, [DEVLINK_CMD_TRAP_GROUP_GET] = &devl_gen_trap_group, [DEVLINK_CMD_TRAP_POLICER_GET] = &devl_gen_trap_policer, + [DEVLINK_CMD_LINECARD_GET] = &devl_gen_linecard, [DEVLINK_CMD_SELFTESTS_GET] = &devl_gen_selftests, }; -- cgit v1.2.3 From 19be51a93d9986edcfecae010920ab1d5714479b Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:13 +0100 Subject: devlink: convert reporters dump to devlink_nl_instance_iter_dump() Benefit from recently introduced instance iteration and convert reporters .dumpit generic netlink callback to use it. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 1 + net/devlink/leftover.c | 87 ++++++++++++++++++++------------------------- net/devlink/netlink.c | 1 + 3 files changed, 40 insertions(+), 49 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index d8f8e4bff5b4..10975e4ea2f4 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -164,6 +164,7 @@ extern const struct devlink_gen_cmd devl_gen_selftests; extern const struct devlink_gen_cmd devl_gen_param; extern const struct devlink_gen_cmd devl_gen_region; extern const struct devlink_gen_cmd devl_gen_info; +extern const struct devlink_gen_cmd devl_gen_health_reporter; extern const struct devlink_gen_cmd devl_gen_trap; extern const struct devlink_gen_cmd devl_gen_trap_group; extern const struct devlink_gen_cmd devl_gen_trap_policer; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 685845a2fc78..8eab95cae917 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7768,70 +7768,59 @@ static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, } static int -devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg, - struct netlink_callback *cb) +devlink_nl_cmd_health_reporter_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink *devlink; + struct devlink_health_reporter *reporter; + struct devlink_port *port; + unsigned long port_index; + int idx = 0; int err; - devlink_dump_for_each_instance_get(msg, state, devlink) { - struct devlink_health_reporter *reporter; - struct devlink_port *port; - unsigned long port_index; - int idx = 0; - - devl_lock(devlink); - if (!devl_is_registered(devlink)) - goto next_devlink; - - list_for_each_entry(reporter, &devlink->reporter_list, - list) { + list_for_each_entry(reporter, &devlink->reporter_list, list) { + if (idx < state->idx) { + idx++; + continue; + } + err = devlink_nl_health_reporter_fill(msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + return err; + } + idx++; + } + xa_for_each(&devlink->ports, port_index, port) { + list_for_each_entry(reporter, &port->reporter_list, list) { if (idx < state->idx) { idx++; continue; } - err = devlink_nl_health_reporter_fill( - msg, reporter, DEVLINK_CMD_HEALTH_REPORTER_GET, - NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, - NLM_F_MULTI); + err = devlink_nl_health_reporter_fill(msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); if (err) { - devl_unlock(devlink); - devlink_put(devlink); state->idx = idx; - goto out; + return err; } idx++; } - - xa_for_each(&devlink->ports, port_index, port) { - list_for_each_entry(reporter, &port->reporter_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_health_reporter_fill( - msg, reporter, - DEVLINK_CMD_HEALTH_REPORTER_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI); - if (err) { - devl_unlock(devlink); - devlink_put(devlink); - state->idx = idx; - goto out; - } - idx++; - } - } -next_devlink: - devl_unlock(devlink); - devlink_put(devlink); } -out: - return msg->len; + + return 0; } +const struct devlink_gen_cmd devl_gen_health_reporter = { + .dump_one = devlink_nl_cmd_health_reporter_get_dump_one, +}; + static int devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, struct genl_info *info) @@ -9193,7 +9182,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_HEALTH_REPORTER_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_health_reporter_get_doit, - .dumpit = devlink_nl_cmd_health_reporter_get_dumpit, + .dumpit = devlink_nl_instance_iter_dump, .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, /* can be retrieved by unprivileged users */ }, diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index b18e216e09b0..d4539c1aedea 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -187,6 +187,7 @@ static const struct devlink_gen_cmd *devl_gen_cmds[] = { [DEVLINK_CMD_PARAM_GET] = &devl_gen_param, [DEVLINK_CMD_REGION_GET] = &devl_gen_region, [DEVLINK_CMD_INFO_GET] = &devl_gen_info, + [DEVLINK_CMD_HEALTH_REPORTER_GET] = &devl_gen_health_reporter, [DEVLINK_CMD_RATE_GET] = &devl_gen_rate_get, [DEVLINK_CMD_TRAP_GET] = &devl_gen_trap, [DEVLINK_CMD_TRAP_GROUP_GET] = &devl_gen_trap_group, -- cgit v1.2.3 From 543753d9e22e3943e2c31c4b5241afafb55dd52a Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:14 +0100 Subject: devlink: remove devlink_dump_for_each_instance_get() helper devlink_dump_for_each_instance_get() is currently called from a single place in netlink.c. As there is no need to use this helper anywhere else in the future, remove it and call devlinks_xa_find_get() directly from while loop in devlink_nl_instance_iter_dump(). Also remove redundant idx clear on loop end as it is already done in devlink_nl_instance_iter_dump(). Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 11 ----------- net/devlink/netlink.c | 5 ++++- 2 files changed, 4 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 10975e4ea2f4..75752f8c4a26 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -123,17 +123,6 @@ struct devlink_gen_cmd { struct netlink_callback *cb); }; -/* Iterate over registered devlink instances for devlink dump. - * devlink_put() needs to be called for each iterated devlink pointer - * in loop body in order to release the reference. - * Note: this is NOT a generic iterator, it makes assumptions about the use - * of @state and can only be used once per dumpit implementation. - */ -#define devlink_dump_for_each_instance_get(msg, state, devlink) \ - for (; (devlink = devlinks_xa_find_get(sock_net(msg->sk), \ - &state->instance)); \ - state->instance++, state->idx = 0) - extern const struct genl_small_ops devlink_nl_ops[56]; struct devlink * diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index d4539c1aedea..3f44633af01c 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -207,7 +207,8 @@ int devlink_nl_instance_iter_dump(struct sk_buff *msg, cmd = devl_gen_cmds[info->op.cmd]; - devlink_dump_for_each_instance_get(msg, state, devlink) { + while ((devlink = devlinks_xa_find_get(sock_net(msg->sk), + &state->instance))) { devl_lock(devlink); if (devl_is_registered(devlink)) @@ -221,6 +222,8 @@ int devlink_nl_instance_iter_dump(struct sk_buff *msg, if (err) break; + state->instance++; + /* restart sub-object walk for the next instance */ state->idx = 0; } -- cgit v1.2.3 From 63ba54a52c417a0c632a5f85a2b912b8a2320358 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Wed, 18 Jan 2023 16:21:15 +0100 Subject: devlink: add instance lock assertion in devl_is_registered() After region and linecard lock removals, this helper is always supposed to be called with instance lock held. So put the assertion here and remove the comment which is no longer accurate. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 75752f8c4a26..1aa1a9549549 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -85,9 +85,7 @@ struct devlink *devlinks_xa_find_get(struct net *net, unsigned long *indexp); static inline bool devl_is_registered(struct devlink *devlink) { - /* To prevent races the caller must hold the instance lock - * or another lock taken during unregistration. - */ + devl_assert_locked(devlink); return xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); } -- cgit v1.2.3 From 34b7074d3fba0d3f3ca8c66b6105b7f575e77e98 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Wed, 18 Jan 2023 22:08:25 +0100 Subject: net: dcb: modify dcb_app_add to take list_head ptr as parameter In preparation to DCB rewrite. Modify dcb_app_add to take new struct list_head * as parameter, to make the used list configurable. This is done to allow reusing the function for adding rewrite entries to the rewrite table, which is introduced in a later patch. Signed-off-by: Daniel Machon Reviewed-by: Petr Machata Signed-off-by: David S. Miller --- net/dcb/dcbnl.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index f9949e051f49..a76bdf6f0198 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -1955,7 +1955,8 @@ static struct dcb_app_type *dcb_app_lookup(const struct dcb_app *app, return NULL; } -static int dcb_app_add(const struct dcb_app *app, int ifindex) +static int dcb_app_add(struct list_head *list, const struct dcb_app *app, + int ifindex) { struct dcb_app_type *entry; @@ -1965,7 +1966,7 @@ static int dcb_app_add(const struct dcb_app *app, int ifindex) memcpy(&entry->app, app, sizeof(*app)); entry->ifindex = ifindex; - list_add(&entry->list, &dcb_app_list); + list_add(&entry->list, list); return 0; } @@ -2028,7 +2029,7 @@ int dcb_setapp(struct net_device *dev, struct dcb_app *new) } /* App type does not exist add new application type */ if (new->priority) - err = dcb_app_add(new, dev->ifindex); + err = dcb_app_add(&dcb_app_list, new, dev->ifindex); out: spin_unlock_bh(&dcb_lock); if (!err) @@ -2088,7 +2089,7 @@ int dcb_ieee_setapp(struct net_device *dev, struct dcb_app *new) goto out; } - err = dcb_app_add(new, dev->ifindex); + err = dcb_app_add(&dcb_app_list, new, dev->ifindex); out: spin_unlock_bh(&dcb_lock); if (!err) -- cgit v1.2.3 From 30568334b657e77629293dd88a35d9c21b3258fb Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Wed, 18 Jan 2023 22:08:26 +0100 Subject: net: dcb: add new common function for set/del of app/rewr entries In preparation for DCB rewrite. Add a new function for setting and deleting both app and rewrite entries. Moving this into a separate function reduces duplicate code, as both type of entries requires the same set of checks. The function will now iterate through a configurable nested attribute (app or rewrite attr), validate each attribute and call the appropriate set- or delete function. Note that this function always checks for nla_len(attr_itr) < sizeof(struct dcb_app), which was only done in dcbnl_ieee_set and not in dcbnl_ieee_del prior to this patch. This means, that any userspace tool that used to shove in data < sizeof(struct dcb_app) would now receive -ERANGE. Signed-off-by: Daniel Machon Reviewed-by: Petr Machata Signed-off-by: David S. Miller --- net/dcb/dcbnl.c | 100 +++++++++++++++++++++++++------------------------------- 1 file changed, 45 insertions(+), 55 deletions(-) (limited to 'net') diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index a76bdf6f0198..cb5319c6afe6 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -1099,6 +1099,41 @@ out: return err; } +/* Set or delete APP table or rewrite table entries. The APP struct is validated + * and the appropriate callback function is called. + */ +static int dcbnl_app_table_setdel(struct nlattr *attr, + struct net_device *netdev, + int (*setdel)(struct net_device *dev, + struct dcb_app *app)) +{ + struct dcb_app *app_data; + enum ieee_attrs_app type; + struct nlattr *attr_itr; + int rem, err; + + nla_for_each_nested(attr_itr, attr, rem) { + type = nla_type(attr_itr); + + if (!dcbnl_app_attr_type_validate(type)) + continue; + + if (nla_len(attr_itr) < sizeof(struct dcb_app)) + return -ERANGE; + + app_data = nla_data(attr_itr); + + if (!dcbnl_app_selector_validate(type, app_data->selector)) + return -EINVAL; + + err = setdel(netdev, app_data); + if (err) + return err; + } + + return 0; +} + /* Handle IEEE 802.1Qaz/802.1Qau/802.1Qbb GET commands. */ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev) { @@ -1568,36 +1603,11 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh, } if (ieee[DCB_ATTR_IEEE_APP_TABLE]) { - struct nlattr *attr; - int rem; - - nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) { - enum ieee_attrs_app type = nla_type(attr); - struct dcb_app *app_data; - - if (!dcbnl_app_attr_type_validate(type)) - continue; - - if (nla_len(attr) < sizeof(struct dcb_app)) { - err = -ERANGE; - goto err; - } - - app_data = nla_data(attr); - - if (!dcbnl_app_selector_validate(type, - app_data->selector)) { - err = -EINVAL; - goto err; - } - - if (ops->ieee_setapp) - err = ops->ieee_setapp(netdev, app_data); - else - err = dcb_ieee_setapp(netdev, app_data); - if (err) - goto err; - } + err = dcbnl_app_table_setdel(ieee[DCB_ATTR_IEEE_APP_TABLE], + netdev, ops->ieee_setapp ?: + dcb_ieee_setapp); + if (err) + goto err; } if (ieee[DCB_ATTR_DCB_APP_TRUST_TABLE]) { @@ -1684,31 +1694,11 @@ static int dcbnl_ieee_del(struct net_device *netdev, struct nlmsghdr *nlh, return err; if (ieee[DCB_ATTR_IEEE_APP_TABLE]) { - struct nlattr *attr; - int rem; - - nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) { - enum ieee_attrs_app type = nla_type(attr); - struct dcb_app *app_data; - - if (!dcbnl_app_attr_type_validate(type)) - continue; - - app_data = nla_data(attr); - - if (!dcbnl_app_selector_validate(type, - app_data->selector)) { - err = -EINVAL; - goto err; - } - - if (ops->ieee_delapp) - err = ops->ieee_delapp(netdev, app_data); - else - err = dcb_ieee_delapp(netdev, app_data); - if (err) - goto err; - } + err = dcbnl_app_table_setdel(ieee[DCB_ATTR_IEEE_APP_TABLE], + netdev, ops->ieee_delapp ?: + dcb_ieee_delapp); + if (err) + goto err; } err: -- cgit v1.2.3 From 622f1b2fae2eea28a80b04f130e3bb54227699f8 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Wed, 18 Jan 2023 22:08:27 +0100 Subject: net: dcb: add new rewrite table Add new rewrite table and all the required functions, offload hooks and bookkeeping for maintaining it. The rewrite table reuses the app struct, and the entire set of app selectors. As such, some bookeeping code can be shared between the rewrite- and the APP table. New functions for getting, setting and deleting entries has been added. Apart from operating on the rewrite list, these functions do not emit a DCB_APP_EVENT when the list os modified. The new dcb_getrewr does a lookup based on selector and priority and returns the protocol, so that mappings from priority to protocol, for a given selector and ifindex is obtained. Also, a new nested attribute has been added, that encapsulates one or more app structs. This attribute is used to distinguish the two tables. The dcb_lock used for the APP table is reused for the rewrite table. Signed-off-by: Daniel Machon Reviewed-by: Petr Machata Signed-off-by: David S. Miller --- include/net/dcbnl.h | 8 ++++ include/uapi/linux/dcbnl.h | 2 + net/dcb/dcbnl.c | 113 ++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 122 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h index 8841ab6c2de7..fe7dfb8bcb5b 100644 --- a/include/net/dcbnl.h +++ b/include/net/dcbnl.h @@ -19,6 +19,10 @@ struct dcb_app_type { u8 dcbx; }; +u16 dcb_getrewr(struct net_device *dev, struct dcb_app *app); +int dcb_setrewr(struct net_device *dev, struct dcb_app *app); +int dcb_delrewr(struct net_device *dev, struct dcb_app *app); + int dcb_setapp(struct net_device *, struct dcb_app *); u8 dcb_getapp(struct net_device *, struct dcb_app *); int dcb_ieee_setapp(struct net_device *, struct dcb_app *); @@ -113,6 +117,10 @@ struct dcbnl_rtnl_ops { /* apptrust */ int (*dcbnl_setapptrust)(struct net_device *, u8 *, int); int (*dcbnl_getapptrust)(struct net_device *, u8 *, int *); + + /* rewrite */ + int (*dcbnl_setrewr)(struct net_device *dev, struct dcb_app *app); + int (*dcbnl_delrewr)(struct net_device *dev, struct dcb_app *app); }; #endif /* __NET_DCBNL_H__ */ diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h index 99047223ab26..7e15214aa5dd 100644 --- a/include/uapi/linux/dcbnl.h +++ b/include/uapi/linux/dcbnl.h @@ -411,6 +411,7 @@ enum dcbnl_attrs { * @DCB_ATTR_IEEE_PEER_PFC: peer PFC configuration - get only * @DCB_ATTR_IEEE_PEER_APP: peer APP tlv - get only * @DCB_ATTR_DCB_APP_TRUST_TABLE: selector trust table + * @DCB_ATTR_DCB_REWR_TABLE: rewrite configuration */ enum ieee_attrs { DCB_ATTR_IEEE_UNSPEC, @@ -425,6 +426,7 @@ enum ieee_attrs { DCB_ATTR_IEEE_QCN_STATS, DCB_ATTR_DCB_BUFFER, DCB_ATTR_DCB_APP_TRUST_TABLE, + DCB_ATTR_DCB_REWR_TABLE, __DCB_ATTR_IEEE_MAX }; #define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1) diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index cb5319c6afe6..2ca7fffcf069 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -178,6 +178,7 @@ static const struct nla_policy dcbnl_featcfg_nest[DCB_FEATCFG_ATTR_MAX + 1] = { }; static LIST_HEAD(dcb_app_list); +static LIST_HEAD(dcb_rewr_list); static DEFINE_SPINLOCK(dcb_lock); static enum ieee_attrs_app dcbnl_app_attr_type_get(u8 selector) @@ -1138,7 +1139,7 @@ static int dcbnl_app_table_setdel(struct nlattr *attr, static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev) { const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops; - struct nlattr *ieee, *app; + struct nlattr *ieee, *app, *rewr; struct dcb_app_type *itr; int dcbx; int err; @@ -1241,6 +1242,27 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev) spin_unlock_bh(&dcb_lock); nla_nest_end(skb, app); + rewr = nla_nest_start(skb, DCB_ATTR_DCB_REWR_TABLE); + if (!rewr) + return -EMSGSIZE; + + spin_lock_bh(&dcb_lock); + list_for_each_entry(itr, &dcb_rewr_list, list) { + if (itr->ifindex == netdev->ifindex) { + enum ieee_attrs_app type = + dcbnl_app_attr_type_get(itr->app.selector); + err = nla_put(skb, type, sizeof(itr->app), &itr->app); + if (err) { + spin_unlock_bh(&dcb_lock); + nla_nest_cancel(skb, rewr); + return -EMSGSIZE; + } + } + } + + spin_unlock_bh(&dcb_lock); + nla_nest_end(skb, rewr); + if (ops->dcbnl_getapptrust) { err = dcbnl_getapptrust(netdev, skb); if (err) @@ -1602,6 +1624,14 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh, goto err; } + if (ieee[DCB_ATTR_DCB_REWR_TABLE]) { + err = dcbnl_app_table_setdel(ieee[DCB_ATTR_DCB_REWR_TABLE], + netdev, + ops->dcbnl_setrewr ?: dcb_setrewr); + if (err) + goto err; + } + if (ieee[DCB_ATTR_IEEE_APP_TABLE]) { err = dcbnl_app_table_setdel(ieee[DCB_ATTR_IEEE_APP_TABLE], netdev, ops->ieee_setapp ?: @@ -1701,6 +1731,14 @@ static int dcbnl_ieee_del(struct net_device *netdev, struct nlmsghdr *nlh, goto err; } + if (ieee[DCB_ATTR_DCB_REWR_TABLE]) { + err = dcbnl_app_table_setdel(ieee[DCB_ATTR_DCB_REWR_TABLE], + netdev, + ops->dcbnl_delrewr ?: dcb_delrewr); + if (err) + goto err; + } + err: err = nla_put_u8(skb, DCB_ATTR_IEEE, err); dcbnl_ieee_notify(netdev, RTM_SETDCB, DCB_CMD_IEEE_DEL, seq, 0); @@ -1929,6 +1967,22 @@ out: return ret; } +static struct dcb_app_type *dcb_rewr_lookup(const struct dcb_app *app, + int ifindex, int proto) +{ + struct dcb_app_type *itr; + + list_for_each_entry(itr, &dcb_rewr_list, list) { + if (itr->app.selector == app->selector && + itr->app.priority == app->priority && + itr->ifindex == ifindex && + ((proto == -1) || itr->app.protocol == proto)) + return itr; + } + + return NULL; +} + static struct dcb_app_type *dcb_app_lookup(const struct dcb_app *app, int ifindex, int prio) { @@ -2052,6 +2106,63 @@ u8 dcb_ieee_getapp_mask(struct net_device *dev, struct dcb_app *app) } EXPORT_SYMBOL(dcb_ieee_getapp_mask); +/* Get protocol value from rewrite entry. */ +u16 dcb_getrewr(struct net_device *dev, struct dcb_app *app) +{ + struct dcb_app_type *itr; + u16 proto = 0; + + spin_lock_bh(&dcb_lock); + itr = dcb_rewr_lookup(app, dev->ifindex, -1); + if (itr) + proto = itr->app.protocol; + spin_unlock_bh(&dcb_lock); + + return proto; +} +EXPORT_SYMBOL(dcb_getrewr); + + /* Add rewrite entry to the rewrite list. */ +int dcb_setrewr(struct net_device *dev, struct dcb_app *new) +{ + int err; + + spin_lock_bh(&dcb_lock); + /* Search for existing match and abort if found. */ + if (dcb_rewr_lookup(new, dev->ifindex, new->protocol)) { + err = -EEXIST; + goto out; + } + + err = dcb_app_add(&dcb_rewr_list, new, dev->ifindex); +out: + spin_unlock_bh(&dcb_lock); + + return err; +} +EXPORT_SYMBOL(dcb_setrewr); + +/* Delete rewrite entry from the rewrite list. */ +int dcb_delrewr(struct net_device *dev, struct dcb_app *del) +{ + struct dcb_app_type *itr; + int err = -ENOENT; + + spin_lock_bh(&dcb_lock); + /* Search for existing match and remove it. */ + itr = dcb_rewr_lookup(del, dev->ifindex, del->protocol); + if (itr) { + list_del(&itr->list); + kfree(itr); + err = 0; + } + + spin_unlock_bh(&dcb_lock); + + return err; +} +EXPORT_SYMBOL(dcb_delrewr); + /** * dcb_ieee_setapp - add IEEE dcb application data to app list * @dev: network interface -- cgit v1.2.3 From 1df99338e6d4e96178b68b3e17bab33e9f1eb628 Mon Sep 17 00:00:00 2001 From: Daniel Machon Date: Wed, 18 Jan 2023 22:08:28 +0100 Subject: net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps Add two new helper functions to retrieve a mapping of priority to PCP and DSCP bitmasks, where each bitmap contains ones in positions that match a rewrite entry. dcb_ieee_getrewr_prio_dscp_mask_map() reuses the dcb_ieee_app_prio_map, as this struct is already used for a similar mapping in the app table. Signed-off-by: Daniel Machon Reviewed-by: Petr Machata Signed-off-by: David S. Miller --- include/net/dcbnl.h | 10 ++++++++++ net/dcb/dcbnl.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+) (limited to 'net') diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h index fe7dfb8bcb5b..42207fc44660 100644 --- a/include/net/dcbnl.h +++ b/include/net/dcbnl.h @@ -29,12 +29,22 @@ int dcb_ieee_setapp(struct net_device *, struct dcb_app *); int dcb_ieee_delapp(struct net_device *, struct dcb_app *); u8 dcb_ieee_getapp_mask(struct net_device *, struct dcb_app *); +struct dcb_rewr_prio_pcp_map { + u16 map[IEEE_8021QAZ_MAX_TCS]; +}; + +void dcb_getrewr_prio_pcp_mask_map(const struct net_device *dev, + struct dcb_rewr_prio_pcp_map *p_map); + struct dcb_ieee_app_prio_map { u64 map[IEEE_8021QAZ_MAX_TCS]; }; void dcb_ieee_getapp_prio_dscp_mask_map(const struct net_device *dev, struct dcb_ieee_app_prio_map *p_map); +void dcb_getrewr_prio_dscp_mask_map(const struct net_device *dev, + struct dcb_ieee_app_prio_map *p_map); + struct dcb_ieee_app_dscp_map { u8 map[64]; }; diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index 2ca7fffcf069..c0c438128575 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -2232,6 +2232,58 @@ int dcb_ieee_delapp(struct net_device *dev, struct dcb_app *del) } EXPORT_SYMBOL(dcb_ieee_delapp); +/* dcb_getrewr_prio_pcp_mask_map - For a given device, find mapping from + * priorities to the PCP and DEI values assigned to that priority. + */ +void dcb_getrewr_prio_pcp_mask_map(const struct net_device *dev, + struct dcb_rewr_prio_pcp_map *p_map) +{ + int ifindex = dev->ifindex; + struct dcb_app_type *itr; + u8 prio; + + memset(p_map->map, 0, sizeof(p_map->map)); + + spin_lock_bh(&dcb_lock); + list_for_each_entry(itr, &dcb_rewr_list, list) { + if (itr->ifindex == ifindex && + itr->app.selector == DCB_APP_SEL_PCP && + itr->app.protocol < 16 && + itr->app.priority < IEEE_8021QAZ_MAX_TCS) { + prio = itr->app.priority; + p_map->map[prio] |= 1 << itr->app.protocol; + } + } + spin_unlock_bh(&dcb_lock); +} +EXPORT_SYMBOL(dcb_getrewr_prio_pcp_mask_map); + +/* dcb_getrewr_prio_dscp_mask_map - For a given device, find mapping from + * priorities to the DSCP values assigned to that priority. + */ +void dcb_getrewr_prio_dscp_mask_map(const struct net_device *dev, + struct dcb_ieee_app_prio_map *p_map) +{ + int ifindex = dev->ifindex; + struct dcb_app_type *itr; + u8 prio; + + memset(p_map->map, 0, sizeof(p_map->map)); + + spin_lock_bh(&dcb_lock); + list_for_each_entry(itr, &dcb_rewr_list, list) { + if (itr->ifindex == ifindex && + itr->app.selector == IEEE_8021QAZ_APP_SEL_DSCP && + itr->app.protocol < 64 && + itr->app.priority < IEEE_8021QAZ_MAX_TCS) { + prio = itr->app.priority; + p_map->map[prio] |= 1ULL << itr->app.protocol; + } + } + spin_unlock_bh(&dcb_lock); +} +EXPORT_SYMBOL(dcb_getrewr_prio_dscp_mask_map); + /* * dcb_ieee_getapp_prio_dscp_mask_map - For a given device, find mapping from * priorities to the DSCP values assigned to that priority. Initialize p_map -- cgit v1.2.3 From e7d6127b89a93f0711a653f06863c3a03d33265f Mon Sep 17 00:00:00 2001 From: Linus Lüssing Date: Tue, 27 Dec 2022 20:34:05 +0100 Subject: batman-adv: mcast: remove now redundant single ucast forwarding MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The multicast code to send a multicast packet via multiple batman-adv unicast packets is not only capable of sending to multiple but also to a single node. Therefore we can safely remove the old, specialized, now redundant multicast-to-single-unicast code. The only functional change of this simplification is that the edge case of allowing a multicast packet with an unsnoopable destination address (224.0.0.0/24 or ff02::1) where only a single node has signaled interest in it via the batman-adv want-all-unsnoopables multicast flag is now transmitted via a batman-adv broadcast instead of a batman-adv unicast packet. Maintaining this edge case feature does not seem worth the extra lines of code and people should just not expect to be able to snoop and optimize such unsnoopable multicast addresses when bridges are involved. While at it also renaming a few items in the batadv_forw_mode enum to prepare for the new batman-adv multicast packet type. Signed-off-by: Linus Lüssing Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich --- net/batman-adv/multicast.c | 249 +++------------------------------------- net/batman-adv/multicast.h | 38 ++---- net/batman-adv/soft-interface.c | 26 ++--- 3 files changed, 33 insertions(+), 280 deletions(-) (limited to 'net') diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index b238455913df..7e2822c01e00 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -1136,223 +1135,20 @@ static int batadv_mcast_forw_rtr_count(struct batadv_priv *bat_priv, } } -/** - * batadv_mcast_forw_tt_node_get() - get a multicast tt node - * @bat_priv: the bat priv with all the soft interface information - * @ethhdr: the ether header containing the multicast destination - * - * Return: an orig_node matching the multicast address provided by ethhdr - * via a translation table lookup. This increases the returned nodes refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_tt_node_get(struct batadv_priv *bat_priv, - struct ethhdr *ethhdr) -{ - return batadv_transtable_search(bat_priv, NULL, ethhdr->h_dest, - BATADV_NO_FLAGS); -} - -/** - * batadv_mcast_forw_ipv4_node_get() - get a node with an ipv4 flag - * @bat_priv: the bat priv with all the soft interface information - * - * Return: an orig_node which has the BATADV_MCAST_WANT_ALL_IPV4 flag set and - * increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_ipv4_node_get(struct batadv_priv *bat_priv) -{ - struct batadv_orig_node *tmp_orig_node, *orig_node = NULL; - - rcu_read_lock(); - hlist_for_each_entry_rcu(tmp_orig_node, - &bat_priv->mcast.want_all_ipv4_list, - mcast_want_all_ipv4_node) { - if (!kref_get_unless_zero(&tmp_orig_node->refcount)) - continue; - - orig_node = tmp_orig_node; - break; - } - rcu_read_unlock(); - - return orig_node; -} - -/** - * batadv_mcast_forw_ipv6_node_get() - get a node with an ipv6 flag - * @bat_priv: the bat priv with all the soft interface information - * - * Return: an orig_node which has the BATADV_MCAST_WANT_ALL_IPV6 flag set - * and increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_ipv6_node_get(struct batadv_priv *bat_priv) -{ - struct batadv_orig_node *tmp_orig_node, *orig_node = NULL; - - rcu_read_lock(); - hlist_for_each_entry_rcu(tmp_orig_node, - &bat_priv->mcast.want_all_ipv6_list, - mcast_want_all_ipv6_node) { - if (!kref_get_unless_zero(&tmp_orig_node->refcount)) - continue; - - orig_node = tmp_orig_node; - break; - } - rcu_read_unlock(); - - return orig_node; -} - -/** - * batadv_mcast_forw_ip_node_get() - get a node with an ipv4/ipv6 flag - * @bat_priv: the bat priv with all the soft interface information - * @ethhdr: an ethernet header to determine the protocol family from - * - * Return: an orig_node which has the BATADV_MCAST_WANT_ALL_IPV4 or - * BATADV_MCAST_WANT_ALL_IPV6 flag, depending on the provided ethhdr, sets and - * increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_ip_node_get(struct batadv_priv *bat_priv, - struct ethhdr *ethhdr) -{ - switch (ntohs(ethhdr->h_proto)) { - case ETH_P_IP: - return batadv_mcast_forw_ipv4_node_get(bat_priv); - case ETH_P_IPV6: - return batadv_mcast_forw_ipv6_node_get(bat_priv); - default: - /* we shouldn't be here... */ - return NULL; - } -} - -/** - * batadv_mcast_forw_unsnoop_node_get() - get a node with an unsnoopable flag - * @bat_priv: the bat priv with all the soft interface information - * - * Return: an orig_node which has the BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag - * set and increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_unsnoop_node_get(struct batadv_priv *bat_priv) -{ - struct batadv_orig_node *tmp_orig_node, *orig_node = NULL; - - rcu_read_lock(); - hlist_for_each_entry_rcu(tmp_orig_node, - &bat_priv->mcast.want_all_unsnoopables_list, - mcast_want_all_unsnoopables_node) { - if (!kref_get_unless_zero(&tmp_orig_node->refcount)) - continue; - - orig_node = tmp_orig_node; - break; - } - rcu_read_unlock(); - - return orig_node; -} - -/** - * batadv_mcast_forw_rtr4_node_get() - get a node with an ipv4 mcast router flag - * @bat_priv: the bat priv with all the soft interface information - * - * Return: an orig_node which has the BATADV_MCAST_WANT_NO_RTR4 flag unset and - * increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_rtr4_node_get(struct batadv_priv *bat_priv) -{ - struct batadv_orig_node *tmp_orig_node, *orig_node = NULL; - - rcu_read_lock(); - hlist_for_each_entry_rcu(tmp_orig_node, - &bat_priv->mcast.want_all_rtr4_list, - mcast_want_all_rtr4_node) { - if (!kref_get_unless_zero(&tmp_orig_node->refcount)) - continue; - - orig_node = tmp_orig_node; - break; - } - rcu_read_unlock(); - - return orig_node; -} - -/** - * batadv_mcast_forw_rtr6_node_get() - get a node with an ipv6 mcast router flag - * @bat_priv: the bat priv with all the soft interface information - * - * Return: an orig_node which has the BATADV_MCAST_WANT_NO_RTR6 flag unset - * and increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_rtr6_node_get(struct batadv_priv *bat_priv) -{ - struct batadv_orig_node *tmp_orig_node, *orig_node = NULL; - - rcu_read_lock(); - hlist_for_each_entry_rcu(tmp_orig_node, - &bat_priv->mcast.want_all_rtr6_list, - mcast_want_all_rtr6_node) { - if (!kref_get_unless_zero(&tmp_orig_node->refcount)) - continue; - - orig_node = tmp_orig_node; - break; - } - rcu_read_unlock(); - - return orig_node; -} - -/** - * batadv_mcast_forw_rtr_node_get() - get a node with an ipv4/ipv6 router flag - * @bat_priv: the bat priv with all the soft interface information - * @ethhdr: an ethernet header to determine the protocol family from - * - * Return: an orig_node which has no BATADV_MCAST_WANT_NO_RTR4 or - * BATADV_MCAST_WANT_NO_RTR6 flag, depending on the provided ethhdr, set and - * increases its refcount. - */ -static struct batadv_orig_node * -batadv_mcast_forw_rtr_node_get(struct batadv_priv *bat_priv, - struct ethhdr *ethhdr) -{ - switch (ntohs(ethhdr->h_proto)) { - case ETH_P_IP: - return batadv_mcast_forw_rtr4_node_get(bat_priv); - case ETH_P_IPV6: - return batadv_mcast_forw_rtr6_node_get(bat_priv); - default: - /* we shouldn't be here... */ - return NULL; - } -} - /** * batadv_mcast_forw_mode() - check on how to forward a multicast packet * @bat_priv: the bat priv with all the soft interface information - * @skb: The multicast packet to check - * @orig: an originator to be set to forward the skb to + * @skb: the multicast packet to check * @is_routable: stores whether the destination is routable * - * Return: the forwarding mode as enum batadv_forw_mode and in case of - * BATADV_FORW_SINGLE set the orig to the single originator the skb - * should be forwarded to. + * Return: The forwarding mode as enum batadv_forw_mode. */ enum batadv_forw_mode batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, - struct batadv_orig_node **orig, int *is_routable) + int *is_routable) { int ret, tt_count, ip_count, unsnoop_count, total_count; bool is_unsnoopable = false; - unsigned int mcast_fanout; struct ethhdr *ethhdr; int rtr_count = 0; @@ -1361,7 +1157,7 @@ batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, if (ret == -ENOMEM) return BATADV_FORW_NONE; else if (ret < 0) - return BATADV_FORW_ALL; + return BATADV_FORW_BCAST; ethhdr = eth_hdr(skb); @@ -1374,32 +1170,15 @@ batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, total_count = tt_count + ip_count + unsnoop_count + rtr_count; - switch (total_count) { - case 1: - if (tt_count) - *orig = batadv_mcast_forw_tt_node_get(bat_priv, ethhdr); - else if (ip_count) - *orig = batadv_mcast_forw_ip_node_get(bat_priv, ethhdr); - else if (unsnoop_count) - *orig = batadv_mcast_forw_unsnoop_node_get(bat_priv); - else if (rtr_count) - *orig = batadv_mcast_forw_rtr_node_get(bat_priv, - ethhdr); - - if (*orig) - return BATADV_FORW_SINGLE; - - fallthrough; - case 0: + if (!total_count) return BATADV_FORW_NONE; - default: - mcast_fanout = atomic_read(&bat_priv->multicast_fanout); + else if (unsnoop_count) + return BATADV_FORW_BCAST; - if (!unsnoop_count && total_count <= mcast_fanout) - return BATADV_FORW_SOME; - } + if (total_count <= atomic_read(&bat_priv->multicast_fanout)) + return BATADV_FORW_UCASTS; - return BATADV_FORW_ALL; + return BATADV_FORW_BCAST; } /** @@ -1411,10 +1190,10 @@ batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, * * Return: NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. */ -int batadv_mcast_forw_send_orig(struct batadv_priv *bat_priv, - struct sk_buff *skb, - unsigned short vid, - struct batadv_orig_node *orig_node) +static int batadv_mcast_forw_send_orig(struct batadv_priv *bat_priv, + struct sk_buff *skb, + unsigned short vid, + struct batadv_orig_node *orig_node) { /* Avoid sending multicast-in-unicast packets to other BLA * gateways - they already got the frame from the LAN side diff --git a/net/batman-adv/multicast.h b/net/batman-adv/multicast.h index 8aec818d0bf6..a9770d8d6d36 100644 --- a/net/batman-adv/multicast.h +++ b/net/batman-adv/multicast.h @@ -17,23 +17,16 @@ */ enum batadv_forw_mode { /** - * @BATADV_FORW_ALL: forward the packet to all nodes (currently via - * classic flooding) + * @BATADV_FORW_BCAST: forward the packet to all nodes via a batman-adv + * broadcast packet */ - BATADV_FORW_ALL, + BATADV_FORW_BCAST, /** - * @BATADV_FORW_SOME: forward the packet to some nodes (currently via - * a multicast-to-unicast conversion and the BATMAN unicast routing - * protocol) + * @BATADV_FORW_UCASTS: forward the packet to some nodes via one + * or more batman-adv unicast packets */ - BATADV_FORW_SOME, - - /** - * @BATADV_FORW_SINGLE: forward the packet to a single node (currently - * via the BATMAN unicast routing protocol) - */ - BATADV_FORW_SINGLE, + BATADV_FORW_UCASTS, /** @BATADV_FORW_NONE: don't forward, drop it */ BATADV_FORW_NONE, @@ -43,14 +36,8 @@ enum batadv_forw_mode { enum batadv_forw_mode batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, - struct batadv_orig_node **mcast_single_orig, int *is_routable); -int batadv_mcast_forw_send_orig(struct batadv_priv *bat_priv, - struct sk_buff *skb, - unsigned short vid, - struct batadv_orig_node *orig_node); - int batadv_mcast_forw_send(struct batadv_priv *bat_priv, struct sk_buff *skb, unsigned short vid, int is_routable); @@ -69,20 +56,9 @@ void batadv_mcast_purge_orig(struct batadv_orig_node *orig_node); static inline enum batadv_forw_mode batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, - struct batadv_orig_node **mcast_single_orig, int *is_routable) { - return BATADV_FORW_ALL; -} - -static inline int -batadv_mcast_forw_send_orig(struct batadv_priv *bat_priv, - struct sk_buff *skb, - unsigned short vid, - struct batadv_orig_node *orig_node) -{ - kfree_skb(skb); - return NET_XMIT_DROP; + return BATADV_FORW_BCAST; } static inline int diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 0f5c0679b55a..125f4628687c 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -48,7 +48,6 @@ #include "hard-interface.h" #include "multicast.h" #include "network-coding.h" -#include "originator.h" #include "send.h" #include "translation-table.h" @@ -196,8 +195,7 @@ static netdev_tx_t batadv_interface_tx(struct sk_buff *skb, unsigned short vid; u32 seqno; int gw_mode; - enum batadv_forw_mode forw_mode = BATADV_FORW_SINGLE; - struct batadv_orig_node *mcast_single_orig = NULL; + enum batadv_forw_mode forw_mode = BATADV_FORW_BCAST; int mcast_is_routable = 0; int network_offset = ETH_HLEN; __be16 proto; @@ -301,14 +299,18 @@ static netdev_tx_t batadv_interface_tx(struct sk_buff *skb, send: if (do_bcast && !is_broadcast_ether_addr(ethhdr->h_dest)) { forw_mode = batadv_mcast_forw_mode(bat_priv, skb, - &mcast_single_orig, &mcast_is_routable); - if (forw_mode == BATADV_FORW_NONE) - goto dropped; - - if (forw_mode == BATADV_FORW_SINGLE || - forw_mode == BATADV_FORW_SOME) + switch (forw_mode) { + case BATADV_FORW_BCAST: + break; + case BATADV_FORW_UCASTS: do_bcast = false; + break; + case BATADV_FORW_NONE: + fallthrough; + default: + goto dropped; + } } } @@ -357,10 +359,7 @@ send: if (ret) goto dropped; ret = batadv_send_skb_via_gw(bat_priv, skb, vid); - } else if (mcast_single_orig) { - ret = batadv_mcast_forw_send_orig(bat_priv, skb, vid, - mcast_single_orig); - } else if (forw_mode == BATADV_FORW_SOME) { + } else if (forw_mode == BATADV_FORW_UCASTS) { ret = batadv_mcast_forw_send(bat_priv, skb, vid, mcast_is_routable); } else { @@ -386,7 +385,6 @@ dropped: dropped_freed: batadv_inc_counter(bat_priv, BATADV_CNT_TX_DROPPED); end: - batadv_orig_node_put(mcast_single_orig); batadv_hardif_put(primary_if); return NETDEV_TX_OK; } -- cgit v1.2.3 From 0c4061c0d0e2c381ffe4d8b7c62ea69ad8132071 Mon Sep 17 00:00:00 2001 From: Linus Lüssing Date: Tue, 27 Dec 2022 20:34:06 +0100 Subject: batman-adv: tvlv: prepare for tvlv enabled multicast packet type MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Prepare TVLV infrastructure for more packet types, in particular the upcoming batman-adv multicast packet type. For that swap the OGM vs. unicast-tvlv packet boolean indicator to an explicit unsigned integer packet type variable. And provide the skb to a call to batadv_tvlv_containers_process(), as later the multicast packet's TVLV handler will need to have access not only to the TVLV but the full skb for forwarding. Forwarding will be invoked from the multicast packet's TVLVs' contents later. Signed-off-by: Linus Lüssing Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich --- include/uapi/linux/batadv_packet.h | 2 + net/batman-adv/bat_v_ogm.c | 4 +- net/batman-adv/distributed-arp-table.c | 2 +- net/batman-adv/gateway_common.c | 2 +- net/batman-adv/multicast.c | 2 +- net/batman-adv/network-coding.c | 2 +- net/batman-adv/routing.c | 7 ++-- net/batman-adv/translation-table.c | 4 +- net/batman-adv/tvlv.c | 71 +++++++++++++++++++++++----------- net/batman-adv/tvlv.h | 9 +++-- net/batman-adv/types.h | 6 +++ 11 files changed, 74 insertions(+), 37 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/batadv_packet.h b/include/uapi/linux/batadv_packet.h index ea4692c339ce..9204e4494b25 100644 --- a/include/uapi/linux/batadv_packet.h +++ b/include/uapi/linux/batadv_packet.h @@ -26,6 +26,7 @@ * @BATADV_CODED: network coded packets * @BATADV_ELP: echo location packets for B.A.T.M.A.N. V * @BATADV_OGM2: originator messages for B.A.T.M.A.N. V + * @BATADV_MCAST: multicast packet with multiple destination addresses * * @BATADV_UNICAST: unicast packets carrying unicast payload traffic * @BATADV_UNICAST_FRAG: unicast packets carrying a fragment of the original @@ -42,6 +43,7 @@ enum batadv_packettype { BATADV_CODED = 0x02, BATADV_ELP = 0x03, BATADV_OGM2 = 0x04, + BATADV_MCAST = 0x05, /* 0x40 - 0x7f: unicast */ #define BATADV_UNICAST_MIN 0x40 BATADV_UNICAST = 0x40, diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c index 96e027364ddd..e710e9afe78f 100644 --- a/net/batman-adv/bat_v_ogm.c +++ b/net/batman-adv/bat_v_ogm.c @@ -799,8 +799,8 @@ batadv_v_ogm_process_per_outif(struct batadv_priv *bat_priv, /* only unknown & newer OGMs contain TVLVs we are interested in */ if (seqno_age > 0 && if_outgoing == BATADV_IF_DEFAULT) - batadv_tvlv_containers_process(bat_priv, true, orig_node, - NULL, NULL, + batadv_tvlv_containers_process(bat_priv, BATADV_OGM2, orig_node, + NULL, (unsigned char *)(ogm2 + 1), ntohs(ogm2->tvlv_len)); diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index fefb51a5f606..6968e55eb971 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -822,7 +822,7 @@ int batadv_dat_init(struct batadv_priv *bat_priv) batadv_dat_start_timer(bat_priv); batadv_tvlv_handler_register(bat_priv, batadv_dat_tvlv_ogm_handler_v1, - NULL, BATADV_TVLV_DAT, 1, + NULL, NULL, BATADV_TVLV_DAT, 1, BATADV_TVLV_HANDLER_OGM_CIFNOTFND); batadv_dat_tvlv_container_update(bat_priv); return 0; diff --git a/net/batman-adv/gateway_common.c b/net/batman-adv/gateway_common.c index 9349c76f30c5..6a964a773f57 100644 --- a/net/batman-adv/gateway_common.c +++ b/net/batman-adv/gateway_common.c @@ -259,7 +259,7 @@ void batadv_gw_init(struct batadv_priv *bat_priv) atomic_set(&bat_priv->gw.sel_class, 1); batadv_tvlv_handler_register(bat_priv, batadv_gw_tvlv_ogm_handler_v1, - NULL, BATADV_TVLV_GW, 1, + NULL, NULL, BATADV_TVLV_GW, 1, BATADV_TVLV_HANDLER_OGM_CIFNOTFND); } diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index 7e2822c01e00..315394f12c55 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -1818,7 +1818,7 @@ static void batadv_mcast_tvlv_ogm_handler(struct batadv_priv *bat_priv, void batadv_mcast_init(struct batadv_priv *bat_priv) { batadv_tvlv_handler_register(bat_priv, batadv_mcast_tvlv_ogm_handler, - NULL, BATADV_TVLV_MCAST, 2, + NULL, NULL, BATADV_TVLV_MCAST, 2, BATADV_TVLV_HANDLER_OGM_CIFNOTFND); INIT_DELAYED_WORK(&bat_priv->mcast.work, batadv_mcast_mla_update); diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index ecd871abda34..71ebd0284f95 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -160,7 +160,7 @@ int batadv_nc_mesh_init(struct batadv_priv *bat_priv) batadv_nc_start_timer(bat_priv); batadv_tvlv_handler_register(bat_priv, batadv_nc_tvlv_ogm_handler_v1, - NULL, BATADV_TVLV_NC, 1, + NULL, NULL, BATADV_TVLV_NC, 1, BATADV_TVLV_HANDLER_OGM_CIFNOTFND); batadv_nc_tvlv_container_update(bat_priv); return 0; diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index 83f31494ea4d..163cd43c4821 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -1073,10 +1073,9 @@ int batadv_recv_unicast_tvlv(struct sk_buff *skb, if (tvlv_buff_len > skb->len - hdr_size) goto free_skb; - ret = batadv_tvlv_containers_process(bat_priv, false, NULL, - unicast_tvlv_packet->src, - unicast_tvlv_packet->dst, - tvlv_buff, tvlv_buff_len); + ret = batadv_tvlv_containers_process(bat_priv, BATADV_UNICAST_TVLV, + NULL, skb, tvlv_buff, + tvlv_buff_len); if (ret != NET_RX_SUCCESS) { ret = batadv_route_unicast_packet(skb, recv_if); diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 01d30c1e412c..36ca31252a73 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -4168,11 +4168,11 @@ int batadv_tt_init(struct batadv_priv *bat_priv) } batadv_tvlv_handler_register(bat_priv, batadv_tt_tvlv_ogm_handler_v1, - batadv_tt_tvlv_unicast_handler_v1, + batadv_tt_tvlv_unicast_handler_v1, NULL, BATADV_TVLV_TT, 1, BATADV_NO_FLAGS); batadv_tvlv_handler_register(bat_priv, NULL, - batadv_roam_tvlv_unicast_handler_v1, + batadv_roam_tvlv_unicast_handler_v1, NULL, BATADV_TVLV_ROAM, 1, BATADV_NO_FLAGS); INIT_DELAYED_WORK(&bat_priv->tt.work, batadv_tt_purge); diff --git a/net/batman-adv/tvlv.c b/net/batman-adv/tvlv.c index 7ec2e2343884..2a583215d439 100644 --- a/net/batman-adv/tvlv.c +++ b/net/batman-adv/tvlv.c @@ -352,10 +352,9 @@ end: * appropriate handlers * @bat_priv: the bat priv with all the soft interface information * @tvlv_handler: tvlv callback function handling the tvlv content - * @ogm_source: flag indicating whether the tvlv is an ogm or a unicast packet + * @packet_type: indicates for which packet type the TVLV handler is called * @orig_node: orig node emitting the ogm packet - * @src: source mac address of the unicast packet - * @dst: destination mac address of the unicast packet + * @skb: the skb the TVLV handler is called for * @tvlv_value: tvlv content * @tvlv_value_len: tvlv content length * @@ -364,15 +363,20 @@ end: */ static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, struct batadv_tvlv_handler *tvlv_handler, - bool ogm_source, + u8 packet_type, struct batadv_orig_node *orig_node, - u8 *src, u8 *dst, - void *tvlv_value, u16 tvlv_value_len) + struct sk_buff *skb, void *tvlv_value, + u16 tvlv_value_len) { + unsigned int tvlv_offset; + u8 *src, *dst; + if (!tvlv_handler) return NET_RX_SUCCESS; - if (ogm_source) { + switch (packet_type) { + case BATADV_IV_OGM: + case BATADV_OGM2: if (!tvlv_handler->ogm_handler) return NET_RX_SUCCESS; @@ -383,19 +387,32 @@ static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, BATADV_NO_FLAGS, tvlv_value, tvlv_value_len); tvlv_handler->flags |= BATADV_TVLV_HANDLER_OGM_CALLED; - } else { - if (!src) - return NET_RX_SUCCESS; - - if (!dst) + break; + case BATADV_UNICAST_TVLV: + if (!skb) return NET_RX_SUCCESS; if (!tvlv_handler->unicast_handler) return NET_RX_SUCCESS; + src = ((struct batadv_unicast_tvlv_packet *)skb->data)->src; + dst = ((struct batadv_unicast_tvlv_packet *)skb->data)->dst; + return tvlv_handler->unicast_handler(bat_priv, src, dst, tvlv_value, tvlv_value_len); + case BATADV_MCAST: + if (!skb) + return NET_RX_SUCCESS; + + if (!tvlv_handler->mcast_handler) + return NET_RX_SUCCESS; + + tvlv_offset = (unsigned char *)tvlv_value - skb->data; + skb_set_network_header(skb, tvlv_offset); + skb_set_transport_header(skb, tvlv_offset + tvlv_value_len); + + return tvlv_handler->mcast_handler(bat_priv, skb); } return NET_RX_SUCCESS; @@ -405,10 +422,9 @@ static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, * batadv_tvlv_containers_process() - parse the given tvlv buffer to call the * appropriate handlers * @bat_priv: the bat priv with all the soft interface information - * @ogm_source: flag indicating whether the tvlv is an ogm or a unicast packet + * @packet_type: indicates for which packet type the TVLV handler is called * @orig_node: orig node emitting the ogm packet - * @src: source mac address of the unicast packet - * @dst: destination mac address of the unicast packet + * @skb: the skb the TVLV handler is called for * @tvlv_value: tvlv content * @tvlv_value_len: tvlv content length * @@ -416,10 +432,10 @@ static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, * handler callbacks. */ int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, - bool ogm_source, + u8 packet_type, struct batadv_orig_node *orig_node, - u8 *src, u8 *dst, - void *tvlv_value, u16 tvlv_value_len) + struct sk_buff *skb, void *tvlv_value, + u16 tvlv_value_len) { struct batadv_tvlv_handler *tvlv_handler; struct batadv_tvlv_hdr *tvlv_hdr; @@ -441,20 +457,24 @@ int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, tvlv_hdr->version); ret |= batadv_tvlv_call_handler(bat_priv, tvlv_handler, - ogm_source, orig_node, - src, dst, tvlv_value, + packet_type, orig_node, skb, + tvlv_value, tvlv_value_cont_len); batadv_tvlv_handler_put(tvlv_handler); tvlv_value = (u8 *)tvlv_value + tvlv_value_cont_len; tvlv_value_len -= tvlv_value_cont_len; } - if (!ogm_source) + if (packet_type != BATADV_IV_OGM && + packet_type != BATADV_OGM2) return ret; rcu_read_lock(); hlist_for_each_entry_rcu(tvlv_handler, &bat_priv->tvlv.handler_list, list) { + if (!tvlv_handler->ogm_handler) + continue; + if ((tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) && !(tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CALLED)) tvlv_handler->ogm_handler(bat_priv, orig_node, @@ -490,7 +510,7 @@ void batadv_tvlv_ogm_receive(struct batadv_priv *bat_priv, tvlv_value = batadv_ogm_packet + 1; - batadv_tvlv_containers_process(bat_priv, true, orig_node, NULL, NULL, + batadv_tvlv_containers_process(bat_priv, BATADV_IV_OGM, orig_node, NULL, tvlv_value, tvlv_value_len); } @@ -504,6 +524,10 @@ void batadv_tvlv_ogm_receive(struct batadv_priv *bat_priv, * @uptr: unicast tvlv handler callback function. This function receives the * source & destination of the unicast packet as well as the tvlv content * to process. + * @mptr: multicast packet tvlv handler callback function. This function + * receives the full skb to process, with the skb network header pointing + * to the current tvlv and the skb transport header pointing to the first + * byte after the current tvlv. * @type: tvlv handler type to be registered * @version: tvlv handler version to be registered * @flags: flags to enable or disable TVLV API behavior @@ -518,6 +542,8 @@ void batadv_tvlv_handler_register(struct batadv_priv *bat_priv, u8 *src, u8 *dst, void *tvlv_value, u16 tvlv_value_len), + int (*mptr)(struct batadv_priv *bat_priv, + struct sk_buff *skb), u8 type, u8 version, u8 flags) { struct batadv_tvlv_handler *tvlv_handler; @@ -539,6 +565,7 @@ void batadv_tvlv_handler_register(struct batadv_priv *bat_priv, tvlv_handler->ogm_handler = optr; tvlv_handler->unicast_handler = uptr; + tvlv_handler->mcast_handler = mptr; tvlv_handler->type = type; tvlv_handler->version = version; tvlv_handler->flags = flags; diff --git a/net/batman-adv/tvlv.h b/net/batman-adv/tvlv.h index 4cf8af00fc11..e5697230d991 100644 --- a/net/batman-adv/tvlv.h +++ b/net/batman-adv/tvlv.h @@ -9,6 +9,7 @@ #include "main.h" +#include #include #include @@ -34,14 +35,16 @@ void batadv_tvlv_handler_register(struct batadv_priv *bat_priv, u8 *src, u8 *dst, void *tvlv_value, u16 tvlv_value_len), + int (*mptr)(struct batadv_priv *bat_priv, + struct sk_buff *skb), u8 type, u8 version, u8 flags); void batadv_tvlv_handler_unregister(struct batadv_priv *bat_priv, u8 type, u8 version); int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, - bool ogm_source, + u8 packet_type, struct batadv_orig_node *orig_node, - u8 *src, u8 *dst, - void *tvlv_buff, u16 tvlv_buff_len); + struct sk_buff *skb, void *tvlv_buff, + u16 tvlv_buff_len); void batadv_tvlv_unicast_send(struct batadv_priv *bat_priv, const u8 *src, const u8 *dst, u8 type, u8 version, void *tvlv_value, u16 tvlv_value_len); diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 758cd797a063..ca9449ec9836 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -2335,6 +2335,12 @@ struct batadv_tvlv_handler { u8 *src, u8 *dst, void *tvlv_value, u16 tvlv_value_len); + /** + * @mcast_handler: handler callback which is given the tvlv payload to + * process on incoming mcast packet + */ + int (*mcast_handler)(struct batadv_priv *bat_priv, struct sk_buff *skb); + /** @type: tvlv type this handler feels responsible for */ u8 type; -- cgit v1.2.3 From 40e0b09081420853542571c38875b48b60404ebb Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Thu, 19 Jan 2023 16:45:16 -0800 Subject: net/sock: Introduce trace_sk_data_ready() As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang Signed-off-by: Peilin Ye Signed-off-by: David S. Miller --- drivers/infiniband/hw/erdma/erdma_cm.c | 3 +++ drivers/infiniband/sw/siw/siw_cm.c | 5 +++++ drivers/infiniband/sw/siw/siw_qp.c | 3 +++ drivers/nvme/host/tcp.c | 3 +++ drivers/nvme/target/tcp.c | 5 +++++ drivers/scsi/iscsi_tcp.c | 3 +++ drivers/soc/qcom/qmi_interface.c | 3 +++ drivers/target/iscsi/iscsi_target_nego.c | 2 ++ drivers/xen/pvcalls-back.c | 5 +++++ fs/dlm/lowcomms.c | 5 +++++ fs/ocfs2/cluster/tcp.c | 5 +++++ include/trace/events/sock.h | 24 ++++++++++++++++++++++++ net/bluetooth/rfcomm/core.c | 4 ++++ net/ceph/messenger.c | 4 ++++ net/core/net-traces.c | 2 ++ net/core/skmsg.c | 5 +++++ net/core/sock.c | 2 ++ net/kcm/kcmsock.c | 3 +++ net/mptcp/subflow.c | 3 +++ net/phonet/pep-gprs.c | 4 ++++ net/qrtr/ns.c | 3 +++ net/rds/tcp_listen.c | 2 ++ net/rds/tcp_recv.c | 2 ++ net/sctp/socket.c | 3 +++ net/smc/smc_rx.c | 3 +++ net/sunrpc/svcsock.c | 5 +++++ net/sunrpc/xprtsock.c | 3 +++ net/tipc/socket.c | 3 +++ net/tipc/topsrv.c | 5 +++++ net/tls/tls_sw.c | 3 +++ net/xfrm/espintcp.c | 3 +++ 31 files changed, 128 insertions(+) (limited to 'net') diff --git a/drivers/infiniband/hw/erdma/erdma_cm.c b/drivers/infiniband/hw/erdma/erdma_cm.c index 74f6348f240a..771059a8eb7d 100644 --- a/drivers/infiniband/hw/erdma/erdma_cm.c +++ b/drivers/infiniband/hw/erdma/erdma_cm.c @@ -11,6 +11,7 @@ /* Copyright (c) 2017, Open Grid Computing, Inc. */ #include +#include #include "erdma.h" #include "erdma_cm.h" @@ -925,6 +926,8 @@ static void erdma_cm_llp_data_ready(struct sock *sk) { struct erdma_cep *cep; + trace_sk_data_ready(sk); + read_lock(&sk->sk_callback_lock); cep = sk_to_cep(sk); diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c index f88d2971c2c6..da530c0404da 100644 --- a/drivers/infiniband/sw/siw/siw_cm.c +++ b/drivers/infiniband/sw/siw/siw_cm.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -109,6 +110,8 @@ static void siw_rtr_data_ready(struct sock *sk) struct siw_qp *qp = NULL; read_descriptor_t rd_desc; + trace_sk_data_ready(sk); + read_lock(&sk->sk_callback_lock); cep = sk_to_cep(sk); @@ -1216,6 +1219,8 @@ static void siw_cm_llp_data_ready(struct sock *sk) { struct siw_cep *cep; + trace_sk_data_ready(sk); + read_lock(&sk->sk_callback_lock); cep = sk_to_cep(sk); diff --git a/drivers/infiniband/sw/siw/siw_qp.c b/drivers/infiniband/sw/siw/siw_qp.c index e6f634971228..81e9bbd9ebda 100644 --- a/drivers/infiniband/sw/siw/siw_qp.c +++ b/drivers/infiniband/sw/siw/siw_qp.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "siw.h" #include "siw_verbs.h" @@ -94,6 +95,8 @@ void siw_qp_llp_data_ready(struct sock *sk) { struct siw_qp *qp; + trace_sk_data_ready(sk); + read_lock(&sk->sk_callback_lock); if (unlikely(!sk->sk_user_data || !sk_to_qp(sk))) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 8cedc1ef496c..70e273b565de 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "nvme.h" #include "fabrics.h" @@ -905,6 +906,8 @@ static void nvme_tcp_data_ready(struct sock *sk) { struct nvme_tcp_queue *queue; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); queue = sk->sk_user_data; if (likely(queue && queue->rd_enabled) && diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index cc05c094de22..4a161c8cb531 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "nvmet.h" @@ -1470,6 +1471,8 @@ static void nvmet_tcp_data_ready(struct sock *sk) { struct nvmet_tcp_queue *queue; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); queue = sk->sk_user_data; if (likely(queue)) @@ -1667,6 +1670,8 @@ static void nvmet_tcp_listen_data_ready(struct sock *sk) { struct nvmet_tcp_port *port; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); port = sk->sk_user_data; if (!port) diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 1d1cf641937c..08f204ba9cd1 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "iscsi_tcp.h" @@ -170,6 +171,8 @@ static void iscsi_sw_tcp_data_ready(struct sock *sk) struct iscsi_tcp_conn *tcp_conn; struct iscsi_conn *conn; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); conn = sk->sk_user_data; if (!conn) { diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c index 57052726299d..820bdd9f8e46 100644 --- a/drivers/soc/qcom/qmi_interface.c +++ b/drivers/soc/qcom/qmi_interface.c @@ -12,6 +12,7 @@ #include #include #include +#include #include static struct socket *qmi_sock_create(struct qmi_handle *qmi, @@ -569,6 +570,8 @@ static void qmi_data_ready(struct sock *sk) { struct qmi_handle *qmi = sk->sk_user_data; + trace_sk_data_ready(sk); + /* * This will be NULL if we receive data while being in * qmi_handle_release() diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index ff49c8f3fe24..24040c118e49 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -384,6 +385,7 @@ static void iscsi_target_sk_data_ready(struct sock *sk) struct iscsit_conn *conn = sk->sk_user_data; bool rc; + trace_sk_data_ready(sk); pr_debug("Entering iscsi_target_sk_data_ready: conn: %p\n", conn); write_lock_bh(&sk->sk_callback_lock); diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 0d4f8f4f4948..e2abc3474d85 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include @@ -300,6 +301,8 @@ static void pvcalls_sk_data_ready(struct sock *sock) struct sock_mapping *map = sock->sk_user_data; struct pvcalls_ioworker *iow; + trace_sk_data_ready(sock); + if (map == NULL) return; @@ -588,6 +591,8 @@ static void pvcalls_pass_sk_data_ready(struct sock *sock) unsigned long flags; int notify; + trace_sk_data_ready(sock); + if (mappass == NULL) return; diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 4450721ec83c..7920c655173c 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -54,6 +54,7 @@ #include #include +#include #include "dlm_internal.h" #include "lowcomms.h" @@ -499,6 +500,8 @@ static void lowcomms_data_ready(struct sock *sk) { struct connection *con = sock2con(sk); + trace_sk_data_ready(sk); + set_bit(CF_RECV_INTR, &con->flags); lowcomms_queue_rwork(con); } @@ -530,6 +533,8 @@ static void lowcomms_state_change(struct sock *sk) static void lowcomms_listen_data_ready(struct sock *sk) { + trace_sk_data_ready(sk); + queue_work(io_workqueue, &listen_con.rwork); } diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index a07b24d170f2..aecbd712a00c 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -46,6 +46,7 @@ #include #include #include +#include #include @@ -585,6 +586,8 @@ static void o2net_data_ready(struct sock *sk) void (*ready)(struct sock *sk); struct o2net_sock_container *sc; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); sc = sk->sk_user_data; if (sc) { @@ -1931,6 +1934,8 @@ static void o2net_listen_data_ready(struct sock *sk) { void (*ready)(struct sock *sk); + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); ready = sk->sk_user_data; if (ready == NULL) { /* check for teardown race */ diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h index 71492e8276da..03d19fc562f8 100644 --- a/include/trace/events/sock.h +++ b/include/trace/events/sock.h @@ -263,6 +263,30 @@ TRACE_EVENT(inet_sk_error_report, __entry->error) ); +TRACE_EVENT(sk_data_ready, + + TP_PROTO(const struct sock *sk), + + TP_ARGS(sk), + + TP_STRUCT__entry( + __field(const void *, skaddr) + __field(__u16, family) + __field(__u16, protocol) + __field(unsigned long, ip) + ), + + TP_fast_assign( + __entry->skaddr = sk; + __entry->family = sk->sk_family; + __entry->protocol = sk->sk_protocol; + __entry->ip = _RET_IP_; + ), + + TP_printk("family=%u protocol=%u func=%ps", + __entry->family, __entry->protocol, (void *)__entry->ip) +); + /* * sock send/recv msg length */ diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c index 8d6fce9005bd..053ef8f25fae 100644 --- a/net/bluetooth/rfcomm/core.c +++ b/net/bluetooth/rfcomm/core.c @@ -35,6 +35,8 @@ #include #include +#include + #define VERSION "1.11" static bool disable_cfc; @@ -186,6 +188,8 @@ static void rfcomm_l2state_change(struct sock *sk) static void rfcomm_l2data_ready(struct sock *sk) { + trace_sk_data_ready(sk); + BT_DBG("%p", sk); rfcomm_schedule(); } diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 1d06e114ba3f..cd7b0bf5369e 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -17,6 +17,7 @@ #endif /* CONFIG_BLOCK */ #include #include +#include #include #include @@ -344,6 +345,9 @@ static void con_sock_state_closed(struct ceph_connection *con) static void ceph_sock_data_ready(struct sock *sk) { struct ceph_connection *con = sk->sk_user_data; + + trace_sk_data_ready(sk); + if (atomic_read(&con->msgr->stopping)) { return; } diff --git a/net/core/net-traces.c b/net/core/net-traces.c index c40cd8dd75c7..ee7006bbe49b 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -61,3 +61,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(napi_poll); EXPORT_TRACEPOINT_SYMBOL_GPL(tcp_send_reset); EXPORT_TRACEPOINT_SYMBOL_GPL(tcp_bad_csum); + +EXPORT_TRACEPOINT_SYMBOL_GPL(sk_data_ready); diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 53d0251788aa..f81883759d38 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -8,6 +8,7 @@ #include #include #include +#include static bool sk_msg_try_coalesce_ok(struct sk_msg *msg, int elem_first_coalesce) { @@ -1114,6 +1115,8 @@ static void sk_psock_strp_data_ready(struct sock *sk) { struct sk_psock *psock; + trace_sk_data_ready(sk); + rcu_read_lock(); psock = sk_psock(sk); if (likely(psock)) { @@ -1210,6 +1213,8 @@ static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; + trace_sk_data_ready(sk); + if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; sock->ops->read_skb(sk, sk_psock_verdict_recv); diff --git a/net/core/sock.c b/net/core/sock.c index f954d5893e79..7ba4891460ad 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3291,6 +3291,8 @@ void sock_def_readable(struct sock *sk) { struct socket_wq *wq; + trace_sk_data_ready(sk); + rcu_read_lock(); wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 890a2423f559..cfe828bd7fc6 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -28,6 +28,7 @@ #include #include #include +#include unsigned int kcm_net_id; @@ -349,6 +350,8 @@ static void psock_data_ready(struct sock *sk) { struct kcm_psock *psock; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); psock = (struct kcm_psock *)sk->sk_user_data; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index ec54413fb31f..beaec843f5ca 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -26,6 +26,7 @@ #include "mib.h" #include +#include static void mptcp_subflow_ops_undo_override(struct sock *ssk); @@ -1438,6 +1439,8 @@ static void subflow_data_ready(struct sock *sk) struct sock *parent = subflow->conn; struct mptcp_sock *msk; + trace_sk_data_ready(sk); + msk = mptcp_sk(parent); if (state & TCPF_LISTEN) { /* MPJ subflow are removed from accept queue before reaching here, diff --git a/net/phonet/pep-gprs.c b/net/phonet/pep-gprs.c index 1f5df0432d37..7f68d8662cfb 100644 --- a/net/phonet/pep-gprs.c +++ b/net/phonet/pep-gprs.c @@ -19,6 +19,8 @@ #include #include +#include + #define GPRS_DEFAULT_MTU 1400 struct gprs_dev { @@ -138,6 +140,8 @@ static void gprs_data_ready(struct sock *sk) struct gprs_dev *gp = sk->sk_user_data; struct sk_buff *skb; + trace_sk_data_ready(sk); + while ((skb = pep_read(sk)) != NULL) { skb_orphan(skb); gprs_recv(gp, skb); diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c index 1990d496fcfc..97bfdf9fd028 100644 --- a/net/qrtr/ns.c +++ b/net/qrtr/ns.c @@ -12,6 +12,7 @@ #include "qrtr.h" +#include #define CREATE_TRACE_POINTS #include @@ -752,6 +753,8 @@ static void qrtr_ns_worker(struct work_struct *work) static void qrtr_ns_data_ready(struct sock *sk) { + trace_sk_data_ready(sk); + queue_work(qrtr_ns.workqueue, &qrtr_ns.work); } diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c index 7edf2e69d3fe..014fa24418c1 100644 --- a/net/rds/tcp_listen.c +++ b/net/rds/tcp_listen.c @@ -34,6 +34,7 @@ #include #include #include +#include #include "rds.h" #include "tcp.h" @@ -234,6 +235,7 @@ void rds_tcp_listen_data_ready(struct sock *sk) { void (*ready)(struct sock *sk); + trace_sk_data_ready(sk); rdsdebug("listen data ready sk %p\n", sk); read_lock_bh(&sk->sk_callback_lock); diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c index f4ee13da90c7..c00f04a1a534 100644 --- a/net/rds/tcp_recv.c +++ b/net/rds/tcp_recv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "rds.h" #include "tcp.h" @@ -309,6 +310,7 @@ void rds_tcp_data_ready(struct sock *sk) struct rds_conn_path *cp; struct rds_tcp_connection *tc; + trace_sk_data_ready(sk); rdsdebug("data ready sk %p\n", sk); read_lock_bh(&sk->sk_callback_lock); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 84021a6c4f9d..a98511b676cd 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -59,6 +59,7 @@ #include #include #include +#include #include /* for sa_family_t */ #include @@ -9244,6 +9245,8 @@ void sctp_data_ready(struct sock *sk) { struct socket_wq *wq; + trace_sk_data_ready(sk); + rcu_read_lock(); wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index 17c5aee7ee4f..0a6e615f000c 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -15,6 +15,7 @@ #include #include +#include #include "smc.h" #include "smc_core.h" @@ -31,6 +32,8 @@ static void smc_rx_wake_up(struct sock *sk) { struct socket_wq *wq; + trace_sk_data_ready(sk); + /* derived from sock_def_readable() */ /* called already in smc_listen_work() */ rcu_read_lock(); diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 815baf308236..99eafe87b1d5 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -55,6 +55,7 @@ #include #include +#include #include #include "socklib.h" @@ -310,6 +311,8 @@ static void svc_data_ready(struct sock *sk) { struct svc_sock *svsk = (struct svc_sock *)sk->sk_user_data; + trace_sk_data_ready(sk); + if (svsk) { /* Refer to svc_setup_socket() for details. */ rmb(); @@ -687,6 +690,8 @@ static void svc_tcp_listen_data_ready(struct sock *sk) { struct svc_sock *svsk = (struct svc_sock *)sk->sk_user_data; + trace_sk_data_ready(sk); + if (svsk) { /* Refer to svc_setup_socket() for details. */ rmb(); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index aaa5b2741b79..adcbedc244d6 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -52,6 +52,7 @@ #include #include +#include #include #include "socklib.h" @@ -1378,6 +1379,8 @@ static void xs_data_ready(struct sock *sk) { struct rpc_xprt *xprt; + trace_sk_data_ready(sk); + xprt = xprt_from_sock(sk); if (xprt != NULL) { struct sock_xprt *transport = container_of(xprt, diff --git a/net/tipc/socket.c b/net/tipc/socket.c index b35c8701876a..07c9bf5f7f5c 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -37,6 +37,7 @@ #include #include +#include #include "core.h" #include "name_table.h" @@ -2130,6 +2131,8 @@ static void tipc_data_ready(struct sock *sk) { struct socket_wq *wq; + trace_sk_data_ready(sk); + rcu_read_lock(); wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c index 69c88cc03887..8ee0c07d00e9 100644 --- a/net/tipc/topsrv.c +++ b/net/tipc/topsrv.c @@ -43,6 +43,7 @@ #include "bearer.h" #include #include +#include /* Number of messages to send before rescheduling */ #define MAX_SEND_MSG_COUNT 25 @@ -439,6 +440,8 @@ static void tipc_conn_data_ready(struct sock *sk) { struct tipc_conn *con; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); con = sk->sk_user_data; if (connected(con)) { @@ -496,6 +499,8 @@ static void tipc_topsrv_listener_data_ready(struct sock *sk) { struct tipc_topsrv *srv; + trace_sk_data_ready(sk); + read_lock_bh(&sk->sk_callback_lock); srv = sk->sk_user_data; if (srv) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9ed978634125..fa137063aaa0 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -43,6 +43,7 @@ #include #include +#include #include "tls.h" @@ -2284,6 +2285,8 @@ static void tls_data_ready(struct sock *sk) struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct sk_psock *psock; + trace_sk_data_ready(sk); + tls_strp_data_ready(&ctx->strp); psock = sk_psock_get(sk); diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c index 74a54295c164..872b80188e83 100644 --- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -6,6 +6,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_IPV6) #include #endif @@ -397,6 +398,8 @@ static void espintcp_data_ready(struct sock *sk) { struct espintcp_ctx *ctx = espintcp_getctx(sk); + trace_sk_data_ready(sk); + strp_data_ready(&ctx->strp); } -- cgit v1.2.3 From 2b30f8291a30305eeedcd787b4d0e352410f7268 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 19 Jan 2023 14:26:54 +0200 Subject: net: ethtool: add support for MAC Merge layer The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2 specifications (the other being Frame Preemption; IEEE 802.1Q-2018 clause 6.7.2), which work together to minimize latency caused by frame interference at TX. The overall goal of TSN is for normal traffic and traffic with a bounded deadline to be able to cohabitate on the same L2 network and not bother each other too much. The standards achieve this (partly) by introducing the concept of preemptible traffic, i.e. Ethernet frames that have a custom value for the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented and reassembled at L2 on a link-local basis. The non-preemptible frames are called express traffic, they are transmitted using a normal SFD, and they can preempt preemptible frames, therefore having lower latency, which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo frames around 9K). Preemption is not recursive, i.e. a P frame cannot preempt another P frame. Preemption also does not depend upon priority, or otherwise said, an E frame with prio 0 will still preempt a P frame with prio 7. In terms of implementation, the standards talk about the presence of an express MAC (eMAC) which handles express traffic, and a preemptible MAC (pMAC) which handles preemptible traffic, and these MACs are multiplexed on the same MII by a MAC merge layer. To support frame preemption, the definition of the SFD was generalized to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an Ethernet frame fragment, or a complete frame. Stations unaware of an SMD value different from the standard SFD will treat P frames as error frames. To prevent that from happening, a negotiation process is defined. On RX, packets are dispatched to the eMAC or pMAC after being filtered by their SMD. On TX, the eMAC/pMAC classification decision is taken by the 802.1Q spec, based on packet priority (each of the 8 user priority values may have an admin-status of preemptible or express). The MAC Merge layer and the Frame Preemption parameters have some degree of independence in terms of how software stacks are supposed to deal with them. The activation of the MM layer is supposed to be controlled by an LLDP daemon (after it has been communicated that the link partner also supports it), after which a (hardware-based or not) verification handshake takes place, before actually enabling the feature. So the process is intended to be relatively plug-and-play. Whereas FP settings are supposed to be coordinated across a network using something approximating NETCONF. The support contained here is exclusively for the 802.3 (MAC Merge) portions and not for the 802.1Q (Frame Preemption) parts. This API is sufficient for an LLDP daemon to do its job. The FP adminStatus variable from 802.1Q is outside the scope of an LLDP daemon. I have taken a few creative licenses and augmented the Linux kernel UAPI compared to the standard managed objects recommended by IEEE 802.3. These are: - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive Processing state diagram, a MAC Merge layer is always supposed to be able to receive P frames. However, this implies keeping the pMAC powered on, which will consume needless power in applications where FP will never be used. If LLDP is used, the reception of an Additional Ethernet Capabilities TLV from the link partner is sufficient indication that the pMAC should be enabled. So my proposal is that in Linux, we keep the pMAC turned off by default and that user space turns it on when needed. - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in the boolean netlink attributes offered, so this is also positive here. Other than the meaning being reversed, they correspond to the same thing. - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP daemon to maximize the verifyTime variable (delay between SMD-V transmissions), to maximize its chances that the LP replies. IEEE says that the verifyTime can range between 1 and 128 ms, but the NXP ENETC stupidly keeps this variable in a 7 bit register, so the maximum supported value is 127 ms. I could have chosen to hardcode this in the LLDP daemon to a lower value, but why not let the kernel expose its supported range directly. - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called aMACMergeAddFragSize, and expresses the "additional" fragment size (on top of ETH_ZLEN), whereas this expresses the absolute value of the fragment size. - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed object mandated by the standard, but user space clearly needs to know what is the minimum supported fragment size of our local receiver, since LLDP must advertise a value no lower than that. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/ethtool.h | 99 ++++++++++++++ include/uapi/linux/ethtool.h | 25 ++++ include/uapi/linux/ethtool_netlink.h | 47 +++++++ net/ethtool/Makefile | 4 +- net/ethtool/mm.c | 255 +++++++++++++++++++++++++++++++++++ net/ethtool/netlink.c | 19 +++ net/ethtool/netlink.h | 4 + 7 files changed, 451 insertions(+), 2 deletions(-) create mode 100644 net/ethtool/mm.c (limited to 'net') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 20d197693d34..37eba38da502 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -477,6 +477,98 @@ struct ethtool_module_power_mode_params { enum ethtool_module_power_mode mode; }; +/** + * struct ethtool_mm_state - 802.3 MAC merge layer state + * @verify_time: + * wait time between verification attempts in ms (according to clause + * 30.14.1.6 aMACMergeVerifyTime) + * @max_verify_time: + * maximum accepted value for the @verify_time variable in set requests + * @verify_status: + * state of the verification state machine of the MM layer (according to + * clause 30.14.1.2 aMACMergeStatusVerify) + * @tx_enabled: + * set if the MM layer is administratively enabled in the TX direction + * (according to clause 30.14.1.3 aMACMergeEnableTx) + * @tx_active: + * set if the MM layer is enabled in the TX direction, which makes FP + * possible (according to 30.14.1.5 aMACMergeStatusTx). This should be + * true if MM is enabled, and the verification status is either verified, + * or disabled. + * @pmac_enabled: + * set if the preemptible MAC is powered on and is able to receive + * preemptible packets and respond to verification frames. + * @verify_enabled: + * set if the Verify function of the MM layer (which sends SMD-V + * verification requests) is administratively enabled (regardless of + * whether it is currently in the ETHTOOL_MM_VERIFY_STATUS_DISABLED state + * or not), according to clause 30.14.1.4 aMACMergeVerifyDisableTx (but + * using positive rather than negative logic). The device should always + * respond to received SMD-V requests as long as @pmac_enabled is set. + * @tx_min_frag_size: + * the minimum size of non-final mPacket fragments that the link partner + * supports receiving, expressed in octets. Compared to the definition + * from clause 30.14.1.7 aMACMergeAddFragSize which is expressed in the + * range 0 to 3 (requiring a translation to the size in octets according + * to the formula 64 * (1 + addFragSize) - 4), a value in a continuous and + * unbounded range can be specified here. + * @rx_min_frag_size: + * the minimum size of non-final mPacket fragments that this device + * supports receiving, expressed in octets. + */ +struct ethtool_mm_state { + u32 verify_time; + u32 max_verify_time; + enum ethtool_mm_verify_status verify_status; + bool tx_enabled; + bool tx_active; + bool pmac_enabled; + bool verify_enabled; + u32 tx_min_frag_size; + u32 rx_min_frag_size; +}; + +/** + * struct ethtool_mm_cfg - 802.3 MAC merge layer configuration + * @verify_time: see struct ethtool_mm_state + * @verify_enabled: see struct ethtool_mm_state + * @tx_enabled: see struct ethtool_mm_state + * @pmac_enabled: see struct ethtool_mm_state + * @tx_min_frag_size: see struct ethtool_mm_state + */ +struct ethtool_mm_cfg { + u32 verify_time; + bool verify_enabled; + bool tx_enabled; + bool pmac_enabled; + u32 tx_min_frag_size; +}; + +/** + * struct ethtool_mm_stats - 802.3 MAC merge layer statistics + * @MACMergeFrameAssErrorCount: + * received MAC frames with reassembly errors + * @MACMergeFrameSmdErrorCount: + * received MAC frames/fragments rejected due to unknown or incorrect SMD + * @MACMergeFrameAssOkCount: + * received MAC frames that were successfully reassembled and passed up + * @MACMergeFragCountRx: + * number of additional correct SMD-C mPackets received due to preemption + * @MACMergeFragCountTx: + * number of additional mPackets sent due to preemption + * @MACMergeHoldCount: + * number of times the MM layer entered the HOLD state, which blocks + * transmission of preemptible traffic + */ +struct ethtool_mm_stats { + u64 MACMergeFrameAssErrorCount; + u64 MACMergeFrameSmdErrorCount; + u64 MACMergeFrameAssOkCount; + u64 MACMergeFragCountRx; + u64 MACMergeFragCountTx; + u64 MACMergeHoldCount; +}; + /** * struct ethtool_ops - optional netdev operations * @cap_link_lanes_supported: indicates if the driver supports lanes @@ -649,6 +741,9 @@ struct ethtool_module_power_mode_params { * plugged-in. * @set_module_power_mode: Set the power mode policy for the plug-in module * used by the network device. + * @get_mm: Query the 802.3 MAC Merge layer state. + * @set_mm: Set the 802.3 MAC Merge layer parameters. + * @get_mm_stats: Query the 802.3 MAC Merge layer statistics. * * All operations are optional (i.e. the function pointer may be set * to %NULL) and callers must take this into account. Callers must @@ -787,6 +882,10 @@ struct ethtool_ops { int (*set_module_power_mode)(struct net_device *dev, const struct ethtool_module_power_mode_params *params, struct netlink_ext_ack *extack); + int (*get_mm)(struct net_device *dev, struct ethtool_mm_state *state); + int (*set_mm)(struct net_device *dev, struct ethtool_mm_cfg *cfg, + struct netlink_ext_ack *extack); + void (*get_mm_stats)(struct net_device *dev, struct ethtool_mm_stats *stats); }; int ethtool_check_ops(const struct ethtool_ops *ops); diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 6389953c32cf..529a93696ab6 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -779,6 +779,31 @@ enum ethtool_podl_pse_pw_d_status { ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR, }; +/** + * enum ethtool_mm_verify_status - status of MAC Merge Verify function + * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN: + * verification status is unknown + * @ETHTOOL_MM_VERIFY_STATUS_INITIAL: + * the 802.3 Verify State diagram is in the state INIT_VERIFICATION + * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING: + * the Verify State diagram is in the state VERIFICATION_IDLE, + * SEND_VERIFY or WAIT_FOR_RESPONSE + * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED: + * indicates that the Verify State diagram is in the state VERIFIED + * @ETHTOOL_MM_VERIFY_STATUS_FAILED: + * the Verify State diagram is in the state VERIFY_FAIL + * @ETHTOOL_MM_VERIFY_STATUS_DISABLED: + * verification of preemption operation is disabled + */ +enum ethtool_mm_verify_status { + ETHTOOL_MM_VERIFY_STATUS_UNKNOWN, + ETHTOOL_MM_VERIFY_STATUS_INITIAL, + ETHTOOL_MM_VERIFY_STATUS_VERIFYING, + ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED, + ETHTOOL_MM_VERIFY_STATUS_FAILED, + ETHTOOL_MM_VERIFY_STATUS_DISABLED, +}; + /** * struct ethtool_gstrings - string set for data tagging * @cmd: Command number = %ETHTOOL_GSTRINGS diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 83557cae0b87..58af390823b0 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -55,6 +55,8 @@ enum { ETHTOOL_MSG_PLCA_GET_CFG, ETHTOOL_MSG_PLCA_SET_CFG, ETHTOOL_MSG_PLCA_GET_STATUS, + ETHTOOL_MSG_MM_GET, + ETHTOOL_MSG_MM_SET, /* add new constants above here */ __ETHTOOL_MSG_USER_CNT, @@ -105,6 +107,8 @@ enum { ETHTOOL_MSG_PLCA_GET_CFG_REPLY, ETHTOOL_MSG_PLCA_GET_STATUS_REPLY, ETHTOOL_MSG_PLCA_NTF, + ETHTOOL_MSG_MM_GET_REPLY, + ETHTOOL_MSG_MM_NTF, /* add new constants above here */ __ETHTOOL_MSG_KERNEL_CNT, @@ -922,6 +926,49 @@ enum { ETHTOOL_A_PLCA_MAX = (__ETHTOOL_A_PLCA_CNT - 1) }; +/* MAC Merge (802.3) */ + +enum { + ETHTOOL_A_MM_STAT_UNSPEC, + ETHTOOL_A_MM_STAT_PAD, + + /* aMACMergeFrameAssErrorCount */ + ETHTOOL_A_MM_STAT_REASSEMBLY_ERRORS, /* u64 */ + /* aMACMergeFrameSmdErrorCount */ + ETHTOOL_A_MM_STAT_SMD_ERRORS, /* u64 */ + /* aMACMergeFrameAssOkCount */ + ETHTOOL_A_MM_STAT_REASSEMBLY_OK, /* u64 */ + /* aMACMergeFragCountRx */ + ETHTOOL_A_MM_STAT_RX_FRAG_COUNT, /* u64 */ + /* aMACMergeFragCountTx */ + ETHTOOL_A_MM_STAT_TX_FRAG_COUNT, /* u64 */ + /* aMACMergeHoldCount */ + ETHTOOL_A_MM_STAT_HOLD_COUNT, /* u64 */ + + /* add new constants above here */ + __ETHTOOL_A_MM_STAT_CNT, + ETHTOOL_A_MM_STAT_MAX = (__ETHTOOL_A_MM_STAT_CNT - 1) +}; + +enum { + ETHTOOL_A_MM_UNSPEC, + ETHTOOL_A_MM_HEADER, /* nest - _A_HEADER_* */ + ETHTOOL_A_MM_PMAC_ENABLED, /* u8 */ + ETHTOOL_A_MM_TX_ENABLED, /* u8 */ + ETHTOOL_A_MM_TX_ACTIVE, /* u8 */ + ETHTOOL_A_MM_TX_MIN_FRAG_SIZE, /* u32 */ + ETHTOOL_A_MM_RX_MIN_FRAG_SIZE, /* u32 */ + ETHTOOL_A_MM_VERIFY_ENABLED, /* u8 */ + ETHTOOL_A_MM_VERIFY_STATUS, /* u8 */ + ETHTOOL_A_MM_VERIFY_TIME, /* u32 */ + ETHTOOL_A_MM_MAX_VERIFY_TIME, /* u32 */ + ETHTOOL_A_MM_STATS, /* nest - _A_MM_STAT_* */ + + /* add new constants above here */ + __ETHTOOL_A_MM_CNT, + ETHTOOL_A_MM_MAX = (__ETHTOOL_A_MM_CNT - 1) +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile index 563864c1bf5a..504f954a1b28 100644 --- a/net/ethtool/Makefile +++ b/net/ethtool/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o ethtool_nl-y := netlink.o bitset.o strset.o linkinfo.o linkmodes.o rss.o \ linkstate.o debug.o wol.o features.o privflags.o rings.o \ channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \ - tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o \ - pse-pd.o plca.o + tunnels.o fec.o eeprom.o stats.o phc_vclocks.o mm.o \ + module.o pse-pd.o plca.o mm.o diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c new file mode 100644 index 000000000000..76b209ad53d2 --- /dev/null +++ b/net/ethtool/mm.c @@ -0,0 +1,255 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright 2022-2023 NXP + */ +#include "common.h" +#include "netlink.h" + +struct mm_req_info { + struct ethnl_req_info base; +}; + +struct mm_reply_data { + struct ethnl_reply_data base; + struct ethtool_mm_state state; + struct ethtool_mm_stats stats; +}; + +#define MM_REPDATA(__reply_base) \ + container_of(__reply_base, struct mm_reply_data, base) + +#define ETHTOOL_MM_STAT_CNT \ + (__ETHTOOL_A_MM_STAT_CNT - (ETHTOOL_A_MM_STAT_PAD + 1)) + +const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1] = { + [ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy_stats), +}; + +static int mm_prepare_data(const struct ethnl_req_info *req_base, + struct ethnl_reply_data *reply_base, + struct genl_info *info) +{ + struct mm_reply_data *data = MM_REPDATA(reply_base); + struct net_device *dev = reply_base->dev; + const struct ethtool_ops *ops; + int ret; + + ops = dev->ethtool_ops; + + if (!ops->get_mm) + return -EOPNOTSUPP; + + ethtool_stats_init((u64 *)&data->stats, + sizeof(data->stats) / sizeof(u64)); + + ret = ethnl_ops_begin(dev); + if (ret < 0) + return ret; + + ret = ops->get_mm(dev, &data->state); + if (ret) + goto out_complete; + + if (ops->get_mm_stats && (req_base->flags & ETHTOOL_FLAG_STATS)) + ops->get_mm_stats(dev, &data->stats); + +out_complete: + ethnl_ops_complete(dev); + + return 0; +} + +static int mm_reply_size(const struct ethnl_req_info *req_base, + const struct ethnl_reply_data *reply_base) +{ + int len = 0; + + len += nla_total_size(sizeof(u8)); /* _MM_PMAC_ENABLED */ + len += nla_total_size(sizeof(u8)); /* _MM_TX_ENABLED */ + len += nla_total_size(sizeof(u8)); /* _MM_TX_ACTIVE */ + len += nla_total_size(sizeof(u8)); /* _MM_VERIFY_ENABLED */ + len += nla_total_size(sizeof(u8)); /* _MM_VERIFY_STATUS */ + len += nla_total_size(sizeof(u32)); /* _MM_VERIFY_TIME */ + len += nla_total_size(sizeof(u32)); /* _MM_MAX_VERIFY_TIME */ + len += nla_total_size(sizeof(u32)); /* _MM_TX_MIN_FRAG_SIZE */ + len += nla_total_size(sizeof(u32)); /* _MM_RX_MIN_FRAG_SIZE */ + + if (req_base->flags & ETHTOOL_FLAG_STATS) + len += nla_total_size(0) + /* _MM_STATS */ + nla_total_size_64bit(sizeof(u64)) * ETHTOOL_MM_STAT_CNT; + + return len; +} + +static int mm_put_stat(struct sk_buff *skb, u64 val, u16 attrtype) +{ + if (val == ETHTOOL_STAT_NOT_SET) + return 0; + if (nla_put_u64_64bit(skb, attrtype, val, ETHTOOL_A_MM_STAT_PAD)) + return -EMSGSIZE; + return 0; +} + +static int mm_put_stats(struct sk_buff *skb, + const struct ethtool_mm_stats *stats) +{ + struct nlattr *nest; + + nest = nla_nest_start(skb, ETHTOOL_A_MM_STATS); + if (!nest) + return -EMSGSIZE; + + if (mm_put_stat(skb, stats->MACMergeFrameAssErrorCount, + ETHTOOL_A_MM_STAT_REASSEMBLY_ERRORS) || + mm_put_stat(skb, stats->MACMergeFrameSmdErrorCount, + ETHTOOL_A_MM_STAT_SMD_ERRORS) || + mm_put_stat(skb, stats->MACMergeFrameAssOkCount, + ETHTOOL_A_MM_STAT_REASSEMBLY_OK) || + mm_put_stat(skb, stats->MACMergeFragCountRx, + ETHTOOL_A_MM_STAT_RX_FRAG_COUNT) || + mm_put_stat(skb, stats->MACMergeFragCountTx, + ETHTOOL_A_MM_STAT_TX_FRAG_COUNT) || + mm_put_stat(skb, stats->MACMergeHoldCount, + ETHTOOL_A_MM_STAT_HOLD_COUNT)) + goto err_cancel; + + nla_nest_end(skb, nest); + return 0; + +err_cancel: + nla_nest_cancel(skb, nest); + return -EMSGSIZE; +} + +static int mm_fill_reply(struct sk_buff *skb, + const struct ethnl_req_info *req_base, + const struct ethnl_reply_data *reply_base) +{ + const struct mm_reply_data *data = MM_REPDATA(reply_base); + const struct ethtool_mm_state *state = &data->state; + + if (nla_put_u8(skb, ETHTOOL_A_MM_TX_ENABLED, state->tx_enabled) || + nla_put_u8(skb, ETHTOOL_A_MM_TX_ACTIVE, state->tx_active) || + nla_put_u8(skb, ETHTOOL_A_MM_PMAC_ENABLED, state->pmac_enabled) || + nla_put_u8(skb, ETHTOOL_A_MM_VERIFY_ENABLED, state->verify_enabled) || + nla_put_u8(skb, ETHTOOL_A_MM_VERIFY_STATUS, state->verify_status) || + nla_put_u32(skb, ETHTOOL_A_MM_VERIFY_TIME, state->verify_time) || + nla_put_u32(skb, ETHTOOL_A_MM_MAX_VERIFY_TIME, state->max_verify_time) || + nla_put_u32(skb, ETHTOOL_A_MM_TX_MIN_FRAG_SIZE, state->tx_min_frag_size) || + nla_put_u32(skb, ETHTOOL_A_MM_RX_MIN_FRAG_SIZE, state->rx_min_frag_size)) + return -EMSGSIZE; + + if (req_base->flags & ETHTOOL_FLAG_STATS && + mm_put_stats(skb, &data->stats)) + return -EMSGSIZE; + + return 0; +} + +const struct ethnl_request_ops ethnl_mm_request_ops = { + .request_cmd = ETHTOOL_MSG_MM_GET, + .reply_cmd = ETHTOOL_MSG_MM_GET_REPLY, + .hdr_attr = ETHTOOL_A_MM_HEADER, + .req_info_size = sizeof(struct mm_req_info), + .reply_data_size = sizeof(struct mm_reply_data), + + .prepare_data = mm_prepare_data, + .reply_size = mm_reply_size, + .fill_reply = mm_fill_reply, +}; + +const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1] = { + [ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), + [ETHTOOL_A_MM_VERIFY_ENABLED] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_MM_VERIFY_TIME] = NLA_POLICY_RANGE(NLA_U32, 1, 128), + [ETHTOOL_A_MM_TX_ENABLED] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_MM_PMAC_ENABLED] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_MM_TX_MIN_FRAG_SIZE] = NLA_POLICY_RANGE(NLA_U32, 60, 252), +}; + +static void mm_state_to_cfg(const struct ethtool_mm_state *state, + struct ethtool_mm_cfg *cfg) +{ + /* We could also compare state->verify_status against + * ETHTOOL_MM_VERIFY_STATUS_DISABLED, but state->verify_enabled + * is more like an administrative state which should be seen in + * ETHTOOL_MSG_MM_GET replies. For example, a port with verification + * disabled might be in the ETHTOOL_MM_VERIFY_STATUS_INITIAL + * if it's down. + */ + cfg->verify_enabled = state->verify_enabled; + cfg->verify_time = state->verify_time; + cfg->tx_enabled = state->tx_enabled; + cfg->pmac_enabled = state->pmac_enabled; + cfg->tx_min_frag_size = state->tx_min_frag_size; +} + +int ethnl_set_mm(struct sk_buff *skb, struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct ethnl_req_info req_info = {}; + struct ethtool_mm_state state = {}; + struct nlattr **tb = info->attrs; + struct ethtool_mm_cfg cfg = {}; + const struct ethtool_ops *ops; + struct net_device *dev; + bool mod = false; + int ret; + + ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_MM_HEADER], + genl_info_net(info), extack, true); + if (ret) + return ret; + + dev = req_info.dev; + ops = dev->ethtool_ops; + + if (!ops->get_mm || !ops->set_mm) { + ret = -EOPNOTSUPP; + goto out_dev_put; + } + + rtnl_lock(); + ret = ethnl_ops_begin(dev); + if (ret < 0) + goto out_rtnl_unlock; + + ret = ops->get_mm(dev, &state); + if (ret) + goto out_complete; + + mm_state_to_cfg(&state, &cfg); + + ethnl_update_bool(&cfg.verify_enabled, tb[ETHTOOL_A_MM_VERIFY_ENABLED], + &mod); + ethnl_update_u32(&cfg.verify_time, tb[ETHTOOL_A_MM_VERIFY_TIME], &mod); + ethnl_update_bool(&cfg.tx_enabled, tb[ETHTOOL_A_MM_TX_ENABLED], &mod); + ethnl_update_bool(&cfg.pmac_enabled, tb[ETHTOOL_A_MM_PMAC_ENABLED], + &mod); + ethnl_update_u32(&cfg.tx_min_frag_size, + tb[ETHTOOL_A_MM_TX_MIN_FRAG_SIZE], &mod); + + if (!mod) + goto out_complete; + + if (cfg.verify_time > state.max_verify_time) { + NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MM_VERIFY_TIME], + "verifyTime exceeds device maximum"); + ret = -ERANGE; + goto out_complete; + } + + ret = ops->set_mm(dev, &cfg, extack); + if (ret) + goto out_complete; + + ethtool_notify(dev, ETHTOOL_MSG_MM_NTF, NULL); + +out_complete: + ethnl_ops_complete(dev); +out_rtnl_unlock: + rtnl_unlock(); +out_dev_put: + ethnl_parse_header_dev_put(&req_info); + return ret; +} diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index 9f924875bba9..6412c4dc663d 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -290,6 +290,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = { [ETHTOOL_MSG_RSS_GET] = ðnl_rss_request_ops, [ETHTOOL_MSG_PLCA_GET_CFG] = ðnl_plca_cfg_request_ops, [ETHTOOL_MSG_PLCA_GET_STATUS] = ðnl_plca_status_request_ops, + [ETHTOOL_MSG_MM_GET] = ðnl_mm_request_ops, }; static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb) @@ -606,6 +607,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = { [ETHTOOL_MSG_FEC_NTF] = ðnl_fec_request_ops, [ETHTOOL_MSG_MODULE_NTF] = ðnl_module_request_ops, [ETHTOOL_MSG_PLCA_NTF] = ðnl_plca_cfg_request_ops, + [ETHTOOL_MSG_MM_NTF] = ðnl_mm_request_ops, }; /* default notification handler */ @@ -700,6 +702,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = { [ETHTOOL_MSG_FEC_NTF] = ethnl_default_notify, [ETHTOOL_MSG_MODULE_NTF] = ethnl_default_notify, [ETHTOOL_MSG_PLCA_NTF] = ethnl_default_notify, + [ETHTOOL_MSG_MM_NTF] = ethnl_default_notify, }; void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data) @@ -1076,6 +1079,22 @@ static const struct genl_ops ethtool_genl_ops[] = { .policy = ethnl_plca_get_status_policy, .maxattr = ARRAY_SIZE(ethnl_plca_get_status_policy) - 1, }, + { + .cmd = ETHTOOL_MSG_MM_GET, + .doit = ethnl_default_doit, + .start = ethnl_default_start, + .dumpit = ethnl_default_dumpit, + .done = ethnl_default_done, + .policy = ethnl_mm_get_policy, + .maxattr = ARRAY_SIZE(ethnl_mm_get_policy) - 1, + }, + { + .cmd = ETHTOOL_MSG_MM_SET, + .flags = GENL_UNS_ADMIN_PERM, + .doit = ethnl_set_mm, + .policy = ethnl_mm_set_policy, + .maxattr = ARRAY_SIZE(ethnl_mm_set_policy) - 1, + }, }; static const struct genl_multicast_group ethtool_nl_mcgrps[] = { diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index f271266f6e28..657b0c80620c 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -349,6 +349,7 @@ extern const struct ethnl_request_ops ethnl_pse_request_ops; extern const struct ethnl_request_ops ethnl_rss_request_ops; extern const struct ethnl_request_ops ethnl_plca_cfg_request_ops; extern const struct ethnl_request_ops ethnl_plca_status_request_ops; +extern const struct ethnl_request_ops ethnl_mm_request_ops; extern const struct nla_policy ethnl_header_policy[ETHTOOL_A_HEADER_FLAGS + 1]; extern const struct nla_policy ethnl_header_policy_stats[ETHTOOL_A_HEADER_FLAGS + 1]; @@ -393,6 +394,8 @@ extern const struct nla_policy ethnl_rss_get_policy[ETHTOOL_A_RSS_CONTEXT + 1]; extern const struct nla_policy ethnl_plca_get_cfg_policy[ETHTOOL_A_PLCA_HEADER + 1]; extern const struct nla_policy ethnl_plca_set_cfg_policy[ETHTOOL_A_PLCA_MAX + 1]; extern const struct nla_policy ethnl_plca_get_status_policy[ETHTOOL_A_PLCA_HEADER + 1]; +extern const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1]; +extern const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1]; int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info); int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info); @@ -414,6 +417,7 @@ int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info); int ethnl_set_module(struct sk_buff *skb, struct genl_info *info); int ethnl_set_pse(struct sk_buff *skb, struct genl_info *info); int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info); +int ethnl_set_mm(struct sk_buff *skb, struct genl_info *info); extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN]; extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN]; -- cgit v1.2.3 From 04692c9020b76939715d6f2b4ff84d832246e0fc Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 19 Jan 2023 14:26:56 +0200 Subject: net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC) IEEE 802.3-2018 clause 99 defines a MAC Merge sublayer which contains an Express MAC and a Preemptible MAC. Both MACs are hidden to higher and lower layers and visible as a single MAC (packet classification to eMAC or pMAC on TX is done based on priority; classification on RX is done based on SFD). For devices which support a MAC Merge sublayer, it is desirable to retrieve individual packet counters from the eMAC and the pMAC, as well as aggregate statistics (their sum). Introduce a new ETHTOOL_A_STATS_SRC attribute which is part of the policy of ETHTOOL_MSG_STATS_GET and, and an ETHTOOL_A_PAUSE_STATS_SRC which is part of the policy of ETHTOOL_MSG_PAUSE_GET (accepted when ETHTOOL_FLAG_STATS is set in the common ethtool header). Both of these take values from enum ethtool_mac_stats_src, defaulting to "aggregate" in the absence of the attribute. Existing drivers do not need to pay attention to this enum which was added to all driver-facing structures, just the ones which report the MAC merge layer as supported. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/ethtool.h | 9 +++++++ include/uapi/linux/ethtool.h | 18 ++++++++++++++ include/uapi/linux/ethtool_netlink.h | 3 +++ net/ethtool/common.h | 2 ++ net/ethtool/mm.c | 16 ++++++++++++ net/ethtool/netlink.h | 4 +-- net/ethtool/pause.c | 48 ++++++++++++++++++++++++++++++++++++ net/ethtool/stats.c | 32 +++++++++++++++++++++++- 8 files changed, 129 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 37eba38da502..0ccba6612190 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -311,6 +311,7 @@ static inline void ethtool_stats_init(u64 *stats, unsigned int n) * via a more targeted API. */ struct ethtool_eth_mac_stats { + enum ethtool_mac_stats_src src; u64 FramesTransmittedOK; u64 SingleCollisionFrames; u64 MultipleCollisionFrames; @@ -339,6 +340,7 @@ struct ethtool_eth_mac_stats { * via a more targeted API. */ struct ethtool_eth_phy_stats { + enum ethtool_mac_stats_src src; u64 SymbolErrorDuringCarrier; }; @@ -346,6 +348,7 @@ struct ethtool_eth_phy_stats { * via a more targeted API. */ struct ethtool_eth_ctrl_stats { + enum ethtool_mac_stats_src src; u64 MACControlFramesTransmitted; u64 MACControlFramesReceived; u64 UnsupportedOpcodesReceived; @@ -353,6 +356,8 @@ struct ethtool_eth_ctrl_stats { /** * struct ethtool_pause_stats - statistics for IEEE 802.3x pause frames + * @src: input field denoting whether stats should be queried from the eMAC or + * pMAC (if the MM layer is supported). To be ignored otherwise. * @tx_pause_frames: transmitted pause frame count. Reported to user space * as %ETHTOOL_A_PAUSE_STAT_TX_FRAMES. * @@ -366,6 +371,7 @@ struct ethtool_eth_ctrl_stats { * from the standard. */ struct ethtool_pause_stats { + enum ethtool_mac_stats_src src; u64 tx_pause_frames; u64 rx_pause_frames; }; @@ -417,6 +423,8 @@ struct ethtool_rmon_hist_range { /** * struct ethtool_rmon_stats - selected RMON (RFC 2819) statistics + * @src: input field denoting whether stats should be queried from the eMAC or + * pMAC (if the MM layer is supported). To be ignored otherwise. * @undersize_pkts: Equivalent to `etherStatsUndersizePkts` from the RFC. * @oversize_pkts: Equivalent to `etherStatsOversizePkts` from the RFC. * @fragments: Equivalent to `etherStatsFragments` from the RFC. @@ -432,6 +440,7 @@ struct ethtool_rmon_hist_range { * ranges is left to the driver. */ struct ethtool_rmon_stats { + enum ethtool_mac_stats_src src; u64 undersize_pkts; u64 oversize_pkts; u64 fragments; diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 529a93696ab6..f7fba0dc87e5 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -711,6 +711,24 @@ enum ethtool_stringset { ETH_SS_COUNT }; +/** + * enum ethtool_mac_stats_src - source of ethtool MAC statistics + * @ETHTOOL_MAC_STATS_SRC_AGGREGATE: + * if device supports a MAC merge layer, this retrieves the aggregate + * statistics of the eMAC and pMAC. Otherwise, it retrieves just the + * statistics of the single (express) MAC. + * @ETHTOOL_MAC_STATS_SRC_EMAC: + * if device supports a MM layer, this retrieves the eMAC statistics. + * Otherwise, it retrieves the statistics of the single (express) MAC. + * @ETHTOOL_MAC_STATS_SRC_PMAC: + * if device supports a MM layer, this retrieves the pMAC statistics. + */ +enum ethtool_mac_stats_src { + ETHTOOL_MAC_STATS_SRC_AGGREGATE, + ETHTOOL_MAC_STATS_SRC_EMAC, + ETHTOOL_MAC_STATS_SRC_PMAC, +}; + /** * enum ethtool_module_power_mode_policy - plug-in module power mode policy * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode. diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 58af390823b0..ffb073c0dbb4 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -428,6 +428,7 @@ enum { ETHTOOL_A_PAUSE_RX, /* u8 */ ETHTOOL_A_PAUSE_TX, /* u8 */ ETHTOOL_A_PAUSE_STATS, /* nest - _PAUSE_STAT_* */ + ETHTOOL_A_PAUSE_STATS_SRC, /* u32 */ /* add new constants above here */ __ETHTOOL_A_PAUSE_CNT, @@ -744,6 +745,8 @@ enum { ETHTOOL_A_STATS_GRP, /* nest - _A_STATS_GRP_* */ + ETHTOOL_A_STATS_SRC, /* u32 */ + /* add new constants above here */ __ETHTOOL_A_STATS_CNT, ETHTOOL_A_STATS_MAX = (__ETHTOOL_A_STATS_CNT - 1) diff --git a/net/ethtool/common.h b/net/ethtool/common.h index b1b9db810eca..28b8aaaf9bcb 100644 --- a/net/ethtool/common.h +++ b/net/ethtool/common.h @@ -54,4 +54,6 @@ int ethtool_get_module_info_call(struct net_device *dev, int ethtool_get_module_eeprom_call(struct net_device *dev, struct ethtool_eeprom *ee, u8 *data); +bool __ethtool_dev_mm_supported(struct net_device *dev); + #endif /* _ETHTOOL_COMMON_H */ diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c index 76b209ad53d2..809d196665c6 100644 --- a/net/ethtool/mm.c +++ b/net/ethtool/mm.c @@ -253,3 +253,19 @@ out_dev_put: ethnl_parse_header_dev_put(&req_info); return ret; } + +/* Returns whether a given device supports the MAC merge layer + * (has an eMAC and a pMAC). Must be called under rtnl_lock() and + * ethnl_ops_begin(). + */ +bool __ethtool_dev_mm_supported(struct net_device *dev) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct ethtool_mm_state state = {}; + int ret = -EOPNOTSUPP; + + if (ops && ops->get_mm) + ret = ops->get_mm(dev, &state); + + return !!ret; +} diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index 657b0c80620c..deb057c4ea62 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -373,7 +373,7 @@ extern const struct nla_policy ethnl_channels_get_policy[ETHTOOL_A_CHANNELS_HEAD extern const struct nla_policy ethnl_channels_set_policy[ETHTOOL_A_CHANNELS_COMBINED_COUNT + 1]; extern const struct nla_policy ethnl_coalesce_get_policy[ETHTOOL_A_COALESCE_HEADER + 1]; extern const struct nla_policy ethnl_coalesce_set_policy[ETHTOOL_A_COALESCE_MAX + 1]; -extern const struct nla_policy ethnl_pause_get_policy[ETHTOOL_A_PAUSE_HEADER + 1]; +extern const struct nla_policy ethnl_pause_get_policy[ETHTOOL_A_PAUSE_STATS_SRC + 1]; extern const struct nla_policy ethnl_pause_set_policy[ETHTOOL_A_PAUSE_TX + 1]; extern const struct nla_policy ethnl_eee_get_policy[ETHTOOL_A_EEE_HEADER + 1]; extern const struct nla_policy ethnl_eee_set_policy[ETHTOOL_A_EEE_TX_LPI_TIMER + 1]; @@ -384,7 +384,7 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1]; extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1]; extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1]; -extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1]; +extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_SRC + 1]; extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1]; extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1]; extern const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1]; diff --git a/net/ethtool/pause.c b/net/ethtool/pause.c index a8c113d244db..e2be9e89c9d9 100644 --- a/net/ethtool/pause.c +++ b/net/ethtool/pause.c @@ -5,8 +5,12 @@ struct pause_req_info { struct ethnl_req_info base; + enum ethtool_mac_stats_src src; }; +#define PAUSE_REQINFO(__req_base) \ + container_of(__req_base, struct pause_req_info, base) + struct pause_reply_data { struct ethnl_reply_data base; struct ethtool_pauseparam pauseparam; @@ -19,13 +23,40 @@ struct pause_reply_data { const struct nla_policy ethnl_pause_get_policy[] = { [ETHTOOL_A_PAUSE_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy_stats), + [ETHTOOL_A_PAUSE_STATS_SRC] = + NLA_POLICY_MAX(NLA_U32, ETHTOOL_MAC_STATS_SRC_PMAC), }; +static int pause_parse_request(struct ethnl_req_info *req_base, + struct nlattr **tb, + struct netlink_ext_ack *extack) +{ + enum ethtool_mac_stats_src src = ETHTOOL_MAC_STATS_SRC_AGGREGATE; + struct pause_req_info *req_info = PAUSE_REQINFO(req_base); + + if (tb[ETHTOOL_A_PAUSE_STATS_SRC]) { + if (!(req_base->flags & ETHTOOL_FLAG_STATS)) { + NL_SET_ERR_MSG_MOD(extack, + "ETHTOOL_FLAG_STATS must be set when requesting a source of stats"); + return -EINVAL; + } + + src = nla_get_u32(tb[ETHTOOL_A_PAUSE_STATS_SRC]); + } + + req_info->src = src; + + return 0; +} + static int pause_prepare_data(const struct ethnl_req_info *req_base, struct ethnl_reply_data *reply_base, struct genl_info *info) { + const struct pause_req_info *req_info = PAUSE_REQINFO(req_base); struct pause_reply_data *data = PAUSE_REPDATA(reply_base); + enum ethtool_mac_stats_src src = req_info->src; + struct netlink_ext_ack *extack = info->extack; struct net_device *dev = reply_base->dev; int ret; @@ -34,14 +65,26 @@ static int pause_prepare_data(const struct ethnl_req_info *req_base, ethtool_stats_init((u64 *)&data->pausestat, sizeof(data->pausestat) / 8); + data->pausestat.src = src; ret = ethnl_ops_begin(dev); if (ret < 0) return ret; + + if ((src == ETHTOOL_MAC_STATS_SRC_EMAC || + src == ETHTOOL_MAC_STATS_SRC_PMAC) && + !__ethtool_dev_mm_supported(dev)) { + NL_SET_ERR_MSG_MOD(extack, + "Device does not support MAC merge layer"); + ethnl_ops_complete(dev); + return -EOPNOTSUPP; + } + dev->ethtool_ops->get_pauseparam(dev, &data->pauseparam); if (req_base->flags & ETHTOOL_FLAG_STATS && dev->ethtool_ops->get_pause_stats) dev->ethtool_ops->get_pause_stats(dev, &data->pausestat); + ethnl_ops_complete(dev); return 0; @@ -56,6 +99,7 @@ static int pause_reply_size(const struct ethnl_req_info *req_base, if (req_base->flags & ETHTOOL_FLAG_STATS) n += nla_total_size(0) + /* _PAUSE_STATS */ + nla_total_size(sizeof(u32)) + /* _PAUSE_STATS_SRC */ nla_total_size_64bit(sizeof(u64)) * ETHTOOL_PAUSE_STAT_CNT; return n; } @@ -77,6 +121,9 @@ static int pause_put_stats(struct sk_buff *skb, const u16 pad = ETHTOOL_A_PAUSE_STAT_PAD; struct nlattr *nest; + if (nla_put_u32(skb, ETHTOOL_A_PAUSE_STATS_SRC, pause_stats->src)) + return -EMSGSIZE; + nest = nla_nest_start(skb, ETHTOOL_A_PAUSE_STATS); if (!nest) return -EMSGSIZE; @@ -121,6 +168,7 @@ const struct ethnl_request_ops ethnl_pause_request_ops = { .req_info_size = sizeof(struct pause_req_info), .reply_data_size = sizeof(struct pause_reply_data), + .parse_request = pause_parse_request, .prepare_data = pause_prepare_data, .reply_size = pause_reply_size, .fill_reply = pause_fill_reply, diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c index a20e0a24ff61..26a70320e01d 100644 --- a/net/ethtool/stats.c +++ b/net/ethtool/stats.c @@ -7,6 +7,7 @@ struct stats_req_info { struct ethnl_req_info base; DECLARE_BITMAP(stat_mask, __ETHTOOL_STATS_CNT); + enum ethtool_mac_stats_src src; }; #define STATS_REQINFO(__req_base) \ @@ -75,16 +76,19 @@ const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN] = { [ETHTOOL_A_STATS_RMON_JABBER] = "etherStatsJabbers", }; -const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1] = { +const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_SRC + 1] = { [ETHTOOL_A_STATS_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), [ETHTOOL_A_STATS_GROUPS] = { .type = NLA_NESTED }, + [ETHTOOL_A_STATS_SRC] = + NLA_POLICY_MAX(NLA_U32, ETHTOOL_MAC_STATS_SRC_PMAC), }; static int stats_parse_request(struct ethnl_req_info *req_base, struct nlattr **tb, struct netlink_ext_ack *extack) { + enum ethtool_mac_stats_src src = ETHTOOL_MAC_STATS_SRC_AGGREGATE; struct stats_req_info *req_info = STATS_REQINFO(req_base); bool mod = false; int err; @@ -100,6 +104,11 @@ static int stats_parse_request(struct ethnl_req_info *req_base, return -EINVAL; } + if (tb[ETHTOOL_A_STATS_SRC]) + src = nla_get_u32(tb[ETHTOOL_A_STATS_SRC]); + + req_info->src = src; + return 0; } @@ -109,6 +118,8 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base, { const struct stats_req_info *req_info = STATS_REQINFO(req_base); struct stats_reply_data *data = STATS_REPDATA(reply_base); + enum ethtool_mac_stats_src src = req_info->src; + struct netlink_ext_ack *extack = info->extack; struct net_device *dev = reply_base->dev; int ret; @@ -116,11 +127,25 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base, if (ret < 0) return ret; + if ((src == ETHTOOL_MAC_STATS_SRC_EMAC || + src == ETHTOOL_MAC_STATS_SRC_PMAC) && + !__ethtool_dev_mm_supported(dev)) { + NL_SET_ERR_MSG_MOD(extack, + "Device does not support MAC merge layer"); + ethnl_ops_complete(dev); + return -EOPNOTSUPP; + } + /* Mark all stats as unset (see ETHTOOL_STAT_NOT_SET) to prevent them * from being reported to user space in case driver did not set them. */ memset(&data->stats, 0xff, sizeof(data->stats)); + data->phy_stats.src = src; + data->mac_stats.src = src; + data->ctrl_stats.src = src; + data->rmon_stats.src = src; + if (test_bit(ETHTOOL_STATS_ETH_PHY, req_info->stat_mask) && dev->ethtool_ops->get_eth_phy_stats) dev->ethtool_ops->get_eth_phy_stats(dev, &data->phy_stats); @@ -146,6 +171,8 @@ static int stats_reply_size(const struct ethnl_req_info *req_base, unsigned int n_grps = 0, n_stats = 0; int len = 0; + len += nla_total_size(sizeof(u32)); /* _STATS_SRC */ + if (test_bit(ETHTOOL_STATS_ETH_PHY, req_info->stat_mask)) { n_stats += sizeof(struct ethtool_eth_phy_stats) / sizeof(u64); n_grps++; @@ -379,6 +406,9 @@ static int stats_fill_reply(struct sk_buff *skb, const struct stats_reply_data *data = STATS_REPDATA(reply_base); int ret = 0; + if (nla_put_u32(skb, ETHTOOL_A_STATS_SRC, req_info->src)) + return -EMSGSIZE; + if (!ret && test_bit(ETHTOOL_STATS_ETH_PHY, req_info->stat_mask)) ret = stats_put_stats(skb, data, ETHTOOL_STATS_ETH_PHY, ETH_SS_STATS_ETH_PHY, -- cgit v1.2.3 From 449c5459641ad72a504884abb9fb9b19ee31397b Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 19 Jan 2023 14:26:58 +0200 Subject: net: ethtool: add helpers for aggregate statistics When a pMAC exists but the driver is unable to atomically query the aggregate eMAC+pMAC statistics, the user should be given back at least the sum of eMAC and pMAC counters queried separately. This is a generic problem, so add helpers in ethtool to do this operation, if the driver doesn't have a better way to report aggregate stats. Do this in a way that does not require changes to these functions when new stats are added (basically treat the structures as an array of u64 values, except for the first element which is the stats source). In include/linux/ethtool.h, there is already a section where helper function prototypes should be placed. The trouble is, this section is too early, before the definitions of struct ethtool_eth_mac_stats et.al. Move that section at the end and append these new helpers to it. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/ethtool.h | 100 +++++++++++++++++++++++--------------- net/ethtool/stats.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 187 insertions(+), 40 deletions(-) (limited to 'net') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 0ccba6612190..6746dee5a3fd 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -106,11 +106,6 @@ enum ethtool_supported_ring_param { struct net_device; struct netlink_ext_ack; -/* Some generic methods drivers may use in their ethtool_ops */ -u32 ethtool_op_get_link(struct net_device *dev); -int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *eti); - - /* Link extended state and substate. */ struct ethtool_link_ext_state_info { enum ethtool_link_ext_state link_ext_state; @@ -312,28 +307,30 @@ static inline void ethtool_stats_init(u64 *stats, unsigned int n) */ struct ethtool_eth_mac_stats { enum ethtool_mac_stats_src src; - u64 FramesTransmittedOK; - u64 SingleCollisionFrames; - u64 MultipleCollisionFrames; - u64 FramesReceivedOK; - u64 FrameCheckSequenceErrors; - u64 AlignmentErrors; - u64 OctetsTransmittedOK; - u64 FramesWithDeferredXmissions; - u64 LateCollisions; - u64 FramesAbortedDueToXSColls; - u64 FramesLostDueToIntMACXmitError; - u64 CarrierSenseErrors; - u64 OctetsReceivedOK; - u64 FramesLostDueToIntMACRcvError; - u64 MulticastFramesXmittedOK; - u64 BroadcastFramesXmittedOK; - u64 FramesWithExcessiveDeferral; - u64 MulticastFramesReceivedOK; - u64 BroadcastFramesReceivedOK; - u64 InRangeLengthErrors; - u64 OutOfRangeLengthField; - u64 FrameTooLongErrors; + struct_group(stats, + u64 FramesTransmittedOK; + u64 SingleCollisionFrames; + u64 MultipleCollisionFrames; + u64 FramesReceivedOK; + u64 FrameCheckSequenceErrors; + u64 AlignmentErrors; + u64 OctetsTransmittedOK; + u64 FramesWithDeferredXmissions; + u64 LateCollisions; + u64 FramesAbortedDueToXSColls; + u64 FramesLostDueToIntMACXmitError; + u64 CarrierSenseErrors; + u64 OctetsReceivedOK; + u64 FramesLostDueToIntMACRcvError; + u64 MulticastFramesXmittedOK; + u64 BroadcastFramesXmittedOK; + u64 FramesWithExcessiveDeferral; + u64 MulticastFramesReceivedOK; + u64 BroadcastFramesReceivedOK; + u64 InRangeLengthErrors; + u64 OutOfRangeLengthField; + u64 FrameTooLongErrors; + ); }; /* Basic IEEE 802.3 PHY statistics (30.3.2.1.*), not otherwise exposed @@ -341,7 +338,9 @@ struct ethtool_eth_mac_stats { */ struct ethtool_eth_phy_stats { enum ethtool_mac_stats_src src; - u64 SymbolErrorDuringCarrier; + struct_group(stats, + u64 SymbolErrorDuringCarrier; + ); }; /* Basic IEEE 802.3 MAC Ctrl statistics (30.3.3.*), not otherwise exposed @@ -349,9 +348,11 @@ struct ethtool_eth_phy_stats { */ struct ethtool_eth_ctrl_stats { enum ethtool_mac_stats_src src; - u64 MACControlFramesTransmitted; - u64 MACControlFramesReceived; - u64 UnsupportedOpcodesReceived; + struct_group(stats, + u64 MACControlFramesTransmitted; + u64 MACControlFramesReceived; + u64 UnsupportedOpcodesReceived; + ); }; /** @@ -372,8 +373,10 @@ struct ethtool_eth_ctrl_stats { */ struct ethtool_pause_stats { enum ethtool_mac_stats_src src; - u64 tx_pause_frames; - u64 rx_pause_frames; + struct_group(stats, + u64 tx_pause_frames; + u64 rx_pause_frames; + ); }; #define ETHTOOL_MAX_LANES 8 @@ -441,13 +444,15 @@ struct ethtool_rmon_hist_range { */ struct ethtool_rmon_stats { enum ethtool_mac_stats_src src; - u64 undersize_pkts; - u64 oversize_pkts; - u64 fragments; - u64 jabbers; - - u64 hist[ETHTOOL_RMON_HIST_MAX]; - u64 hist_tx[ETHTOOL_RMON_HIST_MAX]; + struct_group(stats, + u64 undersize_pkts; + u64 oversize_pkts; + u64 fragments; + u64 jabbers; + + u64 hist[ETHTOOL_RMON_HIST_MAX]; + u64 hist_tx[ETHTOOL_RMON_HIST_MAX]; + ); }; #define ETH_MODULE_EEPROM_PAGE_LEN 128 @@ -981,6 +986,21 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings, */ int ethtool_get_phc_vclocks(struct net_device *dev, int **vclock_index); +/* Some generic methods drivers may use in their ethtool_ops */ +u32 ethtool_op_get_link(struct net_device *dev); +int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *eti); + +void ethtool_aggregate_mac_stats(struct net_device *dev, + struct ethtool_eth_mac_stats *mac_stats); +void ethtool_aggregate_phy_stats(struct net_device *dev, + struct ethtool_eth_phy_stats *phy_stats); +void ethtool_aggregate_ctrl_stats(struct net_device *dev, + struct ethtool_eth_ctrl_stats *ctrl_stats); +void ethtool_aggregate_pause_stats(struct net_device *dev, + struct ethtool_pause_stats *pause_stats); +void ethtool_aggregate_rmon_stats(struct net_device *dev, + struct ethtool_rmon_stats *rmon_stats); + /** * ethtool_sprintf - Write formatted string to ethtool string data * @data: Pointer to start of string to update diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c index 26a70320e01d..7294be5855d4 100644 --- a/net/ethtool/stats.c +++ b/net/ethtool/stats.c @@ -440,3 +440,130 @@ const struct ethnl_request_ops ethnl_stats_request_ops = { .reply_size = stats_reply_size, .fill_reply = stats_fill_reply, }; + +static u64 ethtool_stats_sum(u64 a, u64 b) +{ + if (a == ETHTOOL_STAT_NOT_SET) + return b; + if (b == ETHTOOL_STAT_NOT_SET) + return a; + return a + b; +} + +/* Avoid modifying the aggregation procedure every time a new counter is added + * by treating the structures as an array of u64 statistics. + */ +static void ethtool_aggregate_stats(void *aggr_stats, const void *emac_stats, + const void *pmac_stats, size_t stats_size, + size_t stats_offset) +{ + size_t num_stats = stats_size / sizeof(u64); + const u64 *s1 = emac_stats + stats_offset; + const u64 *s2 = pmac_stats + stats_offset; + u64 *s = aggr_stats + stats_offset; + int i; + + for (i = 0; i < num_stats; i++) + s[i] = ethtool_stats_sum(s1[i], s2[i]); +} + +void ethtool_aggregate_mac_stats(struct net_device *dev, + struct ethtool_eth_mac_stats *mac_stats) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct ethtool_eth_mac_stats pmac, emac; + + memset(&emac, 0xff, sizeof(emac)); + memset(&pmac, 0xff, sizeof(pmac)); + emac.src = ETHTOOL_MAC_STATS_SRC_EMAC; + pmac.src = ETHTOOL_MAC_STATS_SRC_PMAC; + + ops->get_eth_mac_stats(dev, &emac); + ops->get_eth_mac_stats(dev, &pmac); + + ethtool_aggregate_stats(mac_stats, &emac, &pmac, + sizeof(mac_stats->stats), + offsetof(struct ethtool_eth_mac_stats, stats)); +} +EXPORT_SYMBOL(ethtool_aggregate_mac_stats); + +void ethtool_aggregate_phy_stats(struct net_device *dev, + struct ethtool_eth_phy_stats *phy_stats) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct ethtool_eth_phy_stats pmac, emac; + + memset(&emac, 0xff, sizeof(emac)); + memset(&pmac, 0xff, sizeof(pmac)); + emac.src = ETHTOOL_MAC_STATS_SRC_EMAC; + pmac.src = ETHTOOL_MAC_STATS_SRC_PMAC; + + ops->get_eth_phy_stats(dev, &emac); + ops->get_eth_phy_stats(dev, &pmac); + + ethtool_aggregate_stats(phy_stats, &emac, &pmac, + sizeof(phy_stats->stats), + offsetof(struct ethtool_eth_phy_stats, stats)); +} +EXPORT_SYMBOL(ethtool_aggregate_phy_stats); + +void ethtool_aggregate_ctrl_stats(struct net_device *dev, + struct ethtool_eth_ctrl_stats *ctrl_stats) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct ethtool_eth_ctrl_stats pmac, emac; + + memset(&emac, 0xff, sizeof(emac)); + memset(&pmac, 0xff, sizeof(pmac)); + emac.src = ETHTOOL_MAC_STATS_SRC_EMAC; + pmac.src = ETHTOOL_MAC_STATS_SRC_PMAC; + + ops->get_eth_ctrl_stats(dev, &emac); + ops->get_eth_ctrl_stats(dev, &pmac); + + ethtool_aggregate_stats(ctrl_stats, &emac, &pmac, + sizeof(ctrl_stats->stats), + offsetof(struct ethtool_eth_ctrl_stats, stats)); +} +EXPORT_SYMBOL(ethtool_aggregate_ctrl_stats); + +void ethtool_aggregate_pause_stats(struct net_device *dev, + struct ethtool_pause_stats *pause_stats) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + struct ethtool_pause_stats pmac, emac; + + memset(&emac, 0xff, sizeof(emac)); + memset(&pmac, 0xff, sizeof(pmac)); + emac.src = ETHTOOL_MAC_STATS_SRC_EMAC; + pmac.src = ETHTOOL_MAC_STATS_SRC_PMAC; + + ops->get_pause_stats(dev, &emac); + ops->get_pause_stats(dev, &pmac); + + ethtool_aggregate_stats(pause_stats, &emac, &pmac, + sizeof(pause_stats->stats), + offsetof(struct ethtool_pause_stats, stats)); +} +EXPORT_SYMBOL(ethtool_aggregate_pause_stats); + +void ethtool_aggregate_rmon_stats(struct net_device *dev, + struct ethtool_rmon_stats *rmon_stats) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + const struct ethtool_rmon_hist_range *dummy; + struct ethtool_rmon_stats pmac, emac; + + memset(&emac, 0xff, sizeof(emac)); + memset(&pmac, 0xff, sizeof(pmac)); + emac.src = ETHTOOL_MAC_STATS_SRC_EMAC; + pmac.src = ETHTOOL_MAC_STATS_SRC_PMAC; + + ops->get_rmon_stats(dev, &emac, &dummy); + ops->get_rmon_stats(dev, &pmac, &dummy); + + ethtool_aggregate_stats(rmon_stats, &emac, &pmac, + sizeof(rmon_stats->stats), + offsetof(struct ethtool_rmon_stats, stats)); +} +EXPORT_SYMBOL(ethtool_aggregate_rmon_stats); -- cgit v1.2.3 From 5f6c2d498ad97cf9f85b81c0fbb205abbcdfe3f8 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 19 Jan 2023 14:27:00 +0200 Subject: net: dsa: add plumbing for changing and getting MAC merge layer state The DSA core is in charge of the ethtool_ops of the net devices associated with switch ports, so in case a hardware driver supports the MAC merge layer, DSA must pass the callbacks through to the driver. Add support for precisely that. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/net/dsa.h | 11 +++++++++++ net/dsa/slave.c | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) (limited to 'net') diff --git a/include/net/dsa.h b/include/net/dsa.h index 96086289aa9b..a15f17a38eca 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -937,6 +937,17 @@ struct dsa_switch_ops { int (*get_ts_info)(struct dsa_switch *ds, int port, struct ethtool_ts_info *ts); + /* + * ethtool MAC merge layer + */ + int (*get_mm)(struct dsa_switch *ds, int port, + struct ethtool_mm_state *state); + int (*set_mm)(struct dsa_switch *ds, int port, + struct ethtool_mm_cfg *cfg, + struct netlink_ext_ack *extack); + void (*get_mm_stats)(struct dsa_switch *ds, int port, + struct ethtool_mm_stats *stats); + /* * DCB ops */ diff --git a/net/dsa/slave.c b/net/dsa/slave.c index aab79c355224..6014ac3aad34 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1117,6 +1117,40 @@ static void dsa_slave_net_selftest(struct net_device *ndev, net_selftest(ndev, etest, buf); } +static int dsa_slave_get_mm(struct net_device *dev, + struct ethtool_mm_state *state) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_switch *ds = dp->ds; + + if (!ds->ops->get_mm) + return -EOPNOTSUPP; + + return ds->ops->get_mm(ds, dp->index, state); +} + +static int dsa_slave_set_mm(struct net_device *dev, struct ethtool_mm_cfg *cfg, + struct netlink_ext_ack *extack) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_switch *ds = dp->ds; + + if (!ds->ops->set_mm) + return -EOPNOTSUPP; + + return ds->ops->set_mm(ds, dp->index, cfg, extack); +} + +static void dsa_slave_get_mm_stats(struct net_device *dev, + struct ethtool_mm_stats *stats) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_switch *ds = dp->ds; + + if (ds->ops->get_mm_stats) + ds->ops->get_mm_stats(ds, dp->index, stats); +} + static void dsa_slave_get_wol(struct net_device *dev, struct ethtool_wolinfo *w) { struct dsa_port *dp = dsa_slave_to_port(dev); @@ -2205,6 +2239,9 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = { .set_rxnfc = dsa_slave_set_rxnfc, .get_ts_info = dsa_slave_get_ts_info, .self_test = dsa_slave_net_selftest, + .get_mm = dsa_slave_get_mm, + .set_mm = dsa_slave_set_mm, + .get_mm_stats = dsa_slave_get_mm_stats, }; static const struct dcbnl_rtnl_ops __maybe_unused dsa_slave_dcbnl_ops = { -- cgit v1.2.3 From dc0b98a1758f0d2296349ca23ac88804b922e88d Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Mon, 23 Jan 2023 13:57:39 +0000 Subject: ethtool: Add and use ethnl_update_bool. Signed-off-by: David S. Miller --- net/ethtool/mm.c | 2 +- net/ethtool/netlink.h | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c index 809d196665c6..3e8acdb806fd 100644 --- a/net/ethtool/mm.c +++ b/net/ethtool/mm.c @@ -225,7 +225,7 @@ int ethnl_set_mm(struct sk_buff *skb, struct genl_info *info) ethnl_update_u32(&cfg.verify_time, tb[ETHTOOL_A_MM_VERIFY_TIME], &mod); ethnl_update_bool(&cfg.tx_enabled, tb[ETHTOOL_A_MM_TX_ENABLED], &mod); ethnl_update_bool(&cfg.pmac_enabled, tb[ETHTOOL_A_MM_PMAC_ENABLED], - &mod); + &mod); ethnl_update_u32(&cfg.tx_min_frag_size, tb[ETHTOOL_A_MM_TX_MIN_FRAG_SIZE], &mod); diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index deb057c4ea62..b01f7cd542c4 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -137,6 +137,32 @@ static inline void ethnl_update_bool32(u32 *dst, const struct nlattr *attr, *mod = true; } +/** + * ethnl_update_bool() - updateb bool used as bool from NLA_U8 attribute + * @dst: value to update + * @attr: netlink attribute with new value or null + * @mod: pointer to bool for modification tracking + * + * Use the bool value from NLA_U8 netlink attribute @attr to set bool variable + * pointed to by @dst to 0 (if zero) or 1 (if not); do nothing if @attr is + * null. Bool pointed to by @mod is set to true if this function changed the + * logical value of *dst, otherwise it is left as is. + */ +static inline void ethnl_update_bool(bool *dst, const struct nlattr *attr, + bool *mod) +{ + u8 val; + + if (!attr) + return; + val = !!nla_get_u8(attr); + if (!!*dst == val) + return; + + *dst = val; + *mod = true; +} + /** * ethnl_update_binary() - update binary data from NLA_BINARY attribute * @dst: value to update -- cgit v1.2.3 From 9d03ebc71a027ca495c60f6e94d3cda81921791f Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 19 Jan 2023 14:15:21 -0800 Subject: bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded BPF offloading infra will be reused to implement bound-but-not-offloaded bpf programs. Rename existing helpers for clarity. No functional changes. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Reviewed-by: Jakub Kicinski Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.com Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 8 ++++---- kernel/bpf/core.c | 4 ++-- kernel/bpf/offload.c | 4 ++-- kernel/bpf/syscall.c | 22 +++++++++++----------- kernel/bpf/verifier.c | 18 +++++++++--------- net/core/dev.c | 4 ++-- net/core/filter.c | 2 +- 7 files changed, 31 insertions(+), 31 deletions(-) (limited to 'net') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ae7771c7d750..1bb525c0130e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2481,12 +2481,12 @@ void unpriv_ebpf_notify(int new_state); #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); -static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux) +static inline bool bpf_prog_is_offloaded(const struct bpf_prog_aux *aux) { return aux->offload_requested; } -static inline bool bpf_map_is_dev_bound(struct bpf_map *map) +static inline bool bpf_map_is_offloaded(struct bpf_map *map) { return unlikely(map->ops == &bpf_map_offload_ops); } @@ -2513,12 +2513,12 @@ static inline int bpf_prog_offload_init(struct bpf_prog *prog, return -EOPNOTSUPP; } -static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux) +static inline bool bpf_prog_is_offloaded(struct bpf_prog_aux *aux) { return false; } -static inline bool bpf_map_is_dev_bound(struct bpf_map *map) +static inline bool bpf_map_is_offloaded(struct bpf_map *map) { return false; } diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba3fff17e2f9..515f4f08591c 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2182,7 +2182,7 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err) * valid program, which in this case would simply not * be JITed, but falls back to the interpreter. */ - if (!bpf_prog_is_dev_bound(fp->aux)) { + if (!bpf_prog_is_offloaded(fp->aux)) { *err = bpf_prog_alloc_jited_linfo(fp); if (*err) return fp; @@ -2553,7 +2553,7 @@ static void bpf_prog_free_deferred(struct work_struct *work) #endif bpf_free_used_maps(aux); bpf_free_used_btfs(aux); - if (bpf_prog_is_dev_bound(aux)) + if (bpf_prog_is_offloaded(aux)) bpf_prog_offload_destroy(aux->prog); #ifdef CONFIG_PERF_EVENTS if (aux->prog->has_callchain_buf) diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 13e4efc971e6..f5769a8ecbee 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -549,7 +549,7 @@ static bool __bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_offload_netdev *ondev1, *ondev2; struct bpf_prog_offload *offload; - if (!bpf_prog_is_dev_bound(prog->aux)) + if (!bpf_prog_is_offloaded(prog->aux)) return false; offload = prog->aux->offload; @@ -581,7 +581,7 @@ bool bpf_offload_prog_map_match(struct bpf_prog *prog, struct bpf_map *map) struct bpf_offloaded_map *offmap; bool ret; - if (!bpf_map_is_dev_bound(map)) + if (!bpf_map_is_offloaded(map)) return bpf_map_offload_neutral(map); offmap = map_to_offmap(map); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 35ffd808f281..5e90b697f908 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -181,7 +181,7 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, int err; /* Need to create a kthread, thus must support schedule */ - if (bpf_map_is_dev_bound(map)) { + if (bpf_map_is_offloaded(map)) { return bpf_map_offload_update_elem(map, key, value, flags); } else if (map->map_type == BPF_MAP_TYPE_CPUMAP || map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { @@ -238,7 +238,7 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value, void *ptr; int err; - if (bpf_map_is_dev_bound(map)) + if (bpf_map_is_offloaded(map)) return bpf_map_offload_lookup_elem(map, key, value); bpf_disable_instrumentation(); @@ -1483,7 +1483,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr) goto err_put; } - if (bpf_map_is_dev_bound(map)) { + if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_delete_elem(map, key); goto out; } else if (IS_FD_PROG_ARRAY(map) || @@ -1547,7 +1547,7 @@ static int map_get_next_key(union bpf_attr *attr) if (!next_key) goto free_key; - if (bpf_map_is_dev_bound(map)) { + if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_get_next_key(map, key, next_key); goto out; } @@ -1605,7 +1605,7 @@ int generic_map_delete_batch(struct bpf_map *map, map->key_size)) break; - if (bpf_map_is_dev_bound(map)) { + if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_delete_elem(map, key); break; } @@ -1851,7 +1851,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) map->map_type == BPF_MAP_TYPE_PERCPU_HASH || map->map_type == BPF_MAP_TYPE_LRU_HASH || map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) { - if (!bpf_map_is_dev_bound(map)) { + if (!bpf_map_is_offloaded(map)) { bpf_disable_instrumentation(); rcu_read_lock(); err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags); @@ -1944,7 +1944,7 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog) if (!ops) return -EINVAL; - if (!bpf_prog_is_dev_bound(prog->aux)) + if (!bpf_prog_is_offloaded(prog->aux)) prog->aux->ops = ops; else prog->aux->ops = &bpf_offload_prog_ops; @@ -2255,7 +2255,7 @@ bool bpf_prog_get_ok(struct bpf_prog *prog, if (prog->type != *attach_type) return false; - if (bpf_prog_is_dev_bound(prog->aux) && !attach_drv) + if (bpf_prog_is_offloaded(prog->aux) && !attach_drv) return false; return true; @@ -2598,7 +2598,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) atomic64_set(&prog->aux->refcnt, 1); prog->gpl_compatible = is_gpl ? 1 : 0; - if (bpf_prog_is_dev_bound(prog->aux)) { + if (bpf_prog_is_offloaded(prog->aux)) { err = bpf_prog_offload_init(prog, attr); if (err) goto free_prog_sec; @@ -3997,7 +3997,7 @@ static int bpf_prog_get_info_by_fd(struct file *file, return -EFAULT; } - if (bpf_prog_is_dev_bound(prog->aux)) { + if (bpf_prog_is_offloaded(prog->aux)) { err = bpf_prog_offload_info_fill(&info, prog); if (err) return err; @@ -4225,7 +4225,7 @@ static int bpf_map_get_info_by_fd(struct file *file, } info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; - if (bpf_map_is_dev_bound(map)) { + if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_info_fill(&info, map); if (err) return err; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ecf7fed7881c..bba68eefb4b2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -14099,7 +14099,7 @@ static int do_check(struct bpf_verifier_env *env) env->prev_log_len = env->log.len_used; } - if (bpf_prog_is_dev_bound(env->prog->aux)) { + if (bpf_prog_is_offloaded(env->prog->aux)) { err = bpf_prog_offload_verify_insn(env, env->insn_idx, env->prev_insn_idx); if (err) @@ -14579,7 +14579,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } } - if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) && + if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) && !bpf_offload_prog_map_match(prog, map)) { verbose(env, "offload device mismatch between prog and map\n"); return -EINVAL; @@ -15060,7 +15060,7 @@ static int verifier_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt) unsigned int orig_prog_len = env->prog->len; int err; - if (bpf_prog_is_dev_bound(env->prog->aux)) + if (bpf_prog_is_offloaded(env->prog->aux)) bpf_prog_offload_remove_insns(env, off, cnt); err = bpf_remove_insns(env->prog, off, cnt); @@ -15141,7 +15141,7 @@ static void opt_hard_wire_dead_code_branches(struct bpf_verifier_env *env) else continue; - if (bpf_prog_is_dev_bound(env->prog->aux)) + if (bpf_prog_is_offloaded(env->prog->aux)) bpf_prog_offload_replace_insn(env, i, &ja); memcpy(insn, &ja, sizeof(ja)); @@ -15328,7 +15328,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) } } - if (bpf_prog_is_dev_bound(env->prog->aux)) + if (bpf_prog_is_offloaded(env->prog->aux)) return 0; insn = env->prog->insnsi + delta; @@ -15728,7 +15728,7 @@ static int fixup_call_args(struct bpf_verifier_env *env) int err = 0; if (env->prog->jit_requested && - !bpf_prog_is_dev_bound(env->prog->aux)) { + !bpf_prog_is_offloaded(env->prog->aux)) { err = jit_subprogs(env); if (err == 0) return 0; @@ -17231,7 +17231,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) if (ret < 0) goto skip_full_check; - if (bpf_prog_is_dev_bound(env->prog->aux)) { + if (bpf_prog_is_offloaded(env->prog->aux)) { ret = bpf_prog_offload_verifier_prep(env->prog); if (ret) goto skip_full_check; @@ -17244,7 +17244,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) ret = do_check_subprogs(env); ret = ret ?: do_check_main(env); - if (ret == 0 && bpf_prog_is_dev_bound(env->prog->aux)) + if (ret == 0 && bpf_prog_is_offloaded(env->prog->aux)) ret = bpf_prog_offload_finalize(env); skip_full_check: @@ -17279,7 +17279,7 @@ skip_full_check: /* do 32-bit optimization after insn patching has done so those patched * insns could be handled correctly. */ - if (ret == 0 && !bpf_prog_is_dev_bound(env->prog->aux)) { + if (ret == 0 && !bpf_prog_is_offloaded(env->prog->aux)) { ret = opt_subreg_zext_lo32_rnd_hi32(env, attr); env->prog->aux->verifier_zext = bpf_jit_needs_zext() ? !ret : false; diff --git a/net/core/dev.c b/net/core/dev.c index cf78f35bc0b9..a37829de6529 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9224,8 +9224,8 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time"); return -EEXIST; } - if (!offload && bpf_prog_is_dev_bound(new_prog->aux)) { - NL_SET_ERR_MSG(extack, "Using device-bound program without HW_MODE flag is not supported"); + if (!offload && bpf_prog_is_offloaded(new_prog->aux)) { + NL_SET_ERR_MSG(extack, "Using offloaded program without HW_MODE flag is not supported"); return -EINVAL; } if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) { diff --git a/net/core/filter.c b/net/core/filter.c index b4547a2c02f4..ed08dbf10338 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -8760,7 +8760,7 @@ static bool xdp_is_valid_access(int off, int size, } if (type == BPF_WRITE) { - if (bpf_prog_is_dev_bound(prog->aux)) { + if (bpf_prog_is_offloaded(prog->aux)) { switch (off) { case offsetof(struct xdp_md, rx_queue_index): return __is_valid_xdp_access(off, size); -- cgit v1.2.3 From 2b3486bc2d237ec345b3942b7be5deabf8c8fed1 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 19 Jan 2023 14:15:24 -0800 Subject: bpf: Introduce device-bound XDP programs New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way to associate a netdev with a BPF program at load time. netdevsim checks are dropped in favor of generic check in dev_xdp_attach. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com Signed-off-by: Martin KaFai Lau --- drivers/net/netdevsim/bpf.c | 4 -- include/linux/bpf.h | 24 +++++++++-- include/uapi/linux/bpf.h | 5 +++ kernel/bpf/core.c | 4 +- kernel/bpf/offload.c | 95 +++++++++++++++++++++++++++++++----------- kernel/bpf/syscall.c | 9 ++-- net/core/dev.c | 5 +++ tools/include/uapi/linux/bpf.h | 5 +++ 8 files changed, 113 insertions(+), 38 deletions(-) (limited to 'net') diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c index 50854265864d..f60eb97e3a62 100644 --- a/drivers/net/netdevsim/bpf.c +++ b/drivers/net/netdevsim/bpf.c @@ -315,10 +315,6 @@ nsim_setup_prog_hw_checks(struct netdevsim *ns, struct netdev_bpf *bpf) NSIM_EA(bpf->extack, "xdpoffload of non-bound program"); return -EINVAL; } - if (!bpf_offload_dev_match(bpf->prog, ns->netdev)) { - NSIM_EA(bpf->extack, "program bound to different dev"); - return -EINVAL; - } state = bpf->prog->aux->offload->dev_priv; if (WARN_ON(strcmp(state->state, "xlated"))) { diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1bb525c0130e..b97a05bb47be 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1261,7 +1261,8 @@ struct bpf_prog_aux { enum bpf_prog_type saved_dst_prog_type; enum bpf_attach_type saved_dst_attach_type; bool verifier_zext; /* Zero extensions has been inserted by verifier. */ - bool offload_requested; + bool dev_bound; /* Program is bound to the netdev. */ + bool offload_requested; /* Program is bound and offloaded to the netdev. */ bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */ bool func_proto_unreliable; bool sleepable; @@ -2451,7 +2452,7 @@ void __bpf_free_used_maps(struct bpf_prog_aux *aux, bool bpf_prog_get_ok(struct bpf_prog *, enum bpf_prog_type *, bool); int bpf_prog_offload_compile(struct bpf_prog *prog); -void bpf_prog_offload_destroy(struct bpf_prog *prog); +void bpf_prog_dev_bound_destroy(struct bpf_prog *prog); int bpf_prog_offload_info_fill(struct bpf_prog_info *info, struct bpf_prog *prog); @@ -2479,7 +2480,13 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev); void unpriv_ebpf_notify(int new_state); #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) -int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); +int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr); +void bpf_dev_bound_netdev_unregister(struct net_device *dev); + +static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux) +{ + return aux->dev_bound; +} static inline bool bpf_prog_is_offloaded(const struct bpf_prog_aux *aux) { @@ -2507,12 +2514,21 @@ void sock_map_unhash(struct sock *sk); void sock_map_destroy(struct sock *sk); void sock_map_close(struct sock *sk, long timeout); #else -static inline int bpf_prog_offload_init(struct bpf_prog *prog, +static inline int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr) { return -EOPNOTSUPP; } +static inline void bpf_dev_bound_netdev_unregister(struct net_device *dev) +{ +} + +static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux) +{ + return false; +} + static inline bool bpf_prog_is_offloaded(struct bpf_prog_aux *aux) { return false; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index adae5b168f9d..ba0f0cfb5e42 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1156,6 +1156,11 @@ enum bpf_link_type { */ #define BPF_F_XDP_HAS_FRAGS (1U << 5) +/* If BPF_F_XDP_DEV_BOUND_ONLY is used in BPF_PROG_LOAD command, the loaded + * program becomes device-bound but can access XDP metadata. + */ +#define BPF_F_XDP_DEV_BOUND_ONLY (1U << 6) + /* link_create.kprobe_multi.flags used in LINK_CREATE command for * BPF_TRACE_KPROBE_MULTI attach type to create return probe. */ diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 515f4f08591c..1cf19da3c128 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2553,8 +2553,8 @@ static void bpf_prog_free_deferred(struct work_struct *work) #endif bpf_free_used_maps(aux); bpf_free_used_btfs(aux); - if (bpf_prog_is_offloaded(aux)) - bpf_prog_offload_destroy(aux->prog); + if (bpf_prog_is_dev_bound(aux)) + bpf_prog_dev_bound_destroy(aux->prog); #ifdef CONFIG_PERF_EVENTS if (aux->prog->has_callchain_buf) put_callchain_buffers(); diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index deb06498da0b..f767455ed732 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -41,7 +41,7 @@ struct bpf_offload_dev { struct bpf_offload_netdev { struct rhash_head l; struct net_device *netdev; - struct bpf_offload_dev *offdev; + struct bpf_offload_dev *offdev; /* NULL when bound-only */ struct list_head progs; struct list_head maps; struct list_head offdev_netdevs; @@ -89,19 +89,17 @@ static int __bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev, INIT_LIST_HEAD(&ondev->progs); INIT_LIST_HEAD(&ondev->maps); - down_write(&bpf_devs_lock); err = rhashtable_insert_fast(&offdevs, &ondev->l, offdevs_params); if (err) { netdev_warn(netdev, "failed to register for BPF offload\n"); - goto err_unlock_free; + goto err_free; } - list_add(&ondev->offdev_netdevs, &offdev->netdevs); - up_write(&bpf_devs_lock); + if (offdev) + list_add(&ondev->offdev_netdevs, &offdev->netdevs); return 0; -err_unlock_free: - up_write(&bpf_devs_lock); +err_free: kfree(ondev); return err; } @@ -149,24 +147,26 @@ static void __bpf_map_offload_destroy(struct bpf_offloaded_map *offmap) static void __bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev, struct net_device *netdev) { - struct bpf_offload_netdev *ondev, *altdev; + struct bpf_offload_netdev *ondev, *altdev = NULL; struct bpf_offloaded_map *offmap, *mtmp; struct bpf_prog_offload *offload, *ptmp; ASSERT_RTNL(); - down_write(&bpf_devs_lock); ondev = rhashtable_lookup_fast(&offdevs, &netdev, offdevs_params); if (WARN_ON(!ondev)) - goto unlock; + return; WARN_ON(rhashtable_remove_fast(&offdevs, &ondev->l, offdevs_params)); - list_del(&ondev->offdev_netdevs); /* Try to move the objects to another netdev of the device */ - altdev = list_first_entry_or_null(&offdev->netdevs, - struct bpf_offload_netdev, - offdev_netdevs); + if (offdev) { + list_del(&ondev->offdev_netdevs); + altdev = list_first_entry_or_null(&offdev->netdevs, + struct bpf_offload_netdev, + offdev_netdevs); + } + if (altdev) { list_for_each_entry(offload, &ondev->progs, offloads) offload->netdev = altdev->netdev; @@ -185,11 +185,9 @@ static void __bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev, WARN_ON(!list_empty(&ondev->progs)); WARN_ON(!list_empty(&ondev->maps)); kfree(ondev); -unlock: - up_write(&bpf_devs_lock); } -int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr) +int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr) { struct bpf_offload_netdev *ondev; struct bpf_prog_offload *offload; @@ -199,7 +197,11 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr) attr->prog_type != BPF_PROG_TYPE_XDP) return -EINVAL; - if (attr->prog_flags) + if (attr->prog_flags & ~BPF_F_XDP_DEV_BOUND_ONLY) + return -EINVAL; + + if (attr->prog_type == BPF_PROG_TYPE_SCHED_CLS && + attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY) return -EINVAL; offload = kzalloc(sizeof(*offload), GFP_USER); @@ -214,11 +216,23 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr) if (err) goto err_maybe_put; + prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_DEV_BOUND_ONLY); + down_write(&bpf_devs_lock); ondev = bpf_offload_find_netdev(offload->netdev); if (!ondev) { - err = -EINVAL; - goto err_unlock; + if (bpf_prog_is_offloaded(prog->aux)) { + err = -EINVAL; + goto err_unlock; + } + + /* When only binding to the device, explicitly + * create an entry in the hashtable. + */ + err = __bpf_offload_dev_netdev_register(NULL, offload->netdev); + if (err) + goto err_unlock; + ondev = bpf_offload_find_netdev(offload->netdev); } offload->offdev = ondev->offdev; prog->aux->offload = offload; @@ -321,12 +335,25 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt) up_read(&bpf_devs_lock); } -void bpf_prog_offload_destroy(struct bpf_prog *prog) +void bpf_prog_dev_bound_destroy(struct bpf_prog *prog) { + struct bpf_offload_netdev *ondev; + struct net_device *netdev; + + rtnl_lock(); down_write(&bpf_devs_lock); - if (prog->aux->offload) + if (prog->aux->offload) { + list_del_init(&prog->aux->offload->offloads); + + netdev = prog->aux->offload->netdev; __bpf_prog_offload_destroy(prog); + + ondev = bpf_offload_find_netdev(netdev); + if (!ondev->offdev && list_empty(&ondev->progs)) + __bpf_offload_dev_netdev_unregister(NULL, netdev); + } up_write(&bpf_devs_lock); + rtnl_unlock(); } static int bpf_prog_offload_translate(struct bpf_prog *prog) @@ -621,7 +648,7 @@ static bool __bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_offload_netdev *ondev1, *ondev2; struct bpf_prog_offload *offload; - if (!bpf_prog_is_offloaded(prog->aux)) + if (!bpf_prog_is_dev_bound(prog->aux)) return false; offload = prog->aux->offload; @@ -667,14 +694,21 @@ bool bpf_offload_prog_map_match(struct bpf_prog *prog, struct bpf_map *map) int bpf_offload_dev_netdev_register(struct bpf_offload_dev *offdev, struct net_device *netdev) { - return __bpf_offload_dev_netdev_register(offdev, netdev); + int err; + + down_write(&bpf_devs_lock); + err = __bpf_offload_dev_netdev_register(offdev, netdev); + up_write(&bpf_devs_lock); + return err; } EXPORT_SYMBOL_GPL(bpf_offload_dev_netdev_register); void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev, struct net_device *netdev) { + down_write(&bpf_devs_lock); __bpf_offload_dev_netdev_unregister(offdev, netdev); + up_write(&bpf_devs_lock); } EXPORT_SYMBOL_GPL(bpf_offload_dev_netdev_unregister); @@ -708,6 +742,19 @@ void *bpf_offload_dev_priv(struct bpf_offload_dev *offdev) } EXPORT_SYMBOL_GPL(bpf_offload_dev_priv); +void bpf_dev_bound_netdev_unregister(struct net_device *dev) +{ + struct bpf_offload_netdev *ondev; + + ASSERT_RTNL(); + + down_write(&bpf_devs_lock); + ondev = bpf_offload_find_netdev(dev); + if (ondev && !ondev->offdev) + __bpf_offload_dev_netdev_unregister(NULL, ondev->netdev); + up_write(&bpf_devs_lock); +} + static int __init bpf_offload_init(void) { return rhashtable_init(&offdevs, &offdevs_params); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5e90b697f908..fdf4ff3d5a7f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2491,7 +2491,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) BPF_F_TEST_STATE_FREQ | BPF_F_SLEEPABLE | BPF_F_TEST_RND_HI32 | - BPF_F_XDP_HAS_FRAGS)) + BPF_F_XDP_HAS_FRAGS | + BPF_F_XDP_DEV_BOUND_ONLY)) return -EINVAL; if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && @@ -2575,7 +2576,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) prog->aux->attach_btf = attach_btf; prog->aux->attach_btf_id = attr->attach_btf_id; prog->aux->dst_prog = dst_prog; - prog->aux->offload_requested = !!attr->prog_ifindex; + prog->aux->dev_bound = !!attr->prog_ifindex; prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE; prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS; @@ -2598,8 +2599,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) atomic64_set(&prog->aux->refcnt, 1); prog->gpl_compatible = is_gpl ? 1 : 0; - if (bpf_prog_is_offloaded(prog->aux)) { - err = bpf_prog_offload_init(prog, attr); + if (bpf_prog_is_dev_bound(prog->aux)) { + err = bpf_prog_dev_bound_init(prog, attr); if (err) goto free_prog_sec; } diff --git a/net/core/dev.c b/net/core/dev.c index a37829de6529..e66da626df84 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9228,6 +9228,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack NL_SET_ERR_MSG(extack, "Using offloaded program without HW_MODE flag is not supported"); return -EINVAL; } + if (bpf_prog_is_dev_bound(new_prog->aux) && !bpf_offload_dev_match(new_prog, dev)) { + NL_SET_ERR_MSG(extack, "Program bound to different device"); + return -EINVAL; + } if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) { NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device"); return -EINVAL; @@ -10830,6 +10834,7 @@ void unregister_netdevice_many_notify(struct list_head *head, dev_shutdown(dev); dev_xdp_uninstall(dev); + bpf_dev_bound_netdev_unregister(dev); netdev_offload_xstats_disable_all(dev); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 142b81bcbb2e..7f024ac22edd 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1156,6 +1156,11 @@ enum bpf_link_type { */ #define BPF_F_XDP_HAS_FRAGS (1U << 5) +/* If BPF_F_XDP_DEV_BOUND_ONLY is used in BPF_PROG_LOAD command, the loaded + * program becomes device-bound but can access XDP metadata. + */ +#define BPF_F_XDP_DEV_BOUND_ONLY (1U << 6) + /* link_create.kprobe_multi.flags used in LINK_CREATE command for * BPF_TRACE_KPROBE_MULTI attach type to create return probe. */ -- cgit v1.2.3 From 3d76a4d3d4e591af3e789698affaad88a5a8e8ab Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 19 Jan 2023 14:15:26 -0800 Subject: bpf: XDP metadata RX kfuncs Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not supported by the target device, the default implementation is called instead. The verifier, at load time, replaces a call to the generic kfunc with a call to the per-device one. Per-device kfunc pointers are stored in separate struct xdp_metadata_ops. Cc: John Fastabend Cc: David Ahern Cc: Martin KaFai Lau Cc: Jakub Kicinski Cc: Willem de Bruijn Cc: Jesper Dangaard Brouer Cc: Anatoly Burakov Cc: Alexander Lobakin Cc: Magnus Karlsson Cc: Maryam Tahhan Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20230119221536.3349901-8-sdf@google.com Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 17 ++++++++++++- include/linux/netdevice.h | 8 ++++++ include/net/xdp.h | 21 ++++++++++++++++ kernel/bpf/core.c | 8 ++++++ kernel/bpf/offload.c | 44 ++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 25 +++++++++++++++++- net/bpf/test_run.c | 3 +++ net/core/xdp.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++ 8 files changed, 188 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b97a05bb47be..bb26c2e18092 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2480,6 +2480,9 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev); void unpriv_ebpf_notify(int new_state); #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) +int bpf_dev_bound_kfunc_check(struct bpf_verifier_log *log, + struct bpf_prog_aux *prog_aux); +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id); int bpf_prog_dev_bound_init(struct bpf_prog *prog, union bpf_attr *attr); void bpf_dev_bound_netdev_unregister(struct net_device *dev); @@ -2514,8 +2517,20 @@ void sock_map_unhash(struct sock *sk); void sock_map_destroy(struct sock *sk); void sock_map_close(struct sock *sk, long timeout); #else +static inline int bpf_dev_bound_kfunc_check(struct bpf_verifier_log *log, + struct bpf_prog_aux *prog_aux) +{ + return -EOPNOTSUPP; +} + +static inline void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, + u32 func_id) +{ + return NULL; +} + static inline int bpf_prog_dev_bound_init(struct bpf_prog *prog, - union bpf_attr *attr) + union bpf_attr *attr) { return -EOPNOTSUPP; } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index aad12a179e54..90f2be194bc5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -74,6 +74,7 @@ struct udp_tunnel_nic_info; struct udp_tunnel_nic; struct bpf_prog; struct xdp_buff; +struct xdp_md; void synchronize_net(void); void netdev_set_default_ethtool_ops(struct net_device *dev, @@ -1618,6 +1619,11 @@ struct net_device_ops { bool cycles); }; +struct xdp_metadata_ops { + int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); + int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash); +}; + /** * enum netdev_priv_flags - &struct net_device priv_flags * @@ -1801,6 +1807,7 @@ enum netdev_ml_priv_type { * * @netdev_ops: Includes several pointers to callbacks, * if one wants to override the ndo_*() functions + * @xdp_metadata_ops: Includes pointers to XDP metadata callbacks. * @ethtool_ops: Management operations * @l3mdev_ops: Layer 3 master device operations * @ndisc_ops: Includes callbacks for different IPv6 neighbour @@ -2050,6 +2057,7 @@ struct net_device { unsigned int flags; unsigned long long priv_flags; const struct net_device_ops *netdev_ops; + const struct xdp_metadata_ops *xdp_metadata_ops; int ifindex; unsigned short gflags; unsigned short hard_header_len; diff --git a/include/net/xdp.h b/include/net/xdp.h index 55dbc68bfffc..91292aa13bc0 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -409,4 +409,25 @@ void xdp_attachment_setup(struct xdp_attachment_info *info, #define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE +#define XDP_METADATA_KFUNC_xxx \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_TIMESTAMP, \ + bpf_xdp_metadata_rx_timestamp) \ + XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_HASH, \ + bpf_xdp_metadata_rx_hash) \ + +enum { +#define XDP_METADATA_KFUNC(name, _) name, +XDP_METADATA_KFUNC_xxx +#undef XDP_METADATA_KFUNC +MAX_XDP_METADATA_KFUNC, +}; + +#ifdef CONFIG_NET +u32 bpf_xdp_metadata_kfunc_id(int id); +bool bpf_dev_bound_kfunc_id(u32 btf_id); +#else +static inline u32 bpf_xdp_metadata_kfunc_id(int id) { return 0; } +static inline bool bpf_dev_bound_kfunc_id(u32 btf_id) { return false; } +#endif + #endif /* __LINUX_NET_XDP_H__ */ diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 1cf19da3c128..16da51093aff 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2096,6 +2096,14 @@ bool bpf_prog_map_compatible(struct bpf_map *map, if (fp->kprobe_override) return false; + /* XDP programs inserted into maps are not guaranteed to run on + * a particular netdev (and can run outside driver context entirely + * in the case of devmap and cpumap). Until device checks + * are implemented, prohibit adding dev-bound programs to program maps. + */ + if (bpf_prog_is_dev_bound(fp->aux)) + return false; + spin_lock(&map->owner.lock); if (!map->owner.type) { /* There's no owner yet where we could check for diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index f767455ed732..3e173c694bbb 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -755,6 +755,50 @@ void bpf_dev_bound_netdev_unregister(struct net_device *dev) up_write(&bpf_devs_lock); } +int bpf_dev_bound_kfunc_check(struct bpf_verifier_log *log, + struct bpf_prog_aux *prog_aux) +{ + if (!bpf_prog_is_dev_bound(prog_aux)) { + bpf_log(log, "metadata kfuncs require device-bound program\n"); + return -EINVAL; + } + + if (bpf_prog_is_offloaded(prog_aux)) { + bpf_log(log, "metadata kfuncs can't be offloaded\n"); + return -EINVAL; + } + + return 0; +} + +void *bpf_dev_bound_resolve_kfunc(struct bpf_prog *prog, u32 func_id) +{ + const struct xdp_metadata_ops *ops; + void *p = NULL; + + /* We don't hold bpf_devs_lock while resolving several + * kfuncs and can race with the unregister_netdevice(). + * We rely on bpf_dev_bound_match() check at attach + * to render this program unusable. + */ + down_read(&bpf_devs_lock); + if (!prog->aux->offload) + goto out; + + ops = prog->aux->offload->netdev->xdp_metadata_ops; + if (!ops) + goto out; + + if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_TIMESTAMP)) + p = ops->xmo_rx_timestamp; + else if (func_id == bpf_xdp_metadata_kfunc_id(XDP_METADATA_KFUNC_RX_HASH)) + p = ops->xmo_rx_hash; +out: + up_read(&bpf_devs_lock); + + return p; +} + static int __init bpf_offload_init(void) { return rhashtable_init(&offdevs, &offdevs_params); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index bba68eefb4b2..9009395206f8 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2333,6 +2333,12 @@ static int add_kfunc_call(struct bpf_verifier_env *env, u32 func_id, s16 offset) return -EINVAL; } + if (bpf_dev_bound_kfunc_id(func_id)) { + err = bpf_dev_bound_kfunc_check(&env->log, prog_aux); + if (err) + return err; + } + desc = &tab->descs[tab->nr_descs++]; desc->func_id = func_id; desc->imm = call_imm; @@ -15772,12 +15778,25 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, struct bpf_insn *insn_buf, int insn_idx, int *cnt) { const struct bpf_kfunc_desc *desc; + void *xdp_kfunc; if (!insn->imm) { verbose(env, "invalid kernel function call not eliminated in verifier pass\n"); return -EINVAL; } + *cnt = 0; + + if (bpf_dev_bound_kfunc_id(insn->imm)) { + xdp_kfunc = bpf_dev_bound_resolve_kfunc(env->prog, insn->imm); + if (xdp_kfunc) { + insn->imm = BPF_CALL_IMM(xdp_kfunc); + return 0; + } + + /* fallback to default kfunc when not supported by netdev */ + } + /* insn->imm has the btf func_id. Replace it with * an address (relative to __bpf_call_base). */ @@ -15788,7 +15807,6 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, return -EFAULT; } - *cnt = 0; insn->imm = desc->imm; if (insn->off) return 0; @@ -16795,6 +16813,11 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, if (tgt_prog) { struct bpf_prog_aux *aux = tgt_prog->aux; + if (bpf_prog_is_dev_bound(tgt_prog->aux)) { + bpf_log(log, "Replacing device-bound programs not supported\n"); + return -EINVAL; + } + for (i = 0; i < aux->func_info_cnt; i++) if (aux->func_info[i].type_id == btf_id) { subprog = i; diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 2723623429ac..8da0d73b368e 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1300,6 +1300,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES) return -EINVAL; + if (bpf_prog_is_dev_bound(prog->aux)) + return -EINVAL; + if (do_live) { if (!batch_size) batch_size = NAPI_POLL_WEIGHT; diff --git a/net/core/xdp.c b/net/core/xdp.c index 844c9d99dc0e..a5a7ecf6391c 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -4,6 +4,7 @@ * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. */ #include +#include #include #include #include @@ -709,3 +710,66 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) return nxdpf; } + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +/** + * bpf_xdp_metadata_rx_timestamp - Read XDP frame RX timestamp. + * @ctx: XDP context pointer. + * @timestamp: Return value pointer. + * + * Returns 0 on success or ``-errno`` on error. + */ +int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) +{ + return -EOPNOTSUPP; +} + +/** + * bpf_xdp_metadata_rx_hash - Read XDP frame RX hash. + * @ctx: XDP context pointer. + * @hash: Return value pointer. + * + * Returns 0 on success or ``-errno`` on error. + */ +int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash) +{ + return -EOPNOTSUPP; +} + +__diag_pop(); + +BTF_SET8_START(xdp_metadata_kfunc_ids) +#define XDP_METADATA_KFUNC(_, name) BTF_ID_FLAGS(func, name, 0) +XDP_METADATA_KFUNC_xxx +#undef XDP_METADATA_KFUNC +BTF_SET8_END(xdp_metadata_kfunc_ids) + +static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = { + .owner = THIS_MODULE, + .set = &xdp_metadata_kfunc_ids, +}; + +BTF_ID_LIST(xdp_metadata_kfunc_ids_unsorted) +#define XDP_METADATA_KFUNC(name, str) BTF_ID(func, str) +XDP_METADATA_KFUNC_xxx +#undef XDP_METADATA_KFUNC + +u32 bpf_xdp_metadata_kfunc_id(int id) +{ + /* xdp_metadata_kfunc_ids is sorted and can't be used */ + return xdp_metadata_kfunc_ids_unsorted[id]; +} + +bool bpf_dev_bound_kfunc_id(u32 btf_id) +{ + return btf_id_set8_contains(&xdp_metadata_kfunc_ids, btf_id); +} + +static int __init xdp_metadata_init(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); +} +late_initcall(xdp_metadata_init); -- cgit v1.2.3 From f72ff8b81ebc6a0a25e41b7e6c1dc42e3aa33e7e Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Fri, 20 Jan 2023 11:34:44 +0100 Subject: net: fix kfree_skb_list use of skb_mark_not_on_list A bug was introduced by commit eedade12f4cb ("net: kfree_skb_list use kmem_cache_free_bulk"). It unconditionally unlinked the SKB list via invoking skb_mark_not_on_list(). In this patch we choose to remove the skb_mark_not_on_list() call as it isn't necessary. It would be possible and correct to call skb_mark_not_on_list() only when __kfree_skb_reason() returns true, meaning the SKB is ready to be free'ed, as it calls/check skb_unref(). This fix is needed as kfree_skb_list() is also invoked on skb_shared_info frag_list (skb_drop_fraglist() calling kfree_skb_list()). A frag_list can have SKBs with elevated refcnt due to cloning via skb_clone_fraglist(), which takes a reference on all SKBs in the list. This implies the invariant that all SKBs in the list must have the same refcnt, when using kfree_skb_list(). Reported-by: syzbot+c8a2e66e37eee553c4fd@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+c8a2e66e37eee553c4fd@syzkaller.appspotmail.com Fixes: eedade12f4cb ("net: kfree_skb_list use kmem_cache_free_bulk") Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/167421088417.1125894.9761158218878962159.stgit@firesoul Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4e73ab3482b8..180df58e85c7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -999,8 +999,6 @@ kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) while (segs) { struct sk_buff *next = segs->next; - skb_mark_not_on_list(segs); - if (__kfree_skb_reason(segs, reason)) kfree_skb_add_bulk(segs, &sa, reason); -- cgit v1.2.3 From 3176eb82681ec9c8af31c6588ddedcc6cfb9e445 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Fri, 20 Jan 2023 13:07:43 +0100 Subject: net: avoid irqsave in skb_defer_free_flush The spin_lock irqsave/restore API variant in skb_defer_free_flush can be replaced with the faster spin_lock irq variant, which doesn't need to read and restore the CPU flags. Using the unconditional irq "disable/enable" API variant is safe, because the skb_defer_free_flush() function is only called during NAPI-RX processing in net_rx_action(), where it is known the IRQs are enabled. Expected gain is 14 cycles from avoiding reading and restoring CPU flags in a spin_lock_irqsave/restore operation, measured via a microbencmark kernel module[1] on CPU E5-1650 v4 @ 3.60GHz. Microbenchmark overhead of spin_lock+unlock: - spin_lock_unlock_irq cost: 34 cycles(tsc) 9.486 ns - spin_lock_unlock_irqsave cost: 48 cycles(tsc) 13.567 ns We don't expect to see a measurable packet performance gain, as skb_defer_free_flush() is called infrequently once per NIC device NAPI bulk cycle and conditionally only if SKBs have been deferred by other CPUs via skb_attempt_defer_free(). [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c Reviewed-by: Jacob Keller Signed-off-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/167421646327.1321776.7390743166998776914.stgit@firesoul Signed-off-by: Jakub Kicinski --- net/core/dev.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/core/dev.c b/net/core/dev.c index cf78f35bc0b9..9c60190fe352 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6616,17 +6616,16 @@ static int napi_threaded_poll(void *data) static void skb_defer_free_flush(struct softnet_data *sd) { struct sk_buff *skb, *next; - unsigned long flags; /* Paired with WRITE_ONCE() in skb_attempt_defer_free() */ if (!READ_ONCE(sd->defer_list)) return; - spin_lock_irqsave(&sd->defer_lock, flags); + spin_lock_irq(&sd->defer_lock); skb = sd->defer_list; sd->defer_list = NULL; sd->defer_count = 0; - spin_unlock_irqrestore(&sd->defer_lock, flags); + spin_unlock_irq(&sd->defer_lock); while (skb != NULL) { next = skb->next; -- cgit v1.2.3 From e30f33a5f5c74f278feaa57517d851874dfc640f Mon Sep 17 00:00:00 2001 From: Arun Ramadoss Date: Fri, 20 Jan 2023 10:51:34 +0530 Subject: net: dsa: microchip: enable port queues for tc mqprio LAN937x family of switches has 8 queues per port where the KSZ switches has 4 queues per port. By default, only one queue per port is enabled. The queues are configurable in 2, 4 or 8. This patch add 8 number of queues for LAN937x and 4 for other switches. In the tag_ksz.c file, prioirty of the packet is queried using the skb buffer and the corresponding value is updated in the tag. Signed-off-by: Arun Ramadoss Signed-off-by: Jakub Kicinski --- drivers/net/dsa/microchip/ksz9477.c | 18 ++++++++++++++++++ drivers/net/dsa/microchip/ksz9477.h | 1 + drivers/net/dsa/microchip/ksz9477_reg.h | 6 +++++- drivers/net/dsa/microchip/ksz_common.c | 18 ++++++++++++++++++ drivers/net/dsa/microchip/ksz_common.h | 1 + drivers/net/dsa/microchip/lan937x_main.c | 4 ++++ net/dsa/tag_ksz.c | 15 +++++++++++++++ 7 files changed, 62 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c index 6178a96e389f..1b4bb2fefe2a 100644 --- a/drivers/net/dsa/microchip/ksz9477.c +++ b/drivers/net/dsa/microchip/ksz9477.c @@ -980,6 +980,22 @@ int ksz9477_set_ageing_time(struct ksz_device *dev, unsigned int msecs) return ksz_write8(dev, REG_SW_LUE_CTRL_0, value); } +void ksz9477_port_queue_split(struct ksz_device *dev, int port) +{ + u8 data; + + if (dev->info->num_tx_queues == 8) + data = PORT_EIGHT_QUEUE; + else if (dev->info->num_tx_queues == 4) + data = PORT_FOUR_QUEUE; + else if (dev->info->num_tx_queues == 2) + data = PORT_TWO_QUEUE; + else + data = PORT_SINGLE_QUEUE; + + ksz_prmw8(dev, port, REG_PORT_CTRL_0, PORT_QUEUE_SPLIT_MASK, data); +} + void ksz9477_port_setup(struct ksz_device *dev, int port, bool cpu_port) { struct dsa_switch *ds = dev->ds; @@ -991,6 +1007,8 @@ void ksz9477_port_setup(struct ksz_device *dev, int port, bool cpu_port) ksz_port_cfg(dev, port, REG_PORT_CTRL_0, PORT_TAIL_TAG_ENABLE, true); + ksz9477_port_queue_split(dev, port); + ksz_port_cfg(dev, port, REG_PORT_CTRL_0, PORT_MAC_LOOPBACK, false); /* set back pressure */ diff --git a/drivers/net/dsa/microchip/ksz9477.h b/drivers/net/dsa/microchip/ksz9477.h index 7c5bb3032772..2554cb63d326 100644 --- a/drivers/net/dsa/microchip/ksz9477.h +++ b/drivers/net/dsa/microchip/ksz9477.h @@ -56,5 +56,6 @@ int ksz9477_reset_switch(struct ksz_device *dev); int ksz9477_dsa_init(struct ksz_device *dev); int ksz9477_switch_init(struct ksz_device *dev); void ksz9477_switch_exit(struct ksz_device *dev); +void ksz9477_port_queue_split(struct ksz_device *dev, int port); #endif diff --git a/drivers/net/dsa/microchip/ksz9477_reg.h b/drivers/net/dsa/microchip/ksz9477_reg.h index cc457fa64939..b433a529cfec 100644 --- a/drivers/net/dsa/microchip/ksz9477_reg.h +++ b/drivers/net/dsa/microchip/ksz9477_reg.h @@ -850,7 +850,11 @@ #define PORT_FORCE_TX_FLOW_CTRL BIT(4) #define PORT_FORCE_RX_FLOW_CTRL BIT(3) #define PORT_TAIL_TAG_ENABLE BIT(2) -#define PORT_QUEUE_SPLIT_ENABLE 0x3 +#define PORT_QUEUE_SPLIT_MASK GENMASK(1, 0) +#define PORT_EIGHT_QUEUE 0x3 +#define PORT_FOUR_QUEUE 0x2 +#define PORT_TWO_QUEUE 0x1 +#define PORT_SINGLE_QUEUE 0x0 #define REG_PORT_CTRL_1 0x0021 diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 28d26e80e256..9d4dcbe2949b 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -1080,6 +1080,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x07, /* can be configured as cpu port */ .port_cnt = 3, /* total port count */ .port_nirqs = 3, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1106,6 +1107,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_statics = 8, .cpu_ports = 0x10, /* can be configured as cpu port */ .port_cnt = 5, /* total cpu and user ports */ + .num_tx_queues = 4, .ops = &ksz8_dev_ops, .ksz87xx_eee_link_erratum = true, .mib_names = ksz9477_mib_names, @@ -1144,6 +1146,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_statics = 8, .cpu_ports = 0x10, /* can be configured as cpu port */ .port_cnt = 5, /* total cpu and user ports */ + .num_tx_queues = 4, .ops = &ksz8_dev_ops, .ksz87xx_eee_link_erratum = true, .mib_names = ksz9477_mib_names, @@ -1168,6 +1171,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_statics = 8, .cpu_ports = 0x10, /* can be configured as cpu port */ .port_cnt = 5, /* total cpu and user ports */ + .num_tx_queues = 4, .ops = &ksz8_dev_ops, .ksz87xx_eee_link_erratum = true, .mib_names = ksz9477_mib_names, @@ -1192,6 +1196,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .num_statics = 8, .cpu_ports = 0x4, /* can be configured as cpu port */ .port_cnt = 3, + .num_tx_queues = 4, .ops = &ksz8_dev_ops, .mib_names = ksz88xx_mib_names, .mib_cnt = ARRAY_SIZE(ksz88xx_mib_names), @@ -1213,6 +1218,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x7F, /* can be configured as cpu port */ .port_cnt = 7, /* total physical port count */ .port_nirqs = 4, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, @@ -1245,6 +1251,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x3F, /* can be configured as cpu port */ .port_cnt = 6, /* total physical port count */ .port_nirqs = 2, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, @@ -1277,6 +1284,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x7F, /* can be configured as cpu port */ .port_cnt = 7, /* total physical port count */ .port_nirqs = 2, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, @@ -1307,6 +1315,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x07, /* can be configured as cpu port */ .port_cnt = 3, /* total port count */ .port_nirqs = 2, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1332,6 +1341,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x07, /* can be configured as cpu port */ .port_cnt = 3, /* total port count */ .port_nirqs = 3, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1357,6 +1367,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x7F, /* can be configured as cpu port */ .port_cnt = 7, /* total physical port count */ .port_nirqs = 3, + .num_tx_queues = 4, .ops = &ksz9477_dev_ops, .phy_errata_9477 = true, .mib_names = ksz9477_mib_names, @@ -1387,6 +1398,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x10, /* can be configured as cpu port */ .port_cnt = 5, /* total physical port count */ .port_nirqs = 6, + .num_tx_queues = 8, .ops = &lan937x_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1411,6 +1423,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x30, /* can be configured as cpu port */ .port_cnt = 6, /* total physical port count */ .port_nirqs = 6, + .num_tx_queues = 8, .ops = &lan937x_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1435,6 +1448,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x30, /* can be configured as cpu port */ .port_cnt = 8, /* total physical port count */ .port_nirqs = 6, + .num_tx_queues = 8, .ops = &lan937x_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1463,6 +1477,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x38, /* can be configured as cpu port */ .port_cnt = 5, /* total physical port count */ .port_nirqs = 6, + .num_tx_queues = 8, .ops = &lan937x_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -1491,6 +1506,7 @@ const struct ksz_chip_data ksz_switch_chips[] = { .cpu_ports = 0x30, /* can be configured as cpu port */ .port_cnt = 8, /* total physical port count */ .port_nirqs = 6, + .num_tx_queues = 8, .ops = &lan937x_dev_ops, .mib_names = ksz9477_mib_names, .mib_cnt = ARRAY_SIZE(ksz9477_mib_names), @@ -2065,6 +2081,8 @@ static int ksz_setup(struct dsa_switch *ds) dev->dev_ops->enable_stp_addr(dev); + ds->num_tx_queues = dev->info->num_tx_queues; + regmap_update_bits(dev->regmap[0], regs[S_MULTICAST_CTRL], MULTICAST_STORM_DISABLE, MULTICAST_STORM_DISABLE); diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index 7260528e5c57..1a00143b0345 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -49,6 +49,7 @@ struct ksz_chip_data { int cpu_ports; int port_cnt; u8 port_nirqs; + u8 num_tx_queues; const struct ksz_dev_ops *ops; bool phy_errata_9477; bool ksz87xx_eee_link_erratum; diff --git a/drivers/net/dsa/microchip/lan937x_main.c b/drivers/net/dsa/microchip/lan937x_main.c index 06d3d0308cba..923388f87996 100644 --- a/drivers/net/dsa/microchip/lan937x_main.c +++ b/drivers/net/dsa/microchip/lan937x_main.c @@ -15,6 +15,7 @@ #include "lan937x_reg.h" #include "ksz_common.h" +#include "ksz9477.h" #include "lan937x.h" static int lan937x_cfg(struct ksz_device *dev, u32 addr, u8 bits, bool set) @@ -180,6 +181,9 @@ void lan937x_port_setup(struct ksz_device *dev, int port, bool cpu_port) lan937x_port_cfg(dev, port, REG_PORT_CTRL_0, PORT_TAIL_TAG_ENABLE, true); + /* Enable the Port Queue split */ + ksz9477_port_queue_split(dev, port); + /* set back pressure for half duplex */ lan937x_port_cfg(dev, port, REG_PORT_MAC_CTRL_1, PORT_BACK_PRESSURE, true); diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index 694478fe07d6..0eb1c7784c3d 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -178,6 +178,7 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ8795, KSZ8795_NAME); #define KSZ9477_PTP_TAG_LEN 4 #define KSZ9477_PTP_TAG_INDICATION 0x80 +#define KSZ9477_TAIL_TAG_PRIO GENMASK(8, 7) #define KSZ9477_TAIL_TAG_OVERRIDE BIT(9) #define KSZ9477_TAIL_TAG_LOOKUP BIT(10) @@ -269,6 +270,8 @@ static struct sk_buff *ksz_defer_xmit(struct dsa_port *dp, struct sk_buff *skb) static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, struct net_device *dev) { + u16 queue_mapping = skb_get_queue_mapping(skb); + u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); __be16 *tag; u8 *addr; @@ -285,6 +288,8 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, val = BIT(dp->index); + val |= FIELD_PREP(KSZ9477_TAIL_TAG_PRIO, prio); + if (is_link_local_ether_addr(addr)) val |= KSZ9477_TAIL_TAG_OVERRIDE; @@ -322,12 +327,15 @@ static const struct dsa_device_ops ksz9477_netdev_ops = { DSA_TAG_DRIVER(ksz9477_netdev_ops); MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ9477, KSZ9477_NAME); +#define KSZ9893_TAIL_TAG_PRIO GENMASK(4, 3) #define KSZ9893_TAIL_TAG_OVERRIDE BIT(5) #define KSZ9893_TAIL_TAG_LOOKUP BIT(6) static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, struct net_device *dev) { + u16 queue_mapping = skb_get_queue_mapping(skb); + u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); u8 *addr; u8 *tag; @@ -343,6 +351,8 @@ static struct sk_buff *ksz9893_xmit(struct sk_buff *skb, *tag = BIT(dp->index); + *tag |= FIELD_PREP(KSZ9893_TAIL_TAG_PRIO, prio); + if (is_link_local_ether_addr(addr)) *tag |= KSZ9893_TAIL_TAG_OVERRIDE; @@ -384,11 +394,14 @@ MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_KSZ9893, KSZ9893_NAME); #define LAN937X_TAIL_TAG_BLOCKING_OVERRIDE BIT(11) #define LAN937X_TAIL_TAG_LOOKUP BIT(12) #define LAN937X_TAIL_TAG_VALID BIT(13) +#define LAN937X_TAIL_TAG_PRIO GENMASK(10, 8) #define LAN937X_TAIL_TAG_PORT_MASK 7 static struct sk_buff *lan937x_xmit(struct sk_buff *skb, struct net_device *dev) { + u16 queue_mapping = skb_get_queue_mapping(skb); + u8 prio = netdev_txq_to_tc(dev, queue_mapping); struct dsa_port *dp = dsa_slave_to_port(dev); const struct ethhdr *hdr = eth_hdr(skb); __be16 *tag; @@ -403,6 +416,8 @@ static struct sk_buff *lan937x_xmit(struct sk_buff *skb, val = BIT(dp->index); + val |= FIELD_PREP(LAN937X_TAIL_TAG_PRIO, prio); + if (is_link_local_ether_addr(hdr->h_dest)) val |= LAN937X_TAIL_TAG_BLOCKING_OVERRIDE; -- cgit v1.2.3 From 78dcdffe0418ac8f3f057f26fe71ccf4d8ed851f Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Fri, 20 Jan 2023 18:01:39 +0100 Subject: net/sched: act_mirred: better wording on protection against excessive stack growth with commit e2ca070f89ec ("net: sched: protect against stack overflow in TC act_mirred"), act_mirred protected itself against excessive stack growth using per_cpu counter of nested calls to tcf_mirred_act(), and capping it to MIRRED_RECURSION_LIMIT. However, such protection does not detect recursion/loops in case the packet is enqueued to the backlog (for example, when the mirred target device has RPS or skb timestamping enabled). Change the wording from "recursion" to "nesting" to make it more clear to readers. CC: Jamal Hadi Salim Signed-off-by: Davide Caratti Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- net/sched/act_mirred.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 7284bcea7b0b..c8abb5136491 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -29,8 +29,8 @@ static LIST_HEAD(mirred_list); static DEFINE_SPINLOCK(mirred_list_lock); -#define MIRRED_RECURSION_LIMIT 4 -static DEFINE_PER_CPU(unsigned int, mirred_rec_level); +#define MIRRED_NEST_LIMIT 4 +static DEFINE_PER_CPU(unsigned int, mirred_nest_level); static bool tcf_mirred_is_act_redirect(int action) { @@ -226,7 +226,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, struct sk_buff *skb2 = skb; bool m_mac_header_xmit; struct net_device *dev; - unsigned int rec_level; + unsigned int nest_level; int retval, err = 0; bool use_reinsert; bool want_ingress; @@ -237,11 +237,11 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, int mac_len; bool at_nh; - rec_level = __this_cpu_inc_return(mirred_rec_level); - if (unlikely(rec_level > MIRRED_RECURSION_LIMIT)) { + nest_level = __this_cpu_inc_return(mirred_nest_level); + if (unlikely(nest_level > MIRRED_NEST_LIMIT)) { net_warn_ratelimited("Packet exceeded mirred recursion limit on dev %s\n", netdev_name(skb->dev)); - __this_cpu_dec(mirred_rec_level); + __this_cpu_dec(mirred_nest_level); return TC_ACT_SHOT; } @@ -310,7 +310,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, err = tcf_mirred_forward(want_ingress, skb); if (err) tcf_action_inc_overlimit_qstats(&m->common); - __this_cpu_dec(mirred_rec_level); + __this_cpu_dec(mirred_nest_level); return TC_ACT_CONSUMED; } } @@ -322,7 +322,7 @@ out: if (tcf_mirred_is_act_redirect(m_eaction)) retval = TC_ACT_SHOT; } - __this_cpu_dec(mirred_rec_level); + __this_cpu_dec(mirred_nest_level); return retval; } -- cgit v1.2.3 From ca22da2fbd693b54dc8e3b7b54ccc9f7e9ba3640 Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Fri, 20 Jan 2023 18:01:40 +0100 Subject: act_mirred: use the backlog for nested calls to mirred ingress William reports kernel soft-lockups on some OVS topologies when TC mirred egress->ingress action is hit by local TCP traffic [1]. The same can also be reproduced with SCTP (thanks Xin for verifying), when client and server reach themselves through mirred egress to ingress, and one of the two peers sends a "heartbeat" packet (from within a timer). Enqueueing to backlog proved to fix this soft lockup; however, as Cong noticed [2], we should preserve - when possible - the current mirred behavior that counts as "overlimits" any eventual packet drop subsequent to the mirred forwarding action [3]. A compromise solution might use the backlog only when tcf_mirred_act() has a nest level greater than one: change tcf_mirred_forward() accordingly. Also, add a kselftest that can reproduce the lockup and verifies TC mirred ability to account for further packet drops after TC mirred egress->ingress (when the nest level is 1). [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/ [3] such behavior is not guaranteed: for example, if RPS or skb RX timestamping is enabled on the mirred target device, the kernel can defer receiving the skb and return NET_RX_SUCCESS inside tcf_mirred_forward(). Reported-by: William Zhao CC: Xin Long Signed-off-by: Davide Caratti Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- net/sched/act_mirred.c | 7 ++++ .../testing/selftests/net/forwarding/tc_actions.sh | 49 +++++++++++++++++++++- 2 files changed, 55 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index c8abb5136491..8037ec9b1d31 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -206,12 +206,19 @@ release_idr: return err; } +static bool is_mirred_nested(void) +{ + return unlikely(__this_cpu_read(mirred_nest_level) > 1); +} + static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb) { int err; if (!want_ingress) err = tcf_dev_queue_xmit(skb, dev_queue_xmit); + else if (is_mirred_nested()) + err = netif_rx(skb); else err = netif_receive_skb(skb); diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh index 1e0a62f638fe..919c0dd9fe4b 100755 --- a/tools/testing/selftests/net/forwarding/tc_actions.sh +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh @@ -3,7 +3,8 @@ ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \ mirred_egress_mirror_test matchall_mirred_egress_mirror_test \ - gact_trap_test mirred_egress_to_ingress_test" + gact_trap_test mirred_egress_to_ingress_test \ + mirred_egress_to_ingress_tcp_test" NUM_NETIFS=4 source tc_common.sh source lib.sh @@ -198,6 +199,52 @@ mirred_egress_to_ingress_test() log_test "mirred_egress_to_ingress ($tcflags)" } +mirred_egress_to_ingress_tcp_test() +{ + local tmpfile=$(mktemp) tmpfile1=$(mktemp) + + RET=0 + dd conv=sparse status=none if=/dev/zero bs=1M count=2 of=$tmpfile + tc filter add dev $h1 protocol ip pref 100 handle 100 egress flower \ + $tcflags ip_proto tcp src_ip 192.0.2.1 dst_ip 192.0.2.2 \ + action ct commit nat src addr 192.0.2.2 pipe \ + action ct clear pipe \ + action ct commit nat dst addr 192.0.2.1 pipe \ + action ct clear pipe \ + action skbedit ptype host pipe \ + action mirred ingress redirect dev $h1 + tc filter add dev $h1 protocol ip pref 101 handle 101 egress flower \ + $tcflags ip_proto icmp \ + action mirred ingress redirect dev $h1 + tc filter add dev $h1 protocol ip pref 102 handle 102 ingress flower \ + ip_proto icmp \ + action drop + + ip vrf exec v$h1 nc --recv-only -w10 -l -p 12345 -o $tmpfile1 & + local rpid=$! + ip vrf exec v$h1 nc -w1 --send-only 192.0.2.2 12345 <$tmpfile + wait -n $rpid + cmp -s $tmpfile $tmpfile1 + check_err $? "server output check failed" + + $MZ $h1 -c 10 -p 64 -a $h1mac -b $h1mac -A 192.0.2.1 -B 192.0.2.1 \ + -t icmp "ping,id=42,seq=5" -q + tc_check_packets "dev $h1 egress" 101 10 + check_err $? "didn't mirred redirect ICMP" + tc_check_packets "dev $h1 ingress" 102 10 + check_err $? "didn't drop mirred ICMP" + local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits) + test ${overlimits} = 10 + check_err $? "wrong overlimits, expected 10 got ${overlimits}" + + tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower + tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower + tc filter del dev $h1 ingress protocol ip pref 102 handle 102 flower + + rm -f $tmpfile $tmpfile1 + log_test "mirred_egress_to_ingress_tcp ($tcflags)" +} + setup_prepare() { h1=${NETIFS[p1]} -- cgit v1.2.3 From 08d323234d10eab077cbf0093eeb5991478a261a Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 20 Jan 2023 09:50:39 -0800 Subject: net: fou: rename the source for linking We'll need to link two objects together to form the fou module. This means the source can't be called fou, the build system expects fou.o to be the combined object. Acked-by: Stanislav Fomichev Signed-off-by: Jakub Kicinski Signed-off-by: Paolo Abeni --- net/ipv4/Makefile | 1 + net/ipv4/fou.c | 1294 --------------------------------------------------- net/ipv4/fou_core.c | 1294 +++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1295 insertions(+), 1294 deletions(-) delete mode 100644 net/ipv4/fou.c create mode 100644 net/ipv4/fou_core.c (limited to 'net') diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index af7d2cf490fb..fabbe46897ce 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -26,6 +26,7 @@ obj-$(CONFIG_IP_MROUTE) += ipmr.o obj-$(CONFIG_IP_MROUTE_COMMON) += ipmr_base.o obj-$(CONFIG_NET_IPIP) += ipip.o gre-y := gre_demux.o +fou-y := fou_core.o obj-$(CONFIG_NET_FOU) += fou.o obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o obj-$(CONFIG_NET_IPGRE) += ip_gre.o diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c deleted file mode 100644 index 0c3c6d0cee29..000000000000 --- a/net/ipv4/fou.c +++ /dev/null @@ -1,1294 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -struct fou { - struct socket *sock; - u8 protocol; - u8 flags; - __be16 port; - u8 family; - u16 type; - struct list_head list; - struct rcu_head rcu; -}; - -#define FOU_F_REMCSUM_NOPARTIAL BIT(0) - -struct fou_cfg { - u16 type; - u8 protocol; - u8 flags; - struct udp_port_cfg udp_config; -}; - -static unsigned int fou_net_id; - -struct fou_net { - struct list_head fou_list; - struct mutex fou_lock; -}; - -static inline struct fou *fou_from_sock(struct sock *sk) -{ - return sk->sk_user_data; -} - -static int fou_recv_pull(struct sk_buff *skb, struct fou *fou, size_t len) -{ - /* Remove 'len' bytes from the packet (UDP header and - * FOU header if present). - */ - if (fou->family == AF_INET) - ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len); - else - ipv6_hdr(skb)->payload_len = - htons(ntohs(ipv6_hdr(skb)->payload_len) - len); - - __skb_pull(skb, len); - skb_postpull_rcsum(skb, udp_hdr(skb), len); - skb_reset_transport_header(skb); - return iptunnel_pull_offloads(skb); -} - -static int fou_udp_recv(struct sock *sk, struct sk_buff *skb) -{ - struct fou *fou = fou_from_sock(sk); - - if (!fou) - return 1; - - if (fou_recv_pull(skb, fou, sizeof(struct udphdr))) - goto drop; - - return -fou->protocol; - -drop: - kfree_skb(skb); - return 0; -} - -static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr, - void *data, size_t hdrlen, u8 ipproto, - bool nopartial) -{ - __be16 *pd = data; - size_t start = ntohs(pd[0]); - size_t offset = ntohs(pd[1]); - size_t plen = sizeof(struct udphdr) + hdrlen + - max_t(size_t, offset + sizeof(u16), start); - - if (skb->remcsum_offload) - return guehdr; - - if (!pskb_may_pull(skb, plen)) - return NULL; - guehdr = (struct guehdr *)&udp_hdr(skb)[1]; - - skb_remcsum_process(skb, (void *)guehdr + hdrlen, - start, offset, nopartial); - - return guehdr; -} - -static int gue_control_message(struct sk_buff *skb, struct guehdr *guehdr) -{ - /* No support yet */ - kfree_skb(skb); - return 0; -} - -static int gue_udp_recv(struct sock *sk, struct sk_buff *skb) -{ - struct fou *fou = fou_from_sock(sk); - size_t len, optlen, hdrlen; - struct guehdr *guehdr; - void *data; - u16 doffset = 0; - u8 proto_ctype; - - if (!fou) - return 1; - - len = sizeof(struct udphdr) + sizeof(struct guehdr); - if (!pskb_may_pull(skb, len)) - goto drop; - - guehdr = (struct guehdr *)&udp_hdr(skb)[1]; - - switch (guehdr->version) { - case 0: /* Full GUE header present */ - break; - - case 1: { - /* Direct encapsulation of IPv4 or IPv6 */ - - int prot; - - switch (((struct iphdr *)guehdr)->version) { - case 4: - prot = IPPROTO_IPIP; - break; - case 6: - prot = IPPROTO_IPV6; - break; - default: - goto drop; - } - - if (fou_recv_pull(skb, fou, sizeof(struct udphdr))) - goto drop; - - return -prot; - } - - default: /* Undefined version */ - goto drop; - } - - optlen = guehdr->hlen << 2; - len += optlen; - - if (!pskb_may_pull(skb, len)) - goto drop; - - /* guehdr may change after pull */ - guehdr = (struct guehdr *)&udp_hdr(skb)[1]; - - if (validate_gue_flags(guehdr, optlen)) - goto drop; - - hdrlen = sizeof(struct guehdr) + optlen; - - if (fou->family == AF_INET) - ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len); - else - ipv6_hdr(skb)->payload_len = - htons(ntohs(ipv6_hdr(skb)->payload_len) - len); - - /* Pull csum through the guehdr now . This can be used if - * there is a remote checksum offload. - */ - skb_postpull_rcsum(skb, udp_hdr(skb), len); - - data = &guehdr[1]; - - if (guehdr->flags & GUE_FLAG_PRIV) { - __be32 flags = *(__be32 *)(data + doffset); - - doffset += GUE_LEN_PRIV; - - if (flags & GUE_PFLAG_REMCSUM) { - guehdr = gue_remcsum(skb, guehdr, data + doffset, - hdrlen, guehdr->proto_ctype, - !!(fou->flags & - FOU_F_REMCSUM_NOPARTIAL)); - if (!guehdr) - goto drop; - - data = &guehdr[1]; - - doffset += GUE_PLEN_REMCSUM; - } - } - - if (unlikely(guehdr->control)) - return gue_control_message(skb, guehdr); - - proto_ctype = guehdr->proto_ctype; - __skb_pull(skb, sizeof(struct udphdr) + hdrlen); - skb_reset_transport_header(skb); - - if (iptunnel_pull_offloads(skb)) - goto drop; - - return -proto_ctype; - -drop: - kfree_skb(skb); - return 0; -} - -static struct sk_buff *fou_gro_receive(struct sock *sk, - struct list_head *head, - struct sk_buff *skb) -{ - const struct net_offload __rcu **offloads; - u8 proto = fou_from_sock(sk)->protocol; - const struct net_offload *ops; - struct sk_buff *pp = NULL; - - /* We can clear the encap_mark for FOU as we are essentially doing - * one of two possible things. We are either adding an L4 tunnel - * header to the outer L3 tunnel header, or we are simply - * treating the GRE tunnel header as though it is a UDP protocol - * specific header such as VXLAN or GENEVE. - */ - NAPI_GRO_CB(skb)->encap_mark = 0; - - /* Flag this frame as already having an outer encap header */ - NAPI_GRO_CB(skb)->is_fou = 1; - - offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; - ops = rcu_dereference(offloads[proto]); - if (!ops || !ops->callbacks.gro_receive) - goto out; - - pp = call_gro_receive(ops->callbacks.gro_receive, head, skb); - -out: - return pp; -} - -static int fou_gro_complete(struct sock *sk, struct sk_buff *skb, - int nhoff) -{ - const struct net_offload __rcu **offloads; - u8 proto = fou_from_sock(sk)->protocol; - const struct net_offload *ops; - int err = -ENOSYS; - - offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; - ops = rcu_dereference(offloads[proto]); - if (WARN_ON(!ops || !ops->callbacks.gro_complete)) - goto out; - - err = ops->callbacks.gro_complete(skb, nhoff); - - skb_set_inner_mac_header(skb, nhoff); - -out: - return err; -} - -static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off, - struct guehdr *guehdr, void *data, - size_t hdrlen, struct gro_remcsum *grc, - bool nopartial) -{ - __be16 *pd = data; - size_t start = ntohs(pd[0]); - size_t offset = ntohs(pd[1]); - - if (skb->remcsum_offload) - return guehdr; - - if (!NAPI_GRO_CB(skb)->csum_valid) - return NULL; - - guehdr = skb_gro_remcsum_process(skb, (void *)guehdr, off, hdrlen, - start, offset, grc, nopartial); - - skb->remcsum_offload = 1; - - return guehdr; -} - -static struct sk_buff *gue_gro_receive(struct sock *sk, - struct list_head *head, - struct sk_buff *skb) -{ - const struct net_offload __rcu **offloads; - const struct net_offload *ops; - struct sk_buff *pp = NULL; - struct sk_buff *p; - struct guehdr *guehdr; - size_t len, optlen, hdrlen, off; - void *data; - u16 doffset = 0; - int flush = 1; - struct fou *fou = fou_from_sock(sk); - struct gro_remcsum grc; - u8 proto; - - skb_gro_remcsum_init(&grc); - - off = skb_gro_offset(skb); - len = off + sizeof(*guehdr); - - guehdr = skb_gro_header(skb, len, off); - if (unlikely(!guehdr)) - goto out; - - switch (guehdr->version) { - case 0: - break; - case 1: - switch (((struct iphdr *)guehdr)->version) { - case 4: - proto = IPPROTO_IPIP; - break; - case 6: - proto = IPPROTO_IPV6; - break; - default: - goto out; - } - goto next_proto; - default: - goto out; - } - - optlen = guehdr->hlen << 2; - len += optlen; - - if (skb_gro_header_hard(skb, len)) { - guehdr = skb_gro_header_slow(skb, len, off); - if (unlikely(!guehdr)) - goto out; - } - - if (unlikely(guehdr->control) || guehdr->version != 0 || - validate_gue_flags(guehdr, optlen)) - goto out; - - hdrlen = sizeof(*guehdr) + optlen; - - /* Adjust NAPI_GRO_CB(skb)->csum to account for guehdr, - * this is needed if there is a remote checkcsum offload. - */ - skb_gro_postpull_rcsum(skb, guehdr, hdrlen); - - data = &guehdr[1]; - - if (guehdr->flags & GUE_FLAG_PRIV) { - __be32 flags = *(__be32 *)(data + doffset); - - doffset += GUE_LEN_PRIV; - - if (flags & GUE_PFLAG_REMCSUM) { - guehdr = gue_gro_remcsum(skb, off, guehdr, - data + doffset, hdrlen, &grc, - !!(fou->flags & - FOU_F_REMCSUM_NOPARTIAL)); - - if (!guehdr) - goto out; - - data = &guehdr[1]; - - doffset += GUE_PLEN_REMCSUM; - } - } - - skb_gro_pull(skb, hdrlen); - - list_for_each_entry(p, head, list) { - const struct guehdr *guehdr2; - - if (!NAPI_GRO_CB(p)->same_flow) - continue; - - guehdr2 = (struct guehdr *)(p->data + off); - - /* Compare base GUE header to be equal (covers - * hlen, version, proto_ctype, and flags. - */ - if (guehdr->word != guehdr2->word) { - NAPI_GRO_CB(p)->same_flow = 0; - continue; - } - - /* Compare optional fields are the same. */ - if (guehdr->hlen && memcmp(&guehdr[1], &guehdr2[1], - guehdr->hlen << 2)) { - NAPI_GRO_CB(p)->same_flow = 0; - continue; - } - } - - proto = guehdr->proto_ctype; - -next_proto: - - /* We can clear the encap_mark for GUE as we are essentially doing - * one of two possible things. We are either adding an L4 tunnel - * header to the outer L3 tunnel header, or we are simply - * treating the GRE tunnel header as though it is a UDP protocol - * specific header such as VXLAN or GENEVE. - */ - NAPI_GRO_CB(skb)->encap_mark = 0; - - /* Flag this frame as already having an outer encap header */ - NAPI_GRO_CB(skb)->is_fou = 1; - - offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; - ops = rcu_dereference(offloads[proto]); - if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive)) - goto out; - - pp = call_gro_receive(ops->callbacks.gro_receive, head, skb); - flush = 0; - -out: - skb_gro_flush_final_remcsum(skb, pp, flush, &grc); - - return pp; -} - -static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff) -{ - struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff); - const struct net_offload __rcu **offloads; - const struct net_offload *ops; - unsigned int guehlen = 0; - u8 proto; - int err = -ENOENT; - - switch (guehdr->version) { - case 0: - proto = guehdr->proto_ctype; - guehlen = sizeof(*guehdr) + (guehdr->hlen << 2); - break; - case 1: - switch (((struct iphdr *)guehdr)->version) { - case 4: - proto = IPPROTO_IPIP; - break; - case 6: - proto = IPPROTO_IPV6; - break; - default: - return err; - } - break; - default: - return err; - } - - offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; - ops = rcu_dereference(offloads[proto]); - if (WARN_ON(!ops || !ops->callbacks.gro_complete)) - goto out; - - err = ops->callbacks.gro_complete(skb, nhoff + guehlen); - - skb_set_inner_mac_header(skb, nhoff + guehlen); - -out: - return err; -} - -static bool fou_cfg_cmp(struct fou *fou, struct fou_cfg *cfg) -{ - struct sock *sk = fou->sock->sk; - struct udp_port_cfg *udp_cfg = &cfg->udp_config; - - if (fou->family != udp_cfg->family || - fou->port != udp_cfg->local_udp_port || - sk->sk_dport != udp_cfg->peer_udp_port || - sk->sk_bound_dev_if != udp_cfg->bind_ifindex) - return false; - - if (fou->family == AF_INET) { - if (sk->sk_rcv_saddr != udp_cfg->local_ip.s_addr || - sk->sk_daddr != udp_cfg->peer_ip.s_addr) - return false; - else - return true; -#if IS_ENABLED(CONFIG_IPV6) - } else { - if (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, &udp_cfg->local_ip6) || - ipv6_addr_cmp(&sk->sk_v6_daddr, &udp_cfg->peer_ip6)) - return false; - else - return true; -#endif - } - - return false; -} - -static int fou_add_to_port_list(struct net *net, struct fou *fou, - struct fou_cfg *cfg) -{ - struct fou_net *fn = net_generic(net, fou_net_id); - struct fou *fout; - - mutex_lock(&fn->fou_lock); - list_for_each_entry(fout, &fn->fou_list, list) { - if (fou_cfg_cmp(fout, cfg)) { - mutex_unlock(&fn->fou_lock); - return -EALREADY; - } - } - - list_add(&fou->list, &fn->fou_list); - mutex_unlock(&fn->fou_lock); - - return 0; -} - -static void fou_release(struct fou *fou) -{ - struct socket *sock = fou->sock; - - list_del(&fou->list); - udp_tunnel_sock_release(sock); - - kfree_rcu(fou, rcu); -} - -static int fou_create(struct net *net, struct fou_cfg *cfg, - struct socket **sockp) -{ - struct socket *sock = NULL; - struct fou *fou = NULL; - struct sock *sk; - struct udp_tunnel_sock_cfg tunnel_cfg; - int err; - - /* Open UDP socket */ - err = udp_sock_create(net, &cfg->udp_config, &sock); - if (err < 0) - goto error; - - /* Allocate FOU port structure */ - fou = kzalloc(sizeof(*fou), GFP_KERNEL); - if (!fou) { - err = -ENOMEM; - goto error; - } - - sk = sock->sk; - - fou->port = cfg->udp_config.local_udp_port; - fou->family = cfg->udp_config.family; - fou->flags = cfg->flags; - fou->type = cfg->type; - fou->sock = sock; - - memset(&tunnel_cfg, 0, sizeof(tunnel_cfg)); - tunnel_cfg.encap_type = 1; - tunnel_cfg.sk_user_data = fou; - tunnel_cfg.encap_destroy = NULL; - - /* Initial for fou type */ - switch (cfg->type) { - case FOU_ENCAP_DIRECT: - tunnel_cfg.encap_rcv = fou_udp_recv; - tunnel_cfg.gro_receive = fou_gro_receive; - tunnel_cfg.gro_complete = fou_gro_complete; - fou->protocol = cfg->protocol; - break; - case FOU_ENCAP_GUE: - tunnel_cfg.encap_rcv = gue_udp_recv; - tunnel_cfg.gro_receive = gue_gro_receive; - tunnel_cfg.gro_complete = gue_gro_complete; - break; - default: - err = -EINVAL; - goto error; - } - - setup_udp_tunnel_sock(net, sock, &tunnel_cfg); - - sk->sk_allocation = GFP_ATOMIC; - - err = fou_add_to_port_list(net, fou, cfg); - if (err) - goto error; - - if (sockp) - *sockp = sock; - - return 0; - -error: - kfree(fou); - if (sock) - udp_tunnel_sock_release(sock); - - return err; -} - -static int fou_destroy(struct net *net, struct fou_cfg *cfg) -{ - struct fou_net *fn = net_generic(net, fou_net_id); - int err = -EINVAL; - struct fou *fou; - - mutex_lock(&fn->fou_lock); - list_for_each_entry(fou, &fn->fou_list, list) { - if (fou_cfg_cmp(fou, cfg)) { - fou_release(fou); - err = 0; - break; - } - } - mutex_unlock(&fn->fou_lock); - - return err; -} - -static struct genl_family fou_nl_family; - -static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = { - [FOU_ATTR_PORT] = { .type = NLA_U16, }, - [FOU_ATTR_AF] = { .type = NLA_U8, }, - [FOU_ATTR_IPPROTO] = { .type = NLA_U8, }, - [FOU_ATTR_TYPE] = { .type = NLA_U8, }, - [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, }, - [FOU_ATTR_LOCAL_V4] = { .type = NLA_U32, }, - [FOU_ATTR_PEER_V4] = { .type = NLA_U32, }, - [FOU_ATTR_LOCAL_V6] = { .len = sizeof(struct in6_addr), }, - [FOU_ATTR_PEER_V6] = { .len = sizeof(struct in6_addr), }, - [FOU_ATTR_PEER_PORT] = { .type = NLA_U16, }, - [FOU_ATTR_IFINDEX] = { .type = NLA_S32, }, -}; - -static int parse_nl_config(struct genl_info *info, - struct fou_cfg *cfg) -{ - bool has_local = false, has_peer = false; - struct nlattr *attr; - int ifindex; - __be16 port; - - memset(cfg, 0, sizeof(*cfg)); - - cfg->udp_config.family = AF_INET; - - if (info->attrs[FOU_ATTR_AF]) { - u8 family = nla_get_u8(info->attrs[FOU_ATTR_AF]); - - switch (family) { - case AF_INET: - break; - case AF_INET6: - cfg->udp_config.ipv6_v6only = 1; - break; - default: - return -EAFNOSUPPORT; - } - - cfg->udp_config.family = family; - } - - if (info->attrs[FOU_ATTR_PORT]) { - port = nla_get_be16(info->attrs[FOU_ATTR_PORT]); - cfg->udp_config.local_udp_port = port; - } - - if (info->attrs[FOU_ATTR_IPPROTO]) - cfg->protocol = nla_get_u8(info->attrs[FOU_ATTR_IPPROTO]); - - if (info->attrs[FOU_ATTR_TYPE]) - cfg->type = nla_get_u8(info->attrs[FOU_ATTR_TYPE]); - - if (info->attrs[FOU_ATTR_REMCSUM_NOPARTIAL]) - cfg->flags |= FOU_F_REMCSUM_NOPARTIAL; - - if (cfg->udp_config.family == AF_INET) { - if (info->attrs[FOU_ATTR_LOCAL_V4]) { - attr = info->attrs[FOU_ATTR_LOCAL_V4]; - cfg->udp_config.local_ip.s_addr = nla_get_in_addr(attr); - has_local = true; - } - - if (info->attrs[FOU_ATTR_PEER_V4]) { - attr = info->attrs[FOU_ATTR_PEER_V4]; - cfg->udp_config.peer_ip.s_addr = nla_get_in_addr(attr); - has_peer = true; - } -#if IS_ENABLED(CONFIG_IPV6) - } else { - if (info->attrs[FOU_ATTR_LOCAL_V6]) { - attr = info->attrs[FOU_ATTR_LOCAL_V6]; - cfg->udp_config.local_ip6 = nla_get_in6_addr(attr); - has_local = true; - } - - if (info->attrs[FOU_ATTR_PEER_V6]) { - attr = info->attrs[FOU_ATTR_PEER_V6]; - cfg->udp_config.peer_ip6 = nla_get_in6_addr(attr); - has_peer = true; - } -#endif - } - - if (has_peer) { - if (info->attrs[FOU_ATTR_PEER_PORT]) { - port = nla_get_be16(info->attrs[FOU_ATTR_PEER_PORT]); - cfg->udp_config.peer_udp_port = port; - } else { - return -EINVAL; - } - } - - if (info->attrs[FOU_ATTR_IFINDEX]) { - if (!has_local) - return -EINVAL; - - ifindex = nla_get_s32(info->attrs[FOU_ATTR_IFINDEX]); - - cfg->udp_config.bind_ifindex = ifindex; - } - - return 0; -} - -static int fou_nl_cmd_add_port(struct sk_buff *skb, struct genl_info *info) -{ - struct net *net = genl_info_net(info); - struct fou_cfg cfg; - int err; - - err = parse_nl_config(info, &cfg); - if (err) - return err; - - return fou_create(net, &cfg, NULL); -} - -static int fou_nl_cmd_rm_port(struct sk_buff *skb, struct genl_info *info) -{ - struct net *net = genl_info_net(info); - struct fou_cfg cfg; - int err; - - err = parse_nl_config(info, &cfg); - if (err) - return err; - - return fou_destroy(net, &cfg); -} - -static int fou_fill_info(struct fou *fou, struct sk_buff *msg) -{ - struct sock *sk = fou->sock->sk; - - if (nla_put_u8(msg, FOU_ATTR_AF, fou->sock->sk->sk_family) || - nla_put_be16(msg, FOU_ATTR_PORT, fou->port) || - nla_put_be16(msg, FOU_ATTR_PEER_PORT, sk->sk_dport) || - nla_put_u8(msg, FOU_ATTR_IPPROTO, fou->protocol) || - nla_put_u8(msg, FOU_ATTR_TYPE, fou->type) || - nla_put_s32(msg, FOU_ATTR_IFINDEX, sk->sk_bound_dev_if)) - return -1; - - if (fou->flags & FOU_F_REMCSUM_NOPARTIAL) - if (nla_put_flag(msg, FOU_ATTR_REMCSUM_NOPARTIAL)) - return -1; - - if (fou->sock->sk->sk_family == AF_INET) { - if (nla_put_in_addr(msg, FOU_ATTR_LOCAL_V4, sk->sk_rcv_saddr)) - return -1; - - if (nla_put_in_addr(msg, FOU_ATTR_PEER_V4, sk->sk_daddr)) - return -1; -#if IS_ENABLED(CONFIG_IPV6) - } else { - if (nla_put_in6_addr(msg, FOU_ATTR_LOCAL_V6, - &sk->sk_v6_rcv_saddr)) - return -1; - - if (nla_put_in6_addr(msg, FOU_ATTR_PEER_V6, &sk->sk_v6_daddr)) - return -1; -#endif - } - - return 0; -} - -static int fou_dump_info(struct fou *fou, u32 portid, u32 seq, - u32 flags, struct sk_buff *skb, u8 cmd) -{ - void *hdr; - - hdr = genlmsg_put(skb, portid, seq, &fou_nl_family, flags, cmd); - if (!hdr) - return -ENOMEM; - - if (fou_fill_info(fou, skb) < 0) - goto nla_put_failure; - - genlmsg_end(skb, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(skb, hdr); - return -EMSGSIZE; -} - -static int fou_nl_cmd_get_port(struct sk_buff *skb, struct genl_info *info) -{ - struct net *net = genl_info_net(info); - struct fou_net *fn = net_generic(net, fou_net_id); - struct sk_buff *msg; - struct fou_cfg cfg; - struct fou *fout; - __be16 port; - u8 family; - int ret; - - ret = parse_nl_config(info, &cfg); - if (ret) - return ret; - port = cfg.udp_config.local_udp_port; - if (port == 0) - return -EINVAL; - - family = cfg.udp_config.family; - if (family != AF_INET && family != AF_INET6) - return -EINVAL; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - ret = -ESRCH; - mutex_lock(&fn->fou_lock); - list_for_each_entry(fout, &fn->fou_list, list) { - if (fou_cfg_cmp(fout, &cfg)) { - ret = fou_dump_info(fout, info->snd_portid, - info->snd_seq, 0, msg, - info->genlhdr->cmd); - break; - } - } - mutex_unlock(&fn->fou_lock); - if (ret < 0) - goto out_free; - - return genlmsg_reply(msg, info); - -out_free: - nlmsg_free(msg); - return ret; -} - -static int fou_nl_dump(struct sk_buff *skb, struct netlink_callback *cb) -{ - struct net *net = sock_net(skb->sk); - struct fou_net *fn = net_generic(net, fou_net_id); - struct fou *fout; - int idx = 0, ret; - - mutex_lock(&fn->fou_lock); - list_for_each_entry(fout, &fn->fou_list, list) { - if (idx++ < cb->args[0]) - continue; - ret = fou_dump_info(fout, NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - skb, FOU_CMD_GET); - if (ret) - break; - } - mutex_unlock(&fn->fou_lock); - - cb->args[0] = idx; - return skb->len; -} - -static const struct genl_small_ops fou_nl_ops[] = { - { - .cmd = FOU_CMD_ADD, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = fou_nl_cmd_add_port, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = FOU_CMD_DEL, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = fou_nl_cmd_rm_port, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = FOU_CMD_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = fou_nl_cmd_get_port, - .dumpit = fou_nl_dump, - }, -}; - -static struct genl_family fou_nl_family __ro_after_init = { - .hdrsize = 0, - .name = FOU_GENL_NAME, - .version = FOU_GENL_VERSION, - .maxattr = FOU_ATTR_MAX, - .policy = fou_nl_policy, - .netnsok = true, - .module = THIS_MODULE, - .small_ops = fou_nl_ops, - .n_small_ops = ARRAY_SIZE(fou_nl_ops), - .resv_start_op = FOU_CMD_GET + 1, -}; - -size_t fou_encap_hlen(struct ip_tunnel_encap *e) -{ - return sizeof(struct udphdr); -} -EXPORT_SYMBOL(fou_encap_hlen); - -size_t gue_encap_hlen(struct ip_tunnel_encap *e) -{ - size_t len; - bool need_priv = false; - - len = sizeof(struct udphdr) + sizeof(struct guehdr); - - if (e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) { - len += GUE_PLEN_REMCSUM; - need_priv = true; - } - - len += need_priv ? GUE_LEN_PRIV : 0; - - return len; -} -EXPORT_SYMBOL(gue_encap_hlen); - -int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, - u8 *protocol, __be16 *sport, int type) -{ - int err; - - err = iptunnel_handle_offloads(skb, type); - if (err) - return err; - - *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev), - skb, 0, 0, false); - - return 0; -} -EXPORT_SYMBOL(__fou_build_header); - -int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, - u8 *protocol, __be16 *sport, int type) -{ - struct guehdr *guehdr; - size_t hdrlen, optlen = 0; - void *data; - bool need_priv = false; - int err; - - if ((e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) && - skb->ip_summed == CHECKSUM_PARTIAL) { - optlen += GUE_PLEN_REMCSUM; - type |= SKB_GSO_TUNNEL_REMCSUM; - need_priv = true; - } - - optlen += need_priv ? GUE_LEN_PRIV : 0; - - err = iptunnel_handle_offloads(skb, type); - if (err) - return err; - - /* Get source port (based on flow hash) before skb_push */ - *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev), - skb, 0, 0, false); - - hdrlen = sizeof(struct guehdr) + optlen; - - skb_push(skb, hdrlen); - - guehdr = (struct guehdr *)skb->data; - - guehdr->control = 0; - guehdr->version = 0; - guehdr->hlen = optlen >> 2; - guehdr->flags = 0; - guehdr->proto_ctype = *protocol; - - data = &guehdr[1]; - - if (need_priv) { - __be32 *flags = data; - - guehdr->flags |= GUE_FLAG_PRIV; - *flags = 0; - data += GUE_LEN_PRIV; - - if (type & SKB_GSO_TUNNEL_REMCSUM) { - u16 csum_start = skb_checksum_start_offset(skb); - __be16 *pd = data; - - if (csum_start < hdrlen) - return -EINVAL; - - csum_start -= hdrlen; - pd[0] = htons(csum_start); - pd[1] = htons(csum_start + skb->csum_offset); - - if (!skb_is_gso(skb)) { - skb->ip_summed = CHECKSUM_NONE; - skb->encapsulation = 0; - } - - *flags |= GUE_PFLAG_REMCSUM; - data += GUE_PLEN_REMCSUM; - } - - } - - return 0; -} -EXPORT_SYMBOL(__gue_build_header); - -#ifdef CONFIG_NET_FOU_IP_TUNNELS - -static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e, - struct flowi4 *fl4, u8 *protocol, __be16 sport) -{ - struct udphdr *uh; - - skb_push(skb, sizeof(struct udphdr)); - skb_reset_transport_header(skb); - - uh = udp_hdr(skb); - - uh->dest = e->dport; - uh->source = sport; - uh->len = htons(skb->len); - udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb, - fl4->saddr, fl4->daddr, skb->len); - - *protocol = IPPROTO_UDP; -} - -static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, - u8 *protocol, struct flowi4 *fl4) -{ - int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : - SKB_GSO_UDP_TUNNEL; - __be16 sport; - int err; - - err = __fou_build_header(skb, e, protocol, &sport, type); - if (err) - return err; - - fou_build_udp(skb, e, fl4, protocol, sport); - - return 0; -} - -static int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, - u8 *protocol, struct flowi4 *fl4) -{ - int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : - SKB_GSO_UDP_TUNNEL; - __be16 sport; - int err; - - err = __gue_build_header(skb, e, protocol, &sport, type); - if (err) - return err; - - fou_build_udp(skb, e, fl4, protocol, sport); - - return 0; -} - -static int gue_err_proto_handler(int proto, struct sk_buff *skb, u32 info) -{ - const struct net_protocol *ipprot = rcu_dereference(inet_protos[proto]); - - if (ipprot && ipprot->err_handler) { - if (!ipprot->err_handler(skb, info)) - return 0; - } - - return -ENOENT; -} - -static int gue_err(struct sk_buff *skb, u32 info) -{ - int transport_offset = skb_transport_offset(skb); - struct guehdr *guehdr; - size_t len, optlen; - int ret; - - len = sizeof(struct udphdr) + sizeof(struct guehdr); - if (!pskb_may_pull(skb, transport_offset + len)) - return -EINVAL; - - guehdr = (struct guehdr *)&udp_hdr(skb)[1]; - - switch (guehdr->version) { - case 0: /* Full GUE header present */ - break; - case 1: { - /* Direct encapsulation of IPv4 or IPv6 */ - skb_set_transport_header(skb, -(int)sizeof(struct icmphdr)); - - switch (((struct iphdr *)guehdr)->version) { - case 4: - ret = gue_err_proto_handler(IPPROTO_IPIP, skb, info); - goto out; -#if IS_ENABLED(CONFIG_IPV6) - case 6: - ret = gue_err_proto_handler(IPPROTO_IPV6, skb, info); - goto out; -#endif - default: - ret = -EOPNOTSUPP; - goto out; - } - } - default: /* Undefined version */ - return -EOPNOTSUPP; - } - - if (guehdr->control) - return -ENOENT; - - optlen = guehdr->hlen << 2; - - if (!pskb_may_pull(skb, transport_offset + len + optlen)) - return -EINVAL; - - guehdr = (struct guehdr *)&udp_hdr(skb)[1]; - if (validate_gue_flags(guehdr, optlen)) - return -EINVAL; - - /* Handling exceptions for direct UDP encapsulation in GUE would lead to - * recursion. Besides, this kind of encapsulation can't even be - * configured currently. Discard this. - */ - if (guehdr->proto_ctype == IPPROTO_UDP || - guehdr->proto_ctype == IPPROTO_UDPLITE) - return -EOPNOTSUPP; - - skb_set_transport_header(skb, -(int)sizeof(struct icmphdr)); - ret = gue_err_proto_handler(guehdr->proto_ctype, skb, info); - -out: - skb_set_transport_header(skb, transport_offset); - return ret; -} - - -static const struct ip_tunnel_encap_ops fou_iptun_ops = { - .encap_hlen = fou_encap_hlen, - .build_header = fou_build_header, - .err_handler = gue_err, -}; - -static const struct ip_tunnel_encap_ops gue_iptun_ops = { - .encap_hlen = gue_encap_hlen, - .build_header = gue_build_header, - .err_handler = gue_err, -}; - -static int ip_tunnel_encap_add_fou_ops(void) -{ - int ret; - - ret = ip_tunnel_encap_add_ops(&fou_iptun_ops, TUNNEL_ENCAP_FOU); - if (ret < 0) { - pr_err("can't add fou ops\n"); - return ret; - } - - ret = ip_tunnel_encap_add_ops(&gue_iptun_ops, TUNNEL_ENCAP_GUE); - if (ret < 0) { - pr_err("can't add gue ops\n"); - ip_tunnel_encap_del_ops(&fou_iptun_ops, TUNNEL_ENCAP_FOU); - return ret; - } - - return 0; -} - -static void ip_tunnel_encap_del_fou_ops(void) -{ - ip_tunnel_encap_del_ops(&fou_iptun_ops, TUNNEL_ENCAP_FOU); - ip_tunnel_encap_del_ops(&gue_iptun_ops, TUNNEL_ENCAP_GUE); -} - -#else - -static int ip_tunnel_encap_add_fou_ops(void) -{ - return 0; -} - -static void ip_tunnel_encap_del_fou_ops(void) -{ -} - -#endif - -static __net_init int fou_init_net(struct net *net) -{ - struct fou_net *fn = net_generic(net, fou_net_id); - - INIT_LIST_HEAD(&fn->fou_list); - mutex_init(&fn->fou_lock); - return 0; -} - -static __net_exit void fou_exit_net(struct net *net) -{ - struct fou_net *fn = net_generic(net, fou_net_id); - struct fou *fou, *next; - - /* Close all the FOU sockets */ - mutex_lock(&fn->fou_lock); - list_for_each_entry_safe(fou, next, &fn->fou_list, list) - fou_release(fou); - mutex_unlock(&fn->fou_lock); -} - -static struct pernet_operations fou_net_ops = { - .init = fou_init_net, - .exit = fou_exit_net, - .id = &fou_net_id, - .size = sizeof(struct fou_net), -}; - -static int __init fou_init(void) -{ - int ret; - - ret = register_pernet_device(&fou_net_ops); - if (ret) - goto exit; - - ret = genl_register_family(&fou_nl_family); - if (ret < 0) - goto unregister; - - ret = ip_tunnel_encap_add_fou_ops(); - if (ret == 0) - return 0; - - genl_unregister_family(&fou_nl_family); -unregister: - unregister_pernet_device(&fou_net_ops); -exit: - return ret; -} - -static void __exit fou_fini(void) -{ - ip_tunnel_encap_del_fou_ops(); - genl_unregister_family(&fou_nl_family); - unregister_pernet_device(&fou_net_ops); -} - -module_init(fou_init); -module_exit(fou_fini); -MODULE_AUTHOR("Tom Herbert "); -MODULE_LICENSE("GPL"); -MODULE_DESCRIPTION("Foo over UDP"); diff --git a/net/ipv4/fou_core.c b/net/ipv4/fou_core.c new file mode 100644 index 000000000000..0c3c6d0cee29 --- /dev/null +++ b/net/ipv4/fou_core.c @@ -0,0 +1,1294 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct fou { + struct socket *sock; + u8 protocol; + u8 flags; + __be16 port; + u8 family; + u16 type; + struct list_head list; + struct rcu_head rcu; +}; + +#define FOU_F_REMCSUM_NOPARTIAL BIT(0) + +struct fou_cfg { + u16 type; + u8 protocol; + u8 flags; + struct udp_port_cfg udp_config; +}; + +static unsigned int fou_net_id; + +struct fou_net { + struct list_head fou_list; + struct mutex fou_lock; +}; + +static inline struct fou *fou_from_sock(struct sock *sk) +{ + return sk->sk_user_data; +} + +static int fou_recv_pull(struct sk_buff *skb, struct fou *fou, size_t len) +{ + /* Remove 'len' bytes from the packet (UDP header and + * FOU header if present). + */ + if (fou->family == AF_INET) + ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len); + else + ipv6_hdr(skb)->payload_len = + htons(ntohs(ipv6_hdr(skb)->payload_len) - len); + + __skb_pull(skb, len); + skb_postpull_rcsum(skb, udp_hdr(skb), len); + skb_reset_transport_header(skb); + return iptunnel_pull_offloads(skb); +} + +static int fou_udp_recv(struct sock *sk, struct sk_buff *skb) +{ + struct fou *fou = fou_from_sock(sk); + + if (!fou) + return 1; + + if (fou_recv_pull(skb, fou, sizeof(struct udphdr))) + goto drop; + + return -fou->protocol; + +drop: + kfree_skb(skb); + return 0; +} + +static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr, + void *data, size_t hdrlen, u8 ipproto, + bool nopartial) +{ + __be16 *pd = data; + size_t start = ntohs(pd[0]); + size_t offset = ntohs(pd[1]); + size_t plen = sizeof(struct udphdr) + hdrlen + + max_t(size_t, offset + sizeof(u16), start); + + if (skb->remcsum_offload) + return guehdr; + + if (!pskb_may_pull(skb, plen)) + return NULL; + guehdr = (struct guehdr *)&udp_hdr(skb)[1]; + + skb_remcsum_process(skb, (void *)guehdr + hdrlen, + start, offset, nopartial); + + return guehdr; +} + +static int gue_control_message(struct sk_buff *skb, struct guehdr *guehdr) +{ + /* No support yet */ + kfree_skb(skb); + return 0; +} + +static int gue_udp_recv(struct sock *sk, struct sk_buff *skb) +{ + struct fou *fou = fou_from_sock(sk); + size_t len, optlen, hdrlen; + struct guehdr *guehdr; + void *data; + u16 doffset = 0; + u8 proto_ctype; + + if (!fou) + return 1; + + len = sizeof(struct udphdr) + sizeof(struct guehdr); + if (!pskb_may_pull(skb, len)) + goto drop; + + guehdr = (struct guehdr *)&udp_hdr(skb)[1]; + + switch (guehdr->version) { + case 0: /* Full GUE header present */ + break; + + case 1: { + /* Direct encapsulation of IPv4 or IPv6 */ + + int prot; + + switch (((struct iphdr *)guehdr)->version) { + case 4: + prot = IPPROTO_IPIP; + break; + case 6: + prot = IPPROTO_IPV6; + break; + default: + goto drop; + } + + if (fou_recv_pull(skb, fou, sizeof(struct udphdr))) + goto drop; + + return -prot; + } + + default: /* Undefined version */ + goto drop; + } + + optlen = guehdr->hlen << 2; + len += optlen; + + if (!pskb_may_pull(skb, len)) + goto drop; + + /* guehdr may change after pull */ + guehdr = (struct guehdr *)&udp_hdr(skb)[1]; + + if (validate_gue_flags(guehdr, optlen)) + goto drop; + + hdrlen = sizeof(struct guehdr) + optlen; + + if (fou->family == AF_INET) + ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len); + else + ipv6_hdr(skb)->payload_len = + htons(ntohs(ipv6_hdr(skb)->payload_len) - len); + + /* Pull csum through the guehdr now . This can be used if + * there is a remote checksum offload. + */ + skb_postpull_rcsum(skb, udp_hdr(skb), len); + + data = &guehdr[1]; + + if (guehdr->flags & GUE_FLAG_PRIV) { + __be32 flags = *(__be32 *)(data + doffset); + + doffset += GUE_LEN_PRIV; + + if (flags & GUE_PFLAG_REMCSUM) { + guehdr = gue_remcsum(skb, guehdr, data + doffset, + hdrlen, guehdr->proto_ctype, + !!(fou->flags & + FOU_F_REMCSUM_NOPARTIAL)); + if (!guehdr) + goto drop; + + data = &guehdr[1]; + + doffset += GUE_PLEN_REMCSUM; + } + } + + if (unlikely(guehdr->control)) + return gue_control_message(skb, guehdr); + + proto_ctype = guehdr->proto_ctype; + __skb_pull(skb, sizeof(struct udphdr) + hdrlen); + skb_reset_transport_header(skb); + + if (iptunnel_pull_offloads(skb)) + goto drop; + + return -proto_ctype; + +drop: + kfree_skb(skb); + return 0; +} + +static struct sk_buff *fou_gro_receive(struct sock *sk, + struct list_head *head, + struct sk_buff *skb) +{ + const struct net_offload __rcu **offloads; + u8 proto = fou_from_sock(sk)->protocol; + const struct net_offload *ops; + struct sk_buff *pp = NULL; + + /* We can clear the encap_mark for FOU as we are essentially doing + * one of two possible things. We are either adding an L4 tunnel + * header to the outer L3 tunnel header, or we are simply + * treating the GRE tunnel header as though it is a UDP protocol + * specific header such as VXLAN or GENEVE. + */ + NAPI_GRO_CB(skb)->encap_mark = 0; + + /* Flag this frame as already having an outer encap header */ + NAPI_GRO_CB(skb)->is_fou = 1; + + offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; + ops = rcu_dereference(offloads[proto]); + if (!ops || !ops->callbacks.gro_receive) + goto out; + + pp = call_gro_receive(ops->callbacks.gro_receive, head, skb); + +out: + return pp; +} + +static int fou_gro_complete(struct sock *sk, struct sk_buff *skb, + int nhoff) +{ + const struct net_offload __rcu **offloads; + u8 proto = fou_from_sock(sk)->protocol; + const struct net_offload *ops; + int err = -ENOSYS; + + offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; + ops = rcu_dereference(offloads[proto]); + if (WARN_ON(!ops || !ops->callbacks.gro_complete)) + goto out; + + err = ops->callbacks.gro_complete(skb, nhoff); + + skb_set_inner_mac_header(skb, nhoff); + +out: + return err; +} + +static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off, + struct guehdr *guehdr, void *data, + size_t hdrlen, struct gro_remcsum *grc, + bool nopartial) +{ + __be16 *pd = data; + size_t start = ntohs(pd[0]); + size_t offset = ntohs(pd[1]); + + if (skb->remcsum_offload) + return guehdr; + + if (!NAPI_GRO_CB(skb)->csum_valid) + return NULL; + + guehdr = skb_gro_remcsum_process(skb, (void *)guehdr, off, hdrlen, + start, offset, grc, nopartial); + + skb->remcsum_offload = 1; + + return guehdr; +} + +static struct sk_buff *gue_gro_receive(struct sock *sk, + struct list_head *head, + struct sk_buff *skb) +{ + const struct net_offload __rcu **offloads; + const struct net_offload *ops; + struct sk_buff *pp = NULL; + struct sk_buff *p; + struct guehdr *guehdr; + size_t len, optlen, hdrlen, off; + void *data; + u16 doffset = 0; + int flush = 1; + struct fou *fou = fou_from_sock(sk); + struct gro_remcsum grc; + u8 proto; + + skb_gro_remcsum_init(&grc); + + off = skb_gro_offset(skb); + len = off + sizeof(*guehdr); + + guehdr = skb_gro_header(skb, len, off); + if (unlikely(!guehdr)) + goto out; + + switch (guehdr->version) { + case 0: + break; + case 1: + switch (((struct iphdr *)guehdr)->version) { + case 4: + proto = IPPROTO_IPIP; + break; + case 6: + proto = IPPROTO_IPV6; + break; + default: + goto out; + } + goto next_proto; + default: + goto out; + } + + optlen = guehdr->hlen << 2; + len += optlen; + + if (skb_gro_header_hard(skb, len)) { + guehdr = skb_gro_header_slow(skb, len, off); + if (unlikely(!guehdr)) + goto out; + } + + if (unlikely(guehdr->control) || guehdr->version != 0 || + validate_gue_flags(guehdr, optlen)) + goto out; + + hdrlen = sizeof(*guehdr) + optlen; + + /* Adjust NAPI_GRO_CB(skb)->csum to account for guehdr, + * this is needed if there is a remote checkcsum offload. + */ + skb_gro_postpull_rcsum(skb, guehdr, hdrlen); + + data = &guehdr[1]; + + if (guehdr->flags & GUE_FLAG_PRIV) { + __be32 flags = *(__be32 *)(data + doffset); + + doffset += GUE_LEN_PRIV; + + if (flags & GUE_PFLAG_REMCSUM) { + guehdr = gue_gro_remcsum(skb, off, guehdr, + data + doffset, hdrlen, &grc, + !!(fou->flags & + FOU_F_REMCSUM_NOPARTIAL)); + + if (!guehdr) + goto out; + + data = &guehdr[1]; + + doffset += GUE_PLEN_REMCSUM; + } + } + + skb_gro_pull(skb, hdrlen); + + list_for_each_entry(p, head, list) { + const struct guehdr *guehdr2; + + if (!NAPI_GRO_CB(p)->same_flow) + continue; + + guehdr2 = (struct guehdr *)(p->data + off); + + /* Compare base GUE header to be equal (covers + * hlen, version, proto_ctype, and flags. + */ + if (guehdr->word != guehdr2->word) { + NAPI_GRO_CB(p)->same_flow = 0; + continue; + } + + /* Compare optional fields are the same. */ + if (guehdr->hlen && memcmp(&guehdr[1], &guehdr2[1], + guehdr->hlen << 2)) { + NAPI_GRO_CB(p)->same_flow = 0; + continue; + } + } + + proto = guehdr->proto_ctype; + +next_proto: + + /* We can clear the encap_mark for GUE as we are essentially doing + * one of two possible things. We are either adding an L4 tunnel + * header to the outer L3 tunnel header, or we are simply + * treating the GRE tunnel header as though it is a UDP protocol + * specific header such as VXLAN or GENEVE. + */ + NAPI_GRO_CB(skb)->encap_mark = 0; + + /* Flag this frame as already having an outer encap header */ + NAPI_GRO_CB(skb)->is_fou = 1; + + offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; + ops = rcu_dereference(offloads[proto]); + if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive)) + goto out; + + pp = call_gro_receive(ops->callbacks.gro_receive, head, skb); + flush = 0; + +out: + skb_gro_flush_final_remcsum(skb, pp, flush, &grc); + + return pp; +} + +static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff) +{ + struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff); + const struct net_offload __rcu **offloads; + const struct net_offload *ops; + unsigned int guehlen = 0; + u8 proto; + int err = -ENOENT; + + switch (guehdr->version) { + case 0: + proto = guehdr->proto_ctype; + guehlen = sizeof(*guehdr) + (guehdr->hlen << 2); + break; + case 1: + switch (((struct iphdr *)guehdr)->version) { + case 4: + proto = IPPROTO_IPIP; + break; + case 6: + proto = IPPROTO_IPV6; + break; + default: + return err; + } + break; + default: + return err; + } + + offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; + ops = rcu_dereference(offloads[proto]); + if (WARN_ON(!ops || !ops->callbacks.gro_complete)) + goto out; + + err = ops->callbacks.gro_complete(skb, nhoff + guehlen); + + skb_set_inner_mac_header(skb, nhoff + guehlen); + +out: + return err; +} + +static bool fou_cfg_cmp(struct fou *fou, struct fou_cfg *cfg) +{ + struct sock *sk = fou->sock->sk; + struct udp_port_cfg *udp_cfg = &cfg->udp_config; + + if (fou->family != udp_cfg->family || + fou->port != udp_cfg->local_udp_port || + sk->sk_dport != udp_cfg->peer_udp_port || + sk->sk_bound_dev_if != udp_cfg->bind_ifindex) + return false; + + if (fou->family == AF_INET) { + if (sk->sk_rcv_saddr != udp_cfg->local_ip.s_addr || + sk->sk_daddr != udp_cfg->peer_ip.s_addr) + return false; + else + return true; +#if IS_ENABLED(CONFIG_IPV6) + } else { + if (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, &udp_cfg->local_ip6) || + ipv6_addr_cmp(&sk->sk_v6_daddr, &udp_cfg->peer_ip6)) + return false; + else + return true; +#endif + } + + return false; +} + +static int fou_add_to_port_list(struct net *net, struct fou *fou, + struct fou_cfg *cfg) +{ + struct fou_net *fn = net_generic(net, fou_net_id); + struct fou *fout; + + mutex_lock(&fn->fou_lock); + list_for_each_entry(fout, &fn->fou_list, list) { + if (fou_cfg_cmp(fout, cfg)) { + mutex_unlock(&fn->fou_lock); + return -EALREADY; + } + } + + list_add(&fou->list, &fn->fou_list); + mutex_unlock(&fn->fou_lock); + + return 0; +} + +static void fou_release(struct fou *fou) +{ + struct socket *sock = fou->sock; + + list_del(&fou->list); + udp_tunnel_sock_release(sock); + + kfree_rcu(fou, rcu); +} + +static int fou_create(struct net *net, struct fou_cfg *cfg, + struct socket **sockp) +{ + struct socket *sock = NULL; + struct fou *fou = NULL; + struct sock *sk; + struct udp_tunnel_sock_cfg tunnel_cfg; + int err; + + /* Open UDP socket */ + err = udp_sock_create(net, &cfg->udp_config, &sock); + if (err < 0) + goto error; + + /* Allocate FOU port structure */ + fou = kzalloc(sizeof(*fou), GFP_KERNEL); + if (!fou) { + err = -ENOMEM; + goto error; + } + + sk = sock->sk; + + fou->port = cfg->udp_config.local_udp_port; + fou->family = cfg->udp_config.family; + fou->flags = cfg->flags; + fou->type = cfg->type; + fou->sock = sock; + + memset(&tunnel_cfg, 0, sizeof(tunnel_cfg)); + tunnel_cfg.encap_type = 1; + tunnel_cfg.sk_user_data = fou; + tunnel_cfg.encap_destroy = NULL; + + /* Initial for fou type */ + switch (cfg->type) { + case FOU_ENCAP_DIRECT: + tunnel_cfg.encap_rcv = fou_udp_recv; + tunnel_cfg.gro_receive = fou_gro_receive; + tunnel_cfg.gro_complete = fou_gro_complete; + fou->protocol = cfg->protocol; + break; + case FOU_ENCAP_GUE: + tunnel_cfg.encap_rcv = gue_udp_recv; + tunnel_cfg.gro_receive = gue_gro_receive; + tunnel_cfg.gro_complete = gue_gro_complete; + break; + default: + err = -EINVAL; + goto error; + } + + setup_udp_tunnel_sock(net, sock, &tunnel_cfg); + + sk->sk_allocation = GFP_ATOMIC; + + err = fou_add_to_port_list(net, fou, cfg); + if (err) + goto error; + + if (sockp) + *sockp = sock; + + return 0; + +error: + kfree(fou); + if (sock) + udp_tunnel_sock_release(sock); + + return err; +} + +static int fou_destroy(struct net *net, struct fou_cfg *cfg) +{ + struct fou_net *fn = net_generic(net, fou_net_id); + int err = -EINVAL; + struct fou *fou; + + mutex_lock(&fn->fou_lock); + list_for_each_entry(fou, &fn->fou_list, list) { + if (fou_cfg_cmp(fou, cfg)) { + fou_release(fou); + err = 0; + break; + } + } + mutex_unlock(&fn->fou_lock); + + return err; +} + +static struct genl_family fou_nl_family; + +static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = { + [FOU_ATTR_PORT] = { .type = NLA_U16, }, + [FOU_ATTR_AF] = { .type = NLA_U8, }, + [FOU_ATTR_IPPROTO] = { .type = NLA_U8, }, + [FOU_ATTR_TYPE] = { .type = NLA_U8, }, + [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, }, + [FOU_ATTR_LOCAL_V4] = { .type = NLA_U32, }, + [FOU_ATTR_PEER_V4] = { .type = NLA_U32, }, + [FOU_ATTR_LOCAL_V6] = { .len = sizeof(struct in6_addr), }, + [FOU_ATTR_PEER_V6] = { .len = sizeof(struct in6_addr), }, + [FOU_ATTR_PEER_PORT] = { .type = NLA_U16, }, + [FOU_ATTR_IFINDEX] = { .type = NLA_S32, }, +}; + +static int parse_nl_config(struct genl_info *info, + struct fou_cfg *cfg) +{ + bool has_local = false, has_peer = false; + struct nlattr *attr; + int ifindex; + __be16 port; + + memset(cfg, 0, sizeof(*cfg)); + + cfg->udp_config.family = AF_INET; + + if (info->attrs[FOU_ATTR_AF]) { + u8 family = nla_get_u8(info->attrs[FOU_ATTR_AF]); + + switch (family) { + case AF_INET: + break; + case AF_INET6: + cfg->udp_config.ipv6_v6only = 1; + break; + default: + return -EAFNOSUPPORT; + } + + cfg->udp_config.family = family; + } + + if (info->attrs[FOU_ATTR_PORT]) { + port = nla_get_be16(info->attrs[FOU_ATTR_PORT]); + cfg->udp_config.local_udp_port = port; + } + + if (info->attrs[FOU_ATTR_IPPROTO]) + cfg->protocol = nla_get_u8(info->attrs[FOU_ATTR_IPPROTO]); + + if (info->attrs[FOU_ATTR_TYPE]) + cfg->type = nla_get_u8(info->attrs[FOU_ATTR_TYPE]); + + if (info->attrs[FOU_ATTR_REMCSUM_NOPARTIAL]) + cfg->flags |= FOU_F_REMCSUM_NOPARTIAL; + + if (cfg->udp_config.family == AF_INET) { + if (info->attrs[FOU_ATTR_LOCAL_V4]) { + attr = info->attrs[FOU_ATTR_LOCAL_V4]; + cfg->udp_config.local_ip.s_addr = nla_get_in_addr(attr); + has_local = true; + } + + if (info->attrs[FOU_ATTR_PEER_V4]) { + attr = info->attrs[FOU_ATTR_PEER_V4]; + cfg->udp_config.peer_ip.s_addr = nla_get_in_addr(attr); + has_peer = true; + } +#if IS_ENABLED(CONFIG_IPV6) + } else { + if (info->attrs[FOU_ATTR_LOCAL_V6]) { + attr = info->attrs[FOU_ATTR_LOCAL_V6]; + cfg->udp_config.local_ip6 = nla_get_in6_addr(attr); + has_local = true; + } + + if (info->attrs[FOU_ATTR_PEER_V6]) { + attr = info->attrs[FOU_ATTR_PEER_V6]; + cfg->udp_config.peer_ip6 = nla_get_in6_addr(attr); + has_peer = true; + } +#endif + } + + if (has_peer) { + if (info->attrs[FOU_ATTR_PEER_PORT]) { + port = nla_get_be16(info->attrs[FOU_ATTR_PEER_PORT]); + cfg->udp_config.peer_udp_port = port; + } else { + return -EINVAL; + } + } + + if (info->attrs[FOU_ATTR_IFINDEX]) { + if (!has_local) + return -EINVAL; + + ifindex = nla_get_s32(info->attrs[FOU_ATTR_IFINDEX]); + + cfg->udp_config.bind_ifindex = ifindex; + } + + return 0; +} + +static int fou_nl_cmd_add_port(struct sk_buff *skb, struct genl_info *info) +{ + struct net *net = genl_info_net(info); + struct fou_cfg cfg; + int err; + + err = parse_nl_config(info, &cfg); + if (err) + return err; + + return fou_create(net, &cfg, NULL); +} + +static int fou_nl_cmd_rm_port(struct sk_buff *skb, struct genl_info *info) +{ + struct net *net = genl_info_net(info); + struct fou_cfg cfg; + int err; + + err = parse_nl_config(info, &cfg); + if (err) + return err; + + return fou_destroy(net, &cfg); +} + +static int fou_fill_info(struct fou *fou, struct sk_buff *msg) +{ + struct sock *sk = fou->sock->sk; + + if (nla_put_u8(msg, FOU_ATTR_AF, fou->sock->sk->sk_family) || + nla_put_be16(msg, FOU_ATTR_PORT, fou->port) || + nla_put_be16(msg, FOU_ATTR_PEER_PORT, sk->sk_dport) || + nla_put_u8(msg, FOU_ATTR_IPPROTO, fou->protocol) || + nla_put_u8(msg, FOU_ATTR_TYPE, fou->type) || + nla_put_s32(msg, FOU_ATTR_IFINDEX, sk->sk_bound_dev_if)) + return -1; + + if (fou->flags & FOU_F_REMCSUM_NOPARTIAL) + if (nla_put_flag(msg, FOU_ATTR_REMCSUM_NOPARTIAL)) + return -1; + + if (fou->sock->sk->sk_family == AF_INET) { + if (nla_put_in_addr(msg, FOU_ATTR_LOCAL_V4, sk->sk_rcv_saddr)) + return -1; + + if (nla_put_in_addr(msg, FOU_ATTR_PEER_V4, sk->sk_daddr)) + return -1; +#if IS_ENABLED(CONFIG_IPV6) + } else { + if (nla_put_in6_addr(msg, FOU_ATTR_LOCAL_V6, + &sk->sk_v6_rcv_saddr)) + return -1; + + if (nla_put_in6_addr(msg, FOU_ATTR_PEER_V6, &sk->sk_v6_daddr)) + return -1; +#endif + } + + return 0; +} + +static int fou_dump_info(struct fou *fou, u32 portid, u32 seq, + u32 flags, struct sk_buff *skb, u8 cmd) +{ + void *hdr; + + hdr = genlmsg_put(skb, portid, seq, &fou_nl_family, flags, cmd); + if (!hdr) + return -ENOMEM; + + if (fou_fill_info(fou, skb) < 0) + goto nla_put_failure; + + genlmsg_end(skb, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return -EMSGSIZE; +} + +static int fou_nl_cmd_get_port(struct sk_buff *skb, struct genl_info *info) +{ + struct net *net = genl_info_net(info); + struct fou_net *fn = net_generic(net, fou_net_id); + struct sk_buff *msg; + struct fou_cfg cfg; + struct fou *fout; + __be16 port; + u8 family; + int ret; + + ret = parse_nl_config(info, &cfg); + if (ret) + return ret; + port = cfg.udp_config.local_udp_port; + if (port == 0) + return -EINVAL; + + family = cfg.udp_config.family; + if (family != AF_INET && family != AF_INET6) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + ret = -ESRCH; + mutex_lock(&fn->fou_lock); + list_for_each_entry(fout, &fn->fou_list, list) { + if (fou_cfg_cmp(fout, &cfg)) { + ret = fou_dump_info(fout, info->snd_portid, + info->snd_seq, 0, msg, + info->genlhdr->cmd); + break; + } + } + mutex_unlock(&fn->fou_lock); + if (ret < 0) + goto out_free; + + return genlmsg_reply(msg, info); + +out_free: + nlmsg_free(msg); + return ret; +} + +static int fou_nl_dump(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct net *net = sock_net(skb->sk); + struct fou_net *fn = net_generic(net, fou_net_id); + struct fou *fout; + int idx = 0, ret; + + mutex_lock(&fn->fou_lock); + list_for_each_entry(fout, &fn->fou_list, list) { + if (idx++ < cb->args[0]) + continue; + ret = fou_dump_info(fout, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + skb, FOU_CMD_GET); + if (ret) + break; + } + mutex_unlock(&fn->fou_lock); + + cb->args[0] = idx; + return skb->len; +} + +static const struct genl_small_ops fou_nl_ops[] = { + { + .cmd = FOU_CMD_ADD, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = fou_nl_cmd_add_port, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = FOU_CMD_DEL, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = fou_nl_cmd_rm_port, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = FOU_CMD_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = fou_nl_cmd_get_port, + .dumpit = fou_nl_dump, + }, +}; + +static struct genl_family fou_nl_family __ro_after_init = { + .hdrsize = 0, + .name = FOU_GENL_NAME, + .version = FOU_GENL_VERSION, + .maxattr = FOU_ATTR_MAX, + .policy = fou_nl_policy, + .netnsok = true, + .module = THIS_MODULE, + .small_ops = fou_nl_ops, + .n_small_ops = ARRAY_SIZE(fou_nl_ops), + .resv_start_op = FOU_CMD_GET + 1, +}; + +size_t fou_encap_hlen(struct ip_tunnel_encap *e) +{ + return sizeof(struct udphdr); +} +EXPORT_SYMBOL(fou_encap_hlen); + +size_t gue_encap_hlen(struct ip_tunnel_encap *e) +{ + size_t len; + bool need_priv = false; + + len = sizeof(struct udphdr) + sizeof(struct guehdr); + + if (e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) { + len += GUE_PLEN_REMCSUM; + need_priv = true; + } + + len += need_priv ? GUE_LEN_PRIV : 0; + + return len; +} +EXPORT_SYMBOL(gue_encap_hlen); + +int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, + u8 *protocol, __be16 *sport, int type) +{ + int err; + + err = iptunnel_handle_offloads(skb, type); + if (err) + return err; + + *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev), + skb, 0, 0, false); + + return 0; +} +EXPORT_SYMBOL(__fou_build_header); + +int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, + u8 *protocol, __be16 *sport, int type) +{ + struct guehdr *guehdr; + size_t hdrlen, optlen = 0; + void *data; + bool need_priv = false; + int err; + + if ((e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) && + skb->ip_summed == CHECKSUM_PARTIAL) { + optlen += GUE_PLEN_REMCSUM; + type |= SKB_GSO_TUNNEL_REMCSUM; + need_priv = true; + } + + optlen += need_priv ? GUE_LEN_PRIV : 0; + + err = iptunnel_handle_offloads(skb, type); + if (err) + return err; + + /* Get source port (based on flow hash) before skb_push */ + *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev), + skb, 0, 0, false); + + hdrlen = sizeof(struct guehdr) + optlen; + + skb_push(skb, hdrlen); + + guehdr = (struct guehdr *)skb->data; + + guehdr->control = 0; + guehdr->version = 0; + guehdr->hlen = optlen >> 2; + guehdr->flags = 0; + guehdr->proto_ctype = *protocol; + + data = &guehdr[1]; + + if (need_priv) { + __be32 *flags = data; + + guehdr->flags |= GUE_FLAG_PRIV; + *flags = 0; + data += GUE_LEN_PRIV; + + if (type & SKB_GSO_TUNNEL_REMCSUM) { + u16 csum_start = skb_checksum_start_offset(skb); + __be16 *pd = data; + + if (csum_start < hdrlen) + return -EINVAL; + + csum_start -= hdrlen; + pd[0] = htons(csum_start); + pd[1] = htons(csum_start + skb->csum_offset); + + if (!skb_is_gso(skb)) { + skb->ip_summed = CHECKSUM_NONE; + skb->encapsulation = 0; + } + + *flags |= GUE_PFLAG_REMCSUM; + data += GUE_PLEN_REMCSUM; + } + + } + + return 0; +} +EXPORT_SYMBOL(__gue_build_header); + +#ifdef CONFIG_NET_FOU_IP_TUNNELS + +static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e, + struct flowi4 *fl4, u8 *protocol, __be16 sport) +{ + struct udphdr *uh; + + skb_push(skb, sizeof(struct udphdr)); + skb_reset_transport_header(skb); + + uh = udp_hdr(skb); + + uh->dest = e->dport; + uh->source = sport; + uh->len = htons(skb->len); + udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb, + fl4->saddr, fl4->daddr, skb->len); + + *protocol = IPPROTO_UDP; +} + +static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, + u8 *protocol, struct flowi4 *fl4) +{ + int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : + SKB_GSO_UDP_TUNNEL; + __be16 sport; + int err; + + err = __fou_build_header(skb, e, protocol, &sport, type); + if (err) + return err; + + fou_build_udp(skb, e, fl4, protocol, sport); + + return 0; +} + +static int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e, + u8 *protocol, struct flowi4 *fl4) +{ + int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM : + SKB_GSO_UDP_TUNNEL; + __be16 sport; + int err; + + err = __gue_build_header(skb, e, protocol, &sport, type); + if (err) + return err; + + fou_build_udp(skb, e, fl4, protocol, sport); + + return 0; +} + +static int gue_err_proto_handler(int proto, struct sk_buff *skb, u32 info) +{ + const struct net_protocol *ipprot = rcu_dereference(inet_protos[proto]); + + if (ipprot && ipprot->err_handler) { + if (!ipprot->err_handler(skb, info)) + return 0; + } + + return -ENOENT; +} + +static int gue_err(struct sk_buff *skb, u32 info) +{ + int transport_offset = skb_transport_offset(skb); + struct guehdr *guehdr; + size_t len, optlen; + int ret; + + len = sizeof(struct udphdr) + sizeof(struct guehdr); + if (!pskb_may_pull(skb, transport_offset + len)) + return -EINVAL; + + guehdr = (struct guehdr *)&udp_hdr(skb)[1]; + + switch (guehdr->version) { + case 0: /* Full GUE header present */ + break; + case 1: { + /* Direct encapsulation of IPv4 or IPv6 */ + skb_set_transport_header(skb, -(int)sizeof(struct icmphdr)); + + switch (((struct iphdr *)guehdr)->version) { + case 4: + ret = gue_err_proto_handler(IPPROTO_IPIP, skb, info); + goto out; +#if IS_ENABLED(CONFIG_IPV6) + case 6: + ret = gue_err_proto_handler(IPPROTO_IPV6, skb, info); + goto out; +#endif + default: + ret = -EOPNOTSUPP; + goto out; + } + } + default: /* Undefined version */ + return -EOPNOTSUPP; + } + + if (guehdr->control) + return -ENOENT; + + optlen = guehdr->hlen << 2; + + if (!pskb_may_pull(skb, transport_offset + len + optlen)) + return -EINVAL; + + guehdr = (struct guehdr *)&udp_hdr(skb)[1]; + if (validate_gue_flags(guehdr, optlen)) + return -EINVAL; + + /* Handling exceptions for direct UDP encapsulation in GUE would lead to + * recursion. Besides, this kind of encapsulation can't even be + * configured currently. Discard this. + */ + if (guehdr->proto_ctype == IPPROTO_UDP || + guehdr->proto_ctype == IPPROTO_UDPLITE) + return -EOPNOTSUPP; + + skb_set_transport_header(skb, -(int)sizeof(struct icmphdr)); + ret = gue_err_proto_handler(guehdr->proto_ctype, skb, info); + +out: + skb_set_transport_header(skb, transport_offset); + return ret; +} + + +static const struct ip_tunnel_encap_ops fou_iptun_ops = { + .encap_hlen = fou_encap_hlen, + .build_header = fou_build_header, + .err_handler = gue_err, +}; + +static const struct ip_tunnel_encap_ops gue_iptun_ops = { + .encap_hlen = gue_encap_hlen, + .build_header = gue_build_header, + .err_handler = gue_err, +}; + +static int ip_tunnel_encap_add_fou_ops(void) +{ + int ret; + + ret = ip_tunnel_encap_add_ops(&fou_iptun_ops, TUNNEL_ENCAP_FOU); + if (ret < 0) { + pr_err("can't add fou ops\n"); + return ret; + } + + ret = ip_tunnel_encap_add_ops(&gue_iptun_ops, TUNNEL_ENCAP_GUE); + if (ret < 0) { + pr_err("can't add gue ops\n"); + ip_tunnel_encap_del_ops(&fou_iptun_ops, TUNNEL_ENCAP_FOU); + return ret; + } + + return 0; +} + +static void ip_tunnel_encap_del_fou_ops(void) +{ + ip_tunnel_encap_del_ops(&fou_iptun_ops, TUNNEL_ENCAP_FOU); + ip_tunnel_encap_del_ops(&gue_iptun_ops, TUNNEL_ENCAP_GUE); +} + +#else + +static int ip_tunnel_encap_add_fou_ops(void) +{ + return 0; +} + +static void ip_tunnel_encap_del_fou_ops(void) +{ +} + +#endif + +static __net_init int fou_init_net(struct net *net) +{ + struct fou_net *fn = net_generic(net, fou_net_id); + + INIT_LIST_HEAD(&fn->fou_list); + mutex_init(&fn->fou_lock); + return 0; +} + +static __net_exit void fou_exit_net(struct net *net) +{ + struct fou_net *fn = net_generic(net, fou_net_id); + struct fou *fou, *next; + + /* Close all the FOU sockets */ + mutex_lock(&fn->fou_lock); + list_for_each_entry_safe(fou, next, &fn->fou_list, list) + fou_release(fou); + mutex_unlock(&fn->fou_lock); +} + +static struct pernet_operations fou_net_ops = { + .init = fou_init_net, + .exit = fou_exit_net, + .id = &fou_net_id, + .size = sizeof(struct fou_net), +}; + +static int __init fou_init(void) +{ + int ret; + + ret = register_pernet_device(&fou_net_ops); + if (ret) + goto exit; + + ret = genl_register_family(&fou_nl_family); + if (ret < 0) + goto unregister; + + ret = ip_tunnel_encap_add_fou_ops(); + if (ret == 0) + return 0; + + genl_unregister_family(&fou_nl_family); +unregister: + unregister_pernet_device(&fou_net_ops); +exit: + return ret; +} + +static void __exit fou_fini(void) +{ + ip_tunnel_encap_del_fou_ops(); + genl_unregister_family(&fou_nl_family); + unregister_pernet_device(&fou_net_ops); +} + +module_init(fou_init); +module_exit(fou_fini); +MODULE_AUTHOR("Tom Herbert "); +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Foo over UDP"); -- cgit v1.2.3 From 1d562c32e4392cc091c940918ee1ffd7bfcb9e96 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 20 Jan 2023 09:50:40 -0800 Subject: net: fou: use policy and operation tables generated from the spec Generate and plug in the spec-based tables. A little bit of renaming is needed in the FOU code. Acked-by: Stanislav Fomichev Signed-off-by: Jakub Kicinski Signed-off-by: Paolo Abeni --- net/ipv4/Makefile | 2 +- net/ipv4/fou_core.c | 47 +++++++---------------------------------------- net/ipv4/fou_nl.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/fou_nl.h | 25 +++++++++++++++++++++++++ 4 files changed, 81 insertions(+), 41 deletions(-) create mode 100644 net/ipv4/fou_nl.c create mode 100644 net/ipv4/fou_nl.h (limited to 'net') diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index fabbe46897ce..880277c9fd07 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -26,7 +26,7 @@ obj-$(CONFIG_IP_MROUTE) += ipmr.o obj-$(CONFIG_IP_MROUTE_COMMON) += ipmr_base.o obj-$(CONFIG_NET_IPIP) += ipip.o gre-y := gre_demux.o -fou-y := fou_core.o +fou-y := fou_core.o fou_nl.o obj-$(CONFIG_NET_FOU) += fou.o obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o obj-$(CONFIG_NET_IPGRE) += ip_gre.o diff --git a/net/ipv4/fou_core.c b/net/ipv4/fou_core.c index 0c3c6d0cee29..cafec9b4eee0 100644 --- a/net/ipv4/fou_core.c +++ b/net/ipv4/fou_core.c @@ -19,6 +19,8 @@ #include #include +#include "fou_nl.h" + struct fou { struct socket *sock; u8 protocol; @@ -640,20 +642,6 @@ static int fou_destroy(struct net *net, struct fou_cfg *cfg) static struct genl_family fou_nl_family; -static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = { - [FOU_ATTR_PORT] = { .type = NLA_U16, }, - [FOU_ATTR_AF] = { .type = NLA_U8, }, - [FOU_ATTR_IPPROTO] = { .type = NLA_U8, }, - [FOU_ATTR_TYPE] = { .type = NLA_U8, }, - [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, }, - [FOU_ATTR_LOCAL_V4] = { .type = NLA_U32, }, - [FOU_ATTR_PEER_V4] = { .type = NLA_U32, }, - [FOU_ATTR_LOCAL_V6] = { .len = sizeof(struct in6_addr), }, - [FOU_ATTR_PEER_V6] = { .len = sizeof(struct in6_addr), }, - [FOU_ATTR_PEER_PORT] = { .type = NLA_U16, }, - [FOU_ATTR_IFINDEX] = { .type = NLA_S32, }, -}; - static int parse_nl_config(struct genl_info *info, struct fou_cfg *cfg) { @@ -745,7 +733,7 @@ static int parse_nl_config(struct genl_info *info, return 0; } -static int fou_nl_cmd_add_port(struct sk_buff *skb, struct genl_info *info) +int fou_nl_add_doit(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct fou_cfg cfg; @@ -758,7 +746,7 @@ static int fou_nl_cmd_add_port(struct sk_buff *skb, struct genl_info *info) return fou_create(net, &cfg, NULL); } -static int fou_nl_cmd_rm_port(struct sk_buff *skb, struct genl_info *info) +int fou_nl_del_doit(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct fou_cfg cfg; @@ -827,7 +815,7 @@ nla_put_failure: return -EMSGSIZE; } -static int fou_nl_cmd_get_port(struct sk_buff *skb, struct genl_info *info) +int fou_nl_get_doit(struct sk_buff *skb, struct genl_info *info) { struct net *net = genl_info_net(info); struct fou_net *fn = net_generic(net, fou_net_id); @@ -874,7 +862,7 @@ out_free: return ret; } -static int fou_nl_dump(struct sk_buff *skb, struct netlink_callback *cb) +int fou_nl_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); struct fou_net *fn = net_generic(net, fou_net_id); @@ -897,33 +885,12 @@ static int fou_nl_dump(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } -static const struct genl_small_ops fou_nl_ops[] = { - { - .cmd = FOU_CMD_ADD, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = fou_nl_cmd_add_port, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = FOU_CMD_DEL, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = fou_nl_cmd_rm_port, - .flags = GENL_ADMIN_PERM, - }, - { - .cmd = FOU_CMD_GET, - .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, - .doit = fou_nl_cmd_get_port, - .dumpit = fou_nl_dump, - }, -}; - static struct genl_family fou_nl_family __ro_after_init = { .hdrsize = 0, .name = FOU_GENL_NAME, .version = FOU_GENL_VERSION, .maxattr = FOU_ATTR_MAX, - .policy = fou_nl_policy, + .policy = fou_nl_policy, .netnsok = true, .module = THIS_MODULE, .small_ops = fou_nl_ops, diff --git a/net/ipv4/fou_nl.c b/net/ipv4/fou_nl.c new file mode 100644 index 000000000000..6c3820f41dd5 --- /dev/null +++ b/net/ipv4/fou_nl.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: BSD-3-Clause +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/fou.yaml */ +/* YNL-GEN kernel source */ + +#include +#include + +#include "fou_nl.h" + +#include + +/* Global operation policy for fou */ +const struct nla_policy fou_nl_policy[FOU_ATTR_IFINDEX + 1] = { + [FOU_ATTR_PORT] = { .type = NLA_U16, }, + [FOU_ATTR_AF] = { .type = NLA_U8, }, + [FOU_ATTR_IPPROTO] = { .type = NLA_U8, }, + [FOU_ATTR_TYPE] = { .type = NLA_U8, }, + [FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, }, + [FOU_ATTR_LOCAL_V4] = { .type = NLA_U32, }, + [FOU_ATTR_LOCAL_V6] = { .len = 16, }, + [FOU_ATTR_PEER_V4] = { .type = NLA_U32, }, + [FOU_ATTR_PEER_V6] = { .len = 16, }, + [FOU_ATTR_PEER_PORT] = { .type = NLA_U16, }, + [FOU_ATTR_IFINDEX] = { .type = NLA_S32, }, +}; + +/* Ops table for fou */ +const struct genl_small_ops fou_nl_ops[3] = { + { + .cmd = FOU_CMD_ADD, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = fou_nl_add_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = FOU_CMD_DEL, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = fou_nl_del_doit, + .flags = GENL_ADMIN_PERM, + }, + { + .cmd = FOU_CMD_GET, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, + .doit = fou_nl_get_doit, + .dumpit = fou_nl_get_dumpit, + }, +}; diff --git a/net/ipv4/fou_nl.h b/net/ipv4/fou_nl.h new file mode 100644 index 000000000000..b7a68121ce6f --- /dev/null +++ b/net/ipv4/fou_nl.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/fou.yaml */ +/* YNL-GEN kernel header */ + +#ifndef _LINUX_FOU_GEN_H +#define _LINUX_FOU_GEN_H + +#include +#include + +#include + +/* Global operation policy for fou */ +extern const struct nla_policy fou_nl_policy[FOU_ATTR_IFINDEX + 1]; + +/* Ops table for fou */ +extern const struct genl_small_ops fou_nl_ops[3]; + +int fou_nl_add_doit(struct sk_buff *skb, struct genl_info *info); +int fou_nl_del_doit(struct sk_buff *skb, struct genl_info *info); +int fou_nl_get_doit(struct sk_buff *skb, struct genl_info *info); +int fou_nl_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); + +#endif /* _LINUX_FOU_GEN_H */ -- cgit v1.2.3 From ec8f7d495b3d37c5f0d66cdc7e43ef76939293b6 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 23 Jan 2023 14:22:24 -0800 Subject: netlink: fix spelling mistake in dump size assert Commit 2c7bc10d0f7b ("netlink: add macro for checking dump ctx size") misspelled the name of the assert as asset, missing an R. Reported-by: Ido Schimmel Reviewed-by: Ido Schimmel Link: https://lore.kernel.org/r/20230123222224.732338-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netlink.h | 2 +- net/devlink/devl_internal.h | 2 +- net/netfilter/nf_conntrack_netlink.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 38f6334f408c..fa4d86da0ec7 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -263,7 +263,7 @@ struct netlink_callback { }; }; -#define NL_ASSET_DUMP_CTX_FITS(type_name) \ +#define NL_ASSERT_DUMP_CTX_FITS(type_name) \ BUILD_BUG_ON(sizeof(type_name) > \ sizeof_field(struct netlink_callback, ctx)) diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 1aa1a9549549..d0d889038138 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -135,7 +135,7 @@ int devlink_nl_instance_iter_dump(struct sk_buff *msg, static inline struct devlink_nl_dump_state * devlink_dump_state(struct netlink_callback *cb) { - NL_ASSET_DUMP_CTX_FITS(struct devlink_nl_dump_state); + NL_ASSERT_DUMP_CTX_FITS(struct devlink_nl_dump_state); return (struct devlink_nl_dump_state *)cb->ctx; } diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 90672e293e57..308fc0023c7e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -3866,7 +3866,7 @@ static int __init ctnetlink_init(void) { int ret; - NL_ASSET_DUMP_CTX_FITS(struct ctnetlink_list_dump_ctx); + NL_ASSERT_DUMP_CTX_FITS(struct ctnetlink_list_dump_ctx); ret = nfnetlink_subsys_register(&ctnl_subsys); if (ret < 0) { -- cgit v1.2.3 From 90317bcdbd337b9e88f253650f6ab9dfe667be64 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Mon, 23 Jan 2023 18:47:09 +0100 Subject: ipv6: Make ip6_route_output_flags_noref() static. This function is only used in net/ipv6/route.c and has no reason to be visible outside of it. Signed-off-by: Guillaume Nault Reviewed-by: David Ahern Link: https://lore.kernel.org/r/50706db7f675e40b3594d62011d9363dce32b92e.1674495822.git.gnault@redhat.com Signed-off-by: Jakub Kicinski --- include/net/ip6_route.h | 4 ---- net/ipv6/route.c | 8 ++++---- 2 files changed, 4 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 035d61d50a98..81ee387a1fc4 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -84,10 +84,6 @@ struct dst_entry *ip6_route_input_lookup(struct net *net, struct flowi6 *fl6, const struct sk_buff *skb, int flags); -struct dst_entry *ip6_route_output_flags_noref(struct net *net, - const struct sock *sk, - struct flowi6 *fl6, int flags); - struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, struct flowi6 *fl6, int flags); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 76889ceeead9..c180c2ef17c5 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2593,9 +2593,10 @@ INDIRECT_CALLABLE_SCOPE struct rt6_info *ip6_pol_route_output(struct net *net, return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, skb, flags); } -struct dst_entry *ip6_route_output_flags_noref(struct net *net, - const struct sock *sk, - struct flowi6 *fl6, int flags) +static struct dst_entry *ip6_route_output_flags_noref(struct net *net, + const struct sock *sk, + struct flowi6 *fl6, + int flags) { bool any_src; @@ -2624,7 +2625,6 @@ struct dst_entry *ip6_route_output_flags_noref(struct net *net, return fib6_rule_lookup(net, fl6, NULL, flags, ip6_pol_route_output); } -EXPORT_SYMBOL_GPL(ip6_route_output_flags_noref); struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, -- cgit v1.2.3 From 4373a023e0388fc19e27d37f61401bce6ff4c9d7 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 23 Jan 2023 19:52:31 -0800 Subject: devlink: remove a dubious assumption in fmsg dumping Build bot detects that err may be returned uninitialized in devlink_fmsg_prepare_skb(). This is not really true because all fmsgs users should create at least one outer nest, and therefore fmsg can't be completely empty. That said the assumption is not trivial to confirm, so let's follow the bots advice, anyway. This code does not seem to have changed since its inception in commit 1db64e8733f6 ("devlink: Add devlink formatted message (fmsg) API") Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230124035231.787381-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 8eab95cae917..74621287f4e5 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -7116,8 +7116,8 @@ devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb, { struct devlink_fmsg_item *item; struct nlattr *fmsg_nlattr; + int err = 0; int i = 0; - int err; fmsg_nlattr = nla_nest_start_noflag(skb, DEVLINK_ATTR_FMSG); if (!fmsg_nlattr) -- cgit v1.2.3 From c40bff4132e5c1320635ae809d001ccb5598dac6 Mon Sep 17 00:00:00 2001 From: Stefan Raspl Date: Mon, 23 Jan 2023 19:17:45 +0100 Subject: net/smc: Terminate connections prior to device removal Removing an ISM device prior to terminating its associated connections doesn't end well. Signed-off-by: Stefan Raspl Signed-off-by: Jan Karcher Signed-off-by: Wenjia Zhang Signed-off-by: David S. Miller --- net/smc/smc_ism.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 911fe08bc54b..28e1641f990c 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -462,11 +462,11 @@ void smcd_unregister_dev(struct smcd_dev *smcd) { pr_warn_ratelimited("smc: removing smcd device %s\n", dev_name(&smcd->dev)); + smcd->going_away = 1; + smc_smcd_terminate_all(smcd); mutex_lock(&smcd_dev_list.mutex); list_del_init(&smcd->list); mutex_unlock(&smcd_dev_list.mutex); - smcd->going_away = 1; - smc_smcd_terminate_all(smcd); destroy_workqueue(smcd->event_wq); device_del(&smcd->dev); -- cgit v1.2.3 From 89e7d2ba61b742a7525ff06ea4d4378c4a5560d0 Mon Sep 17 00:00:00 2001 From: Stefan Raspl Date: Mon, 23 Jan 2023 19:17:48 +0100 Subject: net/ism: Add new API for client registration Add a new API that allows other drivers to concurrently access ISM devices. To do so, we introduce a new API that allows other modules to register for ISM device usage. Furthermore, we move the GID to struct ism, where it belongs conceptually, and rename and relocate struct smcd_event to struct ism_event. This is the first part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl Signed-off-by: Jan Karcher Signed-off-by: Wenjia Zhang Signed-off-by: David S. Miller --- drivers/s390/net/ism.h | 18 +---- drivers/s390/net/ism_drv.c | 172 ++++++++++++++++++++++++++++++++++++++++++--- include/linux/ism.h | 67 ++++++++++++++++++ include/net/smc.h | 11 +-- net/smc/smc_ism.c | 4 +- 5 files changed, 236 insertions(+), 36 deletions(-) (limited to 'net') diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h index 90af51370183..70c5bbda0fea 100644 --- a/drivers/s390/net/ism.h +++ b/drivers/s390/net/ism.h @@ -16,7 +16,6 @@ */ #define ISM_DMB_WORD_OFFSET 1 #define ISM_DMB_BIT_OFFSET (ISM_DMB_WORD_OFFSET * 32) -#define ISM_NR_DMBS 1920 #define ISM_IDENT_MASK 0x00FFFF #define ISM_REG_SBA 0x1 @@ -178,7 +177,7 @@ struct ism_eq_header { struct ism_eq { struct ism_eq_header header; - struct smcd_event entry[15]; + struct ism_event entry[15]; }; struct ism_sba { @@ -190,21 +189,6 @@ struct ism_sba { u16 dmbe_mask[ISM_NR_DMBS]; }; -struct ism_dev { - spinlock_t lock; - struct pci_dev *pdev; - struct smcd_dev *smcd; - - struct ism_sba *sba; - dma_addr_t sba_dma_addr; - DECLARE_BITMAP(sba_bitmap, ISM_NR_DMBS); - - struct ism_eq *ieq; - dma_addr_t ieq_dma_addr; - - int ieq_idx; -}; - #define ISM_CREATE_REQ(dmb, idx, sf, offset) \ ((dmb) | (idx) << 24 | (sf) << 23 | (offset)) diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index b9f33f411d78..24983224f47e 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -15,9 +15,6 @@ #include #include #include -#include - -#include #include "ism.h" @@ -34,6 +31,84 @@ static const struct pci_device_id ism_device_table[] = { MODULE_DEVICE_TABLE(pci, ism_device_table); static debug_info_t *ism_debug_info; +static const struct smcd_ops ism_ops; + +#define NO_CLIENT 0xff /* must be >= MAX_CLIENTS */ +static struct ism_client *clients[MAX_CLIENTS]; /* use an array rather than */ + /* a list for fast mapping */ +static u8 max_client; +static DEFINE_SPINLOCK(clients_lock); +struct ism_dev_list { + struct list_head list; + struct mutex mutex; /* protects ism device list */ +}; + +static struct ism_dev_list ism_dev_list = { + .list = LIST_HEAD_INIT(ism_dev_list.list), + .mutex = __MUTEX_INITIALIZER(ism_dev_list.mutex), +}; + +int ism_register_client(struct ism_client *client) +{ + struct ism_dev *ism; + unsigned long flags; + int i, rc = -ENOSPC; + + mutex_lock(&ism_dev_list.mutex); + spin_lock_irqsave(&clients_lock, flags); + for (i = 0; i < MAX_CLIENTS; ++i) { + if (!clients[i]) { + clients[i] = client; + client->id = i; + if (i == max_client) + max_client++; + rc = 0; + break; + } + } + spin_unlock_irqrestore(&clients_lock, flags); + if (i < MAX_CLIENTS) { + /* initialize with all devices that we got so far */ + list_for_each_entry(ism, &ism_dev_list.list, list) { + ism->priv[i] = NULL; + client->add(ism); + } + } + mutex_unlock(&ism_dev_list.mutex); + + return rc; +} +EXPORT_SYMBOL_GPL(ism_register_client); + +int ism_unregister_client(struct ism_client *client) +{ + struct ism_dev *ism; + unsigned long flags; + int rc = 0; + + mutex_lock(&ism_dev_list.mutex); + spin_lock_irqsave(&clients_lock, flags); + clients[client->id] = NULL; + if (client->id + 1 == max_client) + max_client--; + spin_unlock_irqrestore(&clients_lock, flags); + list_for_each_entry(ism, &ism_dev_list.list, list) { + for (int i = 0; i < ISM_NR_DMBS; ++i) { + if (ism->sba_client_arr[i] == client->id) { + pr_err("%s: attempt to unregister client '%s'" + "with registered dmb(s)\n", __func__, + client->name); + rc = -EBUSY; + goto out; + } + } + } +out: + mutex_unlock(&ism_dev_list.mutex); + + return rc; +} +EXPORT_SYMBOL_GPL(ism_unregister_client); static int ism_cmd(struct ism_dev *ism, void *cmd) { @@ -193,7 +268,7 @@ static int ism_read_local_gid(struct ism_dev *ism) if (ret) goto out; - ism->smcd->local_gid = cmd.response.gid; + ism->local_gid = cmd.response.gid; out: return ret; } @@ -437,7 +512,8 @@ static u16 ism_get_chid(struct smcd_dev *smcd) static void ism_handle_event(struct ism_dev *ism) { - struct smcd_event *entry; + struct ism_event *entry; + int i; while ((ism->ieq_idx + 1) != READ_ONCE(ism->ieq->header.idx)) { if (++(ism->ieq_idx) == ARRAY_SIZE(ism->ieq->entry)) @@ -445,13 +521,18 @@ static void ism_handle_event(struct ism_dev *ism) entry = &ism->ieq->entry[ism->ieq_idx]; debug_event(ism_debug_info, 2, entry, sizeof(*entry)); - smcd_handle_event(ism->smcd, entry); + spin_lock(&clients_lock); + for (i = 0; i < max_client; ++i) + if (clients[i]) + clients[i]->handle_event(ism, entry); + spin_unlock(&clients_lock); } } static irqreturn_t ism_handle_irq(int irq, void *data) { struct ism_dev *ism = data; + struct ism_client *clt; unsigned long bit, end; unsigned long *bv; u16 dmbemask; @@ -471,7 +552,8 @@ static irqreturn_t ism_handle_irq(int irq, void *data) dmbemask = ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET]; ism->sba->dmbe_mask[bit + ISM_DMB_BIT_OFFSET] = 0; barrier(); - smcd_handle_irq(ism->smcd, bit + ISM_DMB_BIT_OFFSET, dmbemask); + clt = clients[ism->sba_client_arr[bit]]; + clt->handle_irq(ism, bit + ISM_DMB_BIT_OFFSET, dmbemask); } if (ism->sba->e) { @@ -497,10 +579,21 @@ static const struct smcd_ops ism_ops = { .get_chid = ism_get_chid, }; +static void ism_dev_add_work_func(struct work_struct *work) +{ + struct ism_client *client = container_of(work, struct ism_client, + add_work); + + client->add(client->tgt_ism); + atomic_dec(&client->tgt_ism->add_dev_cnt); + wake_up(&client->tgt_ism->waitq); +} + static int ism_dev_init(struct ism_dev *ism) { struct pci_dev *pdev = ism->pdev; - int ret; + unsigned long flags; + int i, ret; ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI); if (ret <= 0) @@ -527,6 +620,28 @@ static int ism_dev_init(struct ism_dev *ism) /* hardware is V2 capable */ ism_create_system_eid(); + init_waitqueue_head(&ism->waitq); + atomic_set(&ism->free_clients_cnt, 0); + atomic_set(&ism->add_dev_cnt, 0); + + wait_event(ism->waitq, !atomic_read(&ism->add_dev_cnt)); + spin_lock_irqsave(&clients_lock, flags); + for (i = 0; i < max_client; ++i) + if (clients[i]) { + INIT_WORK(&clients[i]->add_work, + ism_dev_add_work_func); + clients[i]->tgt_ism = ism; + atomic_inc(&ism->add_dev_cnt); + schedule_work(&clients[i]->add_work); + } + spin_unlock_irqrestore(&clients_lock, flags); + + wait_event(ism->waitq, !atomic_read(&ism->add_dev_cnt)); + + mutex_lock(&ism_dev_list.mutex); + list_add(&ism->list, &ism_dev_list.list); + mutex_unlock(&ism_dev_list.mutex); + ret = smcd_register_dev(ism->smcd); if (ret) goto unreg_ieq; @@ -602,9 +717,36 @@ err: return ret; } +static void ism_dev_remove_work_func(struct work_struct *work) +{ + struct ism_client *client = container_of(work, struct ism_client, + remove_work); + + client->remove(client->tgt_ism); + atomic_dec(&client->tgt_ism->free_clients_cnt); + wake_up(&client->tgt_ism->waitq); +} + +/* Callers must hold ism_dev_list.mutex */ static void ism_dev_exit(struct ism_dev *ism) { struct pci_dev *pdev = ism->pdev; + unsigned long flags; + int i; + + wait_event(ism->waitq, !atomic_read(&ism->free_clients_cnt)); + spin_lock_irqsave(&clients_lock, flags); + for (i = 0; i < max_client; ++i) + if (clients[i]) { + INIT_WORK(&clients[i]->remove_work, + ism_dev_remove_work_func); + clients[i]->tgt_ism = ism; + atomic_inc(&ism->free_clients_cnt); + schedule_work(&clients[i]->remove_work); + } + spin_unlock_irqrestore(&clients_lock, flags); + + wait_event(ism->waitq, !atomic_read(&ism->free_clients_cnt)); smcd_unregister_dev(ism->smcd); if (SYSTEM_EID.serial_number[0] != '0' || @@ -614,18 +756,22 @@ static void ism_dev_exit(struct ism_dev *ism) unregister_sba(ism); free_irq(pci_irq_vector(pdev, 0), ism); pci_free_irq_vectors(pdev); + list_del_init(&ism->list); } static void ism_remove(struct pci_dev *pdev) { struct ism_dev *ism = dev_get_drvdata(&pdev->dev); + mutex_lock(&ism_dev_list.mutex); ism_dev_exit(ism); + mutex_unlock(&ism_dev_list.mutex); smcd_free_dev(ism->smcd); pci_clear_master(pdev); pci_release_mem_regions(pdev); pci_disable_device(pdev); + device_del(&ism->dev); dev_set_drvdata(&pdev->dev, NULL); kfree(ism); } @@ -645,6 +791,8 @@ static int __init ism_init(void) if (!ism_debug_info) return -ENODEV; + memset(clients, 0, sizeof(clients)); + max_client = 0; debug_register_view(ism_debug_info, &debug_hex_ascii_view); ret = pci_register_driver(&ism_driver); if (ret) @@ -655,6 +803,14 @@ static int __init ism_init(void) static void __exit ism_exit(void) { + struct ism_dev *ism; + + mutex_lock(&ism_dev_list.mutex); + list_for_each_entry(ism, &ism_dev_list.list, list) { + ism_dev_exit(ism); + } + mutex_unlock(&ism_dev_list.mutex); + pci_unregister_driver(&ism_driver); debug_unregister(ism_debug_info); } diff --git a/include/linux/ism.h b/include/linux/ism.h index 69bfbf0faaa1..55c8ad306928 100644 --- a/include/linux/ism.h +++ b/include/linux/ism.h @@ -9,6 +9,8 @@ #ifndef _ISM_H #define _ISM_H +#include + struct ism_dmb { u64 dmb_tok; u64 rgid; @@ -20,4 +22,69 @@ struct ism_dmb { dma_addr_t dma_addr; }; +/* Unless we gain unexpected popularity, this limit should hold for a while */ +#define MAX_CLIENTS 8 +#define ISM_NR_DMBS 1920 + +struct ism_dev { + spinlock_t lock; /* protects the ism device */ + struct list_head list; + struct pci_dev *pdev; + struct smcd_dev *smcd; + + struct ism_sba *sba; + dma_addr_t sba_dma_addr; + DECLARE_BITMAP(sba_bitmap, ISM_NR_DMBS); + u8 *sba_client_arr; /* entries are indices into 'clients' array */ + void *priv[MAX_CLIENTS]; + + struct ism_eq *ieq; + dma_addr_t ieq_dma_addr; + + struct device dev; + u64 local_gid; + int ieq_idx; + + atomic_t free_clients_cnt; + atomic_t add_dev_cnt; + wait_queue_head_t waitq; +}; + +struct ism_event { + u32 type; + u32 code; + u64 tok; + u64 time; + u64 info; +}; + +struct ism_client { + const char *name; + void (*add)(struct ism_dev *dev); + void (*remove)(struct ism_dev *dev); + void (*handle_event)(struct ism_dev *dev, struct ism_event *event); + /* Parameter dmbemask contains a bit vector with updated DMBEs, if sent + * via ism_move_data(). Callback function must handle all active bits + * indicated by dmbemask. + */ + void (*handle_irq)(struct ism_dev *dev, unsigned int bit, u16 dmbemask); + /* Private area - don't touch! */ + struct work_struct remove_work; + struct work_struct add_work; + struct ism_dev *tgt_ism; + u8 id; +}; + +int ism_register_client(struct ism_client *client); +int ism_unregister_client(struct ism_client *client); +static inline void *ism_get_priv(struct ism_dev *dev, + struct ism_client *client) { + return dev->priv[client->id]; +} + +static inline void ism_set_priv(struct ism_dev *dev, struct ism_client *client, + void *priv) { + dev->priv[client->id] = priv; +} + #endif /* _ISM_H */ diff --git a/include/net/smc.h b/include/net/smc.h index c926d3313e05..98689b16b841 100644 --- a/include/net/smc.h +++ b/include/net/smc.h @@ -15,6 +15,7 @@ #include #include #include +#include "linux/ism.h" struct sock; @@ -48,14 +49,6 @@ struct smcd_dmb { #define ISM_ERROR 0xFFFF -struct smcd_event { - u32 type; - u32 code; - u64 tok; - u64 time; - u64 info; -}; - struct smcd_dev; struct smcd_ops { @@ -100,6 +93,6 @@ struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, int smcd_register_dev(struct smcd_dev *smcd); void smcd_unregister_dev(struct smcd_dev *smcd); void smcd_free_dev(struct smcd_dev *smcd); -void smcd_handle_event(struct smcd_dev *dev, struct smcd_event *event); +void smcd_handle_event(struct smcd_dev *dev, struct ism_event *event); void smcd_handle_irq(struct smcd_dev *dev, unsigned int bit, u16 dmbemask); #endif /* _SMC_H */ diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 28e1641f990c..215409889872 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -296,7 +296,7 @@ int smcd_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb) struct smc_ism_event_work { struct work_struct work; struct smcd_dev *smcd; - struct smcd_event event; + struct ism_event event; }; #define ISM_EVENT_REQUEST 0x0001 @@ -490,7 +490,7 @@ EXPORT_SYMBOL_GPL(smcd_free_dev); * Context: * - Function called in IRQ context from ISM device driver event handler. */ -void smcd_handle_event(struct smcd_dev *smcd, struct smcd_event *event) +void smcd_handle_event(struct smcd_dev *smcd, struct ism_event *event) { struct smc_ism_event_work *wrk; -- cgit v1.2.3 From 8747716f3942a610efdd12e3655df47269c268ac Mon Sep 17 00:00:00 2001 From: Stefan Raspl Date: Mon, 23 Jan 2023 19:17:49 +0100 Subject: net/smc: Register SMC-D as ISM client Register the smc module with the new ism device driver API. This is the second part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl Signed-off-by: Jan Karcher Signed-off-by: Wenjia Zhang Signed-off-by: David S. Miller --- drivers/s390/net/ism_drv.c | 5 --- include/net/smc.h | 5 +-- net/smc/af_smc.c | 8 +++-- net/smc/smc_core.c | 1 + net/smc/smc_ism.c | 82 +++++++++++++++++++++++++++++++++------------- net/smc/smc_ism.h | 3 +- 6 files changed, 69 insertions(+), 35 deletions(-) (limited to 'net') diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index 24983224f47e..f35c6077db04 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -642,10 +642,6 @@ static int ism_dev_init(struct ism_dev *ism) list_add(&ism->list, &ism_dev_list.list); mutex_unlock(&ism_dev_list.mutex); - ret = smcd_register_dev(ism->smcd); - if (ret) - goto unreg_ieq; - query_info(ism); return 0; @@ -748,7 +744,6 @@ static void ism_dev_exit(struct ism_dev *ism) wait_event(ism->waitq, !atomic_read(&ism->free_clients_cnt)); - smcd_unregister_dev(ism->smcd); if (SYSTEM_EID.serial_number[0] != '0' || SYSTEM_EID.type[0] != '0') ism_del_vlan_id(ism->smcd, ISM_RESERVED_VLANID); diff --git a/include/net/smc.h b/include/net/smc.h index 98689b16b841..151aa54d9ad2 100644 --- a/include/net/smc.h +++ b/include/net/smc.h @@ -90,9 +90,6 @@ struct smcd_dev { struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, const struct smcd_ops *ops, int max_dmbs); -int smcd_register_dev(struct smcd_dev *smcd); -void smcd_unregister_dev(struct smcd_dev *smcd); void smcd_free_dev(struct smcd_dev *smcd); -void smcd_handle_event(struct smcd_dev *dev, struct ism_event *event); -void smcd_handle_irq(struct smcd_dev *dev, unsigned int bit, u16 dmbemask); + #endif /* _SMC_H */ diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index e12d4fa5aece..5d037714ab78 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3382,12 +3382,14 @@ static int __init smc_init(void) if (rc) goto out_pernet_subsys; - smc_ism_init(); + rc = smc_ism_init(); + if (rc) + goto out_pernet_subsys_stat; smc_clc_init(); rc = smc_nl_init(); if (rc) - goto out_pernet_subsys_stat; + goto out_ism; rc = smc_pnet_init(); if (rc) @@ -3480,6 +3482,8 @@ out_pnet: smc_pnet_exit(); out_nl: smc_nl_exit(); +out_ism: + smc_ism_exit(); out_pernet_subsys_stat: unregister_pernet_subsys(&smc_net_stat_ops); out_pernet_subsys: diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index c305d8dd23f8..e15ee084cc5a 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -2595,6 +2595,7 @@ static int smc_core_reboot_event(struct notifier_block *this, { smc_lgrs_shutdown(); smc_ib_unregister_client(); + smc_ism_exit(); return 0; } diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 215409889872..6d31e9bbc5f9 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -17,6 +17,7 @@ #include "smc_ism.h" #include "smc_pnet.h" #include "smc_netlink.h" +#include "linux/ism.h" struct smcd_dev_list smcd_dev_list = { .list = LIST_HEAD_INIT(smcd_dev_list.list), @@ -26,6 +27,20 @@ struct smcd_dev_list smcd_dev_list = { static bool smc_ism_v2_capable; static u8 smc_ism_v2_system_eid[SMC_MAX_EID_LEN]; +static void smcd_register_dev(struct ism_dev *ism); +static void smcd_unregister_dev(struct ism_dev *ism); +static void smcd_handle_event(struct ism_dev *ism, struct ism_event *event); +static void smcd_handle_irq(struct ism_dev *ism, unsigned int dmbno, + u16 dmbemask); + +static struct ism_client smc_ism_client = { + .name = "SMC-D", + .add = smcd_register_dev, + .remove = smcd_unregister_dev, + .handle_event = smcd_handle_event, + .handle_irq = smcd_handle_irq, +}; + /* Test if an ISM communication is possible - same CPC */ int smc_ism_cantalk(u64 peer_gid, unsigned short vlan_id, struct smcd_dev *smcd) { @@ -409,8 +424,6 @@ struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, device_initialize(&smcd->dev); dev_set_name(&smcd->dev, name); smcd->ops = ops; - if (smc_pnetid_by_dev_port(parent, 0, smcd->pnetid)) - smc_pnetid_by_table_smcd(smcd); spin_lock_init(&smcd->lock); spin_lock_init(&smcd->lgr_lock); @@ -421,9 +434,25 @@ struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, } EXPORT_SYMBOL_GPL(smcd_alloc_dev); -int smcd_register_dev(struct smcd_dev *smcd) +void smcd_free_dev(struct smcd_dev *smcd) { - int rc; + put_device(&smcd->dev); +} +EXPORT_SYMBOL_GPL(smcd_free_dev); + +static void smcd_register_dev(struct ism_dev *ism) +{ + const struct smcd_ops *ops = NULL; + struct smcd_dev *smcd; + + smcd = smcd_alloc_dev(&ism->pdev->dev, dev_name(&ism->pdev->dev), ops, + ISM_NR_DMBS); + if (!smcd) + return; + smcd->priv = ism; + ism_set_priv(ism, &smc_ism_client, smcd); + if (smc_pnetid_by_dev_port(&ism->pdev->dev, 0, smcd->pnetid)) + smc_pnetid_by_table_smcd(smcd); mutex_lock(&smcd_dev_list.mutex); if (list_empty(&smcd_dev_list.list)) { @@ -447,19 +476,20 @@ int smcd_register_dev(struct smcd_dev *smcd) dev_name(&smcd->dev), smcd->pnetid, smcd->pnetid_by_user ? " (user defined)" : ""); - rc = device_add(&smcd->dev); - if (rc) { + if (device_add(&smcd->dev)) { mutex_lock(&smcd_dev_list.mutex); list_del(&smcd->list); mutex_unlock(&smcd_dev_list.mutex); + smcd_free_dev(smcd); } - return rc; + return; } -EXPORT_SYMBOL_GPL(smcd_register_dev); -void smcd_unregister_dev(struct smcd_dev *smcd) +static void smcd_unregister_dev(struct ism_dev *ism) { + struct smcd_dev *smcd = ism_get_priv(ism, &smc_ism_client); + pr_warn_ratelimited("smc: removing smcd device %s\n", dev_name(&smcd->dev)); smcd->going_away = 1; @@ -471,16 +501,9 @@ void smcd_unregister_dev(struct smcd_dev *smcd) device_del(&smcd->dev); } -EXPORT_SYMBOL_GPL(smcd_unregister_dev); - -void smcd_free_dev(struct smcd_dev *smcd) -{ - put_device(&smcd->dev); -} -EXPORT_SYMBOL_GPL(smcd_free_dev); /* SMCD Device event handler. Called from ISM device interrupt handler. - * Parameters are smcd device pointer, + * Parameters are ism device pointer, * - event->type (0 --> DMB, 1 --> GID), * - event->code (event code), * - event->tok (either DMB token when event type 0, or GID when event type 1) @@ -490,8 +513,9 @@ EXPORT_SYMBOL_GPL(smcd_free_dev); * Context: * - Function called in IRQ context from ISM device driver event handler. */ -void smcd_handle_event(struct smcd_dev *smcd, struct ism_event *event) +static void smcd_handle_event(struct ism_dev *ism, struct ism_event *event) { + struct smcd_dev *smcd = ism_get_priv(ism, &smc_ism_client); struct smc_ism_event_work *wrk; if (smcd->going_away) @@ -505,17 +529,18 @@ void smcd_handle_event(struct smcd_dev *smcd, struct ism_event *event) wrk->event = *event; queue_work(smcd->event_wq, &wrk->work); } -EXPORT_SYMBOL_GPL(smcd_handle_event); /* SMCD Device interrupt handler. Called from ISM device interrupt handler. - * Parameters are smcd device pointer, DMB number, and the DMBE bitmask. + * Parameters are the ism device pointer, DMB number, and the DMBE bitmask. * Find the connection and schedule the tasklet for this connection. * * Context: * - Function called in IRQ context from ISM device driver IRQ handler. */ -void smcd_handle_irq(struct smcd_dev *smcd, unsigned int dmbno, u16 dmbemask) +static void smcd_handle_irq(struct ism_dev *ism, unsigned int dmbno, + u16 dmbemask) { + struct smcd_dev *smcd = ism_get_priv(ism, &smc_ism_client); struct smc_connection *conn = NULL; unsigned long flags; @@ -525,10 +550,21 @@ void smcd_handle_irq(struct smcd_dev *smcd, unsigned int dmbno, u16 dmbemask) tasklet_schedule(&conn->rx_tsklet); spin_unlock_irqrestore(&smcd->lock, flags); } -EXPORT_SYMBOL_GPL(smcd_handle_irq); -void __init smc_ism_init(void) +int smc_ism_init(void) { smc_ism_v2_capable = false; memset(smc_ism_v2_system_eid, 0, SMC_MAX_EID_LEN); +#if IS_ENABLED(CONFIG_ISM) + return ism_register_client(&smc_ism_client); +#else + return 0; +#endif +} + +void smc_ism_exit(void) +{ +#if IS_ENABLED(CONFIG_ISM) + ism_unregister_client(&smc_ism_client); +#endif } diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h index d6b2db604fe8..832b2f42d79f 100644 --- a/net/smc/smc_ism.h +++ b/net/smc/smc_ism.h @@ -42,7 +42,8 @@ int smc_ism_signal_shutdown(struct smc_link_group *lgr); void smc_ism_get_system_eid(u8 **eid); u16 smc_ism_get_chid(struct smcd_dev *dev); bool smc_ism_is_v2_capable(void); -void smc_ism_init(void); +int smc_ism_init(void); +void smc_ism_exit(void); int smcd_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); static inline int smc_ism_write(struct smcd_dev *smcd, u64 dmb_tok, -- cgit v1.2.3 From 9de4df7b6be1cfca500f8ba21137d53eec45418a Mon Sep 17 00:00:00 2001 From: Stefan Raspl Date: Mon, 23 Jan 2023 19:17:50 +0100 Subject: net/smc: Separate SMC-D and ISM APIs We separate the code implementing the struct smcd_ops API in the ISM device driver from the functions that may be used by other exploiters of ISM devices. Note: We start out small, and don't offer the whole breadth of the ISM device for public use, as many functions are specific to or likely only ever used in the context of SMC-D. This is the third part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl Signed-off-by: Jan Karcher Signed-off-by: Wenjia Zhang Signed-off-by: David S. Miller --- drivers/s390/net/ism_drv.c | 92 +++++++++++++++++++++++++++++++--------------- include/linux/ism.h | 7 ++++ include/net/smc.h | 3 +- net/smc/smc_clc.c | 11 ++++-- net/smc/smc_core.c | 6 ++- net/smc/smc_diag.c | 3 +- 6 files changed, 86 insertions(+), 36 deletions(-) (limited to 'net') diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index f35c6077db04..e6c810a96b24 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -273,10 +273,9 @@ out: return ret; } -static int ism_query_rgid(struct smcd_dev *smcd, u64 rgid, u32 vid_valid, +static int ism_query_rgid(struct ism_dev *ism, u64 rgid, u32 vid_valid, u32 vid) { - struct ism_dev *ism = smcd->priv; union ism_query_rgid cmd; memset(&cmd, 0, sizeof(cmd)); @@ -290,6 +289,11 @@ static int ism_query_rgid(struct smcd_dev *smcd, u64 rgid, u32 vid_valid, return ism_cmd(ism, &cmd); } +static int smcd_query_rgid(struct smcd_dev *smcd, u64 rgid, u32 vid_valid, u32 vid) +{ + return ism_query_rgid(smcd->priv, rgid, vid_valid, vid); +} + static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { clear_bit(dmb->sba_idx, ism->sba_bitmap); @@ -326,9 +330,9 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb) return dmb->cpu_addr ? 0 : -ENOMEM; } -static int ism_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb) +int ism_register_dmb(struct ism_dev *ism, struct ism_dmb *dmb, + struct ism_client *client) { - struct ism_dev *ism = smcd->priv; union ism_reg_dmb cmd; int ret; @@ -353,18 +357,19 @@ static int ism_register_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb) goto out; } dmb->dmb_tok = cmd.response.dmb_tok; + ism->sba_client_arr[dmb->sba_idx - ISM_DMB_BIT_OFFSET] = client->id; out: return ret; } +EXPORT_SYMBOL_GPL(ism_register_dmb); static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb) { - return ism_register_dmb(smcd, (struct ism_dmb *)dmb); + return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, NULL); } -static int ism_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb) +int ism_unregister_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { - struct ism_dev *ism = smcd->priv; union ism_unreg_dmb cmd; int ret; @@ -374,6 +379,8 @@ static int ism_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb) cmd.request.dmb_tok = dmb->dmb_tok; + ism->sba_client_arr[dmb->sba_idx - ISM_DMB_BIT_OFFSET] = NO_CLIENT; + ret = ism_cmd(ism, &cmd); if (ret && ret != ISM_ERROR) goto out; @@ -382,15 +389,15 @@ static int ism_unregister_dmb(struct smcd_dev *smcd, struct ism_dmb *dmb) out: return ret; } +EXPORT_SYMBOL_GPL(ism_unregister_dmb); static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb) { - return ism_unregister_dmb(smcd, (struct ism_dmb *)dmb); + return ism_unregister_dmb(smcd->priv, (struct ism_dmb *)dmb); } -static int ism_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id) +static int ism_add_vlan_id(struct ism_dev *ism, u64 vlan_id) { - struct ism_dev *ism = smcd->priv; union ism_set_vlan_id cmd; memset(&cmd, 0, sizeof(cmd)); @@ -402,9 +409,13 @@ static int ism_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id) return ism_cmd(ism, &cmd); } -static int ism_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id) +static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id) +{ + return ism_add_vlan_id(smcd->priv, vlan_id); +} + +static int ism_del_vlan_id(struct ism_dev *ism, u64 vlan_id) { - struct ism_dev *ism = smcd->priv; union ism_set_vlan_id cmd; memset(&cmd, 0, sizeof(cmd)); @@ -416,6 +427,11 @@ static int ism_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id) return ism_cmd(ism, &cmd); } +static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id) +{ + return ism_del_vlan_id(smcd->priv, vlan_id); +} + static int ism_set_vlan_required(struct smcd_dev *smcd) { return ism_cmd_simple(smcd->priv, ISM_SET_VLAN); @@ -426,8 +442,8 @@ static int ism_reset_vlan_required(struct smcd_dev *smcd) return ism_cmd_simple(smcd->priv, ISM_RESET_VLAN); } -static int ism_signal_ieq(struct smcd_dev *smcd, u64 rgid, u32 trigger_irq, - u32 event_code, u64 info) +static int smcd_signal_ieq(struct smcd_dev *smcd, u64 rgid, u32 trigger_irq, + u32 event_code, u64 info) { struct ism_dev *ism = smcd->priv; union ism_sig_ieq cmd; @@ -450,8 +466,9 @@ static unsigned int max_bytes(unsigned int start, unsigned int len, return min(boundary - (start & (boundary - 1)), len); } -static int ism_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx, - bool sf, unsigned int offset, void *data, unsigned int size) +static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx, + bool sf, unsigned int offset, void *data, + unsigned int size) { struct ism_dev *ism = smcd->priv; unsigned int bytes; @@ -495,14 +512,15 @@ static void ism_create_system_eid(void) memcpy(&SYSTEM_EID.type, tmp, 4); } -static u8 *ism_get_system_eid(void) +u8 *ism_get_seid(void) { return SYSTEM_EID.seid_string; } +EXPORT_SYMBOL_GPL(ism_get_seid); -static u16 ism_get_chid(struct smcd_dev *smcd) +static u16 smcd_get_chid(struct smcd_dev *smcd) { - struct ism_dev *ism = (struct ism_dev *)smcd->priv; + struct ism_dev *ism = smcd->priv; if (!ism || !ism->pdev) return 0; @@ -565,18 +583,26 @@ static irqreturn_t ism_handle_irq(int irq, void *data) return IRQ_HANDLED; } +static u64 smcd_get_local_gid(struct smcd_dev *smcd) +{ + struct ism_dev *ism = smcd->priv; + + return ism->local_gid; +} + static const struct smcd_ops ism_ops = { - .query_remote_gid = ism_query_rgid, + .query_remote_gid = smcd_query_rgid, .register_dmb = smcd_register_dmb, .unregister_dmb = smcd_unregister_dmb, - .add_vlan_id = ism_add_vlan_id, - .del_vlan_id = ism_del_vlan_id, + .add_vlan_id = smcd_add_vlan_id, + .del_vlan_id = smcd_del_vlan_id, .set_vlan_required = ism_set_vlan_required, .reset_vlan_required = ism_reset_vlan_required, - .signal_event = ism_signal_ieq, - .move_data = ism_move, - .get_system_eid = ism_get_system_eid, - .get_chid = ism_get_chid, + .signal_event = smcd_signal_ieq, + .move_data = smcd_move, + .get_system_eid = ism_get_seid, + .get_local_gid = smcd_get_local_gid, + .get_chid = smcd_get_chid, }; static void ism_dev_add_work_func(struct work_struct *work) @@ -599,10 +625,15 @@ static int ism_dev_init(struct ism_dev *ism) if (ret <= 0) goto out; + ism->sba_client_arr = kzalloc(ISM_NR_DMBS, GFP_KERNEL); + if (!ism->sba_client_arr) + goto free_vectors; + memset(ism->sba_client_arr, NO_CLIENT, ISM_NR_DMBS); + ret = request_irq(pci_irq_vector(pdev, 0), ism_handle_irq, 0, pci_name(pdev), ism); if (ret) - goto free_vectors; + goto free_client_arr; ret = register_sba(ism); if (ret) @@ -616,7 +647,7 @@ static int ism_dev_init(struct ism_dev *ism) if (ret) goto unreg_ieq; - if (!ism_add_vlan_id(ism->smcd, ISM_RESERVED_VLANID)) + if (!ism_add_vlan_id(ism, ISM_RESERVED_VLANID)) /* hardware is V2 capable */ ism_create_system_eid(); @@ -651,6 +682,8 @@ unreg_sba: unregister_sba(ism); free_irq: free_irq(pci_irq_vector(pdev, 0), ism); +free_client_arr: + kfree(ism->sba_client_arr); free_vectors: pci_free_irq_vectors(pdev); out: @@ -746,10 +779,11 @@ static void ism_dev_exit(struct ism_dev *ism) if (SYSTEM_EID.serial_number[0] != '0' || SYSTEM_EID.type[0] != '0') - ism_del_vlan_id(ism->smcd, ISM_RESERVED_VLANID); + ism_del_vlan_id(ism, ISM_RESERVED_VLANID); unregister_ieq(ism); unregister_sba(ism); free_irq(pci_irq_vector(pdev, 0), ism); + kfree(ism->sba_client_arr); pci_free_irq_vectors(pdev); list_del_init(&ism->list); } diff --git a/include/linux/ism.h b/include/linux/ism.h index 55c8ad306928..bdd29e08d4fe 100644 --- a/include/linux/ism.h +++ b/include/linux/ism.h @@ -87,4 +87,11 @@ static inline void ism_set_priv(struct ism_dev *dev, struct ism_client *client, dev->priv[client->id] = priv; } +int ism_register_dmb(struct ism_dev *dev, struct ism_dmb *dmb, + struct ism_client *client); +int ism_unregister_dmb(struct ism_dev *dev, struct ism_dmb *dmb); +int ism_move(struct ism_dev *dev, u64 dmb_tok, unsigned int idx, bool sf, + unsigned int offset, void *data, unsigned int size); +u8 *ism_get_seid(void); + #endif /* _ISM_H */ diff --git a/include/net/smc.h b/include/net/smc.h index 151aa54d9ad2..d5f8f18169d7 100644 --- a/include/net/smc.h +++ b/include/net/smc.h @@ -66,14 +66,15 @@ struct smcd_ops { bool sf, unsigned int offset, void *data, unsigned int size); u8* (*get_system_eid)(void); + u64 (*get_local_gid)(struct smcd_dev *dev); u16 (*get_chid)(struct smcd_dev *dev); }; struct smcd_dev { const struct smcd_ops *ops; struct device dev; + struct ism_dev *ism; void *priv; - u64 local_gid; struct list_head list; spinlock_t lock; struct smc_connection **conn; diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index dfb9797f7bc6..b9b8b07aa702 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -813,6 +813,7 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini) struct smc_clc_v2_extension *v2_ext; struct smc_clc_msg_smcd *pclc_smcd; struct smc_clc_msg_trail *trl; + struct smcd_dev *smcd; int len, i, plen, rc; int reason_code = 0; struct kvec vec[8]; @@ -868,7 +869,9 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini) if (smcd_indicated(ini->smc_type_v1)) { /* add SMC-D specifics */ if (ini->ism_dev[0]) { - pclc_smcd->ism.gid = htonll(ini->ism_dev[0]->local_gid); + smcd = ini->ism_dev[0]; + pclc_smcd->ism.gid = + htonll(smcd->ops->get_local_gid(smcd)); pclc_smcd->ism.chid = htons(smc_ism_get_chid(ini->ism_dev[0])); } @@ -914,8 +917,9 @@ int smc_clc_send_proposal(struct smc_sock *smc, struct smc_init_info *ini) plen += sizeof(*smcd_v2_ext); if (ini->ism_offered_cnt) { for (i = 1; i <= ini->ism_offered_cnt; i++) { + smcd = ini->ism_dev[i]; gidchids[i - 1].gid = - htonll(ini->ism_dev[i]->local_gid); + htonll(smcd->ops->get_local_gid(smcd)); gidchids[i - 1].chid = htons(smc_ism_get_chid(ini->ism_dev[i])); } @@ -1000,7 +1004,8 @@ static int smc_clc_send_confirm_accept(struct smc_sock *smc, memcpy(clc->hdr.eyecatcher, SMCD_EYECATCHER, sizeof(SMCD_EYECATCHER)); clc->hdr.typev1 = SMC_TYPE_D; - clc->d0.gid = conn->lgr->smcd->local_gid; + clc->d0.gid = + conn->lgr->smcd->ops->get_local_gid(conn->lgr->smcd); clc->d0.token = conn->rmb_desc->token; clc->d0.dmbe_size = conn->rmbe_size_short; clc->d0.dmbe_idx = 0; diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index e15ee084cc5a..ec04966e9bf9 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -500,6 +500,7 @@ static int smc_nl_fill_smcd_lgr(struct smc_link_group *lgr, struct netlink_callback *cb) { char smc_pnet[SMC_MAX_PNETID_LEN + 1]; + struct smcd_dev *smcd = lgr->smcd; struct nlattr *attrs; void *nlh; @@ -515,8 +516,9 @@ static int smc_nl_fill_smcd_lgr(struct smc_link_group *lgr, if (nla_put_u32(skb, SMC_NLA_LGR_D_ID, *((u32 *)&lgr->id))) goto errattr; - if (nla_put_u64_64bit(skb, SMC_NLA_LGR_D_GID, lgr->smcd->local_gid, - SMC_NLA_LGR_D_PAD)) + if (nla_put_u64_64bit(skb, SMC_NLA_LGR_D_GID, + smcd->ops->get_local_gid(smcd), + SMC_NLA_LGR_D_PAD)) goto errattr; if (nla_put_u64_64bit(skb, SMC_NLA_LGR_D_PEER_GID, lgr->peer_gid, SMC_NLA_LGR_D_PAD)) diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 80ea7d954ece..7ff2152971a5 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -167,12 +167,13 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, !list_empty(&smc->conn.lgr->list)) { struct smc_connection *conn = &smc->conn; struct smcd_diag_dmbinfo dinfo; + struct smcd_dev *smcd = conn->lgr->smcd; memset(&dinfo, 0, sizeof(dinfo)); dinfo.linkid = *((u32 *)conn->lgr->id); dinfo.peer_gid = conn->lgr->peer_gid; - dinfo.my_gid = conn->lgr->smcd->local_gid; + dinfo.my_gid = smcd->ops->get_local_gid(smcd); dinfo.token = conn->rmb_desc->token; dinfo.peer_token = conn->peer_token; -- cgit v1.2.3 From 820f21009f1bc7a69e28752f6c6d9544401ca526 Mon Sep 17 00:00:00 2001 From: Stefan Raspl Date: Mon, 23 Jan 2023 19:17:51 +0100 Subject: s390/ism: Consolidate SMC-D-related code The ism module had SMC-D-specific code sprinkled across the entire module. We are now consolidating the SMC-D-specific parts into the latter parts of the module, so it becomes more clear what code is intended for use with ISM, and which parts are glue code for usage in the context of SMC-D. This is the fourth part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl Signed-off-by: Jan Karcher Signed-off-by: Wenjia Zhang Signed-off-by: David S. Miller --- drivers/s390/net/ism_drv.c | 162 +++++++++++++++++++++++++++------------------ include/linux/ism.h | 2 + include/net/smc.h | 5 +- net/smc/smc_ism.c | 63 +++++++++++------- 4 files changed, 143 insertions(+), 89 deletions(-) (limited to 'net') diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index e6c810a96b24..73c8f42a22a7 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -289,11 +289,6 @@ static int ism_query_rgid(struct ism_dev *ism, u64 rgid, u32 vid_valid, return ism_cmd(ism, &cmd); } -static int smcd_query_rgid(struct smcd_dev *smcd, u64 rgid, u32 vid_valid, u32 vid) -{ - return ism_query_rgid(smcd->priv, rgid, vid_valid, vid); -} - static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { clear_bit(dmb->sba_idx, ism->sba_bitmap); @@ -363,11 +358,6 @@ out: } EXPORT_SYMBOL_GPL(ism_register_dmb); -static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb) -{ - return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, NULL); -} - int ism_unregister_dmb(struct ism_dev *ism, struct ism_dmb *dmb) { union ism_unreg_dmb cmd; @@ -391,11 +381,6 @@ out: } EXPORT_SYMBOL_GPL(ism_unregister_dmb); -static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb) -{ - return ism_unregister_dmb(smcd->priv, (struct ism_dmb *)dmb); -} - static int ism_add_vlan_id(struct ism_dev *ism, u64 vlan_id) { union ism_set_vlan_id cmd; @@ -409,11 +394,6 @@ static int ism_add_vlan_id(struct ism_dev *ism, u64 vlan_id) return ism_cmd(ism, &cmd); } -static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id) -{ - return ism_add_vlan_id(smcd->priv, vlan_id); -} - static int ism_del_vlan_id(struct ism_dev *ism, u64 vlan_id) { union ism_set_vlan_id cmd; @@ -427,25 +407,9 @@ static int ism_del_vlan_id(struct ism_dev *ism, u64 vlan_id) return ism_cmd(ism, &cmd); } -static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id) -{ - return ism_del_vlan_id(smcd->priv, vlan_id); -} - -static int ism_set_vlan_required(struct smcd_dev *smcd) +static int ism_signal_ieq(struct ism_dev *ism, u64 rgid, u32 trigger_irq, + u32 event_code, u64 info) { - return ism_cmd_simple(smcd->priv, ISM_SET_VLAN); -} - -static int ism_reset_vlan_required(struct smcd_dev *smcd) -{ - return ism_cmd_simple(smcd->priv, ISM_RESET_VLAN); -} - -static int smcd_signal_ieq(struct smcd_dev *smcd, u64 rgid, u32 trigger_irq, - u32 event_code, u64 info) -{ - struct ism_dev *ism = smcd->priv; union ism_sig_ieq cmd; memset(&cmd, 0, sizeof(cmd)); @@ -466,11 +430,9 @@ static unsigned int max_bytes(unsigned int start, unsigned int len, return min(boundary - (start & (boundary - 1)), len); } -static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx, - bool sf, unsigned int offset, void *data, - unsigned int size) +int ism_move(struct ism_dev *ism, u64 dmb_tok, unsigned int idx, bool sf, + unsigned int offset, void *data, unsigned int size) { - struct ism_dev *ism = smcd->priv; unsigned int bytes; u64 dmb_req; int ret; @@ -491,6 +453,7 @@ static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx, return 0; } +EXPORT_SYMBOL_GPL(ism_move); static struct ism_systemeid SYSTEM_EID = { .seid_string = "IBM-SYSZ-ISMSEID00000000", @@ -518,10 +481,8 @@ u8 *ism_get_seid(void) } EXPORT_SYMBOL_GPL(ism_get_seid); -static u16 smcd_get_chid(struct smcd_dev *smcd) +static u16 ism_get_chid(struct ism_dev *ism) { - struct ism_dev *ism = smcd->priv; - if (!ism || !ism->pdev) return 0; @@ -583,28 +544,11 @@ static irqreturn_t ism_handle_irq(int irq, void *data) return IRQ_HANDLED; } -static u64 smcd_get_local_gid(struct smcd_dev *smcd) +static u64 ism_get_local_gid(struct ism_dev *ism) { - struct ism_dev *ism = smcd->priv; - return ism->local_gid; } -static const struct smcd_ops ism_ops = { - .query_remote_gid = smcd_query_rgid, - .register_dmb = smcd_register_dmb, - .unregister_dmb = smcd_unregister_dmb, - .add_vlan_id = smcd_add_vlan_id, - .del_vlan_id = smcd_del_vlan_id, - .set_vlan_required = ism_set_vlan_required, - .reset_vlan_required = ism_reset_vlan_required, - .signal_event = smcd_signal_ieq, - .move_data = smcd_move, - .get_system_eid = ism_get_seid, - .get_local_gid = smcd_get_local_gid, - .get_chid = smcd_get_chid, -}; - static void ism_dev_add_work_func(struct work_struct *work) { struct ism_client *client = container_of(work, struct ism_client, @@ -846,3 +790,95 @@ static void __exit ism_exit(void) module_init(ism_init); module_exit(ism_exit); + +/*************************** SMC-D Implementation *****************************/ + +#if IS_ENABLED(CONFIG_SMC) +static int smcd_query_rgid(struct smcd_dev *smcd, u64 rgid, u32 vid_valid, + u32 vid) +{ + return ism_query_rgid(smcd->priv, rgid, vid_valid, vid); +} + +static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb, + struct ism_client *client) +{ + return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, client); +} + +static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb) +{ + return ism_unregister_dmb(smcd->priv, (struct ism_dmb *)dmb); +} + +static int smcd_add_vlan_id(struct smcd_dev *smcd, u64 vlan_id) +{ + return ism_add_vlan_id(smcd->priv, vlan_id); +} + +static int smcd_del_vlan_id(struct smcd_dev *smcd, u64 vlan_id) +{ + return ism_del_vlan_id(smcd->priv, vlan_id); +} + +static int smcd_set_vlan_required(struct smcd_dev *smcd) +{ + return ism_cmd_simple(smcd->priv, ISM_SET_VLAN); +} + +static int smcd_reset_vlan_required(struct smcd_dev *smcd) +{ + return ism_cmd_simple(smcd->priv, ISM_RESET_VLAN); +} + +static int smcd_signal_ieq(struct smcd_dev *smcd, u64 rgid, u32 trigger_irq, + u32 event_code, u64 info) +{ + return ism_signal_ieq(smcd->priv, rgid, trigger_irq, event_code, info); +} + +static int smcd_move(struct smcd_dev *smcd, u64 dmb_tok, unsigned int idx, + bool sf, unsigned int offset, void *data, + unsigned int size) +{ + return ism_move(smcd->priv, dmb_tok, idx, sf, offset, data, size); +} + +static u64 smcd_get_local_gid(struct smcd_dev *smcd) +{ + return ism_get_local_gid(smcd->priv); +} + +static u16 smcd_get_chid(struct smcd_dev *smcd) +{ + return ism_get_chid(smcd->priv); +} + +static inline struct device *smcd_get_dev(struct smcd_dev *dev) +{ + struct ism_dev *ism = dev->priv; + + return &ism->dev; +} + +static const struct smcd_ops ism_ops = { + .query_remote_gid = smcd_query_rgid, + .register_dmb = smcd_register_dmb, + .unregister_dmb = smcd_unregister_dmb, + .add_vlan_id = smcd_add_vlan_id, + .del_vlan_id = smcd_del_vlan_id, + .set_vlan_required = smcd_set_vlan_required, + .reset_vlan_required = smcd_reset_vlan_required, + .signal_event = smcd_signal_ieq, + .move_data = smcd_move, + .get_system_eid = ism_get_seid, + .get_local_gid = smcd_get_local_gid, + .get_chid = smcd_get_chid, +}; + +const struct smcd_ops *ism_get_smcd_ops(void) +{ + return &ism_ops; +} +EXPORT_SYMBOL_GPL(ism_get_smcd_ops); +#endif diff --git a/include/linux/ism.h b/include/linux/ism.h index bdd29e08d4fe..104ce2fd503a 100644 --- a/include/linux/ism.h +++ b/include/linux/ism.h @@ -94,4 +94,6 @@ int ism_move(struct ism_dev *dev, u64 dmb_tok, unsigned int idx, bool sf, unsigned int offset, void *data, unsigned int size); u8 *ism_get_seid(void); +const struct smcd_ops *ism_get_smcd_ops(void); + #endif /* _ISM_H */ diff --git a/include/net/smc.h b/include/net/smc.h index d5f8f18169d7..556b96c12279 100644 --- a/include/net/smc.h +++ b/include/net/smc.h @@ -50,11 +50,13 @@ struct smcd_dmb { #define ISM_ERROR 0xFFFF struct smcd_dev; +struct ism_client; struct smcd_ops { int (*query_remote_gid)(struct smcd_dev *dev, u64 rgid, u32 vid_valid, u32 vid); - int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb); + int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb, + struct ism_client *client); int (*unregister_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb); int (*add_vlan_id)(struct smcd_dev *dev, u64 vlan_id); int (*del_vlan_id)(struct smcd_dev *dev, u64 vlan_id); @@ -73,7 +75,6 @@ struct smcd_ops { struct smcd_dev { const struct smcd_ops *ops; struct device dev; - struct ism_dev *ism; void *priv; struct list_head list; spinlock_t lock; diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 6d31e9bbc5f9..6196b305df44 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -27,6 +27,7 @@ struct smcd_dev_list smcd_dev_list = { static bool smc_ism_v2_capable; static u8 smc_ism_v2_system_eid[SMC_MAX_EID_LEN]; +#if IS_ENABLED(CONFIG_ISM) static void smcd_register_dev(struct ism_dev *ism); static void smcd_unregister_dev(struct ism_dev *ism); static void smcd_handle_event(struct ism_dev *ism, struct ism_event *event); @@ -40,6 +41,7 @@ static struct ism_client smc_ism_client = { .handle_event = smcd_handle_event, .handle_irq = smcd_handle_irq, }; +#endif /* Test if an ISM communication is possible - same CPC */ int smc_ism_cantalk(u64 peer_gid, unsigned short vlan_id, struct smcd_dev *smcd) @@ -198,6 +200,7 @@ int smc_ism_unregister_dmb(struct smcd_dev *smcd, struct smc_buf_desc *dmb_desc) int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len, struct smc_buf_desc *dmb_desc) { +#if IS_ENABLED(CONFIG_ISM) struct smcd_dmb dmb; int rc; @@ -206,7 +209,7 @@ int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len, dmb.sba_idx = dmb_desc->sba_idx; dmb.vlan_id = lgr->vlan_id; dmb.rgid = lgr->peer_gid; - rc = lgr->smcd->ops->register_dmb(lgr->smcd, &dmb); + rc = lgr->smcd->ops->register_dmb(lgr->smcd, &dmb, &smc_ism_client); if (!rc) { dmb_desc->sba_idx = dmb.sba_idx; dmb_desc->token = dmb.dmb_tok; @@ -215,6 +218,9 @@ int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len, dmb_desc->len = dmb.dmb_len; } return rc; +#else + return 0; +#endif } static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd, @@ -308,6 +314,7 @@ int smcd_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } +#if IS_ENABLED(CONFIG_ISM) struct smc_ism_event_work { struct work_struct work; struct smcd_dev *smcd; @@ -351,24 +358,6 @@ static void smcd_handle_sw_event(struct smc_ism_event_work *wrk) } } -int smc_ism_signal_shutdown(struct smc_link_group *lgr) -{ - int rc; - union smcd_sw_event_info ev_info; - - if (lgr->peer_shutdown) - return 0; - - memcpy(ev_info.uid, lgr->id, SMC_LGR_ID_SIZE); - ev_info.vlan_id = lgr->vlan_id; - ev_info.code = ISM_EVENT_REQUEST; - rc = lgr->smcd->ops->signal_event(lgr->smcd, lgr->peer_gid, - ISM_EVENT_REQUEST_IR, - ISM_EVENT_CODE_SHUTDOWN, - ev_info.info); - return rc; -} - /* worker for SMC-D events */ static void smc_ism_event_work(struct work_struct *work) { @@ -442,9 +431,12 @@ EXPORT_SYMBOL_GPL(smcd_free_dev); static void smcd_register_dev(struct ism_dev *ism) { - const struct smcd_ops *ops = NULL; + const struct smcd_ops *ops = ism_get_smcd_ops(); struct smcd_dev *smcd; + if (!ops) + return; + smcd = smcd_alloc_dev(&ism->pdev->dev, dev_name(&ism->pdev->dev), ops, ISM_NR_DMBS); if (!smcd) @@ -550,16 +542,39 @@ static void smcd_handle_irq(struct ism_dev *ism, unsigned int dmbno, tasklet_schedule(&conn->rx_tsklet); spin_unlock_irqrestore(&smcd->lock, flags); } +#endif + +int smc_ism_signal_shutdown(struct smc_link_group *lgr) +{ + int rc = 0; +#if IS_ENABLED(CONFIG_ISM) + union smcd_sw_event_info ev_info; + + if (lgr->peer_shutdown) + return 0; + + memcpy(ev_info.uid, lgr->id, SMC_LGR_ID_SIZE); + ev_info.vlan_id = lgr->vlan_id; + ev_info.code = ISM_EVENT_REQUEST; + rc = lgr->smcd->ops->signal_event(lgr->smcd, lgr->peer_gid, + ISM_EVENT_REQUEST_IR, + ISM_EVENT_CODE_SHUTDOWN, + ev_info.info); +#endif + return rc; +} int smc_ism_init(void) { + int rc = 0; + +#if IS_ENABLED(CONFIG_ISM) smc_ism_v2_capable = false; memset(smc_ism_v2_system_eid, 0, SMC_MAX_EID_LEN); -#if IS_ENABLED(CONFIG_ISM) - return ism_register_client(&smc_ism_client); -#else - return 0; + + rc = ism_register_client(&smc_ism_client); #endif + return rc; } void smc_ism_exit(void) -- cgit v1.2.3 From 8c81ba20349daf9f7e58bb05a0c12f4b71813a30 Mon Sep 17 00:00:00 2001 From: Stefan Raspl Date: Mon, 23 Jan 2023 19:17:52 +0100 Subject: net/smc: De-tangle ism and smc device initialization The struct device for ISM devices was part of struct smcd_dev. Move to struct ism_dev, provide a new API call in struct smcd_ops, and convert existing SMCD code accordingly. Furthermore, remove struct smcd_dev from struct ism_dev. This is the final part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl Signed-off-by: Jan Karcher Signed-off-by: Wenjia Zhang Signed-off-by: David S. Miller --- drivers/s390/net/ism_drv.c | 25 ++++++++++----------- include/linux/ism.h | 1 - include/net/smc.h | 6 +---- net/smc/af_smc.c | 1 + net/smc/smc_core.c | 6 +++-- net/smc/smc_ism.c | 55 ++++++++++------------------------------------ net/smc/smc_pnet.c | 40 ++++++++++++++++++--------------- 7 files changed, 52 insertions(+), 82 deletions(-) (limited to 'net') diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c index 73c8f42a22a7..eb7e13486087 100644 --- a/drivers/s390/net/ism_drv.c +++ b/drivers/s390/net/ism_drv.c @@ -646,6 +646,12 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) spin_lock_init(&ism->lock); dev_set_drvdata(&pdev->dev, ism); ism->pdev = pdev; + ism->dev.parent = &pdev->dev; + device_initialize(&ism->dev); + dev_set_name(&ism->dev, dev_name(&pdev->dev)); + ret = device_add(&ism->dev); + if (ret) + goto err_dev; ret = pci_enable_device_mem(pdev); if (ret) @@ -663,30 +669,23 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id) dma_set_max_seg_size(&pdev->dev, SZ_1M); pci_set_master(pdev); - ism->smcd = smcd_alloc_dev(&pdev->dev, dev_name(&pdev->dev), &ism_ops, - ISM_NR_DMBS); - if (!ism->smcd) { - ret = -ENOMEM; - goto err_resource; - } - - ism->smcd->priv = ism; ret = ism_dev_init(ism); if (ret) - goto err_free; + goto err_resource; return 0; -err_free: - smcd_free_dev(ism->smcd); err_resource: pci_clear_master(pdev); pci_release_mem_regions(pdev); err_disable: pci_disable_device(pdev); err: - kfree(ism); + device_del(&ism->dev); +err_dev: dev_set_drvdata(&pdev->dev, NULL); + kfree(ism); + return ret; } @@ -740,7 +739,6 @@ static void ism_remove(struct pci_dev *pdev) ism_dev_exit(ism); mutex_unlock(&ism_dev_list.mutex); - smcd_free_dev(ism->smcd); pci_clear_master(pdev); pci_release_mem_regions(pdev); pci_disable_device(pdev); @@ -874,6 +872,7 @@ static const struct smcd_ops ism_ops = { .get_system_eid = ism_get_seid, .get_local_gid = smcd_get_local_gid, .get_chid = smcd_get_chid, + .get_dev = smcd_get_dev, }; const struct smcd_ops *ism_get_smcd_ops(void) diff --git a/include/linux/ism.h b/include/linux/ism.h index 104ce2fd503a..ea2bcdae7401 100644 --- a/include/linux/ism.h +++ b/include/linux/ism.h @@ -30,7 +30,6 @@ struct ism_dev { spinlock_t lock; /* protects the ism device */ struct list_head list; struct pci_dev *pdev; - struct smcd_dev *smcd; struct ism_sba *sba; dma_addr_t sba_dma_addr; diff --git a/include/net/smc.h b/include/net/smc.h index 556b96c12279..597cb9381182 100644 --- a/include/net/smc.h +++ b/include/net/smc.h @@ -70,11 +70,11 @@ struct smcd_ops { u8* (*get_system_eid)(void); u64 (*get_local_gid)(struct smcd_dev *dev); u16 (*get_chid)(struct smcd_dev *dev); + struct device* (*get_dev)(struct smcd_dev *dev); }; struct smcd_dev { const struct smcd_ops *ops; - struct device dev; void *priv; struct list_head list; spinlock_t lock; @@ -90,8 +90,4 @@ struct smcd_dev { u8 going_away : 1; }; -struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, - const struct smcd_ops *ops, int max_dmbs); -void smcd_free_dev(struct smcd_dev *smcd); - #endif /* _SMC_H */ diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 5d037714ab78..036532cf39aa 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3499,6 +3499,7 @@ static void __exit smc_exit(void) sock_unregister(PF_SMC); smc_core_exit(); smc_ib_unregister_client(); + smc_ism_exit(); destroy_workqueue(smc_close_wq); destroy_workqueue(smc_tcp_ls_wq); destroy_workqueue(smc_hs_wq); diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index ec04966e9bf9..7642b16c41d1 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -822,6 +822,7 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini) { struct smc_link_group *lgr; struct list_head *lgr_list; + struct smcd_dev *smcd; struct smc_link *lnk; spinlock_t *lgr_lock; u8 link_idx; @@ -868,7 +869,8 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini) lgr->conns_all = RB_ROOT; if (ini->is_smcd) { /* SMC-D specific settings */ - get_device(&ini->ism_dev[ini->ism_selected]->dev); + smcd = ini->ism_dev[ini->ism_selected]; + get_device(smcd->ops->get_dev(smcd)); lgr->peer_gid = ini->ism_peer_gid[ini->ism_selected]; lgr->smcd = ini->ism_dev[ini->ism_selected]; lgr_list = &ini->ism_dev[ini->ism_selected]->lgr_list; @@ -1387,7 +1389,7 @@ static void smc_lgr_free(struct smc_link_group *lgr) destroy_workqueue(lgr->tx_wq); if (lgr->is_smcd) { smc_ism_put_vlan(lgr->smcd, lgr->vlan_id); - put_device(&lgr->smcd->dev); + put_device(lgr->smcd->ops->get_dev(lgr->smcd)); } smc_lgr_put(lgr); /* theoretically last lgr_put */ } diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c index 6196b305df44..3b0b7710c6b0 100644 --- a/net/smc/smc_ism.c +++ b/net/smc/smc_ism.c @@ -231,9 +231,11 @@ static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd, struct smc_pci_dev smc_pci_dev; struct nlattr *port_attrs; struct nlattr *attrs; + struct ism_dev *ism; int use_cnt = 0; void *nlh; + ism = smcd->priv; nlh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, &smc_gen_nl_family, NLM_F_MULTI, SMC_NETLINK_GET_DEV_SMCD); @@ -248,7 +250,7 @@ static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd, if (nla_put_u8(skb, SMC_NLA_DEV_IS_CRIT, use_cnt > 0)) goto errattr; memset(&smc_pci_dev, 0, sizeof(smc_pci_dev)); - smc_set_pci_values(to_pci_dev(smcd->dev.parent), &smc_pci_dev); + smc_set_pci_values(to_pci_dev(ism->dev.parent), &smc_pci_dev); if (nla_put_u32(skb, SMC_NLA_DEV_PCI_FID, smc_pci_dev.pci_fid)) goto errattr; if (nla_put_u16(skb, SMC_NLA_DEV_PCI_CHID, smc_pci_dev.pci_pchid)) @@ -377,41 +379,24 @@ static void smc_ism_event_work(struct work_struct *work) kfree(wrk); } -static void smcd_release(struct device *dev) -{ - struct smcd_dev *smcd = container_of(dev, struct smcd_dev, dev); - - kfree(smcd->conn); - kfree(smcd); -} - -struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, - const struct smcd_ops *ops, int max_dmbs) +static struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, + const struct smcd_ops *ops, int max_dmbs) { struct smcd_dev *smcd; - smcd = kzalloc(sizeof(*smcd), GFP_KERNEL); + smcd = devm_kzalloc(parent, sizeof(*smcd), GFP_KERNEL); if (!smcd) return NULL; - smcd->conn = kcalloc(max_dmbs, sizeof(struct smc_connection *), - GFP_KERNEL); - if (!smcd->conn) { - kfree(smcd); + smcd->conn = devm_kcalloc(parent, max_dmbs, + sizeof(struct smc_connection *), GFP_KERNEL); + if (!smcd->conn) return NULL; - } smcd->event_wq = alloc_ordered_workqueue("ism_evt_wq-%s)", WQ_MEM_RECLAIM, name); - if (!smcd->event_wq) { - kfree(smcd->conn); - kfree(smcd); + if (!smcd->event_wq) return NULL; - } - smcd->dev.parent = parent; - smcd->dev.release = smcd_release; - device_initialize(&smcd->dev); - dev_set_name(&smcd->dev, name); smcd->ops = ops; spin_lock_init(&smcd->lock); @@ -421,13 +406,6 @@ struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name, init_waitqueue_head(&smcd->lgrs_deleted); return smcd; } -EXPORT_SYMBOL_GPL(smcd_alloc_dev); - -void smcd_free_dev(struct smcd_dev *smcd) -{ - put_device(&smcd->dev); -} -EXPORT_SYMBOL_GPL(smcd_free_dev); static void smcd_register_dev(struct ism_dev *ism) { @@ -465,16 +443,9 @@ static void smcd_register_dev(struct ism_dev *ism) mutex_unlock(&smcd_dev_list.mutex); pr_warn_ratelimited("smc: adding smcd device %s with pnetid %.16s%s\n", - dev_name(&smcd->dev), smcd->pnetid, + dev_name(&ism->dev), smcd->pnetid, smcd->pnetid_by_user ? " (user defined)" : ""); - if (device_add(&smcd->dev)) { - mutex_lock(&smcd_dev_list.mutex); - list_del(&smcd->list); - mutex_unlock(&smcd_dev_list.mutex); - smcd_free_dev(smcd); - } - return; } @@ -483,15 +454,13 @@ static void smcd_unregister_dev(struct ism_dev *ism) struct smcd_dev *smcd = ism_get_priv(ism, &smc_ism_client); pr_warn_ratelimited("smc: removing smcd device %s\n", - dev_name(&smcd->dev)); + dev_name(&ism->dev)); smcd->going_away = 1; smc_smcd_terminate_all(smcd); mutex_lock(&smcd_dev_list.mutex); list_del_init(&smcd->list); mutex_unlock(&smcd_dev_list.mutex); destroy_workqueue(smcd->event_wq); - - device_del(&smcd->dev); } /* SMCD Device event handler. Called from ISM device interrupt handler. diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index 25fb2fd186e2..11775401df68 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -103,7 +103,7 @@ static int smc_pnet_remove_by_pnetid(struct net *net, char *pnet_name) struct smc_pnetentry *pnetelem, *tmp_pe; struct smc_pnettable *pnettable; struct smc_ib_device *ibdev; - struct smcd_dev *smcd_dev; + struct smcd_dev *smcd; struct smc_net *sn; int rc = -ENOENT; int ibport; @@ -162,16 +162,17 @@ static int smc_pnet_remove_by_pnetid(struct net *net, char *pnet_name) mutex_unlock(&smc_ib_devices.mutex); /* remove smcd devices */ mutex_lock(&smcd_dev_list.mutex); - list_for_each_entry(smcd_dev, &smcd_dev_list.list, list) { - if (smcd_dev->pnetid_by_user && + list_for_each_entry(smcd, &smcd_dev_list.list, list) { + if (smcd->pnetid_by_user && (!pnet_name || - smc_pnet_match(pnet_name, smcd_dev->pnetid))) { + smc_pnet_match(pnet_name, smcd->pnetid))) { pr_warn_ratelimited("smc: smcd device %s " "erased user defined pnetid " - "%.16s\n", dev_name(&smcd_dev->dev), - smcd_dev->pnetid); - memset(smcd_dev->pnetid, 0, SMC_MAX_PNETID_LEN); - smcd_dev->pnetid_by_user = false; + "%.16s\n", + dev_name(smcd->ops->get_dev(smcd)), + smcd->pnetid); + memset(smcd->pnetid, 0, SMC_MAX_PNETID_LEN); + smcd->pnetid_by_user = false; rc = 0; } } @@ -331,8 +332,8 @@ static struct smcd_dev *smc_pnet_find_smcd(char *smcd_name) mutex_lock(&smcd_dev_list.mutex); list_for_each_entry(smcd_dev, &smcd_dev_list.list, list) { - if (!strncmp(dev_name(&smcd_dev->dev), smcd_name, - IB_DEVICE_NAME_MAX - 1)) + if (!strncmp(dev_name(smcd_dev->ops->get_dev(smcd_dev)), + smcd_name, IB_DEVICE_NAME_MAX - 1)) goto out; } smcd_dev = NULL; @@ -411,7 +412,8 @@ static int smc_pnet_add_ib(struct smc_pnettable *pnettable, char *ib_name, struct smc_ib_device *ib_dev; bool smcddev_applied = true; bool ibdev_applied = true; - struct smcd_dev *smcd_dev; + struct smcd_dev *smcd; + struct device *dev; bool new_ibdev; /* try to apply the pnetid to active devices */ @@ -425,14 +427,16 @@ static int smc_pnet_add_ib(struct smc_pnettable *pnettable, char *ib_name, ib_port, ib_dev->pnetid[ib_port - 1]); } - smcd_dev = smc_pnet_find_smcd(ib_name); - if (smcd_dev) { - smcddev_applied = smc_pnet_apply_smcd(smcd_dev, pnet_name); - if (smcddev_applied) + smcd = smc_pnet_find_smcd(ib_name); + if (smcd) { + smcddev_applied = smc_pnet_apply_smcd(smcd, pnet_name); + if (smcddev_applied) { + dev = smcd->ops->get_dev(smcd); pr_warn_ratelimited("smc: smcd device %s " "applied user defined pnetid " - "%.16s\n", dev_name(&smcd_dev->dev), - smcd_dev->pnetid); + "%.16s\n", dev_name(dev), + smcd->pnetid); + } } /* Apply fails when a device has a hardware-defined pnetid set, do not * add a pnet table entry in that case. @@ -1181,7 +1185,7 @@ int smc_pnetid_by_table_ib(struct smc_ib_device *smcibdev, u8 ib_port) */ int smc_pnetid_by_table_smcd(struct smcd_dev *smcddev) { - const char *ib_name = dev_name(&smcddev->dev); + const char *ib_name = dev_name(smcddev->ops->get_dev(smcddev)); struct smc_pnettable *pnettable; struct smc_pnetentry *tmp_pe; struct smc_net *sn; -- cgit v1.2.3 From c96de136329b38172f21214021fc30d67f05c399 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 24 Jan 2023 13:08:01 +0200 Subject: net: ethtool: fix NULL pointer dereference in stats_prepare_data() In the following call path: ethnl_default_dumpit -> ethnl_default_dump_one -> ctx->ops->prepare_data -> stats_prepare_data struct genl_info *info will be passed as NULL, and stats_prepare_data() dereferences it while getting the extended ack pointer. To avoid that, just set the extack to NULL if "info" is NULL, since the netlink extack handling messages know how to deal with that. The pattern "info ? info->extack : NULL" is present in quite a few other "prepare_data" implementations, so it's clear that it's a more general problem to be dealt with at a higher level, but the code should have at least adhered to the current conventions to avoid the NULL dereference. Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Reported-by: Eric Dumazet Signed-off-by: Vladimir Oltean Reviewed-by: Leon Romanovsky Signed-off-by: David S. Miller --- net/ethtool/stats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c index 7294be5855d4..010ed19ccc99 100644 --- a/net/ethtool/stats.c +++ b/net/ethtool/stats.c @@ -117,9 +117,9 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base, struct genl_info *info) { const struct stats_req_info *req_info = STATS_REQINFO(req_base); + struct netlink_ext_ack *extack = info ? info->extack : NULL; struct stats_reply_data *data = STATS_REPDATA(reply_base); enum ethtool_mac_stats_src src = req_info->src; - struct netlink_ext_ack *extack = info->extack; struct net_device *dev = reply_base->dev; int ret; -- cgit v1.2.3 From f5be9caf7bf022ab550f62ea68f1c1bb8f5287ee Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 24 Jan 2023 13:13:28 +0200 Subject: net: ethtool: fix NULL pointer dereference in pause_prepare_data() In the following call path: ethnl_default_dumpit -> ethnl_default_dump_one -> ctx->ops->prepare_data -> pause_prepare_data struct genl_info *info will be passed as NULL, and pause_prepare_data() dereferences it while getting the extended ack pointer. To avoid that, just set the extack to NULL if "info" is NULL, since the netlink extack handling messages know how to deal with that. The pattern "info ? info->extack : NULL" is present in quite a few other "prepare_data" implementations, so it's clear that it's a more general problem to be dealt with at a higher level, but the code should have at least adhered to the current conventions to avoid the NULL dereference. Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Reported-by: Eric Dumazet Signed-off-by: Vladimir Oltean Reported-by: syzbot+9d44aae2720fc40b8474@syzkaller.appspotmail.com Signed-off-by: David S. Miller --- net/ethtool/pause.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ethtool/pause.c b/net/ethtool/pause.c index e2be9e89c9d9..dcd34b9a849f 100644 --- a/net/ethtool/pause.c +++ b/net/ethtool/pause.c @@ -54,9 +54,9 @@ static int pause_prepare_data(const struct ethnl_req_info *req_base, struct genl_info *info) { const struct pause_req_info *req_info = PAUSE_REQINFO(req_base); + struct netlink_ext_ack *extack = info ? info->extack : NULL; struct pause_reply_data *data = PAUSE_REPDATA(reply_base); enum ethtool_mac_stats_src src = req_info->src; - struct netlink_ext_ack *extack = info->extack; struct net_device *dev = reply_base->dev; int ret; -- cgit v1.2.3 From 51a52a29ebaa8395de090fa415c6e1b2899a50f1 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Wed, 25 Jan 2023 10:47:34 -0600 Subject: bpf: Pass const struct bpf_prog * to .check_member The .check_member field of struct bpf_struct_ops is currently passed the member's btf_type via const struct btf_type *t, and a const struct btf_member *member. This allows the struct_ops implementation to check whether e.g. an ops is supported, but it would be useful to also enforce that the struct_ops prog being loaded for that member has other qualities, like being sleepable (or not). This patch therefore updates the .check_member() callback to also take a const struct bpf_prog *prog argument. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230125164735.785732-4-void@manifault.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 ++- kernel/bpf/verifier.c | 2 +- net/ipv4/bpf_tcp_ca.c | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1bec48d9e5d9..0d868ef1b973 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1422,7 +1422,8 @@ struct bpf_struct_ops { const struct bpf_verifier_ops *verifier_ops; int (*init)(struct btf *btf); int (*check_member)(const struct btf_type *t, - const struct btf_member *member); + const struct btf_member *member, + const struct bpf_prog *prog); int (*init_member)(const struct btf_type *t, const struct btf_member *member, void *kdata, const void *udata); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c8907df49f81..6bd097e0d45f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -16792,7 +16792,7 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) } if (st_ops->check_member) { - int err = st_ops->check_member(t, member); + int err = st_ops->check_member(t, member, prog); if (err) { verbose(env, "attach to unsupported member %s of struct %s\n", diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 4517d2bd186a..13fc0c185cd9 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -248,7 +248,8 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t, } static int bpf_tcp_ca_check_member(const struct btf_type *t, - const struct btf_member *member) + const struct btf_member *member, + const struct bpf_prog *prog) { if (is_unsupported(__btf_member_bit_offset(t, member) / 8)) return -ENOTSUPP; -- cgit v1.2.3 From 7dd880592a88799f3ef48fda507849a75f11cbf0 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Wed, 25 Jan 2023 10:47:35 -0600 Subject: bpf/selftests: Verify struct_ops prog sleepable behavior In a set of prior changes, we added the ability for struct_ops programs to be sleepable. This patch enhances the dummy_st_ops selftest suite to validate this behavior by adding a new sleepable struct_ops entry to dummy_st_ops. Signed-off-by: David Vernet Link: https://lore.kernel.org/r/20230125164735.785732-5-void@manifault.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + net/bpf/bpf_dummy_struct_ops.c | 18 ++++++++ .../selftests/bpf/prog_tests/dummy_st_ops.c | 52 ++++++++++++++++------ tools/testing/selftests/bpf/progs/dummy_st_ops.c | 50 --------------------- .../selftests/bpf/progs/dummy_st_ops_fail.c | 27 +++++++++++ .../selftests/bpf/progs/dummy_st_ops_success.c | 47 +++++++++++++++++++ 6 files changed, 132 insertions(+), 63 deletions(-) delete mode 100644 tools/testing/selftests/bpf/progs/dummy_st_ops.c create mode 100644 tools/testing/selftests/bpf/progs/dummy_st_ops_fail.c create mode 100644 tools/testing/selftests/bpf/progs/dummy_st_ops_success.c (limited to 'net') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0d868ef1b973..14a0264fac57 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1474,6 +1474,7 @@ struct bpf_dummy_ops { int (*test_1)(struct bpf_dummy_ops_state *cb); int (*test_2)(struct bpf_dummy_ops_state *cb, int a1, unsigned short a2, char a3, unsigned long a4); + int (*test_sleepable)(struct bpf_dummy_ops_state *cb); }; int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 1ac4467928a9..ff4f89a2b02a 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -154,6 +154,23 @@ static bool bpf_dummy_ops_is_valid_access(int off, int size, return bpf_tracing_btf_ctx_access(off, size, type, prog, info); } +static int bpf_dummy_ops_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + u32 moff = __btf_member_bit_offset(t, member) / 8; + + switch (moff) { + case offsetof(struct bpf_dummy_ops, test_sleepable): + break; + default: + if (prog->aux->sleepable) + return -EINVAL; + } + + return 0; +} + static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log, const struct bpf_reg_state *reg, int off, int size, enum bpf_access_type atype, @@ -208,6 +225,7 @@ static void bpf_dummy_unreg(void *kdata) struct bpf_struct_ops bpf_bpf_dummy_ops = { .verifier_ops = &bpf_dummy_verifier_ops, .init = bpf_dummy_init, + .check_member = bpf_dummy_ops_check_member, .init_member = bpf_dummy_init_member, .reg = bpf_dummy_reg, .unreg = bpf_dummy_unreg, diff --git a/tools/testing/selftests/bpf/prog_tests/dummy_st_ops.c b/tools/testing/selftests/bpf/prog_tests/dummy_st_ops.c index c11832657d2b..f43fcb13d2c4 100644 --- a/tools/testing/selftests/bpf/prog_tests/dummy_st_ops.c +++ b/tools/testing/selftests/bpf/prog_tests/dummy_st_ops.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (C) 2021. Huawei Technologies Co., Ltd */ #include -#include "dummy_st_ops.skel.h" +#include "dummy_st_ops_success.skel.h" +#include "dummy_st_ops_fail.skel.h" #include "trace_dummy_st_ops.skel.h" /* Need to keep consistent with definition in include/linux/bpf.h */ @@ -11,17 +12,17 @@ struct bpf_dummy_ops_state { static void test_dummy_st_ops_attach(void) { - struct dummy_st_ops *skel; + struct dummy_st_ops_success *skel; struct bpf_link *link; - skel = dummy_st_ops__open_and_load(); + skel = dummy_st_ops_success__open_and_load(); if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load")) return; link = bpf_map__attach_struct_ops(skel->maps.dummy_1); ASSERT_EQ(libbpf_get_error(link), -EOPNOTSUPP, "dummy_st_ops_attach"); - dummy_st_ops__destroy(skel); + dummy_st_ops_success__destroy(skel); } static void test_dummy_init_ret_value(void) @@ -31,10 +32,10 @@ static void test_dummy_init_ret_value(void) .ctx_in = args, .ctx_size_in = sizeof(args), ); - struct dummy_st_ops *skel; + struct dummy_st_ops_success *skel; int fd, err; - skel = dummy_st_ops__open_and_load(); + skel = dummy_st_ops_success__open_and_load(); if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load")) return; @@ -43,7 +44,7 @@ static void test_dummy_init_ret_value(void) ASSERT_OK(err, "test_run"); ASSERT_EQ(attr.retval, 0xf2f3f4f5, "test_ret"); - dummy_st_ops__destroy(skel); + dummy_st_ops_success__destroy(skel); } static void test_dummy_init_ptr_arg(void) @@ -58,10 +59,10 @@ static void test_dummy_init_ptr_arg(void) .ctx_size_in = sizeof(args), ); struct trace_dummy_st_ops *trace_skel; - struct dummy_st_ops *skel; + struct dummy_st_ops_success *skel; int fd, err; - skel = dummy_st_ops__open_and_load(); + skel = dummy_st_ops_success__open_and_load(); if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load")) return; @@ -91,7 +92,7 @@ static void test_dummy_init_ptr_arg(void) ASSERT_EQ(trace_skel->bss->val, exp_retval, "fentry_val"); done: - dummy_st_ops__destroy(skel); + dummy_st_ops_success__destroy(skel); trace_dummy_st_ops__destroy(trace_skel); } @@ -102,12 +103,12 @@ static void test_dummy_multiple_args(void) .ctx_in = args, .ctx_size_in = sizeof(args), ); - struct dummy_st_ops *skel; + struct dummy_st_ops_success *skel; int fd, err; size_t i; char name[8]; - skel = dummy_st_ops__open_and_load(); + skel = dummy_st_ops_success__open_and_load(); if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load")) return; @@ -119,7 +120,28 @@ static void test_dummy_multiple_args(void) ASSERT_EQ(skel->bss->test_2_args[i], args[i], name); } - dummy_st_ops__destroy(skel); + dummy_st_ops_success__destroy(skel); +} + +static void test_dummy_sleepable(void) +{ + __u64 args[1] = {0}; + LIBBPF_OPTS(bpf_test_run_opts, attr, + .ctx_in = args, + .ctx_size_in = sizeof(args), + ); + struct dummy_st_ops_success *skel; + int fd, err; + + skel = dummy_st_ops_success__open_and_load(); + if (!ASSERT_OK_PTR(skel, "dummy_st_ops_load")) + return; + + fd = bpf_program__fd(skel->progs.test_sleepable); + err = bpf_prog_test_run_opts(fd, &attr); + ASSERT_OK(err, "test_run"); + + dummy_st_ops_success__destroy(skel); } void test_dummy_st_ops(void) @@ -132,4 +154,8 @@ void test_dummy_st_ops(void) test_dummy_init_ptr_arg(); if (test__start_subtest("dummy_multiple_args")) test_dummy_multiple_args(); + if (test__start_subtest("dummy_sleepable")) + test_dummy_sleepable(); + + RUN_TESTS(dummy_st_ops_fail); } diff --git a/tools/testing/selftests/bpf/progs/dummy_st_ops.c b/tools/testing/selftests/bpf/progs/dummy_st_ops.c deleted file mode 100644 index ead87edb75e2..000000000000 --- a/tools/testing/selftests/bpf/progs/dummy_st_ops.c +++ /dev/null @@ -1,50 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* Copyright (C) 2021. Huawei Technologies Co., Ltd */ -#include -#include -#include - -struct bpf_dummy_ops_state { - int val; -} __attribute__((preserve_access_index)); - -struct bpf_dummy_ops { - int (*test_1)(struct bpf_dummy_ops_state *state); - int (*test_2)(struct bpf_dummy_ops_state *state, int a1, unsigned short a2, - char a3, unsigned long a4); -}; - -char _license[] SEC("license") = "GPL"; - -SEC("struct_ops/test_1") -int BPF_PROG(test_1, struct bpf_dummy_ops_state *state) -{ - int ret; - - if (!state) - return 0xf2f3f4f5; - - ret = state->val; - state->val = 0x5a; - return ret; -} - -__u64 test_2_args[5]; - -SEC("struct_ops/test_2") -int BPF_PROG(test_2, struct bpf_dummy_ops_state *state, int a1, unsigned short a2, - char a3, unsigned long a4) -{ - test_2_args[0] = (unsigned long)state; - test_2_args[1] = a1; - test_2_args[2] = a2; - test_2_args[3] = a3; - test_2_args[4] = a4; - return 0; -} - -SEC(".struct_ops") -struct bpf_dummy_ops dummy_1 = { - .test_1 = (void *)test_1, - .test_2 = (void *)test_2, -}; diff --git a/tools/testing/selftests/bpf/progs/dummy_st_ops_fail.c b/tools/testing/selftests/bpf/progs/dummy_st_ops_fail.c new file mode 100644 index 000000000000..0bf969a0b5ed --- /dev/null +++ b/tools/testing/selftests/bpf/progs/dummy_st_ops_fail.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include +#include + +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +SEC("struct_ops.s/test_2") +__failure __msg("attach to unsupported member test_2 of struct bpf_dummy_ops") +int BPF_PROG(test_unsupported_field_sleepable, + struct bpf_dummy_ops_state *state, int a1, unsigned short a2, + char a3, unsigned long a4) +{ + /* Tries to mark an unsleepable field in struct bpf_dummy_ops as sleepable. */ + return 0; +} + +SEC(".struct_ops") +struct bpf_dummy_ops dummy_1 = { + .test_1 = NULL, + .test_2 = (void *)test_unsupported_field_sleepable, + .test_sleepable = (void *)NULL, +}; diff --git a/tools/testing/selftests/bpf/progs/dummy_st_ops_success.c b/tools/testing/selftests/bpf/progs/dummy_st_ops_success.c new file mode 100644 index 000000000000..1efa746c25dc --- /dev/null +++ b/tools/testing/selftests/bpf/progs/dummy_st_ops_success.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2021. Huawei Technologies Co., Ltd */ +#include "vmlinux.h" +#include +#include + +char _license[] SEC("license") = "GPL"; + +SEC("struct_ops/test_1") +int BPF_PROG(test_1, struct bpf_dummy_ops_state *state) +{ + int ret; + + if (!state) + return 0xf2f3f4f5; + + ret = state->val; + state->val = 0x5a; + return ret; +} + +__u64 test_2_args[5]; + +SEC("struct_ops/test_2") +int BPF_PROG(test_2, struct bpf_dummy_ops_state *state, int a1, unsigned short a2, + char a3, unsigned long a4) +{ + test_2_args[0] = (unsigned long)state; + test_2_args[1] = a1; + test_2_args[2] = a2; + test_2_args[3] = a3; + test_2_args[4] = a4; + return 0; +} + +SEC("struct_ops.s/test_sleepable") +int BPF_PROG(test_sleepable, struct bpf_dummy_ops_state *state) +{ + return 0; +} + +SEC(".struct_ops") +struct bpf_dummy_ops dummy_1 = { + .test_1 = (void *)test_1, + .test_2 = (void *)test_2, + .test_sleepable = (void *)test_sleepable, +}; -- cgit v1.2.3 From 2ab42c7b871f4e85f21e1a85dfa3f87e4d31221d Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Wed, 25 Jan 2023 12:16:07 -0800 Subject: bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). Resolve an issue when calling sol_tcp_sockopt() on a socket with ktls enabled. Prior to this patch, sol_tcp_sockopt() would only allow calls if the function pointer of setsockopt of the socket was set to tcp_setsockopt(). However, any socket with ktls enabled would have its function pointer set to tls_setsockopt(). To resolve this issue, the patch adds a check of the protocol of the linux socket and allows bpf_setsockopt() to be called if ktls is initialized on the linux socket. This ensures that calls to sol_tcp_sockopt() will succeed on sockets with ktls enabled. Signed-off-by: Kui-Feng Lee Link: https://lore.kernel.org/r/20230125201608.908230-2-kuifeng@meta.com Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/core/filter.c b/net/core/filter.c index ed08dbf10338..6da78b3d381e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5204,7 +5204,7 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, char *optval, int *optlen, bool getopt) { - if (sk->sk_prot->setsockopt != tcp_setsockopt) + if (sk->sk_protocol != IPPROTO_TCP) return -EINVAL; switch (optname) { -- cgit v1.2.3 From 6a7a2c18a9defa922dbc096a8d71e7f74da10582 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 24 Jan 2023 10:17:24 -0800 Subject: net: Kconfig: fix spellos Fix spelling in net/ Kconfig files. (reported by codespell) Signed-off-by: Randy Dunlap Cc: Pablo Neira Ayuso Cc: Jozsef Kadlecsik Cc: Florian Westphal Cc: coreteam@netfilter.org Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- net/netfilter/ipset/Kconfig | 2 +- net/sched/Kconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/netfilter/ipset/Kconfig b/net/netfilter/ipset/Kconfig index 3c273483df23..b1ea054bb82c 100644 --- a/net/netfilter/ipset/Kconfig +++ b/net/netfilter/ipset/Kconfig @@ -30,7 +30,7 @@ config IP_SET_BITMAP_IP depends on IP_SET help This option adds the bitmap:ip set type support, by which one - can store IPv4 addresses (or network addresse) from a range. + can store IPv4 addresses (or network addresses) from a range. To compile it as a module, choose M here. If unsure, say N. diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 777d6b50505c..de18a0dda6df 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -337,7 +337,7 @@ config NET_SCH_FQ Say Y here if you want to use the FQ packet scheduling algorithm. FQ does flow separation, and is able to respect pacing requirements - set by TCP stack into sk->sk_pacing_rate (for localy generated + set by TCP stack into sk->sk_pacing_rate (for locally generated traffic) To compile this driver as a module, choose M here: the module -- cgit v1.2.3 From 91d0b78c5177f3e42a4d8738af8ac19c3a90d002 Mon Sep 17 00:00:00 2001 From: Jakub Sitnicki Date: Tue, 24 Jan 2023 14:36:43 +0100 Subject: inet: Add IP_LOCAL_PORT_RANGE socket option Users who want to share a single public IP address for outgoing connections between several hosts traditionally reach for SNAT. However, SNAT requires state keeping on the node(s) performing the NAT. A stateless alternative exists, where a single IP address used for egress can be shared between several hosts by partitioning the available ephemeral port range. In such a setup: 1. Each host gets assigned a disjoint range of ephemeral ports. 2. Applications open connections from the host-assigned port range. 3. Return traffic gets routed to the host based on both, the destination IP and the destination port. An application which wants to open an outgoing connection (connect) from a given port range today can choose between two solutions: 1. Manually pick the source port by bind()'ing to it before connect()'ing the socket. This approach has a couple of downsides: a) Search for a free port has to be implemented in the user-space. If the chosen 4-tuple happens to be busy, the application needs to retry from a different local port number. Detecting if 4-tuple is busy can be either easy (TCP) or hard (UDP). In TCP case, the application simply has to check if connect() returned an error (EADDRNOTAVAIL). That is assuming that the local port sharing was enabled (REUSEADDR) by all the sockets. # Assume desired local port range is 60_000-60_511 s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind(("192.0.2.1", 60_000)) s.connect(("1.1.1.1", 53)) # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy # Application must retry with another local port In case of UDP, the network stack allows binding more than one socket to the same 4-tuple, when local port sharing is enabled (REUSEADDR). Hence detecting the conflict is much harder and involves querying sock_diag and toggling the REUSEADDR flag [1]. b) For TCP, bind()-ing to a port within the ephemeral port range means that no connecting sockets, that is those which leave it to the network stack to find a free local port at connect() time, can use the this port. IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port will be skipped during the free port search at connect() time. 2. Isolate the app in a dedicated netns and use the use the per-netns ip_local_port_range sysctl to adjust the ephemeral port range bounds. The per-netns setting affects all sockets, so this approach can be used only if: - there is just one egress IP address, or - the desired egress port range is the same for all egress IP addresses used by the application. For TCP, this approach avoids the downsides of (1). Free port search and 4-tuple conflict detection is done by the network stack: system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'") s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1) s.bind(("192.0.2.1", 0)) s.connect(("1.1.1.1", 53)) # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy For UDP this approach has limited applicability. Setting the IP_BIND_ADDRESS_NO_PORT socket option does not result in local source port being shared with other connected UDP sockets. Hence relying on the network stack to find a free source port, limits the number of outgoing UDP flows from a single IP address down to the number of available ephemeral ports. To put it another way, partitioning the ephemeral port range between hosts using the existing Linux networking API is cumbersome. To address this use case, add a new socket option at the SOL_IP level, named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the ephemeral port range for each socket individually. The option can be used only to narrow down the per-netns local port range. If the per-socket range lies outside of the per-netns range, the latter takes precedence. UAPI-wise, the low and high range bounds are passed to the kernel as a pair of u16 values in host byte order packed into a u32. This avoids pointer passing. PORT_LO = 40_000 PORT_HI = 40_511 s = socket(AF_INET, SOCK_STREAM) v = struct.pack("I", PORT_HI << 16 | PORT_LO) s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v) s.bind(("127.0.0.1", 0)) s.getsockname() # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511), # if there is a free port. EADDRINUSE otherwise. [1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116 Reviewed-by: Marek Majkowski Reviewed-by: Kuniyuki Iwashima Signed-off-by: Jakub Sitnicki Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/net/inet_sock.h | 4 ++++ include/net/ip.h | 3 ++- include/uapi/linux/in.h | 1 + net/ipv4/inet_connection_sock.c | 25 +++++++++++++++++++++++-- net/ipv4/inet_hashtables.c | 2 +- net/ipv4/ip_sockglue.c | 18 ++++++++++++++++++ net/ipv4/udp.c | 2 +- net/sctp/socket.c | 2 +- 8 files changed, 51 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index bf5654ce711e..51857117ac09 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -249,6 +249,10 @@ struct inet_sock { __be32 mc_addr; struct ip_mc_socklist __rcu *mc_list; struct inet_cork_full cork; + struct { + __u16 lo; + __u16 hi; + } local_port_range; }; #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */ diff --git a/include/net/ip.h b/include/net/ip.h index 144bdfbb25af..c3fffaa92d6e 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -340,7 +340,8 @@ static inline u64 snmp_fold_field64(void __percpu *mib, int offt, size_t syncp_o } \ } -void inet_get_local_port_range(struct net *net, int *low, int *high); +void inet_get_local_port_range(const struct net *net, int *low, int *high); +void inet_sk_get_local_port_range(const struct sock *sk, int *low, int *high); #ifdef CONFIG_SYSCTL static inline bool inet_is_local_reserved_port(struct net *net, unsigned short port) diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index 07a4cb149305..4b7f2df66b99 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -162,6 +162,7 @@ struct in_addr { #define MCAST_MSFILTER 48 #define IP_MULTICAST_ALL 49 #define IP_UNICAST_IF 50 +#define IP_LOCAL_PORT_RANGE 51 #define MCAST_EXCLUDE 0 #define MCAST_INCLUDE 1 diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d1f837579398..6ed7e65de494 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -117,7 +117,7 @@ bool inet_rcv_saddr_any(const struct sock *sk) return !sk->sk_rcv_saddr; } -void inet_get_local_port_range(struct net *net, int *low, int *high) +void inet_get_local_port_range(const struct net *net, int *low, int *high) { unsigned int seq; @@ -130,6 +130,27 @@ void inet_get_local_port_range(struct net *net, int *low, int *high) } EXPORT_SYMBOL(inet_get_local_port_range); +void inet_sk_get_local_port_range(const struct sock *sk, int *low, int *high) +{ + const struct inet_sock *inet = inet_sk(sk); + const struct net *net = sock_net(sk); + int lo, hi, sk_lo, sk_hi; + + inet_get_local_port_range(net, &lo, &hi); + + sk_lo = inet->local_port_range.lo; + sk_hi = inet->local_port_range.hi; + + if (unlikely(lo <= sk_lo && sk_lo <= hi)) + lo = sk_lo; + if (unlikely(lo <= sk_hi && sk_hi <= hi)) + hi = sk_hi; + + *low = lo; + *high = hi; +} +EXPORT_SYMBOL(inet_sk_get_local_port_range); + static bool inet_use_bhash2_on_bind(const struct sock *sk) { #if IS_ENABLED(CONFIG_IPV6) @@ -316,7 +337,7 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret, ports_exhausted: attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0; other_half_scan: - inet_get_local_port_range(net, &low, &high); + inet_sk_get_local_port_range(sk, &low, &high); high++; /* [32768, 60999] -> [32768, 61000[ */ if (high - low < 4) attempt_half = 0; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 7a13dd7f546b..e41fdc38ce19 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -1016,7 +1016,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, l3mdev = inet_sk_bound_l3mdev(sk); - inet_get_local_port_range(net, &low, &high); + inet_sk_get_local_port_range(sk, &low, &high); high++; /* [32768, 60999] -> [32768, 61000[ */ remaining = high - low; if (likely(remaining > 1)) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 9f92ae35bb01..b511ff0adc0a 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -923,6 +923,7 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname, case IP_CHECKSUM: case IP_RECVFRAGSIZE: case IP_RECVERR_RFC4884: + case IP_LOCAL_PORT_RANGE: if (optlen >= sizeof(int)) { if (copy_from_sockptr(&val, optval, sizeof(val))) return -EFAULT; @@ -1365,6 +1366,20 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname, WRITE_ONCE(inet->min_ttl, val); break; + case IP_LOCAL_PORT_RANGE: + { + const __u16 lo = val; + const __u16 hi = val >> 16; + + if (optlen != sizeof(__u32)) + goto e_inval; + if (lo != 0 && hi != 0 && lo > hi) + goto e_inval; + + inet->local_port_range.lo = lo; + inet->local_port_range.hi = hi; + break; + } default: err = -ENOPROTOOPT; break; @@ -1743,6 +1758,9 @@ int do_ip_getsockopt(struct sock *sk, int level, int optname, case IP_MINTTL: val = inet->min_ttl; break; + case IP_LOCAL_PORT_RANGE: + val = inet->local_port_range.hi << 16 | inet->local_port_range.lo; + break; default: sockopt_release_sock(sk); return -ENOPROTOOPT; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9592fe3e444a..c605d171eb2d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -248,7 +248,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, int low, high, remaining; unsigned int rand; - inet_get_local_port_range(net, &low, &high); + inet_sk_get_local_port_range(sk, &low, &high); remaining = (high - low) + 1; rand = get_random_u32(); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index a98511b676cd..b91616f819de 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -8322,7 +8322,7 @@ static int sctp_get_port_local(struct sock *sk, union sctp_addr *addr) int low, high, remaining, index; unsigned int rover; - inet_get_local_port_range(net, &low, &high); + inet_sk_get_local_port_range(sk, &low, &high); remaining = (high - low) + 1; rover = get_random_u32_below(remaining) + low; -- cgit v1.2.3 From d0941130c93515411c8d66fc22bdae407b509a6d Mon Sep 17 00:00:00 2001 From: Jamie Bainbridge Date: Wed, 25 Jan 2023 11:16:52 +1100 Subject: icmp: Add counters for rate limits There are multiple ICMP rate limiting mechanisms: * Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec * v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask * v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask However, when ICMP output is limited, there is no way to tell which limit has been hit or even if the limits are responsible for the lack of ICMP output. Add counters for each of the cases above. As we are within local_bh_disable(), use the __INC stats variant. Example output: # nstat -sz "*RateLimit*" IcmpOutRateLimitGlobal 134 0.0 IcmpOutRateLimitHost 770 0.0 Icmp6OutRateLimitHost 84 0.0 Signed-off-by: Jamie Bainbridge Suggested-by: Abhishek Rawal Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni --- include/uapi/linux/snmp.h | 3 +++ net/ipv4/icmp.c | 3 +++ net/ipv4/proc.c | 8 +++++--- net/ipv6/icmp.c | 4 ++++ net/ipv6/proc.c | 1 + 5 files changed, 16 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 6600cb0164c2..26f33a4c253d 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -95,6 +95,8 @@ enum ICMP_MIB_OUTADDRMASKS, /* OutAddrMasks */ ICMP_MIB_OUTADDRMASKREPS, /* OutAddrMaskReps */ ICMP_MIB_CSUMERRORS, /* InCsumErrors */ + ICMP_MIB_RATELIMITGLOBAL, /* OutRateLimitGlobal */ + ICMP_MIB_RATELIMITHOST, /* OutRateLimitHost */ __ICMP_MIB_MAX }; @@ -112,6 +114,7 @@ enum ICMP6_MIB_OUTMSGS, /* OutMsgs */ ICMP6_MIB_OUTERRORS, /* OutErrors */ ICMP6_MIB_CSUMERRORS, /* InCsumErrors */ + ICMP6_MIB_RATELIMITHOST, /* OutRateLimitHost */ __ICMP6_MIB_MAX }; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 46aa2d65e40a..8cebb476b3ab 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -296,6 +296,7 @@ static bool icmpv4_global_allow(struct net *net, int type, int code) if (icmp_global_allow()) return true; + __ICMP_INC_STATS(net, ICMP_MIB_RATELIMITGLOBAL); return false; } @@ -325,6 +326,8 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt, if (peer) inet_putpeer(peer); out: + if (!rc) + __ICMP_INC_STATS(net, ICMP_MIB_RATELIMITHOST); return rc; } diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index f88daace9de3..eaf1d3113b62 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -353,7 +353,7 @@ static void icmp_put(struct seq_file *seq) seq_puts(seq, "\nIcmp: InMsgs InErrors InCsumErrors"); for (i = 0; icmpmibmap[i].name; i++) seq_printf(seq, " In%s", icmpmibmap[i].name); - seq_puts(seq, " OutMsgs OutErrors"); + seq_puts(seq, " OutMsgs OutErrors OutRateLimitGlobal OutRateLimitHost"); for (i = 0; icmpmibmap[i].name; i++) seq_printf(seq, " Out%s", icmpmibmap[i].name); seq_printf(seq, "\nIcmp: %lu %lu %lu", @@ -363,9 +363,11 @@ static void icmp_put(struct seq_file *seq) for (i = 0; icmpmibmap[i].name; i++) seq_printf(seq, " %lu", atomic_long_read(ptr + icmpmibmap[i].index)); - seq_printf(seq, " %lu %lu", + seq_printf(seq, " %lu %lu %lu %lu", snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_OUTMSGS), - snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_OUTERRORS)); + snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_OUTERRORS), + snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_RATELIMITGLOBAL), + snmp_fold_field(net->mib.icmp_statistics, ICMP_MIB_RATELIMITHOST)); for (i = 0; icmpmibmap[i].name; i++) seq_printf(seq, " %lu", atomic_long_read(ptr + (icmpmibmap[i].index | 0x100))); diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 9d92d51c4757..79c769c0d113 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -183,6 +183,7 @@ static bool icmpv6_global_allow(struct net *net, int type) if (icmp_global_allow()) return true; + __ICMP_INC_STATS(net, ICMP_MIB_RATELIMITGLOBAL); return false; } @@ -224,6 +225,9 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type, if (peer) inet_putpeer(peer); } + if (!res) + __ICMP6_INC_STATS(net, ip6_dst_idev(dst), + ICMP6_MIB_RATELIMITHOST); dst_release(dst); return res; } diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c index d6306aa46bb1..e20b3705c2d2 100644 --- a/net/ipv6/proc.c +++ b/net/ipv6/proc.c @@ -94,6 +94,7 @@ static const struct snmp_mib snmp6_icmp6_list[] = { SNMP_MIB_ITEM("Icmp6OutMsgs", ICMP6_MIB_OUTMSGS), SNMP_MIB_ITEM("Icmp6OutErrors", ICMP6_MIB_OUTERRORS), SNMP_MIB_ITEM("Icmp6InCsumErrors", ICMP6_MIB_CSUMERRORS), + SNMP_MIB_ITEM("Icmp6OutRateLimitHost", ICMP6_MIB_RATELIMITHOST), SNMP_MIB_SENTINEL }; -- cgit v1.2.3 From b9d69db87fb77fc80997993d40f091b323b3651e Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Wed, 25 Jan 2023 11:47:21 +0100 Subject: mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses Currently the in-kernel PM arbitrary enforces that created subflow's family must match the main MPTCP socket while the RFC allows mixing IPv4 and IPv6 subflows. This patch changes the in-kernel PM logic to create subflows matching the currently selected source (or destination) address. IPv4 sockets can pick only IPv4 addresses (and v4 mapped in v6), while IPv6 sockets not restricted to V6ONLY can pick either IPv4 and IPv6 addresses as long as the source and destination matches. A helper, previously introduced is used to ease family matching checks, taking care of IPv4 vs IPv4-mapped-IPv6 vs IPv6 only addresses. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/269 Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts Signed-off-by: Paolo Abeni --- net/mptcp/pm_netlink.c | 58 +++++++++++++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 27 deletions(-) (limited to 'net') diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index b5505b8167f9..db07cc5b4fcb 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -152,7 +152,6 @@ static struct mptcp_pm_addr_entry * select_local_address(const struct pm_nl_pernet *pernet, const struct mptcp_sock *msk) { - const struct sock *sk = (const struct sock *)msk; struct mptcp_pm_addr_entry *entry, *ret = NULL; msk_owned_by_me(msk); @@ -165,16 +164,6 @@ select_local_address(const struct pm_nl_pernet *pernet, if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap)) continue; - if (entry->addr.family != sk->sk_family) { -#if IS_ENABLED(CONFIG_MPTCP_IPV6) - if ((entry->addr.family == AF_INET && - !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) || - (sk->sk_family == AF_INET && - !ipv6_addr_v4mapped(&entry->addr.addr6))) -#endif - continue; - } - ret = entry; break; } @@ -423,7 +412,9 @@ static bool lookup_address_in_vec(const struct mptcp_addr_info *addrs, unsigned /* Fill all the remote addresses into the array addrs[], * and return the array size. */ -static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullmesh, +static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, + struct mptcp_addr_info *local, + bool fullmesh, struct mptcp_addr_info *addrs) { bool deny_id0 = READ_ONCE(msk->pm.remote_deny_join_id0); @@ -443,6 +434,9 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm if (deny_id0) return 0; + if (!mptcp_pm_addr_families_match(sk, local, &remote)) + return 0; + msk->pm.subflows++; addrs[i++] = remote; } else { @@ -453,6 +447,9 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm if (deny_id0 && !addrs[i].id) continue; + if (!mptcp_pm_addr_families_match(sk, local, &addrs[i])) + continue; + if (!lookup_address_in_vec(addrs, i, &addrs[i]) && msk->pm.subflows < subflows_max) { msk->pm.subflows++; @@ -603,9 +600,11 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) fullmesh = !!(local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH); msk->pm.local_addr_used++; - nr = fill_remote_addresses_vec(msk, fullmesh, addrs); - if (nr) - __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + nr = fill_remote_addresses_vec(msk, &local->addr, fullmesh, addrs); + if (nr == 0) + continue; + spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) __mptcp_subflow_connect(sk, &local->addr, &addrs[i]); @@ -628,11 +627,11 @@ static void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk) * and return the array size. */ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, + struct mptcp_addr_info *remote, struct mptcp_addr_info *addrs) { struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; - struct mptcp_addr_info local; struct pm_nl_pernet *pernet; unsigned int subflows_max; int i = 0; @@ -645,15 +644,8 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, if (!(entry->flags & MPTCP_PM_ADDR_FLAG_FULLMESH)) continue; - if (entry->addr.family != sk->sk_family) { -#if IS_ENABLED(CONFIG_MPTCP_IPV6) - if ((entry->addr.family == AF_INET && - !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) || - (sk->sk_family == AF_INET && - !ipv6_addr_v4mapped(&entry->addr.addr6))) -#endif - continue; - } + if (!mptcp_pm_addr_families_match(sk, &entry->addr, remote)) + continue; if (msk->pm.subflows < subflows_max) { msk->pm.subflows++; @@ -666,8 +658,18 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, * 'IPADDRANY' local address */ if (!i) { + struct mptcp_addr_info local; + memset(&local, 0, sizeof(local)); - local.family = msk->pm.remote.family; + local.family = +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + remote->family == AF_INET6 && + ipv6_addr_v4mapped(&remote->addr6) ? AF_INET : +#endif + remote->family; + + if (!mptcp_pm_addr_families_match(sk, &local, remote)) + return 0; msk->pm.subflows++; addrs[i++] = local; @@ -706,7 +708,9 @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) /* connect to the specified remote address, using whatever * local address the routing configuration will pick. */ - nr = fill_local_addresses_vec(msk, addrs); + nr = fill_local_addresses_vec(msk, &remote, addrs); + if (nr == 0) + return; msk->pm.add_addr_accepted++; if (msk->pm.add_addr_accepted >= add_addr_accept_max || -- cgit v1.2.3 From 7e9740e0e84e21281759590ccec4e3c926eb54bf Mon Sep 17 00:00:00 2001 From: Matthieu Baerts Date: Wed, 25 Jan 2023 11:47:22 +0100 Subject: mptcp: propagate sk_ipv6only to subflows Usually, attributes are propagated to subflows as well. Here, if subflows are created by other ways than the MPTCP path-manager, it is important to make sure they are in v6 if it is asked by the userspace. Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts Signed-off-by: Paolo Abeni --- net/mptcp/sockopt.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net') diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 582ed93bcc8a..9986681aaf40 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1255,6 +1255,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk) ssk->sk_priority = sk->sk_priority; ssk->sk_bound_dev_if = sk->sk_bound_dev_if; ssk->sk_incoming_cpu = sk->sk_incoming_cpu; + ssk->sk_ipv6only = sk->sk_ipv6only; __ip_sock_set_tos(ssk, inet_sk(sk)->tos); if (sk->sk_userlocks & tx_rx_locks) { -- cgit v1.2.3 From 40c71f763f87b68fe43199c1702220c91afed288 Mon Sep 17 00:00:00 2001 From: Matthieu Baerts Date: Wed, 25 Jan 2023 11:47:24 +0100 Subject: mptcp: userspace pm: use a single point of exit Like in all other functions in this file, a single point of exit is used when extra operations are needed: unlock, decrement refcount, etc. There is no functional change for the moment but it is better to do the same here to make sure all cleanups are done in case of intermediate errors. Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts Signed-off-by: Paolo Abeni --- net/mptcp/pm_userspace.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c index ea6ad9da7493..a02d3cbf2a1b 100644 --- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -59,8 +59,8 @@ int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk, */ e = sock_kmalloc(sk, sizeof(*e), GFP_ATOMIC); if (!e) { - spin_unlock_bh(&msk->pm.lock); - return -ENOMEM; + ret = -ENOMEM; + goto append_err; } *e = *entry; @@ -74,6 +74,7 @@ int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk, ret = entry->addr.id; } +append_err: spin_unlock_bh(&msk->pm.lock); return ret; } -- cgit v1.2.3 From 3089386db0901ac6ac3d99fbd601212c98217e60 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 24 Jan 2023 13:54:57 +0200 Subject: xfrm: extend add policy callback to set failure reason Almost all validation logic is in the drivers, but they are missing reliable way to convey failure reason to userspace applications. Let's use extack to return this information to users. Signed-off-by: Leon Romanovsky Signed-off-by: Jakub Kicinski --- Documentation/networking/xfrm_device.rst | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 3 ++- include/linux/netdevice.h | 2 +- net/xfrm/xfrm_device.c | 3 +-- 4 files changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst index c43ace79e320..b9c53e626982 100644 --- a/Documentation/networking/xfrm_device.rst +++ b/Documentation/networking/xfrm_device.rst @@ -73,7 +73,7 @@ Callbacks to implement /* Solely packet offload callbacks */ void (*xdo_dev_state_update_curlft) (struct xfrm_state *x); - int (*xdo_dev_policy_add) (struct xfrm_policy *x); + int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); void (*xdo_dev_policy_delete) (struct xfrm_policy *x); void (*xdo_dev_policy_free) (struct xfrm_policy *x); }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index bb9023957f74..83e0f874484e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -550,7 +550,8 @@ mlx5e_ipsec_build_accel_pol_attrs(struct mlx5e_ipsec_pol_entry *pol_entry, attrs->reqid = x->xfrm_vec[0].reqid; } -static int mlx5e_xfrm_add_policy(struct xfrm_policy *x) +static int mlx5e_xfrm_add_policy(struct xfrm_policy *x, + struct netlink_ext_ack *extack) { struct net_device *netdev = x->xdo.real_dev; struct mlx5e_ipsec_pol_entry *pol_entry; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index aad12a179e54..7c43b9fb9aae 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1042,7 +1042,7 @@ struct xfrmdev_ops { struct xfrm_state *x); void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); void (*xdo_dev_state_update_curlft) (struct xfrm_state *x); - int (*xdo_dev_policy_add) (struct xfrm_policy *x); + int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); void (*xdo_dev_policy_delete) (struct xfrm_policy *x); void (*xdo_dev_policy_free) (struct xfrm_policy *x); }; diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 4aff76c6f12e..2cec637a4a9c 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -383,14 +383,13 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp, return -EINVAL; } - err = dev->xfrmdev_ops->xdo_dev_policy_add(xp); + err = dev->xfrmdev_ops->xdo_dev_policy_add(xp, extack); if (err) { xdo->dev = NULL; xdo->real_dev = NULL; xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED; xdo->dir = 0; netdev_put(dev, &xdo->dev_tracker); - NL_SET_ERR_MSG(extack, "Device failed to offload this policy"); return err; } -- cgit v1.2.3 From 7681a4f58fb9c338d6dfe1181607f84c793d77de Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 24 Jan 2023 13:54:59 +0200 Subject: xfrm: extend add state callback to set failure reason Almost all validation logic is in the drivers, but they are missing reliable way to convey failure reason to userspace applications. Let's use extack to return this information to users. Signed-off-by: Leon Romanovsky Signed-off-by: Jakub Kicinski --- Documentation/networking/xfrm_device.rst | 2 +- drivers/net/bonding/bond_main.c | 8 +++++--- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 +++-- drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c | 6 ++++-- drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 6 ++++-- drivers/net/ethernet/intel/ixgbevf/ipsec.c | 4 +++- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 3 ++- drivers/net/ethernet/netronome/nfp/crypto/ipsec.c | 3 ++- drivers/net/netdevsim/ipsec.c | 3 ++- include/linux/netdevice.h | 2 +- net/xfrm/xfrm_device.c | 6 ++---- net/xfrm/xfrm_state.c | 2 +- 12 files changed, 30 insertions(+), 20 deletions(-) (limited to 'net') diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst index b9c53e626982..83abdfef4ec3 100644 --- a/Documentation/networking/xfrm_device.rst +++ b/Documentation/networking/xfrm_device.rst @@ -64,7 +64,7 @@ Callbacks to implement /* from include/linux/netdevice.h */ struct xfrmdev_ops { /* Crypto and Packet offload callbacks */ - int (*xdo_dev_state_add) (struct xfrm_state *x); + int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); void (*xdo_dev_state_delete) (struct xfrm_state *x); void (*xdo_dev_state_free) (struct xfrm_state *x); bool (*xdo_dev_offload_ok) (struct sk_buff *skb, diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 0363ce597661..686b2a6fd674 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -419,8 +419,10 @@ static int bond_vlan_rx_kill_vid(struct net_device *bond_dev, /** * bond_ipsec_add_sa - program device with a security association * @xs: pointer to transformer state struct + * @extack: extack point to fill failure reason **/ -static int bond_ipsec_add_sa(struct xfrm_state *xs) +static int bond_ipsec_add_sa(struct xfrm_state *xs, + struct netlink_ext_ack *extack) { struct net_device *bond_dev = xs->xso.dev; struct bond_ipsec *ipsec; @@ -454,7 +456,7 @@ static int bond_ipsec_add_sa(struct xfrm_state *xs) } xs->xso.real_dev = slave->dev; - err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs); + err = slave->dev->xfrmdev_ops->xdo_dev_state_add(xs, extack); if (!err) { ipsec->xs = xs; INIT_LIST_HEAD(&ipsec->list); @@ -494,7 +496,7 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) spin_lock_bh(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { ipsec->xs->xso.real_dev = slave->dev; - if (slave->dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs)) { + if (slave->dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) { slave_warn(bond_dev, slave->dev, "%s: failed to add SA\n", __func__); ipsec->xs->xso.real_dev = NULL; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 9cbce1faab26..6c0a41f3ae44 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -6490,7 +6490,8 @@ static const struct tlsdev_ops cxgb4_ktls_ops = { #if IS_ENABLED(CONFIG_CHELSIO_IPSEC_INLINE) -static int cxgb4_xfrm_add_state(struct xfrm_state *x) +static int cxgb4_xfrm_add_state(struct xfrm_state *x, + struct netlink_ext_ack *extack) { struct adapter *adap = netdev2adap(x->xso.dev); int ret; @@ -6504,7 +6505,7 @@ static int cxgb4_xfrm_add_state(struct xfrm_state *x) if (ret) goto out_unlock; - ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(x); + ret = adap->uld[CXGB4_ULD_IPSEC].xfrmdev_ops->xdo_dev_state_add(x, extack); out_unlock: mutex_unlock(&uld_mutex); diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c index ca21794281d6..ac2ea6206af1 100644 --- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c +++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ipsec/chcr_ipsec.c @@ -80,7 +80,8 @@ static void *ch_ipsec_uld_add(const struct cxgb4_lld_info *infop); static void ch_ipsec_advance_esn_state(struct xfrm_state *x); static void ch_ipsec_xfrm_free_state(struct xfrm_state *x); static void ch_ipsec_xfrm_del_state(struct xfrm_state *x); -static int ch_ipsec_xfrm_add_state(struct xfrm_state *x); +static int ch_ipsec_xfrm_add_state(struct xfrm_state *x, + struct netlink_ext_ack *extack); static const struct xfrmdev_ops ch_ipsec_xfrmdev_ops = { .xdo_dev_state_add = ch_ipsec_xfrm_add_state, @@ -226,7 +227,8 @@ out: * returns 0 on success, negative error if failed to send message to FPGA * positive error if FPGA returned a bad response */ -static int ch_ipsec_xfrm_add_state(struct xfrm_state *x) +static int ch_ipsec_xfrm_add_state(struct xfrm_state *x, + struct netlink_ext_ack *extack) { struct ipsec_sa_entry *sa_entry; int res = 0; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c index 53a969e34883..07c37dc619e8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -557,8 +557,10 @@ static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs) /** * ixgbe_ipsec_add_sa - program device with a security association * @xs: pointer to transformer state struct + * @extack: extack point to fill failure reason **/ -static int ixgbe_ipsec_add_sa(struct xfrm_state *xs) +static int ixgbe_ipsec_add_sa(struct xfrm_state *xs, + struct netlink_ext_ack *extack) { struct net_device *dev = xs->xso.real_dev; struct ixgbe_adapter *adapter = netdev_priv(dev); @@ -950,7 +952,7 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) memcpy(xs->aead->alg_name, aes_gcm_name, sizeof(aes_gcm_name)); /* set up the HW offload */ - err = ixgbe_ipsec_add_sa(xs); + err = ixgbe_ipsec_add_sa(xs, NULL); if (err) goto err_aead; diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c index c1cf540d162a..752b9df4fb51 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c +++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c @@ -257,8 +257,10 @@ static int ixgbevf_ipsec_parse_proto_keys(struct xfrm_state *xs, /** * ixgbevf_ipsec_add_sa - program device with a security association * @xs: pointer to transformer state struct + * @extack: extack point to fill failure reason **/ -static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs) +static int ixgbevf_ipsec_add_sa(struct xfrm_state *xs, + struct netlink_ext_ack *extack) { struct net_device *dev = xs->xso.real_dev; struct ixgbevf_adapter *adapter; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 3236c3b43149..a889df77dd2d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -298,7 +298,8 @@ static void _update_xfrm_state(struct work_struct *work) mlx5_accel_esp_modify_xfrm(sa_entry, &modify_work->attrs); } -static int mlx5e_xfrm_add_state(struct xfrm_state *x) +static int mlx5e_xfrm_add_state(struct xfrm_state *x, + struct netlink_ext_ack *extack) { struct mlx5e_ipsec_sa_entry *sa_entry = NULL; struct net_device *netdev = x->xso.real_dev; diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c index 4632268695cb..41b98f2b7402 100644 --- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c +++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c @@ -260,7 +260,8 @@ static void set_sha2_512hmac(struct nfp_ipsec_cfg_add_sa *cfg, int *trunc_len) } } -static int nfp_net_xfrm_add_state(struct xfrm_state *x) +static int nfp_net_xfrm_add_state(struct xfrm_state *x, + struct netlink_ext_ack *extack) { struct net_device *netdev = x->xso.dev; struct nfp_ipsec_cfg_mssg msg = {}; diff --git a/drivers/net/netdevsim/ipsec.c b/drivers/net/netdevsim/ipsec.c index b93baf5c8bee..84a02d69abad 100644 --- a/drivers/net/netdevsim/ipsec.c +++ b/drivers/net/netdevsim/ipsec.c @@ -125,7 +125,8 @@ static int nsim_ipsec_parse_proto_keys(struct xfrm_state *xs, return 0; } -static int nsim_ipsec_add_sa(struct xfrm_state *xs) +static int nsim_ipsec_add_sa(struct xfrm_state *xs, + struct netlink_ext_ack *extack) { struct nsim_ipsec *ipsec; struct net_device *dev; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7c43b9fb9aae..63b77cbc947e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1035,7 +1035,7 @@ struct netdev_bpf { #ifdef CONFIG_XFRM_OFFLOAD struct xfrmdev_ops { - int (*xdo_dev_state_add) (struct xfrm_state *x); + int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); void (*xdo_dev_state_delete) (struct xfrm_state *x); void (*xdo_dev_state_free) (struct xfrm_state *x); bool (*xdo_dev_offload_ok) (struct sk_buff *skb, diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 2cec637a4a9c..562b9d951598 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -309,7 +309,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, else xso->type = XFRM_DEV_OFFLOAD_CRYPTO; - err = dev->xfrmdev_ops->xdo_dev_state_add(x); + err = dev->xfrmdev_ops->xdo_dev_state_add(x, extack); if (err) { xso->dev = NULL; xso->dir = 0; @@ -325,10 +325,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, * authors to do not return -EOPNOTSUPP in packet offload mode. */ WARN_ON(err == -EOPNOTSUPP && is_packet_offload); - if (err != -EOPNOTSUPP || is_packet_offload) { - NL_SET_ERR_MSG(extack, "Device failed to offload this state"); + if (err != -EOPNOTSUPP || is_packet_offload) return err; - } } return 0; diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 89c731f4f0c7..59fffa02d1cc 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1274,7 +1274,7 @@ found: xso->real_dev = xdo->real_dev; netdev_tracker_alloc(xso->dev, &xso->dev_tracker, GFP_ATOMIC); - error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x); + error = xso->dev->xfrmdev_ops->xdo_dev_state_add(x, NULL); if (error) { xso->dir = 0; netdev_put(xso->dev, &xso->dev_tracker); -- cgit v1.2.3 From 2195e2a024aef567aea6ea0b0dab52f77bcc7b55 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 25 Jan 2023 23:14:17 -0800 Subject: net: skbuff: drop the linux/textsearch.h include This include was added for skb_find_text() but all we need there is a forward declaration of struct ts_config. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 +- net/core/skbuff.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b93818e11da0..7eeb06f9ca1f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -279,6 +278,7 @@ struct napi_struct; struct bpf_prog; union bpf_attr; struct skb_ext; +struct ts_config; #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) struct nf_bridge_info { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 180df58e85c7..bb79b4cb89db 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -79,6 +79,7 @@ #include #include #include +#include #include "dev.h" #include "sock_destructor.h" -- cgit v1.2.3 From 2870c4d6a5e479659cb84992717189634c1e71e0 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 25 Jan 2023 23:14:18 -0800 Subject: net: add missing includes of linux/sched/clock.h Number of files depend on linux/sched/clock.h getting included by linux/skbuff.h which soon will no longer be the case. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 1 + drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c | 2 ++ drivers/scsi/lpfc/lpfc_init.c | 1 + include/linux/filter.h | 1 + net/rds/ib_recv.c | 1 + net/rds/recv.c | 1 + 6 files changed, 7 insertions(+) (limited to 'net') diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c index 142415c84c6b..a0b46e7d863e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c @@ -2,6 +2,7 @@ /* Copyright (c) 2018-2019 Hisilicon Limited. */ #include +#include #include "hclge_debugfs.h" #include "hclge_err.h" diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c index 6efd768cc07c..3f35227ef1fa 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0+ /* Copyright (c) 2016-2017 Hisilicon Limited. */ +#include + #include "hclge_err.h" static const struct hclge_hw_error hclge_imp_tcm_ecc_int[] = { diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 25ba20e42825..389a35308be3 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include diff --git a/include/linux/filter.h b/include/linux/filter.h index ccc4a4a58c72..1727898f1641 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c index cfbf0e129cba..e53b7f266bd7 100644 --- a/net/rds/ib_recv.c +++ b/net/rds/ib_recv.c @@ -31,6 +31,7 @@ * */ #include +#include #include #include #include diff --git a/net/rds/recv.c b/net/rds/recv.c index 5b426dc3634d..c71b923764fd 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include -- cgit v1.2.3 From 509f15b9c551b750d8f634d404805c3860e9ea17 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 25 Jan 2023 23:14:21 -0800 Subject: net: add missing includes of linux/splice.h Number of files depend on linux/splice.h getting included by linux/skbuff.h which soon will no longer be the case. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/smc/af_smc.c | 1 + net/smc/smc_rx.c | 1 + net/unix/af_unix.c | 1 + 3 files changed, 3 insertions(+) (limited to 'net') diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 036532cf39aa..1c0fe9ba5358 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c index 0a6e615f000c..4380d32f5a5f 100644 --- a/net/smc/smc_rx.c +++ b/net/smc/smc_rx.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 009616fa0256..0be25e712c28 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -112,6 +112,7 @@ #include #include #include +#include #include #include #include -- cgit v1.2.3 From 99132b6eb7927a549351f57638a1d560039f06f9 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 25 Jan 2023 15:05:18 -0800 Subject: ethtool: netlink: handle SET intro/outro in the common code Most ethtool SET callbacks follow the same general structure. ethnl_parse_header_dev_get() rtnl_lock() ethnl_ops_begin() ... do stuff ... ethtool_notify() ethnl_ops_complete() rtnl_unlock() ethnl_parse_header_dev_put() This leads to a lot of copy / pasted code an bugs when people mis-handle the error path. Add a generic implementation of this pattern with a .set callback in struct ethnl_request_ops called to "do stuff". Also add an optional .set_validate which is called before ethnl_ops_begin() -- a lot of implementations do basic request capability / sanity checking at that point. Because we want to avoid generating the notification when no change happened - adopt a slightly hairy return values: - 0 means nothing to do (no notification) - 1 means done / continue - negative error codes on error Reuse .hdr_attr from struct ethnl_request_ops, GET and SET use the same attr spaces in all cases. Convert pause as an example (and to avoid unused function warnings). Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/ethtool/netlink.c | 49 +++++++++++++++++++++++++++++++- net/ethtool/netlink.h | 22 ++++++++++++-- net/ethtool/pause.c | 79 +++++++++++++++++++++------------------------------ 3 files changed, 100 insertions(+), 50 deletions(-) (limited to 'net') diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index 6412c4dc663d..e90ac1f0a1d7 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -279,6 +279,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = { [ETHTOOL_MSG_CHANNELS_GET] = ðnl_channels_request_ops, [ETHTOOL_MSG_COALESCE_GET] = ðnl_coalesce_request_ops, [ETHTOOL_MSG_PAUSE_GET] = ðnl_pause_request_ops, + [ETHTOOL_MSG_PAUSE_SET] = ðnl_pause_request_ops, [ETHTOOL_MSG_EEE_GET] = ðnl_eee_request_ops, [ETHTOOL_MSG_FEC_GET] = ðnl_fec_request_ops, [ETHTOOL_MSG_TSINFO_GET] = ðnl_tsinfo_request_ops, @@ -591,6 +592,52 @@ static int ethnl_default_done(struct netlink_callback *cb) return 0; } +static int ethnl_default_set_doit(struct sk_buff *skb, struct genl_info *info) +{ + const struct ethnl_request_ops *ops; + struct ethnl_req_info req_info = {}; + const u8 cmd = info->genlhdr->cmd; + int ret; + + ops = ethnl_default_requests[cmd]; + if (WARN_ONCE(!ops, "cmd %u has no ethnl_request_ops\n", cmd)) + return -EOPNOTSUPP; + if (GENL_REQ_ATTR_CHECK(info, ops->hdr_attr)) + return -EINVAL; + + ret = ethnl_parse_header_dev_get(&req_info, info->attrs[ops->hdr_attr], + genl_info_net(info), info->extack, + true); + if (ret < 0) + return ret; + + if (ops->set_validate) { + ret = ops->set_validate(&req_info, info); + /* 0 means nothing to do */ + if (ret <= 0) + goto out_dev; + } + + rtnl_lock(); + ret = ethnl_ops_begin(req_info.dev); + if (ret < 0) + goto out_rtnl; + + ret = ops->set(&req_info, info); + if (ret <= 0) + goto out_ops; + ethtool_notify(req_info.dev, ops->set_ntf_cmd, NULL); + + ret = 0; +out_ops: + ethnl_ops_complete(req_info.dev); +out_rtnl: + rtnl_unlock(); +out_dev: + ethnl_parse_header_dev_put(&req_info); + return ret; +} + static const struct ethnl_request_ops * ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = { [ETHTOOL_MSG_LINKINFO_NTF] = ðnl_linkinfo_request_ops, @@ -921,7 +968,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_PAUSE_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_pause, + .doit = ethnl_default_set_doit, .policy = ethnl_pause_set_policy, .maxattr = ARRAY_SIZE(ethnl_pause_set_policy) - 1, }, diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index b01f7cd542c4..aca4bdb637a6 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -284,13 +284,14 @@ int ethnl_ops_begin(struct net_device *dev); void ethnl_ops_complete(struct net_device *dev); /** - * struct ethnl_request_ops - unified handling of GET requests + * struct ethnl_request_ops - unified handling of GET and SET requests * @request_cmd: command id for request (GET) * @reply_cmd: command id for reply (GET_REPLY) * @hdr_attr: attribute type for request header * @req_info_size: size of request info * @reply_data_size: size of reply data * @allow_nodev_do: allow non-dump request with no device identification + * @set_ntf_cmd: notification to generate on changes (SET) * @parse_request: * Parse request except common header (struct ethnl_req_info). Common * header is already filled on entry, the rest up to @repdata_offset @@ -319,6 +320,18 @@ void ethnl_ops_complete(struct net_device *dev); * used e.g. to free any additional data structures outside the main * structure which were allocated by ->prepare_data(). When processing * dump requests, ->cleanup() is called for each message. + * @set_validate: + * Check if set operation is supported for a given device, and perform + * extra input checks. Expected return values: + * - 0 if the operation is a noop for the device (rare) + * - 1 if operation should proceed to calling @set + * - negative errno on errors + * Called without any locks, just a reference on the netdev. + * @set: + * Execute the set operation. The implementation should return + * - 0 if no configuration has changed + * - 1 if configuration changed and notification should be generated + * - negative errno on errors * * Description of variable parts of GET request handling when using the * unified infrastructure. When used, a pointer to an instance of this @@ -335,6 +348,7 @@ struct ethnl_request_ops { unsigned int req_info_size; unsigned int reply_data_size; bool allow_nodev_do; + u8 set_ntf_cmd; int (*parse_request)(struct ethnl_req_info *req_info, struct nlattr **tb, @@ -348,6 +362,11 @@ struct ethnl_request_ops { const struct ethnl_req_info *req_info, const struct ethnl_reply_data *reply_data); void (*cleanup_data)(struct ethnl_reply_data *reply_data); + + int (*set_validate)(struct ethnl_req_info *req_info, + struct genl_info *info); + int (*set)(struct ethnl_req_info *req_info, + struct genl_info *info); }; /* request handlers */ @@ -432,7 +451,6 @@ int ethnl_set_privflags(struct sk_buff *skb, struct genl_info *info); int ethnl_set_rings(struct sk_buff *skb, struct genl_info *info); int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info); int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_pause(struct sk_buff *skb, struct genl_info *info); int ethnl_set_eee(struct sk_buff *skb, struct genl_info *info); int ethnl_act_cable_test(struct sk_buff *skb, struct genl_info *info); int ethnl_act_cable_test_tdr(struct sk_buff *skb, struct genl_info *info); diff --git a/net/ethtool/pause.c b/net/ethtool/pause.c index dcd34b9a849f..6657d0b888d8 100644 --- a/net/ethtool/pause.c +++ b/net/ethtool/pause.c @@ -161,19 +161,6 @@ static int pause_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_pause_request_ops = { - .request_cmd = ETHTOOL_MSG_PAUSE_GET, - .reply_cmd = ETHTOOL_MSG_PAUSE_GET_REPLY, - .hdr_attr = ETHTOOL_A_PAUSE_HEADER, - .req_info_size = sizeof(struct pause_req_info), - .reply_data_size = sizeof(struct pause_reply_data), - - .parse_request = pause_parse_request, - .prepare_data = pause_prepare_data, - .reply_size = pause_reply_size, - .fill_reply = pause_fill_reply, -}; - /* PAUSE_SET */ const struct nla_policy ethnl_pause_set_policy[] = { @@ -184,51 +171,49 @@ const struct nla_policy ethnl_pause_set_policy[] = { [ETHTOOL_A_PAUSE_TX] = { .type = NLA_U8 }, }; -int ethnl_set_pause(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_pause_validate(struct ethnl_req_info *req_info, + struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_pauseparam && ops->set_pauseparam ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_pause(struct ethnl_req_info *req_info, struct genl_info *info) { + struct net_device *dev = req_info->dev; struct ethtool_pauseparam params = {}; - struct ethnl_req_info req_info = {}; struct nlattr **tb = info->attrs; - const struct ethtool_ops *ops; - struct net_device *dev; bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_PAUSE_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; - if (!ops->get_pauseparam || !ops->set_pauseparam) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ops->get_pauseparam(dev, ¶ms); + dev->ethtool_ops->get_pauseparam(dev, ¶ms); ethnl_update_bool32(¶ms.autoneg, tb[ETHTOOL_A_PAUSE_AUTONEG], &mod); ethnl_update_bool32(¶ms.rx_pause, tb[ETHTOOL_A_PAUSE_RX], &mod); ethnl_update_bool32(¶ms.tx_pause, tb[ETHTOOL_A_PAUSE_TX], &mod); - ret = 0; if (!mod) - goto out_ops; + return 0; ret = dev->ethtool_ops->set_pauseparam(dev, ¶ms); - if (ret < 0) - goto out_ops; - ethtool_notify(dev, ETHTOOL_MSG_PAUSE_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return ret < 0 ? ret : 1; } + +const struct ethnl_request_ops ethnl_pause_request_ops = { + .request_cmd = ETHTOOL_MSG_PAUSE_GET, + .reply_cmd = ETHTOOL_MSG_PAUSE_GET_REPLY, + .hdr_attr = ETHTOOL_A_PAUSE_HEADER, + .req_info_size = sizeof(struct pause_req_info), + .reply_data_size = sizeof(struct pause_reply_data), + + .parse_request = pause_parse_request, + .prepare_data = pause_prepare_data, + .reply_size = pause_reply_size, + .fill_reply = pause_fill_reply, + + .set_validate = ethnl_set_pause_validate, + .set = ethnl_set_pause, + .set_ntf_cmd = ETHTOOL_MSG_PAUSE_NTF, +}; -- cgit v1.2.3 From 04007961bfaf76894b65de2af67f96d9d1fa82cf Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 25 Jan 2023 15:05:19 -0800 Subject: ethtool: netlink: convert commands to common SET Convert all SET commands where new common code is applicable. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/ethtool/channels.c | 92 +++++++++++++++++-------------------------- net/ethtool/coalesce.c | 92 +++++++++++++++++++------------------------ net/ethtool/debug.c | 71 ++++++++++++++-------------------- net/ethtool/eee.c | 78 +++++++++++++++---------------------- net/ethtool/fec.c | 83 ++++++++++++++++----------------------- net/ethtool/linkinfo.c | 81 +++++++++++++++++--------------------- net/ethtool/linkmodes.c | 91 ++++++++++++++++++++----------------------- net/ethtool/mm.c | 82 +++++++++++++++------------------------ net/ethtool/module.c | 89 +++++++++++++++++------------------------- net/ethtool/netlink.c | 42 +++++++++++++------- net/ethtool/netlink.h | 14 ------- net/ethtool/plca.c | 75 +++++++++++------------------------ net/ethtool/privflags.c | 84 +++++++++++++++++++--------------------- net/ethtool/pse-pd.c | 79 ++++++++++++++----------------------- net/ethtool/rings.c | 101 ++++++++++++++++++++---------------------------- net/ethtool/wol.c | 79 ++++++++++++++++--------------------- 16 files changed, 508 insertions(+), 725 deletions(-) (limited to 'net') diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c index c7e37130647e..61c40e889a4d 100644 --- a/net/ethtool/channels.c +++ b/net/ethtool/channels.c @@ -86,18 +86,6 @@ static int channels_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_channels_request_ops = { - .request_cmd = ETHTOOL_MSG_CHANNELS_GET, - .reply_cmd = ETHTOOL_MSG_CHANNELS_GET_REPLY, - .hdr_attr = ETHTOOL_A_CHANNELS_HEADER, - .req_info_size = sizeof(struct channels_req_info), - .reply_data_size = sizeof(struct channels_reply_data), - - .prepare_data = channels_prepare_data, - .reply_size = channels_reply_size, - .fill_reply = channels_fill_reply, -}; - /* CHANNELS_SET */ const struct nla_policy ethnl_channels_set_policy[] = { @@ -109,36 +97,28 @@ const struct nla_policy ethnl_channels_set_policy[] = { [ETHTOOL_A_CHANNELS_COMBINED_COUNT] = { .type = NLA_U32 }, }; -int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_channels_validate(struct ethnl_req_info *req_info, + struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_channels && ops->set_channels ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_channels(struct ethnl_req_info *req_info, struct genl_info *info) { unsigned int from_channel, old_total, i; bool mod = false, mod_combined = false; + struct net_device *dev = req_info->dev; struct ethtool_channels channels = {}; - struct ethnl_req_info req_info = {}; struct nlattr **tb = info->attrs; u32 err_attr, max_rxfh_in_use; - const struct ethtool_ops *ops; - struct net_device *dev; u64 max_rxnfc_in_use; int ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_CHANNELS_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; - if (!ops->get_channels || !ops->set_channels) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ops->get_channels(dev, &channels); + dev->ethtool_ops->get_channels(dev, &channels); old_total = channels.combined_count + max(channels.rx_count, channels.tx_count); @@ -151,9 +131,8 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) ethnl_update_u32(&channels.combined_count, tb[ETHTOOL_A_CHANNELS_COMBINED_COUNT], &mod_combined); mod |= mod_combined; - ret = 0; if (!mod) - goto out_ops; + return 0; /* ensure new channel counts are within limits */ if (channels.rx_count > channels.max_rx) @@ -167,10 +146,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) else err_attr = 0; if (err_attr) { - ret = -EINVAL; NL_SET_ERR_MSG_ATTR(info->extack, tb[err_attr], "requested channel count exceeds maximum"); - goto out_ops; + return -EINVAL; } /* ensure there is at least one RX and one TX channel */ @@ -183,10 +161,9 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) if (err_attr) { if (mod_combined) err_attr = ETHTOOL_A_CHANNELS_COMBINED_COUNT; - ret = -EINVAL; NL_SET_ERR_MSG_ATTR(info->extack, tb[err_attr], "requested channel counts would result in no RX or TX channel being configured"); - goto out_ops; + return -EINVAL; } /* ensure the new Rx count fits within the configured Rx flow @@ -198,14 +175,12 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) ethtool_get_max_rxfh_channel(dev, &max_rxfh_in_use)) max_rxfh_in_use = 0; if (channels.combined_count + channels.rx_count <= max_rxfh_in_use) { - ret = -EINVAL; GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing indirection table settings"); - goto out_ops; + return -EINVAL; } if (channels.combined_count + channels.rx_count <= max_rxnfc_in_use) { - ret = -EINVAL; GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing ntuple filter settings"); - goto out_ops; + return -EINVAL; } /* Disabling channels, query zero-copy AF_XDP sockets */ @@ -213,21 +188,26 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) min(channels.rx_count, channels.tx_count); for (i = from_channel; i < old_total; i++) if (xsk_get_pool_from_qid(dev, i)) { - ret = -EINVAL; GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets"); - goto out_ops; + return -EINVAL; } ret = dev->ethtool_ops->set_channels(dev, &channels); - if (ret < 0) - goto out_ops; - ethtool_notify(dev, ETHTOOL_MSG_CHANNELS_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return ret < 0 ? ret : 1; } + +const struct ethnl_request_ops ethnl_channels_request_ops = { + .request_cmd = ETHTOOL_MSG_CHANNELS_GET, + .reply_cmd = ETHTOOL_MSG_CHANNELS_GET_REPLY, + .hdr_attr = ETHTOOL_A_CHANNELS_HEADER, + .req_info_size = sizeof(struct channels_req_info), + .reply_data_size = sizeof(struct channels_reply_data), + + .prepare_data = channels_prepare_data, + .reply_size = channels_reply_size, + .fill_reply = channels_fill_reply, + + .set_validate = ethnl_set_channels_validate, + .set = ethnl_set_channels, + .set_ntf_cmd = ETHTOOL_MSG_CHANNELS_NTF, +}; diff --git a/net/ethtool/coalesce.c b/net/ethtool/coalesce.c index e405b47f7eed..443e7e642c96 100644 --- a/net/ethtool/coalesce.c +++ b/net/ethtool/coalesce.c @@ -195,18 +195,6 @@ static int coalesce_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_coalesce_request_ops = { - .request_cmd = ETHTOOL_MSG_COALESCE_GET, - .reply_cmd = ETHTOOL_MSG_COALESCE_GET_REPLY, - .hdr_attr = ETHTOOL_A_COALESCE_HEADER, - .req_info_size = sizeof(struct coalesce_req_info), - .reply_data_size = sizeof(struct coalesce_reply_data), - - .prepare_data = coalesce_prepare_data, - .reply_size = coalesce_reply_size, - .fill_reply = coalesce_fill_reply, -}; - /* COALESCE_SET */ const struct nla_policy ethnl_coalesce_set_policy[] = { @@ -241,49 +229,44 @@ const struct nla_policy ethnl_coalesce_set_policy[] = { [ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 }, }; -int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_coalesce_validate(struct ethnl_req_info *req_info, + struct genl_info *info) { - struct kernel_ethtool_coalesce kernel_coalesce = {}; - struct ethtool_coalesce coalesce = {}; - struct ethnl_req_info req_info = {}; + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; struct nlattr **tb = info->attrs; - const struct ethtool_ops *ops; - struct net_device *dev; u32 supported_params; - bool mod = false; - int ret; u16 a; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_COALESCE_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; if (!ops->get_coalesce || !ops->set_coalesce) - goto out_dev; + return -EOPNOTSUPP; /* make sure that only supported parameters are present */ supported_params = ops->supported_coalesce_params; for (a = ETHTOOL_A_COALESCE_RX_USECS; a < __ETHTOOL_A_COALESCE_CNT; a++) if (tb[a] && !(supported_params & attr_to_mask(a))) { - ret = -EINVAL; NL_SET_ERR_MSG_ATTR(info->extack, tb[a], "cannot modify an unsupported parameter"); - goto out_dev; + return -EINVAL; } - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ret = ops->get_coalesce(dev, &coalesce, &kernel_coalesce, - info->extack); + return 1; +} + +static int +ethnl_set_coalesce(struct ethnl_req_info *req_info, struct genl_info *info) +{ + struct kernel_ethtool_coalesce kernel_coalesce = {}; + struct net_device *dev = req_info->dev; + struct ethtool_coalesce coalesce = {}; + struct nlattr **tb = info->attrs; + bool mod = false; + int ret; + + ret = dev->ethtool_ops->get_coalesce(dev, &coalesce, &kernel_coalesce, + info->extack); if (ret < 0) - goto out_ops; + return ret; ethnl_update_u32(&coalesce.rx_coalesce_usecs, tb[ETHTOOL_A_COALESCE_RX_USECS], &mod); @@ -339,21 +322,26 @@ int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info) tb[ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES], &mod); ethnl_update_u32(&kernel_coalesce.tx_aggr_time_usecs, tb[ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS], &mod); - ret = 0; if (!mod) - goto out_ops; + return 0; ret = dev->ethtool_ops->set_coalesce(dev, &coalesce, &kernel_coalesce, info->extack); - if (ret < 0) - goto out_ops; - ethtool_notify(dev, ETHTOOL_MSG_COALESCE_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return ret < 0 ? ret : 1; } + +const struct ethnl_request_ops ethnl_coalesce_request_ops = { + .request_cmd = ETHTOOL_MSG_COALESCE_GET, + .reply_cmd = ETHTOOL_MSG_COALESCE_GET_REPLY, + .hdr_attr = ETHTOOL_A_COALESCE_HEADER, + .req_info_size = sizeof(struct coalesce_req_info), + .reply_data_size = sizeof(struct coalesce_reply_data), + + .prepare_data = coalesce_prepare_data, + .reply_size = coalesce_reply_size, + .fill_reply = coalesce_fill_reply, + + .set_validate = ethnl_set_coalesce_validate, + .set = ethnl_set_coalesce, + .set_ntf_cmd = ETHTOOL_MSG_COALESCE_NTF, +}; diff --git a/net/ethtool/debug.c b/net/ethtool/debug.c index d73888c7d19c..e4369769817e 100644 --- a/net/ethtool/debug.c +++ b/net/ethtool/debug.c @@ -63,18 +63,6 @@ static int debug_fill_reply(struct sk_buff *skb, netif_msg_class_names, compact); } -const struct ethnl_request_ops ethnl_debug_request_ops = { - .request_cmd = ETHTOOL_MSG_DEBUG_GET, - .reply_cmd = ETHTOOL_MSG_DEBUG_GET_REPLY, - .hdr_attr = ETHTOOL_A_DEBUG_HEADER, - .req_info_size = sizeof(struct debug_req_info), - .reply_data_size = sizeof(struct debug_reply_data), - - .prepare_data = debug_prepare_data, - .reply_size = debug_reply_size, - .fill_reply = debug_fill_reply, -}; - /* DEBUG_SET */ const struct nla_policy ethnl_debug_set_policy[] = { @@ -83,46 +71,47 @@ const struct nla_policy ethnl_debug_set_policy[] = { [ETHTOOL_A_DEBUG_MSGMASK] = { .type = NLA_NESTED }, }; -int ethnl_set_debug(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_debug_validate(struct ethnl_req_info *req_info, + struct genl_info *info) { - struct ethnl_req_info req_info = {}; + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_msglevel && ops->set_msglevel ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_debug(struct ethnl_req_info *req_info, struct genl_info *info) +{ + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; - struct net_device *dev; bool mod = false; u32 msg_mask; int ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_DEBUG_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ret = -EOPNOTSUPP; - if (!dev->ethtool_ops->get_msglevel || !dev->ethtool_ops->set_msglevel) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - msg_mask = dev->ethtool_ops->get_msglevel(dev); ret = ethnl_update_bitset32(&msg_mask, NETIF_MSG_CLASS_COUNT, tb[ETHTOOL_A_DEBUG_MSGMASK], netif_msg_class_names, info->extack, &mod); if (ret < 0 || !mod) - goto out_ops; + return ret; dev->ethtool_ops->set_msglevel(dev, msg_mask); - ethtool_notify(dev, ETHTOOL_MSG_DEBUG_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return 1; } + +const struct ethnl_request_ops ethnl_debug_request_ops = { + .request_cmd = ETHTOOL_MSG_DEBUG_GET, + .reply_cmd = ETHTOOL_MSG_DEBUG_GET_REPLY, + .hdr_attr = ETHTOOL_A_DEBUG_HEADER, + .req_info_size = sizeof(struct debug_req_info), + .reply_data_size = sizeof(struct debug_reply_data), + + .prepare_data = debug_prepare_data, + .reply_size = debug_reply_size, + .fill_reply = debug_fill_reply, + + .set_validate = ethnl_set_debug_validate, + .set = ethnl_set_debug, + .set_ntf_cmd = ETHTOOL_MSG_DEBUG_NTF, +}; diff --git a/net/ethtool/eee.c b/net/ethtool/eee.c index 45c42b2d5f17..42104bcb0e47 100644 --- a/net/ethtool/eee.c +++ b/net/ethtool/eee.c @@ -108,18 +108,6 @@ static int eee_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_eee_request_ops = { - .request_cmd = ETHTOOL_MSG_EEE_GET, - .reply_cmd = ETHTOOL_MSG_EEE_GET_REPLY, - .hdr_attr = ETHTOOL_A_EEE_HEADER, - .req_info_size = sizeof(struct eee_req_info), - .reply_data_size = sizeof(struct eee_reply_data), - - .prepare_data = eee_prepare_data, - .reply_size = eee_reply_size, - .fill_reply = eee_fill_reply, -}; - /* EEE_SET */ const struct nla_policy ethnl_eee_set_policy[] = { @@ -131,60 +119,56 @@ const struct nla_policy ethnl_eee_set_policy[] = { [ETHTOOL_A_EEE_TX_LPI_TIMER] = { .type = NLA_U32 }, }; -int ethnl_set_eee(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_eee_validate(struct ethnl_req_info *req_info, struct genl_info *info) { - struct ethnl_req_info req_info = {}; + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_eee && ops->set_eee ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_eee(struct ethnl_req_info *req_info, struct genl_info *info) +{ + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; - const struct ethtool_ops *ops; struct ethtool_eee eee = {}; - struct net_device *dev; bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_EEE_HEADER], - genl_info_net(info), info->extack, - true); + ret = dev->ethtool_ops->get_eee(dev, &eee); if (ret < 0) return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; - if (!ops->get_eee || !ops->set_eee) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ret = ops->get_eee(dev, &eee); - if (ret < 0) - goto out_ops; ret = ethnl_update_bitset32(&eee.advertised, EEE_MODES_COUNT, tb[ETHTOOL_A_EEE_MODES_OURS], link_mode_names, info->extack, &mod); if (ret < 0) - goto out_ops; + return ret; ethnl_update_bool32(&eee.eee_enabled, tb[ETHTOOL_A_EEE_ENABLED], &mod); ethnl_update_bool32(&eee.tx_lpi_enabled, tb[ETHTOOL_A_EEE_TX_LPI_ENABLED], &mod); ethnl_update_u32(&eee.tx_lpi_timer, tb[ETHTOOL_A_EEE_TX_LPI_TIMER], &mod); - ret = 0; if (!mod) - goto out_ops; + return 0; ret = dev->ethtool_ops->set_eee(dev, &eee); - if (ret < 0) - goto out_ops; - ethtool_notify(dev, ETHTOOL_MSG_EEE_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return ret < 0 ? ret : 1; } + +const struct ethnl_request_ops ethnl_eee_request_ops = { + .request_cmd = ETHTOOL_MSG_EEE_GET, + .reply_cmd = ETHTOOL_MSG_EEE_GET_REPLY, + .hdr_attr = ETHTOOL_A_EEE_HEADER, + .req_info_size = sizeof(struct eee_req_info), + .reply_data_size = sizeof(struct eee_reply_data), + + .prepare_data = eee_prepare_data, + .reply_size = eee_reply_size, + .fill_reply = eee_fill_reply, + + .set_validate = ethnl_set_eee_validate, + .set = ethnl_set_eee, + .set_ntf_cmd = ETHTOOL_MSG_EEE_NTF, +}; diff --git a/net/ethtool/fec.c b/net/ethtool/fec.c index 9f5a134e2e01..0d9a3d153170 100644 --- a/net/ethtool/fec.c +++ b/net/ethtool/fec.c @@ -217,18 +217,6 @@ static int fec_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_fec_request_ops = { - .request_cmd = ETHTOOL_MSG_FEC_GET, - .reply_cmd = ETHTOOL_MSG_FEC_GET_REPLY, - .hdr_attr = ETHTOOL_A_FEC_HEADER, - .req_info_size = sizeof(struct fec_req_info), - .reply_data_size = sizeof(struct fec_reply_data), - - .prepare_data = fec_prepare_data, - .reply_size = fec_reply_size, - .fill_reply = fec_fill_reply, -}; - /* FEC_SET */ const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1] = { @@ -237,36 +225,28 @@ const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1] = { [ETHTOOL_A_FEC_AUTO] = NLA_POLICY_MAX(NLA_U8, 1), }; -int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_fec_validate(struct ethnl_req_info *req_info, struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_fecparam && ops->set_fecparam ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_fec(struct ethnl_req_info *req_info, struct genl_info *info) { __ETHTOOL_DECLARE_LINK_MODE_MASK(fec_link_modes) = {}; - struct ethnl_req_info req_info = {}; + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; struct ethtool_fecparam fec = {}; - const struct ethtool_ops *ops; - struct net_device *dev; bool mod = false; u8 fec_auto; int ret; - ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_FEC_HEADER], - genl_info_net(info), info->extack, - true); + ret = dev->ethtool_ops->get_fecparam(dev, &fec); if (ret < 0) return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; - if (!ops->get_fecparam || !ops->set_fecparam) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ret = ops->get_fecparam(dev, &fec); - if (ret < 0) - goto out_ops; ethtool_fec_to_link_modes(fec.fec, fec_link_modes, &fec_auto); @@ -275,36 +255,39 @@ int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info) tb[ETHTOOL_A_FEC_MODES], link_mode_names, info->extack, &mod); if (ret < 0) - goto out_ops; + return ret; ethnl_update_u8(&fec_auto, tb[ETHTOOL_A_FEC_AUTO], &mod); - - ret = 0; if (!mod) - goto out_ops; + return 0; ret = ethtool_link_modes_to_fecparam(&fec, fec_link_modes, fec_auto); if (ret) { NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_FEC_MODES], "invalid FEC modes requested"); - goto out_ops; + return ret; } if (!fec.fec) { - ret = -EINVAL; NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_FEC_MODES], "no FEC modes set"); - goto out_ops; + return -EINVAL; } ret = dev->ethtool_ops->set_fecparam(dev, &fec); - if (ret < 0) - goto out_ops; - ethtool_notify(dev, ETHTOOL_MSG_FEC_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return ret < 0 ? ret : 1; } + +const struct ethnl_request_ops ethnl_fec_request_ops = { + .request_cmd = ETHTOOL_MSG_FEC_GET, + .reply_cmd = ETHTOOL_MSG_FEC_GET_REPLY, + .hdr_attr = ETHTOOL_A_FEC_HEADER, + .req_info_size = sizeof(struct fec_req_info), + .reply_data_size = sizeof(struct fec_reply_data), + + .prepare_data = fec_prepare_data, + .reply_size = fec_reply_size, + .fill_reply = fec_fill_reply, + + .set_validate = ethnl_set_fec_validate, + .set = ethnl_set_fec, + .set_ntf_cmd = ETHTOOL_MSG_FEC_NTF, +}; diff --git a/net/ethtool/linkinfo.c b/net/ethtool/linkinfo.c index efa0f7f48836..310dfe63292a 100644 --- a/net/ethtool/linkinfo.c +++ b/net/ethtool/linkinfo.c @@ -73,18 +73,6 @@ static int linkinfo_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_linkinfo_request_ops = { - .request_cmd = ETHTOOL_MSG_LINKINFO_GET, - .reply_cmd = ETHTOOL_MSG_LINKINFO_GET_REPLY, - .hdr_attr = ETHTOOL_A_LINKINFO_HEADER, - .req_info_size = sizeof(struct linkinfo_req_info), - .reply_data_size = sizeof(struct linkinfo_reply_data), - - .prepare_data = linkinfo_prepare_data, - .reply_size = linkinfo_reply_size, - .fill_reply = linkinfo_fill_reply, -}; - /* LINKINFO_SET */ const struct nla_policy ethnl_linkinfo_set_policy[] = { @@ -95,37 +83,31 @@ const struct nla_policy ethnl_linkinfo_set_policy[] = { [ETHTOOL_A_LINKINFO_TP_MDIX_CTRL] = { .type = NLA_U8 }, }; -int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_linkinfo_validate(struct ethnl_req_info *req_info, + struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + if (!ops->get_link_ksettings || !ops->set_link_ksettings) + return -EOPNOTSUPP; + return 1; +} + +static int +ethnl_set_linkinfo(struct ethnl_req_info *req_info, struct genl_info *info) { struct ethtool_link_ksettings ksettings = {}; struct ethtool_link_settings *lsettings; - struct ethnl_req_info req_info = {}; + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; - struct net_device *dev; bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_LINKINFO_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ret = -EOPNOTSUPP; - if (!dev->ethtool_ops->get_link_ksettings || - !dev->ethtool_ops->set_link_ksettings) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ret = __ethtool_get_link_ksettings(dev, &ksettings); if (ret < 0) { GENL_SET_ERR_MSG(info, "failed to retrieve link settings"); - goto out_ops; + return ret; } lsettings = &ksettings.base; @@ -134,21 +116,30 @@ int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info) &mod); ethnl_update_u8(&lsettings->eth_tp_mdix_ctrl, tb[ETHTOOL_A_LINKINFO_TP_MDIX_CTRL], &mod); - ret = 0; if (!mod) - goto out_ops; + return 0; ret = dev->ethtool_ops->set_link_ksettings(dev, &ksettings); - if (ret < 0) + if (ret < 0) { GENL_SET_ERR_MSG(info, "link settings update failed"); - else - ethtool_notify(dev, ETHTOOL_MSG_LINKINFO_NTF, NULL); + return ret; + } -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return 1; } + +const struct ethnl_request_ops ethnl_linkinfo_request_ops = { + .request_cmd = ETHTOOL_MSG_LINKINFO_GET, + .reply_cmd = ETHTOOL_MSG_LINKINFO_GET_REPLY, + .hdr_attr = ETHTOOL_A_LINKINFO_HEADER, + .req_info_size = sizeof(struct linkinfo_req_info), + .reply_data_size = sizeof(struct linkinfo_reply_data), + + .prepare_data = linkinfo_prepare_data, + .reply_size = linkinfo_reply_size, + .fill_reply = linkinfo_fill_reply, + + .set_validate = ethnl_set_linkinfo_validate, + .set = ethnl_set_linkinfo, + .set_ntf_cmd = ETHTOOL_MSG_LINKINFO_NTF, +}; diff --git a/net/ethtool/linkmodes.c b/net/ethtool/linkmodes.c index 126e06c713a3..fab66c169b9f 100644 --- a/net/ethtool/linkmodes.c +++ b/net/ethtool/linkmodes.c @@ -151,18 +151,6 @@ static int linkmodes_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_linkmodes_request_ops = { - .request_cmd = ETHTOOL_MSG_LINKMODES_GET, - .reply_cmd = ETHTOOL_MSG_LINKMODES_GET_REPLY, - .hdr_attr = ETHTOOL_A_LINKMODES_HEADER, - .req_info_size = sizeof(struct linkmodes_req_info), - .reply_data_size = sizeof(struct linkmodes_reply_data), - - .prepare_data = linkmodes_prepare_data, - .reply_size = linkmodes_reply_size, - .fill_reply = linkmodes_fill_reply, -}; - /* LINKMODES_SET */ const struct nla_policy ethnl_linkmodes_set_policy[] = { @@ -310,59 +298,64 @@ static int ethnl_update_linkmodes(struct genl_info *info, struct nlattr **tb, return 0; } -int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_linkmodes_validate(struct ethnl_req_info *req_info, + struct genl_info *info) { - struct ethtool_link_ksettings ksettings = {}; - struct ethnl_req_info req_info = {}; - struct nlattr **tb = info->attrs; - struct net_device *dev; - bool mod = false; + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; int ret; - ret = ethnl_check_linkmodes(info, tb); + ret = ethnl_check_linkmodes(info, info->attrs); if (ret < 0) return ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_LINKMODES_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ret = -EOPNOTSUPP; - if (!dev->ethtool_ops->get_link_ksettings || - !dev->ethtool_ops->set_link_ksettings) - goto out_dev; + if (!ops->get_link_ksettings || !ops->set_link_ksettings) + return -EOPNOTSUPP; + return 1; +} - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; +static int +ethnl_set_linkmodes(struct ethnl_req_info *req_info, struct genl_info *info) +{ + struct ethtool_link_ksettings ksettings = {}; + struct net_device *dev = req_info->dev; + struct nlattr **tb = info->attrs; + bool mod = false; + int ret; ret = __ethtool_get_link_ksettings(dev, &ksettings); if (ret < 0) { GENL_SET_ERR_MSG(info, "failed to retrieve link settings"); - goto out_ops; + return ret; } ret = ethnl_update_linkmodes(info, tb, &ksettings, &mod, dev); if (ret < 0) - goto out_ops; + return ret; + if (!mod) + return 0; - if (mod) { - ret = dev->ethtool_ops->set_link_ksettings(dev, &ksettings); - if (ret < 0) - GENL_SET_ERR_MSG(info, "link settings update failed"); - else - ethtool_notify(dev, ETHTOOL_MSG_LINKMODES_NTF, NULL); + ret = dev->ethtool_ops->set_link_ksettings(dev, &ksettings); + if (ret < 0) { + GENL_SET_ERR_MSG(info, "link settings update failed"); + return ret; } -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return 1; } + +const struct ethnl_request_ops ethnl_linkmodes_request_ops = { + .request_cmd = ETHTOOL_MSG_LINKMODES_GET, + .reply_cmd = ETHTOOL_MSG_LINKMODES_GET_REPLY, + .hdr_attr = ETHTOOL_A_LINKMODES_HEADER, + .req_info_size = sizeof(struct linkmodes_req_info), + .reply_data_size = sizeof(struct linkmodes_reply_data), + + .prepare_data = linkmodes_prepare_data, + .reply_size = linkmodes_reply_size, + .fill_reply = linkmodes_fill_reply, + + .set_validate = ethnl_set_linkmodes_validate, + .set = ethnl_set_linkmodes, + .set_ntf_cmd = ETHTOOL_MSG_LINKMODES_NTF, +}; diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c index 3e8acdb806fd..7e51f7633001 100644 --- a/net/ethtool/mm.c +++ b/net/ethtool/mm.c @@ -146,18 +146,6 @@ static int mm_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_mm_request_ops = { - .request_cmd = ETHTOOL_MSG_MM_GET, - .reply_cmd = ETHTOOL_MSG_MM_GET_REPLY, - .hdr_attr = ETHTOOL_A_MM_HEADER, - .req_info_size = sizeof(struct mm_req_info), - .reply_data_size = sizeof(struct mm_reply_data), - - .prepare_data = mm_prepare_data, - .reply_size = mm_reply_size, - .fill_reply = mm_fill_reply, -}; - const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1] = { [ETHTOOL_A_MM_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), [ETHTOOL_A_MM_VERIFY_ENABLED] = NLA_POLICY_MAX(NLA_U8, 1), @@ -184,40 +172,28 @@ static void mm_state_to_cfg(const struct ethtool_mm_state *state, cfg->tx_min_frag_size = state->tx_min_frag_size; } -int ethnl_set_mm(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_mm_validate(struct ethnl_req_info *req_info, struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_mm && ops->set_mm ? 1 : -EOPNOTSUPP; +} + +static int ethnl_set_mm(struct ethnl_req_info *req_info, struct genl_info *info) { struct netlink_ext_ack *extack = info->extack; - struct ethnl_req_info req_info = {}; + struct net_device *dev = req_info->dev; struct ethtool_mm_state state = {}; struct nlattr **tb = info->attrs; struct ethtool_mm_cfg cfg = {}; - const struct ethtool_ops *ops; - struct net_device *dev; bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_MM_HEADER], - genl_info_net(info), extack, true); + ret = dev->ethtool_ops->get_mm(dev, &state); if (ret) return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - - if (!ops->get_mm || !ops->set_mm) { - ret = -EOPNOTSUPP; - goto out_dev_put; - } - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl_unlock; - - ret = ops->get_mm(dev, &state); - if (ret) - goto out_complete; - mm_state_to_cfg(&state, &cfg); ethnl_update_bool(&cfg.verify_enabled, tb[ETHTOOL_A_MM_VERIFY_ENABLED], @@ -225,34 +201,38 @@ int ethnl_set_mm(struct sk_buff *skb, struct genl_info *info) ethnl_update_u32(&cfg.verify_time, tb[ETHTOOL_A_MM_VERIFY_TIME], &mod); ethnl_update_bool(&cfg.tx_enabled, tb[ETHTOOL_A_MM_TX_ENABLED], &mod); ethnl_update_bool(&cfg.pmac_enabled, tb[ETHTOOL_A_MM_PMAC_ENABLED], - &mod); + &mod); ethnl_update_u32(&cfg.tx_min_frag_size, tb[ETHTOOL_A_MM_TX_MIN_FRAG_SIZE], &mod); if (!mod) - goto out_complete; + return 0; if (cfg.verify_time > state.max_verify_time) { NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_MM_VERIFY_TIME], "verifyTime exceeds device maximum"); - ret = -ERANGE; - goto out_complete; + return -ERANGE; } - ret = ops->set_mm(dev, &cfg, extack); - if (ret) - goto out_complete; + ret = dev->ethtool_ops->set_mm(dev, &cfg, extack); + return ret < 0 ? ret : 1; +} + +const struct ethnl_request_ops ethnl_mm_request_ops = { + .request_cmd = ETHTOOL_MSG_MM_GET, + .reply_cmd = ETHTOOL_MSG_MM_GET_REPLY, + .hdr_attr = ETHTOOL_A_MM_HEADER, + .req_info_size = sizeof(struct mm_req_info), + .reply_data_size = sizeof(struct mm_reply_data), - ethtool_notify(dev, ETHTOOL_MSG_MM_NTF, NULL); + .prepare_data = mm_prepare_data, + .reply_size = mm_reply_size, + .fill_reply = mm_fill_reply, -out_complete: - ethnl_ops_complete(dev); -out_rtnl_unlock: - rtnl_unlock(); -out_dev_put: - ethnl_parse_header_dev_put(&req_info); - return ret; -} + .set_validate = ethnl_set_mm_validate, + .set = ethnl_set_mm, + .set_ntf_cmd = ETHTOOL_MSG_MM_NTF, +}; /* Returns whether a given device supports the MAC merge layer * (has an eMAC and a pMAC). Must be called under rtnl_lock() and diff --git a/net/ethtool/module.c b/net/ethtool/module.c index 898ed436b9e4..e0d539b21423 100644 --- a/net/ethtool/module.c +++ b/net/ethtool/module.c @@ -91,18 +91,6 @@ static int module_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_module_request_ops = { - .request_cmd = ETHTOOL_MSG_MODULE_GET, - .reply_cmd = ETHTOOL_MSG_MODULE_GET_REPLY, - .hdr_attr = ETHTOOL_A_MODULE_HEADER, - .req_info_size = sizeof(struct module_req_info), - .reply_data_size = sizeof(struct module_reply_data), - - .prepare_data = module_prepare_data, - .reply_size = module_reply_size, - .fill_reply = module_fill_reply, -}; - /* MODULE_SET */ const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1] = { @@ -112,69 +100,62 @@ const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLI ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO), }; -static int module_set_power_mode(struct net_device *dev, struct nlattr **tb, - bool *p_mod, struct netlink_ext_ack *extack) +static int +ethnl_set_module_validate(struct ethnl_req_info *req_info, + struct genl_info *info) { - struct ethtool_module_power_mode_params power = {}; - struct ethtool_module_power_mode_params power_new; - const struct ethtool_ops *ops = dev->ethtool_ops; - int ret; + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + struct nlattr **tb = info->attrs; if (!tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY]) return 0; if (!ops->get_module_power_mode || !ops->set_module_power_mode) { - NL_SET_ERR_MSG_ATTR(extack, + NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY], "Setting power mode policy is not supported by this device"); return -EOPNOTSUPP; } - power_new.policy = nla_get_u8(tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY]); - ret = ops->get_module_power_mode(dev, &power, extack); - if (ret < 0) - return ret; - - if (power_new.policy == power.policy) - return 0; - *p_mod = true; - - return ops->set_module_power_mode(dev, &power_new, extack); + return 1; } -int ethnl_set_module(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_module(struct ethnl_req_info *req_info, struct genl_info *info) { - struct ethnl_req_info req_info = {}; + struct ethtool_module_power_mode_params power = {}; + struct ethtool_module_power_mode_params power_new; + const struct ethtool_ops *ops; + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; - struct net_device *dev; - bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_MODULE_HEADER], - genl_info_net(info), info->extack, - true); + ops = dev->ethtool_ops; + + power_new.policy = nla_get_u8(tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY]); + ret = ops->get_module_power_mode(dev, &power, info->extack); if (ret < 0) return ret; - dev = req_info.dev; - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; + if (power_new.policy == power.policy) + return 0; - ret = module_set_power_mode(dev, tb, &mod, info->extack); - if (ret < 0) - goto out_ops; + ret = ops->set_module_power_mode(dev, &power_new, info->extack); + return ret < 0 ? ret : 1; +} - if (!mod) - goto out_ops; +const struct ethnl_request_ops ethnl_module_request_ops = { + .request_cmd = ETHTOOL_MSG_MODULE_GET, + .reply_cmd = ETHTOOL_MSG_MODULE_GET_REPLY, + .hdr_attr = ETHTOOL_A_MODULE_HEADER, + .req_info_size = sizeof(struct module_req_info), + .reply_data_size = sizeof(struct module_reply_data), - ethtool_notify(dev, ETHTOOL_MSG_MODULE_NTF, NULL); + .prepare_data = module_prepare_data, + .reply_size = module_reply_size, + .fill_reply = module_fill_reply, -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); - ethnl_parse_header_dev_put(&req_info); - return ret; -} + .set_validate = ethnl_set_module_validate, + .set = ethnl_set_module, + .set_ntf_cmd = ETHTOOL_MSG_MODULE_NTF, +}; diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index e90ac1f0a1d7..08120095cc68 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -269,29 +269,43 @@ static const struct ethnl_request_ops * ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = { [ETHTOOL_MSG_STRSET_GET] = ðnl_strset_request_ops, [ETHTOOL_MSG_LINKINFO_GET] = ðnl_linkinfo_request_ops, + [ETHTOOL_MSG_LINKINFO_SET] = ðnl_linkinfo_request_ops, [ETHTOOL_MSG_LINKMODES_GET] = ðnl_linkmodes_request_ops, + [ETHTOOL_MSG_LINKMODES_SET] = ðnl_linkmodes_request_ops, [ETHTOOL_MSG_LINKSTATE_GET] = ðnl_linkstate_request_ops, [ETHTOOL_MSG_DEBUG_GET] = ðnl_debug_request_ops, + [ETHTOOL_MSG_DEBUG_SET] = ðnl_debug_request_ops, [ETHTOOL_MSG_WOL_GET] = ðnl_wol_request_ops, + [ETHTOOL_MSG_WOL_SET] = ðnl_wol_request_ops, [ETHTOOL_MSG_FEATURES_GET] = ðnl_features_request_ops, [ETHTOOL_MSG_PRIVFLAGS_GET] = ðnl_privflags_request_ops, + [ETHTOOL_MSG_PRIVFLAGS_SET] = ðnl_privflags_request_ops, [ETHTOOL_MSG_RINGS_GET] = ðnl_rings_request_ops, + [ETHTOOL_MSG_RINGS_SET] = ðnl_rings_request_ops, [ETHTOOL_MSG_CHANNELS_GET] = ðnl_channels_request_ops, + [ETHTOOL_MSG_CHANNELS_SET] = ðnl_channels_request_ops, [ETHTOOL_MSG_COALESCE_GET] = ðnl_coalesce_request_ops, + [ETHTOOL_MSG_COALESCE_SET] = ðnl_coalesce_request_ops, [ETHTOOL_MSG_PAUSE_GET] = ðnl_pause_request_ops, [ETHTOOL_MSG_PAUSE_SET] = ðnl_pause_request_ops, [ETHTOOL_MSG_EEE_GET] = ðnl_eee_request_ops, + [ETHTOOL_MSG_EEE_SET] = ðnl_eee_request_ops, [ETHTOOL_MSG_FEC_GET] = ðnl_fec_request_ops, + [ETHTOOL_MSG_FEC_SET] = ðnl_fec_request_ops, [ETHTOOL_MSG_TSINFO_GET] = ðnl_tsinfo_request_ops, [ETHTOOL_MSG_MODULE_EEPROM_GET] = ðnl_module_eeprom_request_ops, [ETHTOOL_MSG_STATS_GET] = ðnl_stats_request_ops, [ETHTOOL_MSG_PHC_VCLOCKS_GET] = ðnl_phc_vclocks_request_ops, [ETHTOOL_MSG_MODULE_GET] = ðnl_module_request_ops, + [ETHTOOL_MSG_MODULE_SET] = ðnl_module_request_ops, [ETHTOOL_MSG_PSE_GET] = ðnl_pse_request_ops, + [ETHTOOL_MSG_PSE_SET] = ðnl_pse_request_ops, [ETHTOOL_MSG_RSS_GET] = ðnl_rss_request_ops, [ETHTOOL_MSG_PLCA_GET_CFG] = ðnl_plca_cfg_request_ops, + [ETHTOOL_MSG_PLCA_SET_CFG] = ðnl_plca_cfg_request_ops, [ETHTOOL_MSG_PLCA_GET_STATUS] = ðnl_plca_status_request_ops, [ETHTOOL_MSG_MM_GET] = ðnl_mm_request_ops, + [ETHTOOL_MSG_MM_SET] = ðnl_mm_request_ops, }; static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb) @@ -814,7 +828,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_LINKINFO_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_linkinfo, + .doit = ethnl_default_set_doit, .policy = ethnl_linkinfo_set_policy, .maxattr = ARRAY_SIZE(ethnl_linkinfo_set_policy) - 1, }, @@ -830,7 +844,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_LINKMODES_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_linkmodes, + .doit = ethnl_default_set_doit, .policy = ethnl_linkmodes_set_policy, .maxattr = ARRAY_SIZE(ethnl_linkmodes_set_policy) - 1, }, @@ -855,7 +869,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_DEBUG_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_debug, + .doit = ethnl_default_set_doit, .policy = ethnl_debug_set_policy, .maxattr = ARRAY_SIZE(ethnl_debug_set_policy) - 1, }, @@ -872,7 +886,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_WOL_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_wol, + .doit = ethnl_default_set_doit, .policy = ethnl_wol_set_policy, .maxattr = ARRAY_SIZE(ethnl_wol_set_policy) - 1, }, @@ -904,7 +918,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_PRIVFLAGS_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_privflags, + .doit = ethnl_default_set_doit, .policy = ethnl_privflags_set_policy, .maxattr = ARRAY_SIZE(ethnl_privflags_set_policy) - 1, }, @@ -920,7 +934,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_RINGS_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_rings, + .doit = ethnl_default_set_doit, .policy = ethnl_rings_set_policy, .maxattr = ARRAY_SIZE(ethnl_rings_set_policy) - 1, }, @@ -936,7 +950,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_CHANNELS_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_channels, + .doit = ethnl_default_set_doit, .policy = ethnl_channels_set_policy, .maxattr = ARRAY_SIZE(ethnl_channels_set_policy) - 1, }, @@ -952,7 +966,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_COALESCE_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_coalesce, + .doit = ethnl_default_set_doit, .policy = ethnl_coalesce_set_policy, .maxattr = ARRAY_SIZE(ethnl_coalesce_set_policy) - 1, }, @@ -984,7 +998,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_EEE_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_eee, + .doit = ethnl_default_set_doit, .policy = ethnl_eee_set_policy, .maxattr = ARRAY_SIZE(ethnl_eee_set_policy) - 1, }, @@ -1031,7 +1045,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_FEC_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_fec, + .doit = ethnl_default_set_doit, .policy = ethnl_fec_set_policy, .maxattr = ARRAY_SIZE(ethnl_fec_set_policy) - 1, }, @@ -1075,7 +1089,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_MODULE_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_module, + .doit = ethnl_default_set_doit, .policy = ethnl_module_set_policy, .maxattr = ARRAY_SIZE(ethnl_module_set_policy) - 1, }, @@ -1091,7 +1105,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_PSE_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_pse, + .doit = ethnl_default_set_doit, .policy = ethnl_pse_set_policy, .maxattr = ARRAY_SIZE(ethnl_pse_set_policy) - 1, }, @@ -1113,7 +1127,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_PLCA_SET_CFG, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_plca_cfg, + .doit = ethnl_default_set_doit, .policy = ethnl_plca_set_cfg_policy, .maxattr = ARRAY_SIZE(ethnl_plca_set_cfg_policy) - 1, }, @@ -1138,7 +1152,7 @@ static const struct genl_ops ethtool_genl_ops[] = { { .cmd = ETHTOOL_MSG_MM_SET, .flags = GENL_UNS_ADMIN_PERM, - .doit = ethnl_set_mm, + .doit = ethnl_default_set_doit, .policy = ethnl_mm_set_policy, .maxattr = ARRAY_SIZE(ethnl_mm_set_policy) - 1, }, diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index aca4bdb637a6..ae0732460e88 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -442,26 +442,12 @@ extern const struct nla_policy ethnl_plca_get_status_policy[ETHTOOL_A_PLCA_HEADE extern const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1]; extern const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1]; -int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_debug(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_wol(struct sk_buff *skb, struct genl_info *info); int ethnl_set_features(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_privflags(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_rings(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_coalesce(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_eee(struct sk_buff *skb, struct genl_info *info); int ethnl_act_cable_test(struct sk_buff *skb, struct genl_info *info); int ethnl_act_cable_test_tdr(struct sk_buff *skb, struct genl_info *info); int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info); int ethnl_tunnel_info_start(struct netlink_callback *cb); int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb); -int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_module(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_pse(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info); -int ethnl_set_mm(struct sk_buff *skb, struct genl_info *info); extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN]; extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN]; diff --git a/net/ethtool/plca.c b/net/ethtool/plca.c index be7404dc9ef2..5a8cab4df0c9 100644 --- a/net/ethtool/plca.c +++ b/net/ethtool/plca.c @@ -112,18 +112,6 @@ static int plca_get_cfg_fill_reply(struct sk_buff *skb, return 0; }; -const struct ethnl_request_ops ethnl_plca_cfg_request_ops = { - .request_cmd = ETHTOOL_MSG_PLCA_GET_CFG, - .reply_cmd = ETHTOOL_MSG_PLCA_GET_CFG_REPLY, - .hdr_attr = ETHTOOL_A_PLCA_HEADER, - .req_info_size = sizeof(struct plca_req_info), - .reply_data_size = sizeof(struct plca_reply_data), - - .prepare_data = plca_get_cfg_prepare_data, - .reply_size = plca_get_cfg_reply_size, - .fill_reply = plca_get_cfg_fill_reply, -}; - // PLCA set configuration message ------------------------------------------- // const struct nla_policy ethnl_plca_set_cfg_policy[] = { @@ -137,42 +125,23 @@ const struct nla_policy ethnl_plca_set_cfg_policy[] = { [ETHTOOL_A_PLCA_BURST_TMR] = NLA_POLICY_MAX(NLA_U32, 255), }; -int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_plca(struct ethnl_req_info *req_info, struct genl_info *info) { - struct ethnl_req_info req_info = {}; - struct nlattr **tb = info->attrs; + struct net_device *dev = req_info->dev; const struct ethtool_phy_ops *ops; + struct nlattr **tb = info->attrs; struct phy_plca_cfg plca_cfg; - struct net_device *dev; bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_PLCA_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - - dev = req_info.dev; - - rtnl_lock(); - // check that the PHY device is available and connected - if (!dev->phydev) { - ret = -EOPNOTSUPP; - goto out_rtnl; - } + if (!dev->phydev) + return -EOPNOTSUPP; ops = ethtool_phy_ops; - if (!ops || !ops->set_plca_cfg) { - ret = -EOPNOTSUPP; - goto out_rtnl; - } - - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; + if (!ops || !ops->set_plca_cfg) + return -EOPNOTSUPP; memset(&plca_cfg, 0xff, sizeof(plca_cfg)); plca_update_sint(&plca_cfg.enabled, tb[ETHTOOL_A_PLCA_ENABLED], &mod); @@ -183,25 +152,27 @@ int ethnl_set_plca_cfg(struct sk_buff *skb, struct genl_info *info) &mod); plca_update_sint(&plca_cfg.burst_tmr, tb[ETHTOOL_A_PLCA_BURST_TMR], &mod); - - ret = 0; if (!mod) - goto out_ops; + return 0; ret = ops->set_plca_cfg(dev->phydev, &plca_cfg, info->extack); - if (ret < 0) - goto out_ops; + return ret < 0 ? ret : 1; +} - ethtool_notify(dev, ETHTOOL_MSG_PLCA_NTF, NULL); +const struct ethnl_request_ops ethnl_plca_cfg_request_ops = { + .request_cmd = ETHTOOL_MSG_PLCA_GET_CFG, + .reply_cmd = ETHTOOL_MSG_PLCA_GET_CFG_REPLY, + .hdr_attr = ETHTOOL_A_PLCA_HEADER, + .req_info_size = sizeof(struct plca_req_info), + .reply_data_size = sizeof(struct plca_reply_data), -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); - ethnl_parse_header_dev_put(&req_info); + .prepare_data = plca_get_cfg_prepare_data, + .reply_size = plca_get_cfg_reply_size, + .fill_reply = plca_get_cfg_fill_reply, - return ret; -} + .set = ethnl_set_plca, + .set_ntf_cmd = ETHTOOL_MSG_PLCA_NTF, +}; // PLCA get status message -------------------------------------------------- // diff --git a/net/ethtool/privflags.c b/net/ethtool/privflags.c index 4c7bfa81e4ab..23264a1ebf12 100644 --- a/net/ethtool/privflags.c +++ b/net/ethtool/privflags.c @@ -118,19 +118,6 @@ static void privflags_cleanup_data(struct ethnl_reply_data *reply_data) kfree(data->priv_flag_names); } -const struct ethnl_request_ops ethnl_privflags_request_ops = { - .request_cmd = ETHTOOL_MSG_PRIVFLAGS_GET, - .reply_cmd = ETHTOOL_MSG_PRIVFLAGS_GET_REPLY, - .hdr_attr = ETHTOOL_A_PRIVFLAGS_HEADER, - .req_info_size = sizeof(struct privflags_req_info), - .reply_data_size = sizeof(struct privflags_reply_data), - - .prepare_data = privflags_prepare_data, - .reply_size = privflags_reply_size, - .fill_reply = privflags_fill_reply, - .cleanup_data = privflags_cleanup_data, -}; - /* PRIVFLAGS_SET */ const struct nla_policy ethnl_privflags_set_policy[] = { @@ -139,63 +126,70 @@ const struct nla_policy ethnl_privflags_set_policy[] = { [ETHTOOL_A_PRIVFLAGS_FLAGS] = { .type = NLA_NESTED }, }; -int ethnl_set_privflags(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_privflags_validate(struct ethnl_req_info *req_info, + struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + if (!info->attrs[ETHTOOL_A_PRIVFLAGS_FLAGS]) + return -EINVAL; + + if (!ops->get_priv_flags || !ops->set_priv_flags || + !ops->get_sset_count || !ops->get_strings) + return -EOPNOTSUPP; + return 1; +} + +static int +ethnl_set_privflags(struct ethnl_req_info *req_info, struct genl_info *info) { const char (*names)[ETH_GSTRING_LEN] = NULL; - struct ethnl_req_info req_info = {}; + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; - const struct ethtool_ops *ops; - struct net_device *dev; unsigned int nflags; bool mod = false; bool compact; u32 flags; int ret; - if (!tb[ETHTOOL_A_PRIVFLAGS_FLAGS]) - return -EINVAL; ret = ethnl_bitset_is_compact(tb[ETHTOOL_A_PRIVFLAGS_FLAGS], &compact); if (ret < 0) return ret; - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_PRIVFLAGS_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; - if (!ops->get_priv_flags || !ops->set_priv_flags || - !ops->get_sset_count || !ops->get_strings) - goto out_dev; - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; ret = ethnl_get_priv_flags_info(dev, &nflags, compact ? NULL : &names); if (ret < 0) - goto out_ops; - flags = ops->get_priv_flags(dev); + return ret; + flags = dev->ethtool_ops->get_priv_flags(dev); ret = ethnl_update_bitset32(&flags, nflags, tb[ETHTOOL_A_PRIVFLAGS_FLAGS], names, info->extack, &mod); if (ret < 0 || !mod) goto out_free; - ret = ops->set_priv_flags(dev, flags); + ret = dev->ethtool_ops->set_priv_flags(dev, flags); if (ret < 0) goto out_free; - ethtool_notify(dev, ETHTOOL_MSG_PRIVFLAGS_NTF, NULL); + ret = 1; out_free: kfree(names); -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); return ret; } + +const struct ethnl_request_ops ethnl_privflags_request_ops = { + .request_cmd = ETHTOOL_MSG_PRIVFLAGS_GET, + .reply_cmd = ETHTOOL_MSG_PRIVFLAGS_GET_REPLY, + .hdr_attr = ETHTOOL_A_PRIVFLAGS_HEADER, + .req_info_size = sizeof(struct privflags_req_info), + .reply_data_size = sizeof(struct privflags_reply_data), + + .prepare_data = privflags_prepare_data, + .reply_size = privflags_reply_size, + .fill_reply = privflags_fill_reply, + .cleanup_data = privflags_cleanup_data, + + .set_validate = ethnl_set_privflags_validate, + .set = ethnl_set_privflags, + .set_ntf_cmd = ETHTOOL_MSG_PRIVFLAGS_NTF, +}; diff --git a/net/ethtool/pse-pd.c b/net/ethtool/pse-pd.c index e8683e485dc9..a5b607b0a652 100644 --- a/net/ethtool/pse-pd.c +++ b/net/ethtool/pse-pd.c @@ -106,18 +106,6 @@ static int pse_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_pse_request_ops = { - .request_cmd = ETHTOOL_MSG_PSE_GET, - .reply_cmd = ETHTOOL_MSG_PSE_GET_REPLY, - .hdr_attr = ETHTOOL_A_PSE_HEADER, - .req_info_size = sizeof(struct pse_req_info), - .reply_data_size = sizeof(struct pse_reply_data), - - .prepare_data = pse_prepare_data, - .reply_size = pse_reply_size, - .fill_reply = pse_fill_reply, -}; - /* PSE_SET */ const struct nla_policy ethnl_pse_set_policy[ETHTOOL_A_PSE_MAX + 1] = { @@ -127,59 +115,50 @@ const struct nla_policy ethnl_pse_set_policy[ETHTOOL_A_PSE_MAX + 1] = { ETHTOOL_PODL_PSE_ADMIN_STATE_ENABLED), }; -static int pse_set_pse_config(struct net_device *dev, - struct netlink_ext_ack *extack, - struct nlattr **tb) +static int +ethnl_set_pse_validate(struct ethnl_req_info *req_info, struct genl_info *info) { - struct phy_device *phydev = dev->phydev; - struct pse_control_config config = {}; + return !!info->attrs[ETHTOOL_A_PODL_PSE_ADMIN_CONTROL]; +} - /* Optional attribute. Do not return error if not set. */ - if (!tb[ETHTOOL_A_PODL_PSE_ADMIN_CONTROL]) - return 0; +static int +ethnl_set_pse(struct ethnl_req_info *req_info, struct genl_info *info) +{ + struct net_device *dev = req_info->dev; + struct pse_control_config config = {}; + struct nlattr **tb = info->attrs; + struct phy_device *phydev; /* this values are already validated by the ethnl_pse_set_policy */ config.admin_cotrol = nla_get_u32(tb[ETHTOOL_A_PODL_PSE_ADMIN_CONTROL]); + phydev = dev->phydev; if (!phydev) { - NL_SET_ERR_MSG(extack, "No PHY is attached"); + NL_SET_ERR_MSG(info->extack, "No PHY is attached"); return -EOPNOTSUPP; } if (!phydev->psec) { - NL_SET_ERR_MSG(extack, "No PSE is attached"); + NL_SET_ERR_MSG(info->extack, "No PSE is attached"); return -EOPNOTSUPP; } - return pse_ethtool_set_config(phydev->psec, extack, &config); + /* Return errno directly - PSE has no notification */ + return pse_ethtool_set_config(phydev->psec, info->extack, &config); } -int ethnl_set_pse(struct sk_buff *skb, struct genl_info *info) -{ - struct ethnl_req_info req_info = {}; - struct nlattr **tb = info->attrs; - struct net_device *dev; - int ret; - - ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_PSE_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - - dev = req_info.dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - - ret = pse_set_pse_config(dev, info->extack, tb); - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); +const struct ethnl_request_ops ethnl_pse_request_ops = { + .request_cmd = ETHTOOL_MSG_PSE_GET, + .reply_cmd = ETHTOOL_MSG_PSE_GET_REPLY, + .hdr_attr = ETHTOOL_A_PSE_HEADER, + .req_info_size = sizeof(struct pse_req_info), + .reply_data_size = sizeof(struct pse_reply_data), - ethnl_parse_header_dev_put(&req_info); + .prepare_data = pse_prepare_data, + .reply_size = pse_reply_size, + .fill_reply = pse_fill_reply, - return ret; -} + .set_validate = ethnl_set_pse_validate, + .set = ethnl_set_pse, + /* PSE has no notification */ +}; diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c index fa3ec8d438f7..2a2d3539630c 100644 --- a/net/ethtool/rings.c +++ b/net/ethtool/rings.c @@ -102,18 +102,6 @@ static int rings_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_rings_request_ops = { - .request_cmd = ETHTOOL_MSG_RINGS_GET, - .reply_cmd = ETHTOOL_MSG_RINGS_GET_REPLY, - .hdr_attr = ETHTOOL_A_RINGS_HEADER, - .req_info_size = sizeof(struct rings_req_info), - .reply_data_size = sizeof(struct rings_reply_data), - - .prepare_data = rings_prepare_data, - .reply_size = rings_reply_size, - .fill_reply = rings_fill_reply, -}; - /* RINGS_SET */ const struct nla_policy ethnl_rings_set_policy[] = { @@ -128,62 +116,53 @@ const struct nla_policy ethnl_rings_set_policy[] = { [ETHTOOL_A_RINGS_TX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), }; -int ethnl_set_rings(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_rings_validate(struct ethnl_req_info *req_info, + struct genl_info *info) { - struct kernel_ethtool_ringparam kernel_ringparam = {}; - struct ethtool_ringparam ringparam = {}; - struct ethnl_req_info req_info = {}; + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; struct nlattr **tb = info->attrs; - const struct nlattr *err_attr; - const struct ethtool_ops *ops; - struct net_device *dev; - bool mod = false; - int ret; - - ret = ethnl_parse_header_dev_get(&req_info, - tb[ETHTOOL_A_RINGS_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ops = dev->ethtool_ops; - ret = -EOPNOTSUPP; - if (!ops->get_ringparam || !ops->set_ringparam) - goto out_dev; if (tb[ETHTOOL_A_RINGS_RX_BUF_LEN] && !(ops->supported_ring_params & ETHTOOL_RING_USE_RX_BUF_LEN)) { - ret = -EOPNOTSUPP; NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_RINGS_RX_BUF_LEN], "setting rx buf len not supported"); - goto out_dev; + return -EOPNOTSUPP; } if (tb[ETHTOOL_A_RINGS_CQE_SIZE] && !(ops->supported_ring_params & ETHTOOL_RING_USE_CQE_SIZE)) { - ret = -EOPNOTSUPP; NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_RINGS_CQE_SIZE], "setting cqe size not supported"); - goto out_dev; + return -EOPNOTSUPP; } if (tb[ETHTOOL_A_RINGS_TX_PUSH] && !(ops->supported_ring_params & ETHTOOL_RING_USE_TX_PUSH)) { - ret = -EOPNOTSUPP; NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_RINGS_TX_PUSH], "setting tx push not supported"); - goto out_dev; + return -EOPNOTSUPP; } - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - ops->get_ringparam(dev, &ringparam, &kernel_ringparam, info->extack); + return ops->get_ringparam && ops->set_ringparam ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) +{ + struct kernel_ethtool_ringparam kernel_ringparam = {}; + struct ethtool_ringparam ringparam = {}; + struct net_device *dev = req_info->dev; + struct nlattr **tb = info->attrs; + const struct nlattr *err_attr; + bool mod = false; + int ret; + + dev->ethtool_ops->get_ringparam(dev, &ringparam, + &kernel_ringparam, info->extack); ethnl_update_u32(&ringparam.rx_pending, tb[ETHTOOL_A_RINGS_RX], &mod); ethnl_update_u32(&ringparam.rx_mini_pending, @@ -197,9 +176,8 @@ int ethnl_set_rings(struct sk_buff *skb, struct genl_info *info) tb[ETHTOOL_A_RINGS_CQE_SIZE], &mod); ethnl_update_u8(&kernel_ringparam.tx_push, tb[ETHTOOL_A_RINGS_TX_PUSH], &mod); - ret = 0; if (!mod) - goto out_ops; + return 0; /* ensure new ring parameters are within limits */ if (ringparam.rx_pending > ringparam.rx_max_pending) @@ -213,23 +191,28 @@ int ethnl_set_rings(struct sk_buff *skb, struct genl_info *info) else err_attr = NULL; if (err_attr) { - ret = -EINVAL; NL_SET_ERR_MSG_ATTR(info->extack, err_attr, "requested ring size exceeds maximum"); - goto out_ops; + return -EINVAL; } ret = dev->ethtool_ops->set_ringparam(dev, &ringparam, &kernel_ringparam, info->extack); - if (ret < 0) - goto out_ops; - ethtool_notify(dev, ETHTOOL_MSG_RINGS_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return ret < 0 ? ret : 1; } + +const struct ethnl_request_ops ethnl_rings_request_ops = { + .request_cmd = ETHTOOL_MSG_RINGS_GET, + .reply_cmd = ETHTOOL_MSG_RINGS_GET_REPLY, + .hdr_attr = ETHTOOL_A_RINGS_HEADER, + .req_info_size = sizeof(struct rings_req_info), + .reply_data_size = sizeof(struct rings_reply_data), + + .prepare_data = rings_prepare_data, + .reply_size = rings_reply_size, + .fill_reply = rings_fill_reply, + + .set_validate = ethnl_set_rings_validate, + .set = ethnl_set_rings, + .set_ntf_cmd = ETHTOOL_MSG_RINGS_NTF, +}; diff --git a/net/ethtool/wol.c b/net/ethtool/wol.c index 88f435e76481..a4a43d9e6e9d 100644 --- a/net/ethtool/wol.c +++ b/net/ethtool/wol.c @@ -82,18 +82,6 @@ static int wol_fill_reply(struct sk_buff *skb, return 0; } -const struct ethnl_request_ops ethnl_wol_request_ops = { - .request_cmd = ETHTOOL_MSG_WOL_GET, - .reply_cmd = ETHTOOL_MSG_WOL_GET_REPLY, - .hdr_attr = ETHTOOL_A_WOL_HEADER, - .req_info_size = sizeof(struct wol_req_info), - .reply_data_size = sizeof(struct wol_reply_data), - - .prepare_data = wol_prepare_data, - .reply_size = wol_reply_size, - .fill_reply = wol_fill_reply, -}; - /* WOL_SET */ const struct nla_policy ethnl_wol_set_policy[] = { @@ -104,67 +92,66 @@ const struct nla_policy ethnl_wol_set_policy[] = { .len = SOPASS_MAX }, }; -int ethnl_set_wol(struct sk_buff *skb, struct genl_info *info) +static int +ethnl_set_wol_validate(struct ethnl_req_info *req_info, struct genl_info *info) +{ + const struct ethtool_ops *ops = req_info->dev->ethtool_ops; + + return ops->get_wol && ops->set_wol ? 1 : -EOPNOTSUPP; +} + +static int +ethnl_set_wol(struct ethnl_req_info *req_info, struct genl_info *info) { struct ethtool_wolinfo wol = { .cmd = ETHTOOL_GWOL }; - struct ethnl_req_info req_info = {}; + struct net_device *dev = req_info->dev; struct nlattr **tb = info->attrs; - struct net_device *dev; bool mod = false; int ret; - ret = ethnl_parse_header_dev_get(&req_info, tb[ETHTOOL_A_WOL_HEADER], - genl_info_net(info), info->extack, - true); - if (ret < 0) - return ret; - dev = req_info.dev; - ret = -EOPNOTSUPP; - if (!dev->ethtool_ops->get_wol || !dev->ethtool_ops->set_wol) - goto out_dev; - - rtnl_lock(); - ret = ethnl_ops_begin(dev); - if (ret < 0) - goto out_rtnl; - dev->ethtool_ops->get_wol(dev, &wol); ret = ethnl_update_bitset32(&wol.wolopts, WOL_MODE_COUNT, tb[ETHTOOL_A_WOL_MODES], wol_mode_names, info->extack, &mod); if (ret < 0) - goto out_ops; + return ret; if (wol.wolopts & ~wol.supported) { NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_WOL_MODES], "cannot enable unsupported WoL mode"); - ret = -EINVAL; - goto out_ops; + return -EINVAL; } if (tb[ETHTOOL_A_WOL_SOPASS]) { if (!(wol.supported & WAKE_MAGICSECURE)) { NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_WOL_SOPASS], "magicsecure not supported, cannot set password"); - ret = -EINVAL; - goto out_ops; + return -EINVAL; } ethnl_update_binary(wol.sopass, sizeof(wol.sopass), tb[ETHTOOL_A_WOL_SOPASS], &mod); } if (!mod) - goto out_ops; + return 0; ret = dev->ethtool_ops->set_wol(dev, &wol); if (ret) - goto out_ops; + return ret; dev->wol_enabled = !!wol.wolopts; - ethtool_notify(dev, ETHTOOL_MSG_WOL_NTF, NULL); - -out_ops: - ethnl_ops_complete(dev); -out_rtnl: - rtnl_unlock(); -out_dev: - ethnl_parse_header_dev_put(&req_info); - return ret; + return 1; } + +const struct ethnl_request_ops ethnl_wol_request_ops = { + .request_cmd = ETHTOOL_MSG_WOL_GET, + .reply_cmd = ETHTOOL_MSG_WOL_GET_REPLY, + .hdr_attr = ETHTOOL_A_WOL_HEADER, + .req_info_size = sizeof(struct wol_req_info), + .reply_data_size = sizeof(struct wol_reply_data), + + .prepare_data = wol_prepare_data, + .reply_size = wol_reply_size, + .fill_reply = wol_fill_reply, + + .set_validate = ethnl_set_wol_validate, + .set = ethnl_set_wol, + .set_ntf_cmd = ETHTOOL_MSG_WOL_NTF, +}; -- cgit v1.2.3 From 020dd127a3fef9dfc6c03cd5d1c231d5e55d7632 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Thu, 26 Jan 2023 08:58:29 +0100 Subject: devlink: make devlink_param_register/unregister static There is no user outside the devlink code, so remove the export and make the functions static. Move them before callers to avoid forward declarations. Signed-off-by: Jiri Pirko Reviewed-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- include/net/devlink.h | 4 --- net/devlink/leftover.c | 90 +++++++++++++++++++++----------------------------- 2 files changed, 37 insertions(+), 57 deletions(-) (limited to 'net') diff --git a/include/net/devlink.h b/include/net/devlink.h index 608a0c198be8..cf74b6391896 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1773,10 +1773,6 @@ int devlink_params_register(struct devlink *devlink, void devlink_params_unregister(struct devlink *devlink, const struct devlink_param *params, size_t params_count); -int devlink_param_register(struct devlink *devlink, - const struct devlink_param *param); -void devlink_param_unregister(struct devlink *devlink, - const struct devlink_param *param); int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, union devlink_param_value *init_val); int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 74621287f4e5..b1216b8f0acc 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -10793,6 +10793,43 @@ static int devlink_param_verify(const struct devlink_param *param) return devlink_param_driver_verify(param); } +static int devlink_param_register(struct devlink *devlink, + const struct devlink_param *param) +{ + struct devlink_param_item *param_item; + + WARN_ON(devlink_param_verify(param)); + WARN_ON(devlink_param_find_by_name(&devlink->param_list, param->name)); + + if (param->supported_cmodes == BIT(DEVLINK_PARAM_CMODE_DRIVERINIT)) + WARN_ON(param->get || param->set); + else + WARN_ON(!param->get || !param->set); + + param_item = kzalloc(sizeof(*param_item), GFP_KERNEL); + if (!param_item) + return -ENOMEM; + + param_item->param = param; + + list_add_tail(¶m_item->list, &devlink->param_list); + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); + return 0; +} + +static void devlink_param_unregister(struct devlink *devlink, + const struct devlink_param *param) +{ + struct devlink_param_item *param_item; + + param_item = + devlink_param_find_by_name(&devlink->param_list, param->name); + WARN_ON(!param_item); + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_DEL); + list_del(¶m_item->list); + kfree(param_item); +} + /** * devlink_params_register - register configuration parameters * @@ -10844,59 +10881,6 @@ void devlink_params_unregister(struct devlink *devlink, } EXPORT_SYMBOL_GPL(devlink_params_unregister); -/** - * devlink_param_register - register one configuration parameter - * - * @devlink: devlink - * @param: one configuration parameter - * - * Register the configuration parameter supported by the driver. - * Return: returns 0 on successful registration or error code otherwise. - */ -int devlink_param_register(struct devlink *devlink, - const struct devlink_param *param) -{ - struct devlink_param_item *param_item; - - WARN_ON(devlink_param_verify(param)); - WARN_ON(devlink_param_find_by_name(&devlink->param_list, param->name)); - - if (param->supported_cmodes == BIT(DEVLINK_PARAM_CMODE_DRIVERINIT)) - WARN_ON(param->get || param->set); - else - WARN_ON(!param->get || !param->set); - - param_item = kzalloc(sizeof(*param_item), GFP_KERNEL); - if (!param_item) - return -ENOMEM; - - param_item->param = param; - - list_add_tail(¶m_item->list, &devlink->param_list); - devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); - return 0; -} -EXPORT_SYMBOL_GPL(devlink_param_register); - -/** - * devlink_param_unregister - unregister one configuration parameter - * @devlink: devlink - * @param: configuration parameter to unregister - */ -void devlink_param_unregister(struct devlink *devlink, - const struct devlink_param *param) -{ - struct devlink_param_item *param_item; - - param_item = - devlink_param_find_by_name(&devlink->param_list, param->name); - WARN_ON(!param_item); - devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_DEL); - list_del(¶m_item->list); - kfree(param_item); -} -EXPORT_SYMBOL_GPL(devlink_param_unregister); - /** * devlink_param_driverinit_value_get - get configuration parameter * value for driver initializing -- cgit v1.2.3 From bb9bb6bfd1c37b1c0786248c2785536146603fa8 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Thu, 26 Jan 2023 08:58:30 +0100 Subject: devlink: don't work with possible NULL pointer in devlink_param_unregister() There is a WARN_ON checking the param_item for being NULL when the param is not inserted in the list. That indicates a driver BUG. Instead of continuing to work with NULL pointer with its consequences, return. Signed-off-by: Jiri Pirko Reviewed-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index b1216b8f0acc..fca2b6661362 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -10824,7 +10824,8 @@ static void devlink_param_unregister(struct devlink *devlink, param_item = devlink_param_find_by_name(&devlink->param_list, param->name); - WARN_ON(!param_item); + if (WARN_ON(!param_item)) + return; devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_DEL); list_del(¶m_item->list); kfree(param_item); -- cgit v1.2.3 From 85fe0b324c830ac671b811efc70ea80c3dcb2390 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Thu, 26 Jan 2023 08:58:33 +0100 Subject: devlink: make devlink_param_driverinit_value_set() return void devlink_param_driverinit_value_set() currently returns int with possible error, but no user is checking it anyway. The only reason for a fail is a driver bug. So convert the function to return void and put WARN_ONs on error paths. Signed-off-by: Jiri Pirko Reviewed-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- include/net/devlink.h | 4 ++-- net/devlink/leftover.c | 15 +++++++-------- 2 files changed, 9 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/include/net/devlink.h b/include/net/devlink.h index cf74b6391896..e0d773dfa637 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1775,8 +1775,8 @@ void devlink_params_unregister(struct devlink *devlink, size_t params_count); int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, union devlink_param_value *init_val); -int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, - union devlink_param_value init_val); +void devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, + union devlink_param_value init_val); void devlink_param_value_changed(struct devlink *devlink, u32 param_id); struct devlink_region *devl_region_create(struct devlink *devlink, const struct devlink_region_ops *ops, diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index fca2b6661362..693470af548f 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -10931,18 +10931,18 @@ EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_get); * This function should be used by the driver to set driverinit * configuration mode default value. */ -int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, - union devlink_param_value init_val) +void devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, + union devlink_param_value init_val) { struct devlink_param_item *param_item; param_item = devlink_param_find_by_id(&devlink->param_list, param_id); - if (!param_item) - return -EINVAL; + if (WARN_ON(!param_item)) + return; - if (!devlink_param_cmode_is_supported(param_item->param, - DEVLINK_PARAM_CMODE_DRIVERINIT)) - return -EOPNOTSUPP; + if (WARN_ON(!devlink_param_cmode_is_supported(param_item->param, + DEVLINK_PARAM_CMODE_DRIVERINIT))) + return; if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) strcpy(param_item->driverinit_value.vstr, init_val.vstr); @@ -10951,7 +10951,6 @@ int devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, param_item->driverinit_value_valid = true; devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); - return 0; } EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_set); -- cgit v1.2.3 From 3f716a620e1314aa9abe69052bdc9df719372bd4 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Thu, 26 Jan 2023 08:58:34 +0100 Subject: devlink: put couple of WARN_ONs in devlink_param_driverinit_value_get() Put couple of WARN_ONs in devlink_param_driverinit_value_get() function to clearly indicate, that it is a driver bug if used without reload support or for non-driverinit param. Signed-off-by: Jiri Pirko Reviewed-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 693470af548f..512ed4ccbdc7 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -10898,16 +10898,18 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, { struct devlink_param_item *param_item; - if (!devlink_reload_supported(devlink->ops)) + if (WARN_ON(!devlink_reload_supported(devlink->ops))) return -EOPNOTSUPP; param_item = devlink_param_find_by_id(&devlink->param_list, param_id); if (!param_item) return -EINVAL; - if (!param_item->driverinit_value_valid || - !devlink_param_cmode_is_supported(param_item->param, - DEVLINK_PARAM_CMODE_DRIVERINIT)) + if (!param_item->driverinit_value_valid) + return -EOPNOTSUPP; + + if (WARN_ON(!devlink_param_cmode_is_supported(param_item->param, + DEVLINK_PARAM_CMODE_DRIVERINIT))) return -EOPNOTSUPP; if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) -- cgit v1.2.3 From 075935f0ae0fbbe469a911d685f6cc59de892700 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Thu, 26 Jan 2023 08:58:35 +0100 Subject: devlink: protect devlink param list by instance lock Commit 1d18bb1a4ddd ("devlink: allow registering parameters after the instance") as the subject implies introduced possibility to register devlink params even for already registered devlink instance. This is a bit problematic, as the consistency or params list was originally secured by the fact it is static during devlink lifetime. So in order to protect the params list, take devlink instance lock during the params operations. Introduce unlocked function variants and use them in drivers in locked context. Put lock assertions to appropriate places. Signed-off-by: Jiri Pirko Reviewed-by: Jakub Kicinski Reviewed-by: Jacob Keller Reviewed-by: Ido Schimmel Reviewed-by: Simon Horman Tested-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/ethernet/mellanox/mlx4/main.c | 80 +++++++++---------- drivers/net/ethernet/mellanox/mlx5/core/dev.c | 18 ++--- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 92 +++++++++++----------- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 +-- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 +-- drivers/net/ethernet/mellanox/mlxsw/core.c | 18 +++-- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 16 ++-- drivers/net/ethernet/netronome/nfp/devlink_param.c | 8 +- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 7 +- drivers/net/netdevsim/dev.c | 36 ++++----- include/net/devlink.h | 16 ++-- net/devlink/leftover.c | 77 ++++++++++++------ 13 files changed, 218 insertions(+), 180 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 3ae246391549..6152f77dcfd8 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -265,29 +265,29 @@ static void mlx4_devlink_set_params_init_values(struct devlink *devlink) union devlink_param_value value; value.vbool = !!mlx4_internal_err_reset; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, + value); value.vu32 = 1UL << log_num_mac; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + value); value.vbool = enable_64b_cqe_eqe; - devlink_param_driverinit_value_set(devlink, - MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE, - value); + devl_param_driverinit_value_set(devlink, + MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE, + value); value.vbool = enable_4k_uar; - devlink_param_driverinit_value_set(devlink, - MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR, - value); + devl_param_driverinit_value_set(devlink, + MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR, + value); value.vbool = false; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, + value); } static inline void mlx4_set_num_reserved_uars(struct mlx4_dev *dev, @@ -3910,37 +3910,37 @@ static void mlx4_devlink_param_load_driverinit_values(struct devlink *devlink) union devlink_param_value saved_value; int err; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, + &saved_value); if (!err && mlx4_internal_err_reset != saved_value.vbool) { mlx4_internal_err_reset = saved_value.vbool; /* Notify on value changed on runtime configuration mode */ - devlink_param_value_changed(devlink, - DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET); + devl_param_value_changed(devlink, + DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET); } - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + &saved_value); if (!err) log_num_mac = order_base_2(saved_value.vu32); - err = devlink_param_driverinit_value_get(devlink, - MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE, + &saved_value); if (!err) enable_64b_cqe_eqe = saved_value.vbool; - err = devlink_param_driverinit_value_get(devlink, - MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR, + &saved_value); if (!err) enable_4k_uar = saved_value.vbool; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT, + &saved_value); if (!err && crdump->snapshot_enable != saved_value.vbool) { crdump->snapshot_enable = saved_value.vbool; - devlink_param_value_changed(devlink, - DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT); + devl_param_value_changed(devlink, + DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT); } } @@ -4021,8 +4021,8 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) mutex_init(&dev->persist->interface_state_mutex); mutex_init(&dev->persist->pci_status_mutex); - ret = devlink_params_register(devlink, mlx4_devlink_params, - ARRAY_SIZE(mlx4_devlink_params)); + ret = devl_params_register(devlink, mlx4_devlink_params, + ARRAY_SIZE(mlx4_devlink_params)); if (ret) goto err_devlink_unregister; mlx4_devlink_set_params_init_values(devlink); @@ -4037,8 +4037,8 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) return 0; err_params_unregister: - devlink_params_unregister(devlink, mlx4_devlink_params, - ARRAY_SIZE(mlx4_devlink_params)); + devl_params_unregister(devlink, mlx4_devlink_params, + ARRAY_SIZE(mlx4_devlink_params)); err_devlink_unregister: kfree(dev->persist); err_devlink_free: @@ -4181,8 +4181,8 @@ static void mlx4_remove_one(struct pci_dev *pdev) pci_release_regions(pdev); mlx4_pci_disable_device(dev); - devlink_params_unregister(devlink, mlx4_devlink_params, - ARRAY_SIZE(mlx4_devlink_params)); + devl_params_unregister(devlink, mlx4_devlink_params, + ARRAY_SIZE(mlx4_devlink_params)); kfree(dev->persist); devl_unlock(devlink); devlink_free(devlink); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c index 2b444fb12388..17ae9b4ec794 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c @@ -114,9 +114,9 @@ static bool is_eth_enabled(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(priv_to_devlink(dev), - DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, - &val); + err = devl_param_driverinit_value_get(priv_to_devlink(dev), + DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, + &val); return err ? false : val.vbool; } @@ -147,9 +147,9 @@ static bool is_vnet_enabled(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(priv_to_devlink(dev), - DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET, - &val); + err = devl_param_driverinit_value_get(priv_to_devlink(dev), + DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET, + &val); return err ? false : val.vbool; } @@ -221,9 +221,9 @@ static bool is_ib_enabled(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(priv_to_devlink(dev), - DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA, - &val); + err = devl_param_driverinit_value_get(priv_to_devlink(dev), + DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA, + &val); return err ? false : val.vbool; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index 2d2fcb518172..ed4b79aeecd1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -602,26 +602,26 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink) union devlink_param_value value; value.vbool = MLX5_CAP_GEN(dev, roce); - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, + value); #ifdef CONFIG_MLX5_ESWITCH value.vu32 = ESW_OFFLOADS_DEFAULT_NUM_GROUPS; - devlink_param_driverinit_value_set(devlink, - MLX5_DEVLINK_PARAM_ID_ESW_LARGE_GROUP_NUM, - value); + devl_param_driverinit_value_set(devlink, + MLX5_DEVLINK_PARAM_ID_ESW_LARGE_GROUP_NUM, + value); #endif value.vu32 = MLX5_COMP_EQ_SIZE; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE, + value); value.vu32 = MLX5_NUM_ASYNC_EQE; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE, + value); } static const struct devlink_param mlx5_devlink_eth_params[] = { @@ -638,15 +638,15 @@ static int mlx5_devlink_eth_params_register(struct devlink *devlink) if (!mlx5_eth_supported(dev)) return 0; - err = devlink_params_register(devlink, mlx5_devlink_eth_params, - ARRAY_SIZE(mlx5_devlink_eth_params)); + err = devl_params_register(devlink, mlx5_devlink_eth_params, + ARRAY_SIZE(mlx5_devlink_eth_params)); if (err) return err; value.vbool = true; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH, + value); return 0; } @@ -657,8 +657,8 @@ static void mlx5_devlink_eth_params_unregister(struct devlink *devlink) if (!mlx5_eth_supported(dev)) return; - devlink_params_unregister(devlink, mlx5_devlink_eth_params, - ARRAY_SIZE(mlx5_devlink_eth_params)); + devl_params_unregister(devlink, mlx5_devlink_eth_params, + ARRAY_SIZE(mlx5_devlink_eth_params)); } static int mlx5_devlink_enable_rdma_validate(struct devlink *devlink, u32 id, @@ -686,15 +686,15 @@ static int mlx5_devlink_rdma_params_register(struct devlink *devlink) if (!IS_ENABLED(CONFIG_MLX5_INFINIBAND)) return 0; - err = devlink_params_register(devlink, mlx5_devlink_rdma_params, - ARRAY_SIZE(mlx5_devlink_rdma_params)); + err = devl_params_register(devlink, mlx5_devlink_rdma_params, + ARRAY_SIZE(mlx5_devlink_rdma_params)); if (err) return err; value.vbool = true; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA, + value); return 0; } @@ -703,8 +703,8 @@ static void mlx5_devlink_rdma_params_unregister(struct devlink *devlink) if (!IS_ENABLED(CONFIG_MLX5_INFINIBAND)) return; - devlink_params_unregister(devlink, mlx5_devlink_rdma_params, - ARRAY_SIZE(mlx5_devlink_rdma_params)); + devl_params_unregister(devlink, mlx5_devlink_rdma_params, + ARRAY_SIZE(mlx5_devlink_rdma_params)); } static const struct devlink_param mlx5_devlink_vnet_params[] = { @@ -721,15 +721,15 @@ static int mlx5_devlink_vnet_params_register(struct devlink *devlink) if (!mlx5_vnet_supported(dev)) return 0; - err = devlink_params_register(devlink, mlx5_devlink_vnet_params, - ARRAY_SIZE(mlx5_devlink_vnet_params)); + err = devl_params_register(devlink, mlx5_devlink_vnet_params, + ARRAY_SIZE(mlx5_devlink_vnet_params)); if (err) return err; value.vbool = true; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET, + value); return 0; } @@ -740,8 +740,8 @@ static void mlx5_devlink_vnet_params_unregister(struct devlink *devlink) if (!mlx5_vnet_supported(dev)) return; - devlink_params_unregister(devlink, mlx5_devlink_vnet_params, - ARRAY_SIZE(mlx5_devlink_vnet_params)); + devl_params_unregister(devlink, mlx5_devlink_vnet_params, + ARRAY_SIZE(mlx5_devlink_vnet_params)); } static int mlx5_devlink_auxdev_params_register(struct devlink *devlink) @@ -814,15 +814,15 @@ static int mlx5_devlink_max_uc_list_params_register(struct devlink *devlink) if (!MLX5_CAP_GEN_MAX(dev, log_max_current_uc_list_wr_supported)) return 0; - err = devlink_params_register(devlink, mlx5_devlink_max_uc_list_params, - ARRAY_SIZE(mlx5_devlink_max_uc_list_params)); + err = devl_params_register(devlink, mlx5_devlink_max_uc_list_params, + ARRAY_SIZE(mlx5_devlink_max_uc_list_params)); if (err) return err; value.vu32 = 1 << MLX5_CAP_GEN(dev, log_max_current_uc_list); - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + value); return 0; } @@ -834,8 +834,8 @@ mlx5_devlink_max_uc_list_params_unregister(struct devlink *devlink) if (!MLX5_CAP_GEN_MAX(dev, log_max_current_uc_list_wr_supported)) return; - devlink_params_unregister(devlink, mlx5_devlink_max_uc_list_params, - ARRAY_SIZE(mlx5_devlink_max_uc_list_params)); + devl_params_unregister(devlink, mlx5_devlink_max_uc_list_params, + ARRAY_SIZE(mlx5_devlink_max_uc_list_params)); } #define MLX5_TRAP_DROP(_id, _group_id) \ @@ -886,8 +886,8 @@ int mlx5_devlink_params_register(struct devlink *devlink) struct mlx5_core_dev *dev = devlink_priv(devlink); int err; - err = devlink_params_register(devlink, mlx5_devlink_params, - ARRAY_SIZE(mlx5_devlink_params)); + err = devl_params_register(devlink, mlx5_devlink_params, + ARRAY_SIZE(mlx5_devlink_params)); if (err) return err; @@ -909,8 +909,8 @@ int mlx5_devlink_params_register(struct devlink *devlink) max_uc_list_err: mlx5_devlink_auxdev_params_unregister(devlink); auxdev_reg_err: - devlink_params_unregister(devlink, mlx5_devlink_params, - ARRAY_SIZE(mlx5_devlink_params)); + devl_params_unregister(devlink, mlx5_devlink_params, + ARRAY_SIZE(mlx5_devlink_params)); return err; } @@ -918,6 +918,6 @@ void mlx5_devlink_params_unregister(struct devlink *devlink) { mlx5_devlink_max_uc_list_params_unregister(devlink); mlx5_devlink_auxdev_params_unregister(devlink); - devlink_params_unregister(devlink, mlx5_devlink_params, - ARRAY_SIZE(mlx5_devlink_params)); + devl_params_unregister(devlink, mlx5_devlink_params, + ARRAY_SIZE(mlx5_devlink_params)); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index 8f7580fec193..9b44557e7271 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -629,9 +629,9 @@ static u16 async_eq_depth_devlink_param_get(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE, - &val); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_EVENT_EQ_SIZE, + &val); if (!err) return val.vu32; mlx5_core_dbg(dev, "Failed to get param. using default. err = %d\n", err); @@ -874,9 +874,9 @@ static u16 comp_eq_depth_devlink_param_get(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE, - &val); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_IO_EQ_SIZE, + &val); if (!err) return val.vu32; mlx5_core_dbg(dev, "Failed to get param. using default. err = %d\n", err); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index d809c9192496..0be01d702049 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -1190,9 +1190,9 @@ static void mlx5_eswitch_get_devlink_param(struct mlx5_eswitch *esw) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(devlink, - MLX5_DEVLINK_PARAM_ID_ESW_LARGE_GROUP_NUM, - &val); + err = devl_param_driverinit_value_get(devlink, + MLX5_DEVLINK_PARAM_ID_ESW_LARGE_GROUP_NUM, + &val); if (!err) { esw->params.large_group_num = val.vu32; } else { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 65cd6c393c0a..8823f20d2122 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -484,9 +484,9 @@ static int max_uc_list_get_devlink_param(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - &val); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + &val); if (!err) return val.vu32; mlx5_core_dbg(dev, "Failed to get param. err = %d\n", err); @@ -499,9 +499,9 @@ bool mlx5_is_roce_on(struct mlx5_core_dev *dev) union devlink_param_value val; int err; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, - &val); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, + &val); if (!err) return val.vbool; diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index f952a6518ba9..f8623e8388c8 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -1243,9 +1243,9 @@ static int mlxsw_core_fw_rev_validate(struct mlxsw_core *mlxsw_core, return 0; /* Don't check if devlink 'fw_load_policy' param is 'flash' */ - err = devlink_param_driverinit_value_get(priv_to_devlink(mlxsw_core), - DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, - &value); + err = devl_param_driverinit_value_get(priv_to_devlink(mlxsw_core), + DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, + &value); if (err) return err; if (value.vu8 == DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH) @@ -1316,20 +1316,22 @@ static int mlxsw_core_fw_params_register(struct mlxsw_core *mlxsw_core) union devlink_param_value value; int err; - err = devlink_params_register(devlink, mlxsw_core_fw_devlink_params, - ARRAY_SIZE(mlxsw_core_fw_devlink_params)); + err = devl_params_register(devlink, mlxsw_core_fw_devlink_params, + ARRAY_SIZE(mlxsw_core_fw_devlink_params)); if (err) return err; value.vu8 = DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER; - devlink_param_driverinit_value_set(devlink, DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY, + value); return 0; } static void mlxsw_core_fw_params_unregister(struct mlxsw_core *mlxsw_core) { - devlink_params_unregister(priv_to_devlink(mlxsw_core), mlxsw_core_fw_devlink_params, - ARRAY_SIZE(mlxsw_core_fw_devlink_params)); + devl_params_unregister(priv_to_devlink(mlxsw_core), mlxsw_core_fw_devlink_params, + ARRAY_SIZE(mlxsw_core_fw_devlink_params)); } static void *__dl_port(struct devlink_port *devlink_port) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index 3d15d3387aa2..b0bdb9640ebf 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -3898,23 +3898,23 @@ static int mlxsw_sp2_params_register(struct mlxsw_core *mlxsw_core) union devlink_param_value value; int err; - err = devlink_params_register(devlink, mlxsw_sp2_devlink_params, - ARRAY_SIZE(mlxsw_sp2_devlink_params)); + err = devl_params_register(devlink, mlxsw_sp2_devlink_params, + ARRAY_SIZE(mlxsw_sp2_devlink_params)); if (err) return err; value.vu32 = 0; - devlink_param_driverinit_value_set(devlink, - MLXSW_DEVLINK_PARAM_ID_ACL_REGION_REHASH_INTERVAL, - value); + devl_param_driverinit_value_set(devlink, + MLXSW_DEVLINK_PARAM_ID_ACL_REGION_REHASH_INTERVAL, + value); return 0; } static void mlxsw_sp2_params_unregister(struct mlxsw_core *mlxsw_core) { - devlink_params_unregister(priv_to_devlink(mlxsw_core), - mlxsw_sp2_devlink_params, - ARRAY_SIZE(mlxsw_sp2_devlink_params)); + devl_params_unregister(priv_to_devlink(mlxsw_core), + mlxsw_sp2_devlink_params, + ARRAY_SIZE(mlxsw_sp2_devlink_params)); } static void mlxsw_sp_ptp_transmitted(struct mlxsw_core *mlxsw_core, diff --git a/drivers/net/ethernet/netronome/nfp/devlink_param.c b/drivers/net/ethernet/netronome/nfp/devlink_param.c index db297ee4d7ad..a655f9e69a7b 100644 --- a/drivers/net/ethernet/netronome/nfp/devlink_param.c +++ b/drivers/net/ethernet/netronome/nfp/devlink_param.c @@ -233,8 +233,8 @@ int nfp_devlink_params_register(struct nfp_pf *pf) if (err <= 0) return err; - return devlink_params_register(devlink, nfp_devlink_params, - ARRAY_SIZE(nfp_devlink_params)); + return devl_params_register(devlink, nfp_devlink_params, + ARRAY_SIZE(nfp_devlink_params)); } void nfp_devlink_params_unregister(struct nfp_pf *pf) @@ -245,6 +245,6 @@ void nfp_devlink_params_unregister(struct nfp_pf *pf) if (err <= 0) return; - devlink_params_unregister(priv_to_devlink(pf), nfp_devlink_params, - ARRAY_SIZE(nfp_devlink_params)); + devl_params_unregister(priv_to_devlink(pf), nfp_devlink_params, + ARRAY_SIZE(nfp_devlink_params)); } diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index abfe788d558f..cbe4972ba104 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -754,11 +754,11 @@ int nfp_net_pci_probe(struct nfp_pf *pf) if (err) goto err_devlink_unreg; + devl_lock(devlink); err = nfp_devlink_params_register(pf); if (err) goto err_shared_buf_unreg; - devl_lock(devlink); pf->ddir = nfp_net_debugfs_device_add(pf->pdev); /* Allocate the vnics and do basic init */ @@ -791,9 +791,9 @@ err_free_vnics: nfp_net_pf_free_vnics(pf); err_clean_ddir: nfp_net_debugfs_dir_clean(&pf->ddir); - devl_unlock(devlink); nfp_devlink_params_unregister(pf); err_shared_buf_unreg: + devl_unlock(devlink); nfp_shared_buf_unregister(pf); err_devlink_unreg: cancel_work_sync(&pf->port_refresh_work); @@ -821,9 +821,10 @@ void nfp_net_pci_remove(struct nfp_pf *pf) /* stop app first, to avoid double free of ctrl vNIC's ddir */ nfp_net_debugfs_dir_clean(&pf->ddir); + nfp_devlink_params_unregister(pf); + devl_unlock(devlink); - nfp_devlink_params_unregister(pf); nfp_shared_buf_unregister(pf); nfp_net_pf_free_irqs(pf); diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c index 738784fda117..f88095b0f836 100644 --- a/drivers/net/netdevsim/dev.c +++ b/drivers/net/netdevsim/dev.c @@ -527,13 +527,13 @@ static void nsim_devlink_set_params_init_values(struct nsim_dev *nsim_dev, union devlink_param_value value; value.vu32 = nsim_dev->max_macs; - devlink_param_driverinit_value_set(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - value); + devl_param_driverinit_value_set(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + value); value.vbool = nsim_dev->test1; - devlink_param_driverinit_value_set(devlink, - NSIM_DEVLINK_PARAM_ID_TEST1, - value); + devl_param_driverinit_value_set(devlink, + NSIM_DEVLINK_PARAM_ID_TEST1, + value); } static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink) @@ -542,14 +542,14 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink) union devlink_param_value saved_value; int err; - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_MAX_MACS, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + DEVLINK_PARAM_GENERIC_ID_MAX_MACS, + &saved_value); if (!err) nsim_dev->max_macs = saved_value.vu32; - err = devlink_param_driverinit_value_get(devlink, - NSIM_DEVLINK_PARAM_ID_TEST1, - &saved_value); + err = devl_param_driverinit_value_get(devlink, + NSIM_DEVLINK_PARAM_ID_TEST1, + &saved_value); if (!err) nsim_dev->test1 = saved_value.vbool; } @@ -1564,8 +1564,8 @@ int nsim_drv_probe(struct nsim_bus_dev *nsim_bus_dev) if (err) goto err_dl_unregister; - err = devlink_params_register(devlink, nsim_devlink_params, - ARRAY_SIZE(nsim_devlink_params)); + err = devl_params_register(devlink, nsim_devlink_params, + ARRAY_SIZE(nsim_devlink_params)); if (err) goto err_resource_unregister; nsim_devlink_set_params_init_values(nsim_dev, devlink); @@ -1630,8 +1630,8 @@ err_traps_exit: err_dummy_region_exit: nsim_dev_dummy_region_exit(nsim_dev); err_params_unregister: - devlink_params_unregister(devlink, nsim_devlink_params, - ARRAY_SIZE(nsim_devlink_params)); + devl_params_unregister(devlink, nsim_devlink_params, + ARRAY_SIZE(nsim_devlink_params)); err_resource_unregister: devl_resources_unregister(devlink); err_dl_unregister: @@ -1678,8 +1678,8 @@ void nsim_drv_remove(struct nsim_bus_dev *nsim_bus_dev) nsim_bpf_dev_exit(nsim_dev); nsim_dev_debugfs_exit(nsim_dev); - devlink_params_unregister(devlink, nsim_devlink_params, - ARRAY_SIZE(nsim_devlink_params)); + devl_params_unregister(devlink, nsim_devlink_params, + ARRAY_SIZE(nsim_devlink_params)); devl_resources_unregister(devlink); devl_unregister(devlink); kfree(nsim_dev->vfconfigs); diff --git a/include/net/devlink.h b/include/net/devlink.h index e0d773dfa637..ab654cf552b8 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1767,17 +1767,23 @@ void devl_resource_occ_get_unregister(struct devlink *devlink, void devlink_resource_occ_get_unregister(struct devlink *devlink, u64 resource_id); +int devl_params_register(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count); int devlink_params_register(struct devlink *devlink, const struct devlink_param *params, size_t params_count); +void devl_params_unregister(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count); void devlink_params_unregister(struct devlink *devlink, const struct devlink_param *params, size_t params_count); -int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, - union devlink_param_value *init_val); -void devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, - union devlink_param_value init_val); -void devlink_param_value_changed(struct devlink *devlink, u32 param_id); +int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, + union devlink_param_value *init_val); +void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, + union devlink_param_value init_val); +void devl_param_value_changed(struct devlink *devlink, u32 param_id); struct devlink_region *devl_region_create(struct devlink *devlink, const struct devlink_region_ops *ops, u32 region_max_snapshots, diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 512ed4ccbdc7..bd4c5d2dd612 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -10832,7 +10832,7 @@ static void devlink_param_unregister(struct devlink *devlink, } /** - * devlink_params_register - register configuration parameters + * devl_params_register - register configuration parameters * * @devlink: devlink * @params: configuration parameters array @@ -10840,13 +10840,15 @@ static void devlink_param_unregister(struct devlink *devlink, * * Register the configuration parameters supported by the driver. */ -int devlink_params_register(struct devlink *devlink, - const struct devlink_param *params, - size_t params_count) +int devl_params_register(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count) { const struct devlink_param *param = params; int i, err; + lockdep_assert_held(&devlink->lock); + for (i = 0; i < params_count; i++, param++) { err = devlink_param_register(devlink, param); if (err) @@ -10862,29 +10864,54 @@ rollback: devlink_param_unregister(devlink, param); return err; } +EXPORT_SYMBOL_GPL(devl_params_register); + +int devlink_params_register(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count) +{ + int err; + + devl_lock(devlink); + err = devl_params_register(devlink, params, params_count); + devl_unlock(devlink); + return err; +} EXPORT_SYMBOL_GPL(devlink_params_register); /** - * devlink_params_unregister - unregister configuration parameters + * devl_params_unregister - unregister configuration parameters * @devlink: devlink * @params: configuration parameters to unregister * @params_count: number of parameters provided */ -void devlink_params_unregister(struct devlink *devlink, - const struct devlink_param *params, - size_t params_count) +void devl_params_unregister(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count) { const struct devlink_param *param = params; int i; + lockdep_assert_held(&devlink->lock); + for (i = 0; i < params_count; i++, param++) devlink_param_unregister(devlink, param); } +EXPORT_SYMBOL_GPL(devl_params_unregister); + +void devlink_params_unregister(struct devlink *devlink, + const struct devlink_param *params, + size_t params_count) +{ + devl_lock(devlink); + devl_params_unregister(devlink, params, params_count); + devl_unlock(devlink); +} EXPORT_SYMBOL_GPL(devlink_params_unregister); /** - * devlink_param_driverinit_value_get - get configuration parameter - * value for driver initializing + * devl_param_driverinit_value_get - get configuration parameter + * value for driver initializing * * @devlink: devlink * @param_id: parameter ID @@ -10893,11 +10920,13 @@ EXPORT_SYMBOL_GPL(devlink_params_unregister); * This function should be used by the driver to get driverinit * configuration for initialization after reload command. */ -int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, - union devlink_param_value *init_val) +int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, + union devlink_param_value *init_val) { struct devlink_param_item *param_item; + lockdep_assert_held(&devlink->lock); + if (WARN_ON(!devlink_reload_supported(devlink->ops))) return -EOPNOTSUPP; @@ -10919,12 +10948,12 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id, return 0; } -EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_get); +EXPORT_SYMBOL_GPL(devl_param_driverinit_value_get); /** - * devlink_param_driverinit_value_set - set value of configuration - * parameter for driverinit - * configuration mode + * devl_param_driverinit_value_set - set value of configuration + * parameter for driverinit + * configuration mode * * @devlink: devlink * @param_id: parameter ID @@ -10933,8 +10962,8 @@ EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_get); * This function should be used by the driver to set driverinit * configuration mode default value. */ -void devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, - union devlink_param_value init_val) +void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, + union devlink_param_value init_val) { struct devlink_param_item *param_item; @@ -10954,12 +10983,12 @@ void devlink_param_driverinit_value_set(struct devlink *devlink, u32 param_id, devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); } -EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_set); +EXPORT_SYMBOL_GPL(devl_param_driverinit_value_set); /** - * devlink_param_value_changed - notify devlink on a parameter's value - * change. Should be called by the driver - * right after the change. + * devl_param_value_changed - notify devlink on a parameter's value + * change. Should be called by the driver + * right after the change. * * @devlink: devlink * @param_id: parameter ID @@ -10968,7 +10997,7 @@ EXPORT_SYMBOL_GPL(devlink_param_driverinit_value_set); * change, excluding driverinit configuration mode. * For driverinit configuration mode driver should use the function */ -void devlink_param_value_changed(struct devlink *devlink, u32 param_id) +void devl_param_value_changed(struct devlink *devlink, u32 param_id) { struct devlink_param_item *param_item; @@ -10977,7 +11006,7 @@ void devlink_param_value_changed(struct devlink *devlink, u32 param_id) devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); } -EXPORT_SYMBOL_GPL(devlink_param_value_changed); +EXPORT_SYMBOL_GPL(devl_param_value_changed); /** * devl_region_create - create a new address region -- cgit v1.2.3 From d8afe2f8a92d2aac3df645772f6ee61b0b2fc147 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Wed, 25 Jan 2023 10:52:30 -0800 Subject: netpoll: Remove 4s sleep during carrier detection This patch removes the msleep(4s) during netpoll_setup() if the carrier appears instantly. Here are some scenarios where this workaround is counter-productive in modern ages: Servers which have BMC communicating over NC-SI via the same NIC as gets used for netconsole. BMC will keep the PHY up, hence the carrier appearing instantly. The link is fibre, SERDES getting sync could happen within 0.1Hz, and the carrier also appears instantly. Other than that, if a driver is reporting instant carrier and then losing it, this is probably a driver bug. Reported-by: Michael van der Westhuizen Signed-off-by: Breno Leitao Link: https://lore.kernel.org/r/20230125185230.3574681-1-leitao@debian.org Signed-off-by: Jakub Kicinski --- net/core/netpoll.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) (limited to 'net') diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 9be762e1d042..a089b704b986 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -682,7 +682,7 @@ int netpoll_setup(struct netpoll *np) } if (!netif_running(ndev)) { - unsigned long atmost, atleast; + unsigned long atmost; np_info(np, "device %s not up yet, forcing it\n", np->dev_name); @@ -694,7 +694,6 @@ int netpoll_setup(struct netpoll *np) } rtnl_unlock(); - atleast = jiffies + HZ/10; atmost = jiffies + carrier_timeout * HZ; while (!netif_carrier_ok(ndev)) { if (time_after(jiffies, atmost)) { @@ -704,15 +703,6 @@ int netpoll_setup(struct netpoll *np) msleep(1); } - /* If carrier appears to come up instantly, we don't - * trust it and pause so that we don't pump all our - * queued console messages into the bitbucket. - */ - - if (time_before(jiffies, atleast)) { - np_notice(np, "carrier detect appears untrustworthy, waiting 4 seconds\n"); - msleep(4000); - } rtnl_lock(); } -- cgit v1.2.3 From 9bc114504b07207d671593f6f6d787d55dcf91bd Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 25 Jan 2023 11:29:22 +0100 Subject: ieee802154: Add support for user beaconing requests Parse user requests for sending beacons, start sending beacons at a regular pace. If needed, the pace can be updated with a new request. The process can also be interrupted at any moment. The page and channel must be changed beforehands if needed. Interval orders above 14 are reserved to tell a device it must answer BEACON_REQ coming from another device as part of an active scan procedure and this is not yet supported. A netlink "beacon request" structure is created to list the requirements. Mac layers may now implement the ->send_beacons() and ->stop_beacons() hooks. Co-developed-by: David Girault Signed-off-by: David Girault Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230125102923.135465-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/net/cfg802154.h | 23 ++++++++++++ include/net/nl802154.h | 3 ++ net/ieee802154/nl802154.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++ net/ieee802154/nl802154.h | 1 + net/ieee802154/rdev-ops.h | 28 ++++++++++++++ net/ieee802154/trace.h | 21 +++++++++++ 6 files changed, 169 insertions(+) (limited to 'net') diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index 0b0f81a945b6..0c2778a836db 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -19,6 +19,7 @@ struct wpan_phy; struct wpan_phy_cca; struct cfg802154_scan_request; +struct cfg802154_beacon_request; #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL struct ieee802154_llsec_device_key; @@ -72,6 +73,10 @@ struct cfg802154_ops { struct cfg802154_scan_request *request); int (*abort_scan)(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev); + int (*send_beacons)(struct wpan_phy *wpan_phy, + struct cfg802154_beacon_request *request); + int (*stop_beacons)(struct wpan_phy *wpan_phy, + struct wpan_dev *wpan_dev); #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL void (*get_llsec_table)(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, @@ -314,6 +319,24 @@ struct cfg802154_scan_request { struct wpan_phy *wpan_phy; }; +/** + * struct cfg802154_beacon_request - Beacon request descriptor + * + * @interval: interval n between sendings, in multiple order of the super frame + * duration: aBaseSuperframeDuration * (2^n) unless the interval + * order is greater or equal to 15, in this case beacons won't be + * passively sent out at a fixed rate but instead inform the device + * that it should answer beacon requests as part of active scan + * procedures + * @wpan_dev: the concerned wpan device + * @wpan_phy: the wpan phy this was for + */ +struct cfg802154_beacon_request { + u8 interval; + struct wpan_dev *wpan_dev; + struct wpan_phy *wpan_phy; +}; + /** * struct cfg802154_mac_pkt - MAC packet descriptor (beacon/command) * @node: MAC packets to process list member diff --git a/include/net/nl802154.h b/include/net/nl802154.h index c267fa1c5aac..8cd9d141f5af 100644 --- a/include/net/nl802154.h +++ b/include/net/nl802154.h @@ -76,6 +76,8 @@ enum nl802154_commands { NL802154_CMD_TRIGGER_SCAN, NL802154_CMD_ABORT_SCAN, NL802154_CMD_SCAN_DONE, + NL802154_CMD_SEND_BEACONS, + NL802154_CMD_STOP_BEACONS, /* add new commands above here */ @@ -144,6 +146,7 @@ enum nl802154_attrs { NL802154_ATTR_SCAN_MEAN_PRF, NL802154_ATTR_SCAN_DURATION, NL802154_ATTR_SCAN_DONE_REASON, + NL802154_ATTR_BEACON_INTERVAL, /* add attributes here, update the policy in nl802154.c */ diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index 0b7a9f16b3b6..8661907599e1 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -227,6 +227,7 @@ static const struct nla_policy nl802154_policy[NL802154_ATTR_MAX+1] = { [NL802154_ATTR_SCAN_MEAN_PRF] = { .type = NLA_U8 }, [NL802154_ATTR_SCAN_DURATION] = { .type = NLA_U8 }, [NL802154_ATTR_SCAN_DONE_REASON] = { .type = NLA_U8 }, + [NL802154_ATTR_BEACON_INTERVAL] = { .type = NLA_U8 }, #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL [NL802154_ATTR_SEC_ENABLED] = { .type = NLA_U8, }, @@ -1587,6 +1588,82 @@ static int nl802154_abort_scan(struct sk_buff *skb, struct genl_info *info) return rdev_abort_scan(rdev, wpan_dev); } +static int +nl802154_send_beacons(struct sk_buff *skb, struct genl_info *info) +{ + struct cfg802154_registered_device *rdev = info->user_ptr[0]; + struct net_device *dev = info->user_ptr[1]; + struct wpan_dev *wpan_dev = dev->ieee802154_ptr; + struct wpan_phy *wpan_phy = &rdev->wpan_phy; + struct cfg802154_beacon_request *request; + int err; + + /* Only coordinators can send beacons */ + if (wpan_dev->iftype != NL802154_IFTYPE_COORD) + return -EOPNOTSUPP; + + if (wpan_dev->pan_id == cpu_to_le16(IEEE802154_PANID_BROADCAST)) { + pr_err("Device is not part of any PAN\n"); + return -EPERM; + } + + request = kzalloc(sizeof(*request), GFP_KERNEL); + if (!request) + return -ENOMEM; + + request->wpan_dev = wpan_dev; + request->wpan_phy = wpan_phy; + + if (info->attrs[NL802154_ATTR_BEACON_INTERVAL]) { + request->interval = nla_get_u8(info->attrs[NL802154_ATTR_BEACON_INTERVAL]); + if (request->interval > IEEE802154_MAX_SCAN_DURATION) { + pr_err("Interval is out of range\n"); + err = -EINVAL; + goto free_request; + } + } else { + /* Use maximum duration order by default */ + request->interval = IEEE802154_MAX_SCAN_DURATION; + } + + if (wpan_dev->netdev) + dev_hold(wpan_dev->netdev); + + err = rdev_send_beacons(rdev, request); + if (err) { + pr_err("Failure starting sending beacons (%d)\n", err); + goto free_device; + } + + return 0; + +free_device: + if (wpan_dev->netdev) + dev_put(wpan_dev->netdev); +free_request: + kfree(request); + + return err; +} + +void nl802154_beaconing_done(struct wpan_dev *wpan_dev) +{ + if (wpan_dev->netdev) + dev_put(wpan_dev->netdev); +} +EXPORT_SYMBOL_GPL(nl802154_beaconing_done); + +static int +nl802154_stop_beacons(struct sk_buff *skb, struct genl_info *info) +{ + struct cfg802154_registered_device *rdev = info->user_ptr[0]; + struct net_device *dev = info->user_ptr[1]; + struct wpan_dev *wpan_dev = dev->ieee802154_ptr; + + /* Resources are released in the notification helper above */ + return rdev_stop_beacons(rdev, wpan_dev); +} + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL static const struct nla_policy nl802154_dev_addr_policy[NL802154_DEV_ADDR_ATTR_MAX + 1] = { [NL802154_DEV_ADDR_ATTR_PAN_ID] = { .type = NLA_U16 }, @@ -2693,6 +2770,22 @@ static const struct genl_ops nl802154_ops[] = { NL802154_FLAG_CHECK_NETDEV_UP | NL802154_FLAG_NEED_RTNL, }, + { + .cmd = NL802154_CMD_SEND_BEACONS, + .doit = nl802154_send_beacons, + .flags = GENL_ADMIN_PERM, + .internal_flags = NL802154_FLAG_NEED_NETDEV | + NL802154_FLAG_CHECK_NETDEV_UP | + NL802154_FLAG_NEED_RTNL, + }, + { + .cmd = NL802154_CMD_STOP_BEACONS, + .doit = nl802154_stop_beacons, + .flags = GENL_ADMIN_PERM, + .internal_flags = NL802154_FLAG_NEED_NETDEV | + NL802154_FLAG_CHECK_NETDEV_UP | + NL802154_FLAG_NEED_RTNL, + }, #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL { .cmd = NL802154_CMD_SET_SEC_PARAMS, diff --git a/net/ieee802154/nl802154.h b/net/ieee802154/nl802154.h index cfa7134be747..d69d950f9a6a 100644 --- a/net/ieee802154/nl802154.h +++ b/net/ieee802154/nl802154.h @@ -9,5 +9,6 @@ int nl802154_scan_event(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, int nl802154_scan_started(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev); int nl802154_scan_done(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, enum nl802154_scan_done_reasons reason); +void nl802154_beaconing_done(struct wpan_dev *wpan_dev); #endif /* __IEEE802154_NL802154_H */ diff --git a/net/ieee802154/rdev-ops.h b/net/ieee802154/rdev-ops.h index e171d74c3251..5eaae15c610e 100644 --- a/net/ieee802154/rdev-ops.h +++ b/net/ieee802154/rdev-ops.h @@ -237,6 +237,34 @@ static inline int rdev_abort_scan(struct cfg802154_registered_device *rdev, return ret; } +static inline int rdev_send_beacons(struct cfg802154_registered_device *rdev, + struct cfg802154_beacon_request *request) +{ + int ret; + + if (!rdev->ops->send_beacons) + return -EOPNOTSUPP; + + trace_802154_rdev_send_beacons(&rdev->wpan_phy, request); + ret = rdev->ops->send_beacons(&rdev->wpan_phy, request); + trace_802154_rdev_return_int(&rdev->wpan_phy, ret); + return ret; +} + +static inline int rdev_stop_beacons(struct cfg802154_registered_device *rdev, + struct wpan_dev *wpan_dev) +{ + int ret; + + if (!rdev->ops->stop_beacons) + return -EOPNOTSUPP; + + trace_802154_rdev_stop_beacons(&rdev->wpan_phy, wpan_dev); + ret = rdev->ops->stop_beacons(&rdev->wpan_phy, wpan_dev); + trace_802154_rdev_return_int(&rdev->wpan_phy, ret); + return ret; +} + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL /* TODO this is already a nl802154, so move into ieee802154 */ static inline void diff --git a/net/ieee802154/trace.h b/net/ieee802154/trace.h index e5405f737ded..e5d8439b9e45 100644 --- a/net/ieee802154/trace.h +++ b/net/ieee802154/trace.h @@ -315,6 +315,22 @@ TRACE_EVENT(802154_rdev_trigger_scan, WPAN_PHY_PR_ARG, __entry->page, __entry->channels, __entry->duration) ); +TRACE_EVENT(802154_rdev_send_beacons, + TP_PROTO(struct wpan_phy *wpan_phy, + struct cfg802154_beacon_request *request), + TP_ARGS(wpan_phy, request), + TP_STRUCT__entry( + WPAN_PHY_ENTRY + __field(u8, interval) + ), + TP_fast_assign( + WPAN_PHY_ASSIGN; + __entry->interval = request->interval; + ), + TP_printk(WPAN_PHY_PR_FMT ", sending beacons (interval order: %d)", + WPAN_PHY_PR_ARG, __entry->interval) +); + DECLARE_EVENT_CLASS(802154_wdev_template, TP_PROTO(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev), TP_ARGS(wpan_phy, wpan_dev), @@ -335,6 +351,11 @@ DEFINE_EVENT(802154_wdev_template, 802154_rdev_abort_scan, TP_ARGS(wpan_phy, wpan_dev) ); +DEFINE_EVENT(802154_wdev_template, 802154_rdev_stop_beacons, + TP_PROTO(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev), + TP_ARGS(wpan_phy, wpan_dev) +); + TRACE_EVENT(802154_rdev_return_int, TP_PROTO(struct wpan_phy *wpan_phy, int ret), TP_ARGS(wpan_phy, ret), -- cgit v1.2.3 From 3accf4762734a69ebd03cba989249c78ac7dfc7e Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 25 Jan 2023 11:29:23 +0100 Subject: mac802154: Handle basic beaconing Implement the core hooks in order to provide the softMAC layer support for sending beacons. Coordinators may be requested to send beacons in a beacon enabled PAN in order for the other devices around to self discover the available PANs automatically. Changing the channels is prohibited while a beacon operation is ongoing. The implementation uses a workqueue triggered at a certain interval depending on the symbol duration for the current channel and the interval order provided. Sending beacons in response to a BEACON_REQ frame (ie. answering active scans) is not yet supported. This initial patchset has no security support (llsec). Co-developed-by: David Girault Signed-off-by: David Girault Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20230125102923.135465-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/net/ieee802154_netdev.h | 16 +++++ net/ieee802154/header_ops.c | 24 +++++++ net/mac802154/cfg.c | 31 ++++++++- net/mac802154/ieee802154_i.h | 18 +++++ net/mac802154/iface.c | 3 + net/mac802154/llsec.c | 5 +- net/mac802154/main.c | 1 + net/mac802154/scan.c | 151 ++++++++++++++++++++++++++++++++++++++++ 8 files changed, 246 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/net/ieee802154_netdev.h b/include/net/ieee802154_netdev.h index 2f2196049a86..da8a3e648c7a 100644 --- a/include/net/ieee802154_netdev.h +++ b/include/net/ieee802154_netdev.h @@ -129,6 +129,13 @@ enum ieee802154_frame_version { IEEE802154_MULTIPURPOSE_STD = IEEE802154_2003_STD, }; +enum ieee802154_addressing_mode { + IEEE802154_NO_ADDRESSING, + IEEE802154_RESERVED, + IEEE802154_SHORT_ADDRESSING, + IEEE802154_EXTENDED_ADDRESSING, +}; + struct ieee802154_hdr { struct ieee802154_hdr_fc fc; u8 seq; @@ -137,6 +144,11 @@ struct ieee802154_hdr { struct ieee802154_sechdr sec; }; +struct ieee802154_beacon_frame { + struct ieee802154_hdr mhr; + struct ieee802154_beacon_hdr mac_pl; +}; + /* pushes hdr onto the skb. fields of hdr->fc that can be calculated from * the contents of hdr will be, and the actual value of those bits in * hdr->fc will be ignored. this includes the INTRA_PAN bit and the frame @@ -162,6 +174,10 @@ int ieee802154_hdr_peek_addrs(const struct sk_buff *skb, */ int ieee802154_hdr_peek(const struct sk_buff *skb, struct ieee802154_hdr *hdr); +/* pushes a beacon frame into an skb */ +int ieee802154_beacon_push(struct sk_buff *skb, + struct ieee802154_beacon_frame *beacon); + int ieee802154_max_payload(const struct ieee802154_hdr *hdr); static inline int diff --git a/net/ieee802154/header_ops.c b/net/ieee802154/header_ops.c index af337cf62764..35d384dfe29d 100644 --- a/net/ieee802154/header_ops.c +++ b/net/ieee802154/header_ops.c @@ -120,6 +120,30 @@ ieee802154_hdr_push(struct sk_buff *skb, struct ieee802154_hdr *hdr) } EXPORT_SYMBOL_GPL(ieee802154_hdr_push); +int ieee802154_beacon_push(struct sk_buff *skb, + struct ieee802154_beacon_frame *beacon) +{ + struct ieee802154_beacon_hdr *mac_pl = &beacon->mac_pl; + struct ieee802154_hdr *mhr = &beacon->mhr; + int ret; + + skb_reserve(skb, sizeof(*mhr)); + ret = ieee802154_hdr_push(skb, mhr); + if (ret < 0) + return ret; + + skb_reset_mac_header(skb); + skb->mac_len = ret; + + skb_put_data(skb, mac_pl, sizeof(*mac_pl)); + + if (mac_pl->pend_short_addr_count || mac_pl->pend_ext_addr_count) + return -EOPNOTSUPP; + + return 0; +} +EXPORT_SYMBOL_GPL(ieee802154_beacon_push); + static int ieee802154_hdr_get_addr(const u8 *buf, int mode, bool omit_pan, struct ieee802154_addr *addr) diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c index 187cebcaf233..5c3cb019f751 100644 --- a/net/mac802154/cfg.c +++ b/net/mac802154/cfg.c @@ -114,8 +114,8 @@ ieee802154_set_channel(struct wpan_phy *wpan_phy, u8 page, u8 channel) wpan_phy->current_channel == channel) return 0; - /* Refuse to change channels during a scanning operation */ - if (mac802154_is_scanning(local)) + /* Refuse to change channels during scanning or beaconing */ + if (mac802154_is_scanning(local) || mac802154_is_beaconing(local)) return -EBUSY; ret = drv_set_channel(local, page, channel); @@ -290,6 +290,31 @@ static int mac802154_abort_scan(struct wpan_phy *wpan_phy, return mac802154_abort_scan_locked(local, sdata); } +static int mac802154_send_beacons(struct wpan_phy *wpan_phy, + struct cfg802154_beacon_request *request) +{ + struct ieee802154_sub_if_data *sdata; + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(request->wpan_dev); + + ASSERT_RTNL(); + + return mac802154_send_beacons_locked(sdata, request); +} + +static int mac802154_stop_beacons(struct wpan_phy *wpan_phy, + struct wpan_dev *wpan_dev) +{ + struct ieee802154_local *local = wpan_phy_priv(wpan_phy); + struct ieee802154_sub_if_data *sdata; + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(wpan_dev); + + ASSERT_RTNL(); + + return mac802154_stop_beacons_locked(local, sdata); +} + #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL static void ieee802154_get_llsec_table(struct wpan_phy *wpan_phy, @@ -499,6 +524,8 @@ const struct cfg802154_ops mac802154_config_ops = { .set_ackreq_default = ieee802154_set_ackreq_default, .trigger_scan = mac802154_trigger_scan, .abort_scan = mac802154_abort_scan, + .send_beacons = mac802154_send_beacons, + .stop_beacons = mac802154_stop_beacons, #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL .get_llsec_table = ieee802154_get_llsec_table, .lock_llsec_table = ieee802154_lock_llsec_table, diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h index 0e4db967bd1d..63bab99ed368 100644 --- a/net/mac802154/ieee802154_i.h +++ b/net/mac802154/ieee802154_i.h @@ -23,6 +23,7 @@ enum ieee802154_ongoing { IEEE802154_IS_SCANNING = BIT(0), + IEEE802154_IS_BEACONING = BIT(1), }; /* mac802154 device private data */ @@ -60,6 +61,12 @@ struct ieee802154_local { struct cfg802154_scan_request __rcu *scan_req; struct delayed_work scan_work; + /* Beaconing */ + unsigned int beacon_interval; + struct ieee802154_beacon_frame beacon; + struct cfg802154_beacon_request __rcu *beacon_req; + struct delayed_work beacon_work; + /* Asynchronous tasks */ struct list_head rx_beacon_list; struct work_struct rx_beacon_work; @@ -257,6 +264,17 @@ static inline bool mac802154_is_scanning(struct ieee802154_local *local) return test_bit(IEEE802154_IS_SCANNING, &local->ongoing); } +void mac802154_beacon_worker(struct work_struct *work); +int mac802154_send_beacons_locked(struct ieee802154_sub_if_data *sdata, + struct cfg802154_beacon_request *request); +int mac802154_stop_beacons_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata); + +static inline bool mac802154_is_beaconing(struct ieee802154_local *local) +{ + return test_bit(IEEE802154_IS_BEACONING, &local->ongoing); +} + /* interface handling */ int ieee802154_iface_init(void); void ieee802154_iface_exit(void); diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c index a5958d323ea3..9d59caeb74e0 100644 --- a/net/mac802154/iface.c +++ b/net/mac802154/iface.c @@ -305,6 +305,9 @@ static int mac802154_slave_close(struct net_device *dev) if (mac802154_is_scanning(local)) mac802154_abort_scan_locked(local, sdata); + if (mac802154_is_beaconing(local)) + mac802154_stop_beacons_locked(local, sdata); + netif_stop_queue(dev); local->open_count--; diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c index 55550ead2ced..8d2eabc71bbe 100644 --- a/net/mac802154/llsec.c +++ b/net/mac802154/llsec.c @@ -707,7 +707,10 @@ int mac802154_llsec_encrypt(struct mac802154_llsec *sec, struct sk_buff *skb) hlen = ieee802154_hdr_pull(skb, &hdr); - if (hlen < 0 || hdr.fc.type != IEEE802154_FC_TYPE_DATA) + /* TODO: control frames security support */ + if (hlen < 0 || + (hdr.fc.type != IEEE802154_FC_TYPE_DATA && + hdr.fc.type != IEEE802154_FC_TYPE_BEACON)) return -EINVAL; if (!hdr.fc.security_enabled || diff --git a/net/mac802154/main.c b/net/mac802154/main.c index b1111279e06d..ee23e234b998 100644 --- a/net/mac802154/main.c +++ b/net/mac802154/main.c @@ -99,6 +99,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops) INIT_WORK(&local->sync_tx_work, ieee802154_xmit_sync_worker); INIT_DELAYED_WORK(&local->scan_work, mac802154_scan_worker); INIT_WORK(&local->rx_beacon_work, mac802154_rx_beacon_worker); + INIT_DELAYED_WORK(&local->beacon_work, mac802154_beacon_worker); /* init supported flags with 802.15.4 default ranges */ phy->supported.max_minbe = 8; diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c index 56056b9c93c1..cfbe20b1ec5e 100644 --- a/net/mac802154/scan.c +++ b/net/mac802154/scan.c @@ -16,6 +16,11 @@ #include "driver-ops.h" #include "../ieee802154/nl802154.h" +#define IEEE802154_BEACON_MHR_SZ 13 +#define IEEE802154_BEACON_PL_SZ 4 +#define IEEE802154_BEACON_SKB_SZ (IEEE802154_BEACON_MHR_SZ + \ + IEEE802154_BEACON_PL_SZ) + /* mac802154_scan_cleanup_locked() must be called upon scan completion or abort. * - Completions are asynchronous, not locked by the rtnl and decided by the * scan worker. @@ -286,3 +291,149 @@ int mac802154_process_beacon(struct ieee802154_local *local, return 0; } + +static int mac802154_transmit_beacon(struct ieee802154_local *local, + struct wpan_dev *wpan_dev) +{ + struct cfg802154_beacon_request *beacon_req; + struct ieee802154_sub_if_data *sdata; + struct sk_buff *skb; + int ret; + + /* Update the sequence number */ + local->beacon.mhr.seq = atomic_inc_return(&wpan_dev->bsn) & 0xFF; + + skb = alloc_skb(IEEE802154_BEACON_SKB_SZ, GFP_KERNEL); + if (!skb) + return -ENOBUFS; + + rcu_read_lock(); + beacon_req = rcu_dereference(local->beacon_req); + if (unlikely(!beacon_req)) { + rcu_read_unlock(); + kfree_skb(skb); + return -EINVAL; + } + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(beacon_req->wpan_dev); + skb->dev = sdata->dev; + + rcu_read_unlock(); + + ret = ieee802154_beacon_push(skb, &local->beacon); + if (ret) { + kfree_skb(skb); + return ret; + } + + return ieee802154_subif_start_xmit(skb, sdata->dev); +} + +void mac802154_beacon_worker(struct work_struct *work) +{ + struct ieee802154_local *local = + container_of(work, struct ieee802154_local, beacon_work.work); + struct cfg802154_beacon_request *beacon_req; + struct ieee802154_sub_if_data *sdata; + struct wpan_dev *wpan_dev; + int ret; + + rcu_read_lock(); + beacon_req = rcu_dereference(local->beacon_req); + if (unlikely(!beacon_req)) { + rcu_read_unlock(); + return; + } + + sdata = IEEE802154_WPAN_DEV_TO_SUB_IF(beacon_req->wpan_dev); + + /* Wait an arbitrary amount of time in case we cannot use the device */ + if (local->suspended || !ieee802154_sdata_running(sdata)) { + rcu_read_unlock(); + queue_delayed_work(local->mac_wq, &local->beacon_work, + msecs_to_jiffies(1000)); + return; + } + + wpan_dev = beacon_req->wpan_dev; + + rcu_read_unlock(); + + dev_dbg(&sdata->dev->dev, "Sending beacon\n"); + ret = mac802154_transmit_beacon(local, wpan_dev); + if (ret) + dev_err(&sdata->dev->dev, + "Beacon could not be transmitted (%d)\n", ret); + + if (local->beacon_interval >= 0) + queue_delayed_work(local->mac_wq, &local->beacon_work, + local->beacon_interval); +} + +int mac802154_stop_beacons_locked(struct ieee802154_local *local, + struct ieee802154_sub_if_data *sdata) +{ + struct wpan_dev *wpan_dev = &sdata->wpan_dev; + struct cfg802154_beacon_request *request; + + ASSERT_RTNL(); + + if (!mac802154_is_beaconing(local)) + return -ESRCH; + + clear_bit(IEEE802154_IS_BEACONING, &local->ongoing); + cancel_delayed_work(&local->beacon_work); + request = rcu_replace_pointer(local->beacon_req, NULL, 1); + if (!request) + return 0; + kfree_rcu(request); + + nl802154_beaconing_done(wpan_dev); + + return 0; +} + +int mac802154_send_beacons_locked(struct ieee802154_sub_if_data *sdata, + struct cfg802154_beacon_request *request) +{ + struct ieee802154_local *local = sdata->local; + + ASSERT_RTNL(); + + if (mac802154_is_beaconing(local)) + mac802154_stop_beacons_locked(local, sdata); + + /* Store beaconing parameters */ + rcu_assign_pointer(local->beacon_req, request); + + set_bit(IEEE802154_IS_BEACONING, &local->ongoing); + + memset(&local->beacon, 0, sizeof(local->beacon)); + local->beacon.mhr.fc.type = IEEE802154_FC_TYPE_BEACON; + local->beacon.mhr.fc.security_enabled = 0; + local->beacon.mhr.fc.frame_pending = 0; + local->beacon.mhr.fc.ack_request = 0; + local->beacon.mhr.fc.intra_pan = 0; + local->beacon.mhr.fc.dest_addr_mode = IEEE802154_NO_ADDRESSING; + local->beacon.mhr.fc.version = IEEE802154_2003_STD; + local->beacon.mhr.fc.source_addr_mode = IEEE802154_EXTENDED_ADDRESSING; + atomic_set(&request->wpan_dev->bsn, -1); + local->beacon.mhr.source.mode = IEEE802154_ADDR_LONG; + local->beacon.mhr.source.pan_id = cpu_to_le16(request->wpan_dev->pan_id); + local->beacon.mhr.source.extended_addr = cpu_to_le64(request->wpan_dev->extended_addr); + local->beacon.mac_pl.beacon_order = request->interval; + local->beacon.mac_pl.superframe_order = request->interval; + local->beacon.mac_pl.final_cap_slot = 0xf; + local->beacon.mac_pl.battery_life_ext = 0; + /* TODO: Fill this field depending on the coordinator capacity */ + local->beacon.mac_pl.pan_coordinator = 1; + local->beacon.mac_pl.assoc_permit = 1; + + /* Start the beacon work */ + local->beacon_interval = + mac802154_scan_get_channel_time(request->interval, + request->wpan_phy->symbol_duration); + queue_delayed_work(local->mac_wq, &local->beacon_work, 0); + + return 0; +} -- cgit v1.2.3 From bf3849755ac606f2a04808b6b706a16867d1e1b8 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sat, 28 Jan 2023 01:06:20 +0100 Subject: bpf: Use ARG_CONST_SIZE_OR_ZERO for 3rd argument of bpf_tcp_raw_gen_syncookie_ipv{4,6}() These functions already check that th_len < sizeof(*th), and propagating the lower bound (th_len > 0) may be challenging in complex code, e.g. as is the case with xdp_synproxy test on s390x [1]. Switch to ARG_CONST_SIZE_OR_ZERO in order to make the verifier accept code where it cannot prove that th_len > 0. [1] https://lore.kernel.org/bpf/CAEf4Bzb3uiSHtUbgVWmkWuJ5Sw1UZd4c_iuS4QXtUkXmTTtXuQ@mail.gmail.com/ Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230128000650.1516334-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- net/core/filter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/core/filter.c b/net/core/filter.c index d8f9b53f3db6..0039cf16713e 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7536,7 +7536,7 @@ static const struct bpf_func_proto bpf_tcp_raw_gen_syncookie_ipv4_proto = { .arg1_type = ARG_PTR_TO_FIXED_SIZE_MEM, .arg1_size = sizeof(struct iphdr), .arg2_type = ARG_PTR_TO_MEM, - .arg3_type = ARG_CONST_SIZE, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, }; BPF_CALL_3(bpf_tcp_raw_gen_syncookie_ipv6, struct ipv6hdr *, iph, @@ -7568,7 +7568,7 @@ static const struct bpf_func_proto bpf_tcp_raw_gen_syncookie_ipv6_proto = { .arg1_type = ARG_PTR_TO_FIXED_SIZE_MEM, .arg1_size = sizeof(struct ipv6hdr), .arg2_type = ARG_PTR_TO_MEM, - .arg3_type = ARG_CONST_SIZE, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, }; BPF_CALL_2(bpf_tcp_raw_check_syncookie_ipv4, struct iphdr *, iph, -- cgit v1.2.3 From be6b5c10ecc4014446e5c807d6a69c5a7cc1c497 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sat, 28 Jan 2023 01:06:33 +0100 Subject: selftests/bpf: Add a sign-extension test for kfuncs s390x ABI requires the caller to zero- or sign-extend the arguments. eBPF already deals with zero-extension (by definition of its ABI), but not with sign-extension. Add a test to cover that potentially problematic area. Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230128000650.1516334-15-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- net/bpf/test_run.c | 9 +++++++++ tools/testing/selftests/bpf/prog_tests/kfunc_call.c | 1 + tools/testing/selftests/bpf/progs/kfunc_call_test.c | 18 ++++++++++++++++++ 3 files changed, 28 insertions(+) (limited to 'net') diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 8da0d73b368e..7dbefa4fd2eb 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -550,6 +550,14 @@ struct sock * noinline bpf_kfunc_call_test3(struct sock *sk) return sk; } +long noinline bpf_kfunc_call_test4(signed char a, short b, int c, long d) +{ + /* Provoke the compiler to assume that the caller has sign-extended a, + * b and c on platforms where this is required (e.g. s390x). + */ + return (long)a + (long)b + (long)c + d; +} + struct prog_test_member1 { int a; }; @@ -746,6 +754,7 @@ BTF_SET8_START(test_sk_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_kfunc_call_test1) BTF_ID_FLAGS(func, bpf_kfunc_call_test2) BTF_ID_FLAGS(func, bpf_kfunc_call_test3) +BTF_ID_FLAGS(func, bpf_kfunc_call_test4) BTF_ID_FLAGS(func, bpf_kfunc_call_test_acquire, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE) diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c index 5af1ee8f0e6e..bb4cd82a788a 100644 --- a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c +++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c @@ -72,6 +72,7 @@ static struct kfunc_test_params kfunc_tests[] = { /* success cases */ TC_TEST(kfunc_call_test1, 12), TC_TEST(kfunc_call_test2, 3), + TC_TEST(kfunc_call_test4, -1234), TC_TEST(kfunc_call_test_ref_btf_id, 0), TC_TEST(kfunc_call_test_get_mem, 42), SYSCALL_TEST(kfunc_syscall_test, 0), diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test.c b/tools/testing/selftests/bpf/progs/kfunc_call_test.c index f636e50be259..d91c58d06d38 100644 --- a/tools/testing/selftests/bpf/progs/kfunc_call_test.c +++ b/tools/testing/selftests/bpf/progs/kfunc_call_test.c @@ -3,6 +3,7 @@ #include #include +extern long bpf_kfunc_call_test4(signed char a, short b, int c, long d) __ksym; extern int bpf_kfunc_call_test2(struct sock *sk, __u32 a, __u32 b) __ksym; extern __u64 bpf_kfunc_call_test1(struct sock *sk, __u32 a, __u64 b, __u32 c, __u64 d) __ksym; @@ -17,6 +18,23 @@ extern void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym; extern int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size) __ksym; extern int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) __ksym; +SEC("tc") +int kfunc_call_test4(struct __sk_buff *skb) +{ + struct bpf_sock *sk = skb->sk; + long tmp; + + if (!sk) + return -1; + + sk = bpf_sk_fullsock(sk); + if (!sk) + return -1; + + tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000); + return (tmp >> 32) + tmp; +} + SEC("tc") int kfunc_call_test2(struct __sk_buff *skb) { -- cgit v1.2.3 From 7d7e9169a3ecdc14a4921b2130066ce26952c9e1 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 27 Jan 2023 16:50:40 +0100 Subject: devlink: move devlink reload notifications back in between _down() and _up() calls This effectively reverts commit 05a7f4a8dff1 ("devlink: Break parameter notification sequence to be before/after unload/load driver"). Cited commit resolved a problem in mlx5 params implementation, when param notification code accessed memory previously freed during reload. Now, when the params can be registered and unregistered when devlink instance is registered, mlx5 code unregisters the problematic param during devlink reload. The fix is therefore no longer needed. Current behavior is a it problematic, as it sends DEL notifications even in potential case when reload_down() call fails which might confuse userspace notifications listener. So move the reload notifications back where they were originally in between reload_down() and reload_up() calls. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index bd4c5d2dd612..24e20861a28b 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -4235,12 +4235,11 @@ static void devlink_param_notify(struct devlink *devlink, struct devlink_param_item *param_item, enum devlink_command cmd); -static void devlink_ns_change_notify(struct devlink *devlink, - struct net *dest_net, struct net *curr_net, - bool new) +static void devlink_reload_netns_change(struct devlink *devlink, + struct net *curr_net, + struct net *dest_net) { struct devlink_param_item *param_item; - enum devlink_command cmd; /* Userspace needs to be notified about devlink objects * removed from original and entering new network namespace. @@ -4248,18 +4247,19 @@ static void devlink_ns_change_notify(struct devlink *devlink, * reload process so the notifications are generated separatelly. */ - if (!dest_net || net_eq(dest_net, curr_net)) - return; + list_for_each_entry(param_item, &devlink->param_list, list) + devlink_param_notify(devlink, 0, param_item, + DEVLINK_CMD_PARAM_DEL); + devlink_notify(devlink, DEVLINK_CMD_DEL); - if (new) - devlink_notify(devlink, DEVLINK_CMD_NEW); + move_netdevice_notifier_net(curr_net, dest_net, + &devlink->netdevice_nb); + write_pnet(&devlink->_net, dest_net); - cmd = new ? DEVLINK_CMD_PARAM_NEW : DEVLINK_CMD_PARAM_DEL; + devlink_notify(devlink, DEVLINK_CMD_NEW); list_for_each_entry(param_item, &devlink->param_list, list) - devlink_param_notify(devlink, 0, param_item, cmd); - - if (!new) - devlink_notify(devlink, DEVLINK_CMD_DEL); + devlink_param_notify(devlink, 0, param_item, + DEVLINK_CMD_PARAM_NEW); } static void devlink_reload_failed_set(struct devlink *devlink, @@ -4341,24 +4341,19 @@ int devlink_reload(struct devlink *devlink, struct net *dest_net, memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, sizeof(remote_reload_stats)); - curr_net = devlink_net(devlink); - devlink_ns_change_notify(devlink, dest_net, curr_net, false); err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack); if (err) return err; - if (dest_net && !net_eq(dest_net, curr_net)) { - move_netdevice_notifier_net(curr_net, dest_net, - &devlink->netdevice_nb); - write_pnet(&devlink->_net, dest_net); - } + curr_net = devlink_net(devlink); + if (dest_net && !net_eq(dest_net, curr_net)) + devlink_reload_netns_change(devlink, curr_net, dest_net); err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack); devlink_reload_failed_set(devlink, !!err); if (err) return err; - devlink_ns_change_notify(devlink, dest_net, curr_net, true); WARN_ON(!(*actions_performed & BIT(action))); /* Catch driver on updating the remote action within devlink reload */ WARN_ON(memcmp(remote_reload_stats, devlink->stats.remote_reload_stats, -- cgit v1.2.3 From a131315a47bbb5e89a3330cac69fe6ab4836f762 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 27 Jan 2023 16:50:41 +0100 Subject: devlink: send objects notifications during devlink reload Currently, the notifications are only sent for params. People who introduced other objects forgot to add the reload notifications here. To make sure all notifications happen according to existing comment, benefit from existence of devlink_notify_register/unregister() helpers and use them in reload code. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 20 ++------------------ 1 file changed, 2 insertions(+), 18 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 24e20861a28b..4f78ef5a46af 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -4230,36 +4230,20 @@ static struct net *devlink_netns_get(struct sk_buff *skb, return net; } -static void devlink_param_notify(struct devlink *devlink, - unsigned int port_index, - struct devlink_param_item *param_item, - enum devlink_command cmd); - static void devlink_reload_netns_change(struct devlink *devlink, struct net *curr_net, struct net *dest_net) { - struct devlink_param_item *param_item; - /* Userspace needs to be notified about devlink objects * removed from original and entering new network namespace. * The rest of the devlink objects are re-created during * reload process so the notifications are generated separatelly. */ - - list_for_each_entry(param_item, &devlink->param_list, list) - devlink_param_notify(devlink, 0, param_item, - DEVLINK_CMD_PARAM_DEL); - devlink_notify(devlink, DEVLINK_CMD_DEL); - + devlink_notify_unregister(devlink); move_netdevice_notifier_net(curr_net, dest_net, &devlink->netdevice_nb); write_pnet(&devlink->_net, dest_net); - - devlink_notify(devlink, DEVLINK_CMD_NEW); - list_for_each_entry(param_item, &devlink->param_list, list) - devlink_param_notify(devlink, 0, param_item, - DEVLINK_CMD_PARAM_NEW); + devlink_notify_register(devlink); } static void devlink_reload_failed_set(struct devlink *devlink, -- cgit v1.2.3 From fb8421a94c5613fee86e192bab0892ecb1d56e4c Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 27 Jan 2023 16:50:42 +0100 Subject: devlink: remove devlink features Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 1 - .../ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c | 1 - .../ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c | 1 - drivers/net/ethernet/intel/ice/ice_devlink.c | 1 - drivers/net/ethernet/mellanox/mlx4/main.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 9 +++++---- drivers/net/ethernet/mellanox/mlxsw/core.c | 1 - drivers/net/netdevsim/dev.c | 1 - include/net/devlink.h | 2 +- net/devlink/core.c | 19 ------------------- net/devlink/devl_internal.h | 1 - net/devlink/leftover.c | 3 --- 12 files changed, 6 insertions(+), 35 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index 26913dc816d3..8b3e7697390f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -1303,7 +1303,6 @@ int bnxt_dl_register(struct bnxt *bp) if (rc) goto err_dl_port_unreg; - devlink_set_features(dl, DEVLINK_F_RELOAD); out: devlink_register(dl); return 0; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c index 3d3b69605423..9a939c0b217f 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c @@ -114,7 +114,6 @@ int hclge_devlink_init(struct hclge_dev *hdev) priv->hdev = hdev; hdev->devlink = devlink; - devlink_set_features(devlink, DEVLINK_F_RELOAD); devlink_register(devlink); return 0; } diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c index a6c3c5e8f0ab..1b535142c65a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c @@ -116,7 +116,6 @@ int hclgevf_devlink_init(struct hclgevf_dev *hdev) priv->hdev = hdev; hdev->devlink = devlink; - devlink_set_features(devlink, DEVLINK_F_RELOAD); devlink_register(devlink); return 0; } diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c index ce753d23aba9..88497363fc4c 100644 --- a/drivers/net/ethernet/intel/ice/ice_devlink.c +++ b/drivers/net/ethernet/intel/ice/ice_devlink.c @@ -1376,7 +1376,6 @@ void ice_devlink_register(struct ice_pf *pf) { struct devlink *devlink = priv_to_devlink(pf); - devlink_set_features(devlink, DEVLINK_F_RELOAD); devlink_register(devlink); } diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 6152f77dcfd8..277738c50c56 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -4031,7 +4031,6 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) goto err_params_unregister; pci_save_state(pdev); - devlink_set_features(devlink, DEVLINK_F_RELOAD); devl_unlock(devlink); devlink_register(devlink); return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c index 95a69544a685..63fb7912b032 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -156,6 +156,11 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change, return -EOPNOTSUPP; } + if (mlx5_core_is_mp_slave(dev)) { + NL_SET_ERR_MSG_MOD(extack, "reload is unsupported for multi port slave"); + return -EOPNOTSUPP; + } + if (pci_num_vf(pdev)) { NL_SET_ERR_MSG_MOD(extack, "reload while VFs are present is unfavorable"); } @@ -744,7 +749,6 @@ void mlx5_devlink_traps_unregister(struct devlink *devlink) int mlx5_devlink_params_register(struct devlink *devlink) { - struct mlx5_core_dev *dev = devlink_priv(devlink); int err; err = devl_params_register(devlink, mlx5_devlink_params, @@ -762,9 +766,6 @@ int mlx5_devlink_params_register(struct devlink *devlink) if (err) goto max_uc_list_err; - if (!mlx5_core_is_mp_slave(dev)) - devlink_set_features(devlink, DEVLINK_F_RELOAD); - return 0; max_uc_list_err: diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index f8623e8388c8..42422a106433 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -2285,7 +2285,6 @@ __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info, } if (!reload) { - devlink_set_features(devlink, DEVLINK_F_RELOAD); devl_unlock(devlink); devlink_register(devlink); } diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c index f88095b0f836..6045bece2654 100644 --- a/drivers/net/netdevsim/dev.c +++ b/drivers/net/netdevsim/dev.c @@ -1609,7 +1609,6 @@ int nsim_drv_probe(struct nsim_bus_dev *nsim_bus_dev) goto err_hwstats_exit; nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY; - devlink_set_features(devlink, DEVLINK_F_RELOAD); devl_unlock(devlink); return 0; diff --git a/include/net/devlink.h b/include/net/devlink.h index ab654cf552b8..2e85a5970a32 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1645,7 +1645,7 @@ static inline struct devlink *devlink_alloc(const struct devlink_ops *ops, { return devlink_alloc_ns(ops, priv_size, &init_net, dev); } -void devlink_set_features(struct devlink *devlink, u64 features); + int devl_register(struct devlink *devlink); void devl_unregister(struct devlink *devlink); void devlink_register(struct devlink *devlink); diff --git a/net/devlink/core.c b/net/devlink/core.c index 6c0e2fc57e45..aeffd1b8206d 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -125,23 +125,6 @@ next: goto retry; } -/** - * devlink_set_features - Set devlink supported features - * - * @devlink: devlink - * @features: devlink support features - * - * This interface allows us to set reload ops separatelly from - * the devlink_alloc. - */ -void devlink_set_features(struct devlink *devlink, u64 features) -{ - WARN_ON(features & DEVLINK_F_RELOAD && - !devlink_reload_supported(devlink->ops)); - devlink->features = features; -} -EXPORT_SYMBOL_GPL(devlink_set_features); - /** * devl_register - Register devlink instance * @devlink: devlink @@ -303,7 +286,6 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) * all devlink instances from this namespace into init_net. */ devlinks_xa_for_each_registered_get(net, index, devlink) { - WARN_ON(!(devlink->features & DEVLINK_F_RELOAD)); devl_lock(devlink); err = 0; if (devl_is_registered(devlink)) @@ -313,7 +295,6 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) &actions_performed, NULL); devl_unlock(devlink); devlink_put(devlink); - if (err && err != -EOPNOTSUPP) pr_warn("Failed to reload devlink instance into init_net\n"); } diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index d0d889038138..ba161de4120e 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -38,7 +38,6 @@ struct devlink { struct list_head trap_policer_list; struct list_head linecard_list; const struct devlink_ops *ops; - u64 features; struct xarray snapshot_ids; struct devlink_dev_stats stats; struct device *dev; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 4f78ef5a46af..92210587d349 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -4387,9 +4387,6 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) u32 actions_performed; int err; - if (!(devlink->features & DEVLINK_F_RELOAD)) - return -EOPNOTSUPP; - err = devlink_resources_validate(devlink, NULL, info); if (err) { NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed"); -- cgit v1.2.3 From 8395406b3495235d73c7aa86ef8df97830e036d6 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 19 Dec 2022 14:22:12 +0000 Subject: rxrpc: Fix trace string Fix a trace string to indicate that it's discarding the local endpoint for a preallocated peer, not a preallocated connection. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- include/trace/events/rxrpc.h | 2 +- net/rxrpc/call_accept.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 283db0ea3db4..31524d605319 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -163,7 +163,7 @@ EM(rxrpc_local_put_for_use, "PUT for-use ") \ EM(rxrpc_local_put_kill_conn, "PUT conn-kil") \ EM(rxrpc_local_put_peer, "PUT peer ") \ - EM(rxrpc_local_put_prealloc_conn, "PUT conn-pre") \ + EM(rxrpc_local_put_prealloc_peer, "PUT peer-pre") \ EM(rxrpc_local_put_release_sock, "PUT rel-sock") \ EM(rxrpc_local_stop, "STOP ") \ EM(rxrpc_local_stopped, "STOPPED ") \ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 3e8689fdc437..0f5a1d77b890 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -195,7 +195,7 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx) tail = b->peer_backlog_tail; while (CIRC_CNT(head, tail, size) > 0) { struct rxrpc_peer *peer = b->peer_backlog[tail]; - rxrpc_put_local(peer->local, rxrpc_local_put_prealloc_conn); + rxrpc_put_local(peer->local, rxrpc_local_put_prealloc_peer); kfree(peer); tail = (tail + 1) & (size - 1); } -- cgit v1.2.3 From 8338304c2719fb7aec8e4b5dd9fa684b719db06e Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 30 Jan 2023 16:43:06 +0100 Subject: mac802154: Avoid superfluous endianness handling When compiling scan.c with C=1, Sparse complains with: sparse: expected unsigned short [usertype] val sparse: got restricted __le16 [usertype] pan_id sparse: sparse: cast from restricted __le16 sparse: expected unsigned long long [usertype] val sparse: got restricted __le64 [usertype] extended_addr sparse: sparse: cast from restricted __le64 The tool is right, both pan_id and extended_addr already are rightfully defined as being __le16 and __le64 on both sides of the operations and do not require extra endianness handling. Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: kernel test robot Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230130154306.114265-1-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/mac802154/scan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c index cfbe20b1ec5e..8f98efec7753 100644 --- a/net/mac802154/scan.c +++ b/net/mac802154/scan.c @@ -419,8 +419,8 @@ int mac802154_send_beacons_locked(struct ieee802154_sub_if_data *sdata, local->beacon.mhr.fc.source_addr_mode = IEEE802154_EXTENDED_ADDRESSING; atomic_set(&request->wpan_dev->bsn, -1); local->beacon.mhr.source.mode = IEEE802154_ADDR_LONG; - local->beacon.mhr.source.pan_id = cpu_to_le16(request->wpan_dev->pan_id); - local->beacon.mhr.source.extended_addr = cpu_to_le64(request->wpan_dev->extended_addr); + local->beacon.mhr.source.pan_id = request->wpan_dev->pan_id; + local->beacon.mhr.source.extended_addr = request->wpan_dev->extended_addr; local->beacon.mac_pl.beacon_order = request->interval; local->beacon.mac_pl.superframe_order = request->interval; local->beacon.mac_pl.final_cap_slot = 0xf; -- cgit v1.2.3 From 223f59016fa2b6d01814dc53ace1c146857ba236 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 12 Oct 2022 22:06:52 +0100 Subject: rxrpc: Convert call->recvmsg_lock to a spinlock Convert call->recvmsg_lock to a spinlock as it's only ever write-locked. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/af_rxrpc.c | 2 +- net/rxrpc/ar-internal.h | 2 +- net/rxrpc/call_object.c | 4 ++-- net/rxrpc/recvmsg.c | 12 ++++++------ 4 files changed, 10 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index ebbd4a1c3f86..102f5cbff91a 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -786,7 +786,7 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol, INIT_LIST_HEAD(&rx->sock_calls); INIT_LIST_HEAD(&rx->to_be_accepted); INIT_LIST_HEAD(&rx->recvmsg_q); - rwlock_init(&rx->recvmsg_lock); + spin_lock_init(&rx->recvmsg_lock); rwlock_init(&rx->call_lock); memset(&rx->srx, 0, sizeof(rx->srx)); diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 433060cade03..808c08cb2ae5 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -149,7 +149,7 @@ struct rxrpc_sock { struct list_head sock_calls; /* List of calls owned by this socket */ struct list_head to_be_accepted; /* calls awaiting acceptance */ struct list_head recvmsg_q; /* Calls awaiting recvmsg's attention */ - rwlock_t recvmsg_lock; /* Lock for recvmsg_q */ + spinlock_t recvmsg_lock; /* Lock for recvmsg_q */ struct key *key; /* security for this socket */ struct key *securities; /* list of server security descriptors */ struct rb_root calls; /* User ID -> call mapping */ diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index f3c9f0201c15..0012589f2aad 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -560,7 +560,7 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call) rxrpc_put_call_slot(call); /* Make sure we don't get any more notifications */ - write_lock(&rx->recvmsg_lock); + spin_lock(&rx->recvmsg_lock); if (!list_empty(&call->recvmsg_link)) { _debug("unlinking once-pending call %p { e=%lx f=%lx }", @@ -573,7 +573,7 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call) call->recvmsg_link.next = NULL; call->recvmsg_link.prev = NULL; - write_unlock(&rx->recvmsg_lock); + spin_unlock(&rx->recvmsg_lock); if (put) rxrpc_put_call(call, rxrpc_call_put_unnotify); diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index dd54ceee7bcc..b7545fdc0401 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -40,12 +40,12 @@ void rxrpc_notify_socket(struct rxrpc_call *call) call->notify_rx(sk, call, call->user_call_ID); spin_unlock(&call->notify_lock); } else { - write_lock(&rx->recvmsg_lock); + spin_lock(&rx->recvmsg_lock); if (list_empty(&call->recvmsg_link)) { rxrpc_get_call(call, rxrpc_call_get_notify_socket); list_add_tail(&call->recvmsg_link, &rx->recvmsg_q); } - write_unlock(&rx->recvmsg_lock); + spin_unlock(&rx->recvmsg_lock); if (!sock_flag(sk, SOCK_DEAD)) { _debug("call %ps", sk->sk_data_ready); @@ -335,14 +335,14 @@ try_again: /* Find the next call and dequeue it if we're not just peeking. If we * do dequeue it, that comes with a ref that we will need to release. */ - write_lock(&rx->recvmsg_lock); + spin_lock(&rx->recvmsg_lock); l = rx->recvmsg_q.next; call = list_entry(l, struct rxrpc_call, recvmsg_link); if (!(flags & MSG_PEEK)) list_del_init(&call->recvmsg_link); else rxrpc_get_call(call, rxrpc_call_get_recvmsg); - write_unlock(&rx->recvmsg_lock); + spin_unlock(&rx->recvmsg_lock); call_debug_id = call->debug_id; trace_rxrpc_recvmsg(call_debug_id, rxrpc_recvmsg_dequeue, 0); @@ -431,9 +431,9 @@ error_unlock_call: error_requeue_call: if (!(flags & MSG_PEEK)) { - write_lock(&rx->recvmsg_lock); + spin_lock(&rx->recvmsg_lock); list_add(&call->recvmsg_link, &rx->recvmsg_q); - write_unlock(&rx->recvmsg_lock); + spin_unlock(&rx->recvmsg_lock); trace_rxrpc_recvmsg(call_debug_id, rxrpc_recvmsg_requeue, 0); } else { rxrpc_put_call(call, rxrpc_call_put_recvmsg); -- cgit v1.2.3 From af094824f20b454ee23b7b5a860b3ba58f4e6938 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 17 Oct 2022 08:54:57 +0100 Subject: rxrpc: Allow a delay to be injected into packet reception If CONFIG_AF_RXRPC_DEBUG_RX_DELAY=y, then a delay is injected between packets and errors being received and them being made available to the processing code, thereby allowing the RTT to be artificially increased. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/Kconfig | 9 +++++++++ net/rxrpc/ar-internal.h | 6 ++++++ net/rxrpc/io_thread.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- net/rxrpc/local_object.c | 6 ++++++ net/rxrpc/misc.c | 7 +++++++ net/rxrpc/sysctl.c | 17 ++++++++++++++++- 6 files changed, 91 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/rxrpc/Kconfig b/net/rxrpc/Kconfig index 7ae023b37a83..a20986806fea 100644 --- a/net/rxrpc/Kconfig +++ b/net/rxrpc/Kconfig @@ -36,6 +36,15 @@ config AF_RXRPC_INJECT_LOSS Say Y here to inject packet loss by discarding some received and some transmitted packets. +config AF_RXRPC_INJECT_RX_DELAY + bool "Inject delay into packet reception" + depends on SYSCTL + help + Say Y here to inject a delay into packet reception, allowing an + extended RTT time to be modelled. The delay can be configured using + /proc/sys/net/rxrpc/rxrpc_inject_rx_delay, setting a number of + milliseconds up to 0.5s (note that the granularity is actually in + jiffies). config AF_RXRPC_DEBUG bool "RxRPC dynamic debugging" diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 808c08cb2ae5..bfae4a87626f 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -285,6 +285,9 @@ struct rxrpc_local { struct completion io_thread_ready; /* Indication that the I/O thread started */ struct rxrpc_sock *service; /* Service(s) listening on this endpoint */ struct rw_semaphore defrag_sem; /* control re-enablement of IP DF bit */ +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + struct sk_buff_head rx_delay_queue; /* Delay injection queue */ +#endif struct sk_buff_head rx_queue; /* Received packets */ struct list_head conn_attend_q; /* Conns requiring immediate attention */ struct list_head call_attend_q; /* Calls requiring immediate attention */ @@ -1109,6 +1112,9 @@ extern unsigned long rxrpc_idle_ack_delay; extern unsigned int rxrpc_rx_window_size; extern unsigned int rxrpc_rx_mtu; extern unsigned int rxrpc_rx_jumbo_max; +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY +extern unsigned long rxrpc_inject_rx_delay; +#endif /* * net_ns.c diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index 9e9dfb2fc559..4a3a08a0e2cd 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -25,6 +25,7 @@ static int rxrpc_input_packet_on_conn(struct rxrpc_connection *conn, */ int rxrpc_encap_rcv(struct sock *udp_sk, struct sk_buff *skb) { + struct sk_buff_head *rx_queue; struct rxrpc_local *local = rcu_dereference_sk_user_data(udp_sk); if (unlikely(!local)) { @@ -36,7 +37,16 @@ int rxrpc_encap_rcv(struct sock *udp_sk, struct sk_buff *skb) skb->mark = RXRPC_SKB_MARK_PACKET; rxrpc_new_skb(skb, rxrpc_skb_new_encap_rcv); - skb_queue_tail(&local->rx_queue, skb); + rx_queue = &local->rx_queue; +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + if (rxrpc_inject_rx_delay || + !skb_queue_empty(&local->rx_delay_queue)) { + skb->tstamp = ktime_add_ms(skb->tstamp, rxrpc_inject_rx_delay); + rx_queue = &local->rx_delay_queue; + } +#endif + + skb_queue_tail(rx_queue, skb); rxrpc_wake_up_io_thread(local); return 0; } @@ -407,6 +417,9 @@ int rxrpc_io_thread(void *data) struct rxrpc_local *local = data; struct rxrpc_call *call; struct sk_buff *skb; +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + ktime_t now; +#endif bool should_stop; complete(&local->io_thread_ready); @@ -481,6 +494,17 @@ int rxrpc_io_thread(void *data) continue; } + /* Inject a delay into packets if requested. */ +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + now = ktime_get_real(); + while ((skb = skb_peek(&local->rx_delay_queue))) { + if (ktime_before(now, skb->tstamp)) + break; + skb = skb_dequeue(&local->rx_delay_queue); + skb_queue_tail(&local->rx_queue, skb); + } +#endif + if (!skb_queue_empty(&local->rx_queue)) { spin_lock_irq(&local->rx_queue.lock); skb_queue_splice_tail_init(&local->rx_queue, &rx_queue); @@ -502,6 +526,28 @@ int rxrpc_io_thread(void *data) if (should_stop) break; + +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + skb = skb_peek(&local->rx_delay_queue); + if (skb) { + unsigned long timeout; + ktime_t tstamp = skb->tstamp; + ktime_t now = ktime_get_real(); + s64 delay_ns = ktime_to_ns(ktime_sub(tstamp, now)); + + if (delay_ns <= 0) { + __set_current_state(TASK_RUNNING); + continue; + } + + timeout = nsecs_to_jiffies(delay_ns); + timeout = max(timeout, 1UL); + schedule_timeout(timeout); + __set_current_state(TASK_RUNNING); + continue; + } +#endif + schedule(); } diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index b8eaca5d9f22..07d83a4e5841 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -110,6 +110,9 @@ static struct rxrpc_local *rxrpc_alloc_local(struct net *net, INIT_HLIST_NODE(&local->link); init_rwsem(&local->defrag_sem); init_completion(&local->io_thread_ready); +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + skb_queue_head_init(&local->rx_delay_queue); +#endif skb_queue_head_init(&local->rx_queue); INIT_LIST_HEAD(&local->conn_attend_q); INIT_LIST_HEAD(&local->call_attend_q); @@ -434,6 +437,9 @@ void rxrpc_destroy_local(struct rxrpc_local *local) /* At this point, there should be no more packets coming in to the * local endpoint. */ +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + rxrpc_purge_queue(&local->rx_delay_queue); +#endif rxrpc_purge_queue(&local->rx_queue); rxrpc_purge_client_connections(local); } diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c index 056c428d8bf3..825b81183046 100644 --- a/net/rxrpc/misc.c +++ b/net/rxrpc/misc.c @@ -53,3 +53,10 @@ unsigned int rxrpc_rx_mtu = 5692; * sender that we're willing to handle. */ unsigned int rxrpc_rx_jumbo_max = 4; + +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY +/* + * The delay to inject into packet reception. + */ +unsigned long rxrpc_inject_rx_delay; +#endif diff --git a/net/rxrpc/sysctl.c b/net/rxrpc/sysctl.c index cde3224a5cd2..ecaeb4ecfb58 100644 --- a/net/rxrpc/sysctl.c +++ b/net/rxrpc/sysctl.c @@ -17,6 +17,9 @@ static const unsigned int n_65535 = 65535; static const unsigned int n_max_acks = 255; static const unsigned long one_jiffy = 1; static const unsigned long max_jiffies = MAX_JIFFY_OFFSET; +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY +static const unsigned long max_500 = 500; +#endif /* * RxRPC operating parameters. @@ -63,6 +66,19 @@ static struct ctl_table rxrpc_sysctl_table[] = { .extra2 = (void *)&max_jiffies, }, + /* Values used in milliseconds */ +#ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY + { + .procname = "inject_rx_delay", + .data = &rxrpc_inject_rx_delay, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + .extra1 = (void *)SYSCTL_LONG_ZERO, + .extra2 = (void *)&max_500, + }, +#endif + /* Non-time values */ { .procname = "reap_client_conns", @@ -109,7 +125,6 @@ static struct ctl_table rxrpc_sysctl_table[] = { .extra1 = (void *)SYSCTL_ONE, .extra2 = (void *)&four, }, - { } }; -- cgit v1.2.3 From 84e28aa513af814807a5e9a0e5f3cab773946f3c Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 17 Oct 2022 10:55:41 +0100 Subject: rxrpc: Generate extra pings for RTT during heavy-receive call When doing a call that has a single transmitted data packet and a massive amount of received data packets, we only ping for one RTT sample, which means we don't get a good reading on it. Fix this by converting occasional IDLE ACKs into PING ACKs to elicit a response. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- include/trace/events/rxrpc.h | 3 ++- net/rxrpc/call_event.c | 15 ++++++++++++--- net/rxrpc/output.c | 7 +++++-- 3 files changed, 19 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index cdcadb1345dc..450b8f345814 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -360,11 +360,12 @@ EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ EM(rxrpc_propose_ack_input_data, "DataIn ") \ EM(rxrpc_propose_ack_input_data_hole, "DataInH") \ - EM(rxrpc_propose_ack_ping_for_check_life, "ChkLife") \ EM(rxrpc_propose_ack_ping_for_keepalive, "KeepAlv") \ EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \ EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \ + EM(rxrpc_propose_ack_ping_for_old_rtt, "OldRtt ") \ EM(rxrpc_propose_ack_ping_for_params, "Params ") \ + EM(rxrpc_propose_ack_ping_for_rtt, "Rtt ") \ EM(rxrpc_propose_ack_processing_op, "ProcOp ") \ EM(rxrpc_propose_ack_respond_to_ack, "Rsp2Ack") \ EM(rxrpc_propose_ack_respond_to_ping, "Rsp2Png") \ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 1abdef15debc..cf9799be4286 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -498,9 +498,18 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, rxrpc_propose_ack_rx_idle); - if (atomic_read(&call->ackr_nr_unacked) > 2) - rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, - rxrpc_propose_ack_input_data); + if (atomic_read(&call->ackr_nr_unacked) > 2) { + if (call->peer->rtt_count < 3) + rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, + rxrpc_propose_ack_ping_for_rtt); + else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), + ktime_get_real())) + rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, + rxrpc_propose_ack_ping_for_old_rtt); + else + rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, + rxrpc_propose_ack_input_data); + } /* Make sure the timer is restarted */ if (!__rxrpc_call_is_complete(call)) { diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index a9746be29634..98b5d0db7761 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -253,12 +253,15 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, len); ret = do_udp_sendmsg(conn->local->socket, &msg, len); call->peer->last_tx_at = ktime_get_seconds(); - if (ret < 0) + if (ret < 0) { trace_rxrpc_tx_fail(call->debug_id, serial, ret, rxrpc_tx_point_call_ack); - else + } else { trace_rxrpc_tx_packet(call->debug_id, &txb->wire, rxrpc_tx_point_call_ack); + if (txb->wire.flags & RXRPC_REQUEST_ACK) + call->peer->rtt_last_req = ktime_get_real(); + } rxrpc_tx_backoff(call, ret); if (!__rxrpc_call_is_complete(call)) { -- cgit v1.2.3 From 5bbf953382bec6d3b7003e9389668c1d0863db31 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 17 Oct 2022 11:44:22 +0100 Subject: rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked call->ackr_window doesn't need to be atomic as ACK generation and ACK transmission are now done in the same thread, so drop the atomic64 handling and split it into two separate members. Similarly, call->ackr_nr_unacked doesn't need to be atomic now either. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- include/trace/events/rxrpc.h | 10 ++++++---- net/rxrpc/ar-internal.h | 5 +++-- net/rxrpc/call_event.c | 2 +- net/rxrpc/call_object.c | 3 ++- net/rxrpc/input.c | 14 +++++++------- net/rxrpc/output.c | 13 +++++-------- net/rxrpc/proc.c | 4 +--- net/rxrpc/recvmsg.c | 6 +++--- 8 files changed, 28 insertions(+), 29 deletions(-) (limited to 'net') diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 450b8f345814..e51a84f349d8 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -1152,7 +1152,8 @@ TRACE_EVENT(rxrpc_receive, __field(enum rxrpc_receive_trace, why) __field(rxrpc_serial_t, serial) __field(rxrpc_seq_t, seq) - __field(u64, window) + __field(rxrpc_seq_t, window) + __field(rxrpc_seq_t, wtop) ), TP_fast_assign( @@ -1160,7 +1161,8 @@ TRACE_EVENT(rxrpc_receive, __entry->why = why; __entry->serial = serial; __entry->seq = seq; - __entry->window = atomic64_read(&call->ackr_window); + __entry->window = call->ackr_window; + __entry->wtop = call->ackr_wtop; ), TP_printk("c=%08x %s r=%08x q=%08x w=%08x-%08x", @@ -1168,8 +1170,8 @@ TRACE_EVENT(rxrpc_receive, __print_symbolic(__entry->why, rxrpc_receive_traces), __entry->serial, __entry->seq, - lower_32_bits(__entry->window), - upper_32_bits(__entry->window)) + __entry->window, + __entry->wtop) ); TRACE_EVENT(rxrpc_recvmsg, diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index bfae4a87626f..2ca99688f7f0 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -692,8 +692,9 @@ struct rxrpc_call { /* Receive-phase ACK management (ACKs we send). */ u8 ackr_reason; /* reason to ACK */ rxrpc_serial_t ackr_serial; /* serial of packet being ACK'd */ - atomic64_t ackr_window; /* Base (in LSW) and top (in MSW) of SACK window */ - atomic_t ackr_nr_unacked; /* Number of unacked packets */ + rxrpc_seq_t ackr_window; /* Base of SACK window */ + rxrpc_seq_t ackr_wtop; /* Base of SACK window */ + unsigned int ackr_nr_unacked; /* Number of unacked packets */ atomic_t ackr_nr_consumed; /* Number of packets needing hard ACK */ struct { #define RXRPC_SACK_SIZE 256 diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index cf9799be4286..e363f21a2014 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -498,7 +498,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, rxrpc_propose_ack_rx_idle); - if (atomic_read(&call->ackr_nr_unacked) > 2) { + if (call->ackr_nr_unacked > 2) { if (call->peer->rtt_count < 3) rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, rxrpc_propose_ack_ping_for_rtt); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 0012589f2aad..6eaffb0d8fdc 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -167,7 +167,8 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, call->tx_total_len = -1; call->next_rx_timo = 20 * HZ; call->next_req_timo = 1 * HZ; - atomic64_set(&call->ackr_window, 0x100000001ULL); + call->ackr_window = 1; + call->ackr_wtop = 1; memset(&call->sock_node, 0xed, sizeof(call->sock_node)); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 367927a99881..7e65c7d5bff0 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -338,7 +338,8 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call, rxrpc_serial_t serial) static void rxrpc_input_update_ack_window(struct rxrpc_call *call, rxrpc_seq_t window, rxrpc_seq_t wtop) { - atomic64_set_release(&call->ackr_window, ((u64)wtop) << 32 | window); + call->ackr_window = window; + call->ackr_wtop = wtop; } /* @@ -367,9 +368,8 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct sk_buff *oos; rxrpc_serial_t serial = sp->hdr.serial; - u64 win = atomic64_read(&call->ackr_window); - rxrpc_seq_t window = lower_32_bits(win); - rxrpc_seq_t wtop = upper_32_bits(win); + rxrpc_seq_t window = call->ackr_window; + rxrpc_seq_t wtop = call->ackr_wtop; rxrpc_seq_t wlimit = window + call->rx_winsize - 1; rxrpc_seq_t seq = sp->hdr.seq; bool last = sp->hdr.flags & RXRPC_LAST_PACKET; @@ -419,7 +419,7 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, else if (!skb_queue_empty(&call->rx_oos_queue)) ack_reason = RXRPC_ACK_DELAY; else - atomic_inc_return(&call->ackr_nr_unacked); + call->ackr_nr_unacked++; window++; if (after(window, wtop)) @@ -567,8 +567,8 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_serial_t serial = sp->hdr.serial; rxrpc_seq_t seq0 = sp->hdr.seq; - _enter("{%llx,%x},{%u,%x}", - atomic64_read(&call->ackr_window), call->rx_highest_seq, + _enter("{%x,%x,%x},{%u,%x}", + call->ackr_window, call->ackr_wtop, call->rx_highest_seq, skb->len, seq0); if (__rxrpc_call_is_complete(call)) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 98b5d0db7761..b6bd5e6ccb4c 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -86,20 +86,18 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn, unsigned int qsize; rxrpc_seq_t window, wtop, wrap_point, ix, first; int rsize; - u64 wtmp; u32 mtu, jmax; u8 *ackp = txb->acks; u8 sack_buffer[sizeof(call->ackr_sack_table)] __aligned(8); - atomic_set(&call->ackr_nr_unacked, 0); + call->ackr_nr_unacked = 0; atomic_set(&call->ackr_nr_consumed, 0); rxrpc_inc_stat(call->rxnet, stat_tx_ack_fill); /* Barrier against rxrpc_input_data(). */ retry: - wtmp = atomic64_read_acquire(&call->ackr_window); - window = lower_32_bits(wtmp); - wtop = upper_32_bits(wtmp); + window = call->ackr_window; + wtop = call->ackr_wtop; txb->ack.firstPacket = htonl(window); txb->ack.nAcks = 0; @@ -111,9 +109,8 @@ retry: */ memcpy(sack_buffer, call->ackr_sack_table, sizeof(sack_buffer)); wrap_point = window + RXRPC_SACK_SIZE - 1; - wtmp = atomic64_read_acquire(&call->ackr_window); - window = lower_32_bits(wtmp); - wtop = upper_32_bits(wtmp); + window = call->ackr_window; + wtop = call->ackr_wtop; if (after(wtop, wrap_point)) { cond_resched(); goto retry; diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 750158a085cd..682636d3b060 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -55,7 +55,6 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) unsigned long timeout = 0; rxrpc_seq_t acks_hard_ack; char lbuff[50], rbuff[50]; - u64 wtmp; if (v == &rxnet->calls) { seq_puts(seq, @@ -83,7 +82,6 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) } acks_hard_ack = READ_ONCE(call->acks_hard_ack); - wtmp = atomic64_read_acquire(&call->ackr_window); seq_printf(seq, "UDP %-47.47s %-47.47s %4x %08x %08x %s %3u" " %-8.8s %08x %08x %08x %02x %08x %02x %08x %02x %06lx\n", @@ -98,7 +96,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) call->abort_code, call->debug_id, acks_hard_ack, READ_ONCE(call->tx_top) - acks_hard_ack, - lower_32_bits(wtmp), upper_32_bits(wtmp) - lower_32_bits(wtmp), + call->ackr_window, call->ackr_wtop - call->ackr_window, call->rx_serial, call->cong_cwnd, timeout); diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index b7545fdc0401..50d263a6359d 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -95,7 +95,7 @@ static int rxrpc_recvmsg_term(struct rxrpc_call *call, struct msghdr *msg) } trace_rxrpc_recvdata(call, rxrpc_recvmsg_terminal, - lower_32_bits(atomic64_read(&call->ackr_window)) - 1, + call->ackr_window - 1, call->rx_pkt_offset, call->rx_pkt_len, ret); return ret; } @@ -175,13 +175,13 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call, rx_pkt_len = call->rx_pkt_len; if (rxrpc_call_has_failed(call)) { - seq = lower_32_bits(atomic64_read(&call->ackr_window)) - 1; + seq = call->ackr_window - 1; ret = -EIO; goto done; } if (test_bit(RXRPC_CALL_RECVMSG_READ_ALL, &call->flags)) { - seq = lower_32_bits(atomic64_read(&call->ackr_window)) - 1; + seq = call->ackr_window - 1; ret = 1; goto done; } -- cgit v1.2.3 From f21e93485bcbfa2753d1447b6198604a2c3d57be Mon Sep 17 00:00:00 2001 From: David Howells Date: Sun, 16 Oct 2022 08:01:32 +0100 Subject: rxrpc: Simplify ACK handling Now that general ACK transmission is done from the same thread as incoming DATA packet wrangling, there's no possibility that the SACK table will be being updated by the latter whilst the former is trying to copy it to an ACK. This means that we can safely rotate the SACK table whilst updating it without having to take a lock, rather than keeping all the bits inside it in fixed place and copying and then rotating it in the transmitter. Therefore, simplify SACK handing by keeping track of starting point in the ring and rotate slots down as we consume them. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- include/trace/events/rxrpc.h | 36 ++++++++++++++++++++++++++++++++++ net/rxrpc/ar-internal.h | 1 + net/rxrpc/input.c | 46 ++++++++++++++++++++++---------------------- net/rxrpc/output.c | 46 +++++++++++++------------------------------- 4 files changed, 73 insertions(+), 56 deletions(-) (limited to 'net') diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index e51a84f349d8..b6adec9111e1 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -422,6 +422,13 @@ EM(RXRPC_ACK_IDLE, "IDL") \ E_(RXRPC_ACK__INVALID, "-?-") +#define rxrpc_sack_traces \ + EM(rxrpc_sack_advance, "ADV") \ + EM(rxrpc_sack_fill, "FIL") \ + EM(rxrpc_sack_nack, "NAK") \ + EM(rxrpc_sack_none, "---") \ + E_(rxrpc_sack_oos, "OOS") + #define rxrpc_completions \ EM(RXRPC_CALL_SUCCEEDED, "Succeeded") \ EM(RXRPC_CALL_REMOTELY_ABORTED, "RemoteAbort") \ @@ -497,6 +504,7 @@ enum rxrpc_recvmsg_trace { rxrpc_recvmsg_traces } __mode(byte); enum rxrpc_req_ack_trace { rxrpc_req_ack_traces } __mode(byte); enum rxrpc_rtt_rx_trace { rxrpc_rtt_rx_traces } __mode(byte); enum rxrpc_rtt_tx_trace { rxrpc_rtt_tx_traces } __mode(byte); +enum rxrpc_sack_trace { rxrpc_sack_traces } __mode(byte); enum rxrpc_skb_trace { rxrpc_skb_traces } __mode(byte); enum rxrpc_timer_trace { rxrpc_timer_traces } __mode(byte); enum rxrpc_tx_point { rxrpc_tx_points } __mode(byte); @@ -531,6 +539,7 @@ rxrpc_recvmsg_traces; rxrpc_req_ack_traces; rxrpc_rtt_rx_traces; rxrpc_rtt_tx_traces; +rxrpc_sack_traces; rxrpc_skb_traces; rxrpc_timer_traces; rxrpc_tx_points; @@ -1929,6 +1938,33 @@ TRACE_EVENT(rxrpc_call_poked, __entry->call_debug_id) ); +TRACE_EVENT(rxrpc_sack, + TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, + unsigned int sack, enum rxrpc_sack_trace what), + + TP_ARGS(call, seq, sack, what), + + TP_STRUCT__entry( + __field(unsigned int, call_debug_id) + __field(rxrpc_seq_t, seq) + __field(unsigned int, sack) + __field(enum rxrpc_sack_trace, what) + ), + + TP_fast_assign( + __entry->call_debug_id = call->debug_id; + __entry->seq = seq; + __entry->sack = sack; + __entry->what = what; + ), + + TP_printk("c=%08x q=%08x %s k=%x", + __entry->call_debug_id, + __entry->seq, + __print_symbolic(__entry->what, rxrpc_sack_traces), + __entry->sack) + ); + #undef EM #undef E_ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 2ca99688f7f0..2b1d0d3ca064 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -691,6 +691,7 @@ struct rxrpc_call { /* Receive-phase ACK management (ACKs we send). */ u8 ackr_reason; /* reason to ACK */ + u16 ackr_sack_base; /* Starting slot in SACK table ring */ rxrpc_serial_t ackr_serial; /* serial of packet being ACK'd */ rxrpc_seq_t ackr_window; /* Base of SACK window */ rxrpc_seq_t ackr_wtop; /* Base of SACK window */ diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 7e65c7d5bff0..d68848fce51f 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -368,6 +368,7 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, struct rxrpc_skb_priv *sp = rxrpc_skb(skb); struct sk_buff *oos; rxrpc_serial_t serial = sp->hdr.serial; + unsigned int sack = call->ackr_sack_base; rxrpc_seq_t window = call->ackr_window; rxrpc_seq_t wtop = call->ackr_wtop; rxrpc_seq_t wlimit = window + call->rx_winsize - 1; @@ -410,9 +411,6 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, /* Queue the packet. */ if (seq == window) { - rxrpc_seq_t reset_from; - bool reset_sack = false; - if (sp->hdr.flags & RXRPC_REQUEST_ACK) ack_reason = RXRPC_ACK_REQUESTED; /* Send an immediate ACK if we fill in a hole */ @@ -422,8 +420,14 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, call->ackr_nr_unacked++; window++; - if (after(window, wtop)) + if (after(window, wtop)) { + trace_rxrpc_sack(call, seq, sack, rxrpc_sack_none); wtop = window; + } else { + trace_rxrpc_sack(call, seq, sack, rxrpc_sack_advance); + sack = (sack + 1) % RXRPC_SACK_SIZE; + } + rxrpc_get_skb(skb, rxrpc_skb_get_to_recvmsg); @@ -440,43 +444,39 @@ static void rxrpc_input_data_one(struct rxrpc_call *call, struct sk_buff *skb, __skb_unlink(oos, &call->rx_oos_queue); last = osp->hdr.flags & RXRPC_LAST_PACKET; seq = osp->hdr.seq; - if (!reset_sack) { - reset_from = seq; - reset_sack = true; - } + call->ackr_sack_table[sack] = 0; + trace_rxrpc_sack(call, seq, sack, rxrpc_sack_fill); + sack = (sack + 1) % RXRPC_SACK_SIZE; window++; rxrpc_input_queue_data(call, oos, window, wtop, - rxrpc_receive_queue_oos); + rxrpc_receive_queue_oos); } spin_unlock(&call->recvmsg_queue.lock); - if (reset_sack) { - do { - call->ackr_sack_table[reset_from % RXRPC_SACK_SIZE] = 0; - } while (reset_from++, before(reset_from, window)); - } + call->ackr_sack_base = sack; } else { - bool keep = false; + unsigned int slot; ack_reason = RXRPC_ACK_OUT_OF_SEQUENCE; - if (!call->ackr_sack_table[seq % RXRPC_SACK_SIZE]) { - call->ackr_sack_table[seq % RXRPC_SACK_SIZE] = 1; - keep = 1; + slot = seq - window; + sack = (sack + slot) % RXRPC_SACK_SIZE; + + if (call->ackr_sack_table[sack % RXRPC_SACK_SIZE]) { + ack_reason = RXRPC_ACK_DUPLICATE; + goto send_ack; } + call->ackr_sack_table[sack % RXRPC_SACK_SIZE] |= 1; + trace_rxrpc_sack(call, seq, sack, rxrpc_sack_oos); + if (after(seq + 1, wtop)) { wtop = seq + 1; rxrpc_input_update_ack_window(call, window, wtop); } - if (!keep) { - ack_reason = RXRPC_ACK_DUPLICATE; - goto send_ack; - } - skb_queue_walk(&call->rx_oos_queue, oos) { struct rxrpc_skb_priv *osp = rxrpc_skb(oos); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index b6bd5e6ccb4c..c69c31470fa8 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -83,56 +83,36 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn, struct rxrpc_txbuf *txb) { struct rxrpc_ackinfo ackinfo; - unsigned int qsize; - rxrpc_seq_t window, wtop, wrap_point, ix, first; + unsigned int qsize, sack, wrap, to; + rxrpc_seq_t window, wtop; int rsize; u32 mtu, jmax; u8 *ackp = txb->acks; - u8 sack_buffer[sizeof(call->ackr_sack_table)] __aligned(8); call->ackr_nr_unacked = 0; atomic_set(&call->ackr_nr_consumed, 0); rxrpc_inc_stat(call->rxnet, stat_tx_ack_fill); + clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags); - /* Barrier against rxrpc_input_data(). */ -retry: window = call->ackr_window; wtop = call->ackr_wtop; + sack = call->ackr_sack_base % RXRPC_SACK_SIZE; txb->ack.firstPacket = htonl(window); - txb->ack.nAcks = 0; + txb->ack.nAcks = wtop - window; if (after(wtop, window)) { - /* Try to copy the SACK ring locklessly. We can use the copy, - * only if the now-current top of the window didn't go past the - * previously read base - otherwise we can't know whether we - * have old data or new data. - */ - memcpy(sack_buffer, call->ackr_sack_table, sizeof(sack_buffer)); - wrap_point = window + RXRPC_SACK_SIZE - 1; - window = call->ackr_window; - wtop = call->ackr_wtop; - if (after(wtop, wrap_point)) { - cond_resched(); - goto retry; - } - - /* The buffer is maintained as a ring with an invariant mapping - * between bit position and sequence number, so we'll probably - * need to rotate it. - */ - txb->ack.nAcks = wtop - window; - ix = window % RXRPC_SACK_SIZE; - first = sizeof(sack_buffer) - ix; + wrap = RXRPC_SACK_SIZE - sack; + to = min_t(unsigned int, txb->ack.nAcks, RXRPC_SACK_SIZE); - if (ix + txb->ack.nAcks <= RXRPC_SACK_SIZE) { - memcpy(txb->acks, sack_buffer + ix, txb->ack.nAcks); + if (sack + txb->ack.nAcks <= RXRPC_SACK_SIZE) { + memcpy(txb->acks, call->ackr_sack_table + sack, txb->ack.nAcks); } else { - memcpy(txb->acks, sack_buffer + ix, first); - memcpy(txb->acks + first, sack_buffer, - txb->ack.nAcks - first); + memcpy(txb->acks, call->ackr_sack_table + sack, wrap); + memcpy(txb->acks + wrap, call->ackr_sack_table, + to - wrap); } - ackp += txb->ack.nAcks; + ackp += to; } else if (before(wtop, window)) { pr_warn("ack window backward %x %x", window, wtop); } else if (txb->ack.reason == RXRPC_ACK_DELAY) { -- cgit v1.2.3 From b30d61f4b12899101f754238f02069ff4d6161c2 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 17 Oct 2022 22:48:58 +0100 Subject: rxrpc: Don't lock call->tx_lock to access call->tx_buffer call->tx_buffer is now only accessed within the I/O thread (->tx_sendmsg is the way sendmsg passes packets to the I/O thread) so there's no need to lock around it. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/txbuf.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/net/rxrpc/txbuf.c b/net/rxrpc/txbuf.c index d2cf2aac3adb..d43be8512386 100644 --- a/net/rxrpc/txbuf.c +++ b/net/rxrpc/txbuf.c @@ -110,12 +110,8 @@ void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *call) _enter("%x/%x/%x", call->tx_bottom, call->acks_hard_ack, call->tx_top); - for (;;) { - spin_lock(&call->tx_lock); - txb = list_first_entry_or_null(&call->tx_buffer, - struct rxrpc_txbuf, call_link); - if (!txb) - break; + while ((txb = list_first_entry_or_null(&call->tx_buffer, + struct rxrpc_txbuf, call_link))) { hard_ack = smp_load_acquire(&call->acks_hard_ack); if (before(hard_ack, txb->seq)) break; @@ -128,15 +124,11 @@ void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *call) trace_rxrpc_txqueue(call, rxrpc_txqueue_dequeue); - spin_unlock(&call->tx_lock); - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_rotated); if (after(call->acks_hard_ack, call->tx_bottom + 128)) wake = true; } - spin_unlock(&call->tx_lock); - if (wake) wake_up(&call->waitq); } -- cgit v1.2.3 From e7f40f4a701b67ada4a843bcc253711d8a34b1f1 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 17 Oct 2022 22:52:06 +0100 Subject: rxrpc: Remove local->defrag_sem We no longer need local->defrag_sem as all DATA packet transmission is now done from one thread, so remove it. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/ar-internal.h | 1 - net/rxrpc/local_object.c | 1 - net/rxrpc/output.c | 7 ------- 3 files changed, 9 deletions(-) (limited to 'net') diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 2b1d0d3ca064..9e19688b0e06 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -284,7 +284,6 @@ struct rxrpc_local { struct task_struct *io_thread; struct completion io_thread_ready; /* Indication that the I/O thread started */ struct rxrpc_sock *service; /* Service(s) listening on this endpoint */ - struct rw_semaphore defrag_sem; /* control re-enablement of IP DF bit */ #ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY struct sk_buff_head rx_delay_queue; /* Delay injection queue */ #endif diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 07d83a4e5841..7d910aee4f8c 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -108,7 +108,6 @@ static struct rxrpc_local *rxrpc_alloc_local(struct net *net, local->net = net; local->rxnet = rxrpc_net(net); INIT_HLIST_NODE(&local->link); - init_rwsem(&local->defrag_sem); init_completion(&local->io_thread_ready); #ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY skb_queue_head_init(&local->rx_delay_queue); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index c69c31470fa8..6b2022240076 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -409,8 +409,6 @@ dont_set_request_ack: if (txb->len >= call->peer->maxdata) goto send_fragmentable; - down_read(&conn->local->defrag_sem); - txb->last_sent = ktime_get_real(); if (txb->wire.flags & RXRPC_REQUEST_ACK) rtt_slot = rxrpc_begin_rtt_probe(call, serial, rxrpc_rtt_tx_data); @@ -425,7 +423,6 @@ dont_set_request_ack: ret = do_udp_sendmsg(conn->local->socket, &msg, len); conn->peer->last_tx_at = ktime_get_seconds(); - up_read(&conn->local->defrag_sem); if (ret < 0) { rxrpc_inc_stat(call->rxnet, stat_tx_data_send_fail); rxrpc_cancel_rtt_probe(call, serial, rtt_slot); @@ -486,8 +483,6 @@ send_fragmentable: /* attempt to send this message with fragmentation enabled */ _debug("send fragment"); - down_write(&conn->local->defrag_sem); - txb->last_sent = ktime_get_real(); if (txb->wire.flags & RXRPC_REQUEST_ACK) rtt_slot = rxrpc_begin_rtt_probe(call, serial, rxrpc_rtt_tx_data); @@ -519,8 +514,6 @@ send_fragmentable: rxrpc_tx_point_call_data_frag); } rxrpc_tx_backoff(call, ret); - - up_write(&conn->local->defrag_sem); goto done; } -- cgit v1.2.3 From f20fe3ff82b321287ba59cab12e4f34eec9a7e10 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 11 Nov 2022 22:01:04 +0000 Subject: rxrpc: Show consumed and freed packets as non-dropped in dropwatch Set a reason when freeing a packet that has been consumed such that dropwatch doesn't complain that it has been dropped. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/skbuff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c index ebe0c75e7b07..944320e65ea8 100644 --- a/net/rxrpc/skbuff.c +++ b/net/rxrpc/skbuff.c @@ -63,7 +63,7 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) if (skb) { int n = atomic_dec_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, why); - kfree_skb(skb); + kfree_skb_reason(skb, SKB_CONSUMED); } } @@ -78,6 +78,6 @@ void rxrpc_purge_queue(struct sk_buff_head *list) int n = atomic_dec_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, rxrpc_skb_put_purge); - kfree_skb(skb); + kfree_skb_reason(skb, SKB_CONSUMED); } } -- cgit v1.2.3 From 550130a0ce303f7cd754c7067b0a971ca179db63 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 16 Nov 2022 16:42:02 +0000 Subject: rxrpc: Kill service bundle Now that the bundle->channel_lock has been eliminated, we don't need the dummy service bundle anymore. It's purpose was purely to provide the channel_lock for service connections. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/conn_service.c | 7 ------- 1 file changed, 7 deletions(-) (limited to 'net') diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c index f30323de82bd..89ac05a711a4 100644 --- a/net/rxrpc/conn_service.c +++ b/net/rxrpc/conn_service.c @@ -8,11 +8,6 @@ #include #include "ar-internal.h" -static struct rxrpc_bundle rxrpc_service_dummy_bundle = { - .ref = REFCOUNT_INIT(1), - .debug_id = UINT_MAX, -}; - /* * Find a service connection under RCU conditions. * @@ -132,8 +127,6 @@ struct rxrpc_connection *rxrpc_prealloc_service_connection(struct rxrpc_net *rxn */ conn->state = RXRPC_CONN_SERVICE_PREALLOC; refcount_set(&conn->ref, 2); - conn->bundle = rxrpc_get_bundle(&rxrpc_service_dummy_bundle, - rxrpc_bundle_get_service_conn); atomic_inc(&rxnet->nr_conns); write_lock(&rxnet->conn_lock); -- cgit v1.2.3 From dac7f50a45216d652887fb92d6cd3b7ca7f006ea Mon Sep 17 00:00:00 2001 From: Alok Tiwari Date: Tue, 17 Jan 2023 07:45:49 -0800 Subject: netfilter: nf_tables: NULL pointer dereference in nf_tables_updobj() static analyzer detect null pointer dereference case for 'type' function __nft_obj_type_get() can return NULL value which require to handle if type is NULL pointer return -ENOENT. This is a theoretical issue, since an existing object has a type, but better add this failsafe check. Signed-off-by: Alok Tiwari Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'net') diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 974b95dece1d..2abf473c8f67 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7023,6 +7023,9 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info, return -EOPNOTSUPP; type = __nft_obj_type_get(objtype); + if (WARN_ON_ONCE(!type)) + return -ENOENT; + nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla); return nf_tables_updobj(&ctx, type, nla[NFTA_OBJ_DATA], obj); -- cgit v1.2.3 From 1fb7696ac6c3ac470dd002e639af80e7e170b25f Mon Sep 17 00:00:00 2001 From: Yang Yingliang Date: Thu, 19 Jan 2023 15:51:25 +0800 Subject: netfilter: nf_tables: fix wrong pointer passed to PTR_ERR() It should be 'chain' passed to PTR_ERR() in the error path after calling nft_chain_lookup() in nf_tables_delrule(). Fixes: f80a612dd77c ("netfilter: nf_tables: add support to destroy operation") Signed-off-by: Yang Yingliang Reviewed-by: Simon Horman Acked-by: Fernando Fernandez Mancera Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_tables_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 2abf473c8f67..d73edbd4eec4 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3724,7 +3724,7 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info, chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN], genmask); if (IS_ERR(chain)) { - if (PTR_ERR(rule) == -ENOENT && + if (PTR_ERR(chain) == -ENOENT && NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYRULE) return 0; -- cgit v1.2.3 From 28af0f009dded735f8d42cc85e95f9c3db212124 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 23 Jan 2023 13:04:33 +0100 Subject: netfilter: conntrack: udp: fix seen-reply test IPS_SEEN_REPLY_BIT is only useful for test_bit() api. Fixes: 4883ec512c17 ("netfilter: conntrack: avoid reload of ct->status") Reported-by: Roi Dayan Signed-off-by: Florian Westphal Reviewed-by: Roi Dayan Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_proto_udp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c index 6b9206635b24..0030fbe8885c 100644 --- a/net/netfilter/nf_conntrack_proto_udp.c +++ b/net/netfilter/nf_conntrack_proto_udp.c @@ -104,7 +104,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, /* If we've seen traffic both ways, this is some kind of UDP * stream. Set Assured. */ - if (status & IPS_SEEN_REPLY_BIT) { + if (status & IPS_SEEN_REPLY) { unsigned long extra = timeouts[UDP_CT_UNREPLIED]; bool stream = false; -- cgit v1.2.3 From f6477ec62fda59561826b29339a04097bfb46a80 Mon Sep 17 00:00:00 2001 From: Gavrilov Ilia Date: Mon, 23 Jan 2023 14:31:54 +0000 Subject: netfilter: conntrack: remote a return value of the 'seq_print_acct' function. The static 'seq_print_acct' function always returns 0. Change the return value to 'void' and remove unnecessary checks. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1ca9e41770cb ("netfilter: Remove uses of seq_ return values") Signed-off-by: Ilia.Gavrilov Reviewed-by: Leon Romanovsky Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_standalone.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c index 460294bd4b60..57f6724c99a7 100644 --- a/net/netfilter/nf_conntrack_standalone.c +++ b/net/netfilter/nf_conntrack_standalone.c @@ -275,7 +275,7 @@ static const char* l4proto_name(u16 proto) return "unknown"; } -static unsigned int +static void seq_print_acct(struct seq_file *s, const struct nf_conn *ct, int dir) { struct nf_conn_acct *acct; @@ -283,14 +283,12 @@ seq_print_acct(struct seq_file *s, const struct nf_conn *ct, int dir) acct = nf_conn_acct_find(ct); if (!acct) - return 0; + return; counter = acct->counter; seq_printf(s, "packets=%llu bytes=%llu ", (unsigned long long)atomic64_read(&counter[dir].packets), (unsigned long long)atomic64_read(&counter[dir].bytes)); - - return 0; } /* return 0 on success, 1 in case of error */ @@ -342,8 +340,7 @@ static int ct_seq_show(struct seq_file *s, void *v) if (seq_has_overflowed(s)) goto release; - if (seq_print_acct(s, ct, IP_CT_DIR_ORIGINAL)) - goto release; + seq_print_acct(s, ct, IP_CT_DIR_ORIGINAL); if (!(test_bit(IPS_SEEN_REPLY_BIT, &ct->status))) seq_puts(s, "[UNREPLIED] "); @@ -352,8 +349,7 @@ static int ct_seq_show(struct seq_file *s, void *v) ct_show_zone(s, ct, NF_CT_ZONE_DIR_REPL); - if (seq_print_acct(s, ct, IP_CT_DIR_REPLY)) - goto release; + seq_print_acct(s, ct, IP_CT_DIR_REPLY); if (test_bit(IPS_HW_OFFLOAD_BIT, &ct->status)) seq_puts(s, "[HW_OFFLOAD] "); -- cgit v1.2.3 From c3a4fd5718ea6756f2289f4d89468b54ad3d02aa Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 31 Jan 2023 10:06:11 +0100 Subject: devlink: rename devlink_nl_instance_iter_dump() to "dumpit" To have the name of the function consistent with the struct cb name, rename devlink_nl_instance_iter_dump() to devlink_nl_instance_iter_dumpit(). Signed-off-by: Jiri Pirko Reviewed-by: Jacob Keller Reviewed-by: Leon Romanovsky Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 4 ++-- net/devlink/leftover.c | 32 ++++++++++++++++---------------- net/devlink/netlink.c | 4 ++-- 3 files changed, 20 insertions(+), 20 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index ba161de4120e..dd4366c68b96 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -128,8 +128,8 @@ devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs); void devlink_notify_unregister(struct devlink *devlink); void devlink_notify_register(struct devlink *devlink); -int devlink_nl_instance_iter_dump(struct sk_buff *msg, - struct netlink_callback *cb); +int devlink_nl_instance_iter_dumpit(struct sk_buff *msg, + struct netlink_callback *cb); static inline struct devlink_nl_dump_state * devlink_dump_state(struct netlink_callback *cb) diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 92210587d349..1461eec423ff 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -8898,14 +8898,14 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { .cmd = DEVLINK_CMD_PORT_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_port_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, /* can be retrieved by unprivileged users */ }, @@ -8919,7 +8919,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_RATE_GET, .doit = devlink_nl_cmd_rate_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .internal_flags = DEVLINK_NL_FLAG_NEED_RATE, /* can be retrieved by unprivileged users */ }, @@ -8967,7 +8967,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_LINECARD_GET, .doit = devlink_nl_cmd_linecard_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD, /* can be retrieved by unprivileged users */ }, @@ -8981,14 +8981,14 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_SB_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { .cmd = DEVLINK_CMD_SB_POOL_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_pool_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { @@ -9001,7 +9001,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_SB_PORT_POOL_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_port_pool_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, /* can be retrieved by unprivileged users */ }, @@ -9016,7 +9016,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_SB_TC_POOL_BIND_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_sb_tc_pool_bind_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .internal_flags = DEVLINK_NL_FLAG_NEED_PORT, /* can be retrieved by unprivileged users */ }, @@ -9097,7 +9097,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_PARAM_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_param_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { @@ -9125,7 +9125,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_REGION_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_region_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .flags = GENL_ADMIN_PERM, }, { @@ -9151,14 +9151,14 @@ const struct genl_small_ops devlink_nl_ops[56] = { .cmd = DEVLINK_CMD_INFO_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_info_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { .cmd = DEVLINK_CMD_HEALTH_REPORTER_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = devlink_nl_cmd_health_reporter_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT, /* can be retrieved by unprivileged users */ }, @@ -9213,7 +9213,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_TRAP_GET, .doit = devlink_nl_cmd_trap_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { @@ -9224,7 +9224,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_TRAP_GROUP_GET, .doit = devlink_nl_cmd_trap_group_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { @@ -9235,7 +9235,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_TRAP_POLICER_GET, .doit = devlink_nl_cmd_trap_policer_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { @@ -9246,7 +9246,7 @@ const struct genl_small_ops devlink_nl_ops[56] = { { .cmd = DEVLINK_CMD_SELFTESTS_GET, .doit = devlink_nl_cmd_selftests_get_doit, - .dumpit = devlink_nl_instance_iter_dump, + .dumpit = devlink_nl_instance_iter_dumpit, /* can be retrieved by unprivileged users */ }, { diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 3f44633af01c..11666edf5cd2 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -196,8 +196,8 @@ static const struct devlink_gen_cmd *devl_gen_cmds[] = { [DEVLINK_CMD_SELFTESTS_GET] = &devl_gen_selftests, }; -int devlink_nl_instance_iter_dump(struct sk_buff *msg, - struct netlink_callback *cb) +int devlink_nl_instance_iter_dumpit(struct sk_buff *msg, + struct netlink_callback *cb) { const struct genl_dumpit_info *info = genl_dumpit_info(cb); struct devlink_nl_dump_state *state = devlink_dump_state(cb); -- cgit v1.2.3 From f87445953d4c1cbcaa110f95dfd64193756f7353 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 31 Jan 2023 10:06:12 +0100 Subject: devlink: remove "gen" from struct devlink_gen_cmd name No need to have "gen" inside name of the structure for devlink commands. Remove it. Signed-off-by: Jiri Pirko Reviewed-by: Leon Romanovsky Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 36 ++++++++++++++++++------------------ net/devlink/leftover.c | 32 ++++++++++++++++---------------- net/devlink/netlink.c | 4 ++-- 3 files changed, 36 insertions(+), 36 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index dd4366c68b96..3910db5547fe 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -115,7 +115,7 @@ struct devlink_nl_dump_state { }; }; -struct devlink_gen_cmd { +struct devlink_cmd { int (*dump_one)(struct sk_buff *msg, struct devlink *devlink, struct netlink_callback *cb); }; @@ -139,22 +139,22 @@ devlink_dump_state(struct netlink_callback *cb) return (struct devlink_nl_dump_state *)cb->ctx; } -/* gen cmds */ -extern const struct devlink_gen_cmd devl_gen_inst; -extern const struct devlink_gen_cmd devl_gen_port; -extern const struct devlink_gen_cmd devl_gen_sb; -extern const struct devlink_gen_cmd devl_gen_sb_pool; -extern const struct devlink_gen_cmd devl_gen_sb_port_pool; -extern const struct devlink_gen_cmd devl_gen_sb_tc_pool_bind; -extern const struct devlink_gen_cmd devl_gen_selftests; -extern const struct devlink_gen_cmd devl_gen_param; -extern const struct devlink_gen_cmd devl_gen_region; -extern const struct devlink_gen_cmd devl_gen_info; -extern const struct devlink_gen_cmd devl_gen_health_reporter; -extern const struct devlink_gen_cmd devl_gen_trap; -extern const struct devlink_gen_cmd devl_gen_trap_group; -extern const struct devlink_gen_cmd devl_gen_trap_policer; -extern const struct devlink_gen_cmd devl_gen_linecard; +/* Commands */ +extern const struct devlink_cmd devl_gen_inst; +extern const struct devlink_cmd devl_gen_port; +extern const struct devlink_cmd devl_gen_sb; +extern const struct devlink_cmd devl_gen_sb_pool; +extern const struct devlink_cmd devl_gen_sb_port_pool; +extern const struct devlink_cmd devl_gen_sb_tc_pool_bind; +extern const struct devlink_cmd devl_gen_selftests; +extern const struct devlink_cmd devl_gen_param; +extern const struct devlink_cmd devl_gen_region; +extern const struct devlink_cmd devl_gen_info; +extern const struct devlink_cmd devl_gen_health_reporter; +extern const struct devlink_cmd devl_gen_trap; +extern const struct devlink_cmd devl_gen_trap_group; +extern const struct devlink_cmd devl_gen_trap_policer; +extern const struct devlink_cmd devl_gen_linecard; /* Ports */ int devlink_port_netdevice_event(struct notifier_block *nb, @@ -182,7 +182,7 @@ struct devlink_linecard * devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info); /* Rates */ -extern const struct devlink_gen_cmd devl_gen_rate_get; +extern const struct devlink_cmd devl_gen_rate_get; struct devlink_rate * devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 1461eec423ff..16cb5975de1a 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1236,7 +1236,7 @@ devlink_nl_cmd_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_gen_cmd devl_gen_rate_get = { +const struct devlink_cmd devl_gen_rate_get = { .dump_one = devlink_nl_cmd_rate_get_dump_one, }; @@ -1303,7 +1303,7 @@ devlink_nl_cmd_get_dump_one(struct sk_buff *msg, struct devlink *devlink, cb->nlh->nlmsg_seq, NLM_F_MULTI); } -const struct devlink_gen_cmd devl_gen_inst = { +const struct devlink_cmd devl_gen_inst = { .dump_one = devlink_nl_cmd_get_dump_one, }; @@ -1359,7 +1359,7 @@ devlink_nl_cmd_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_gen_cmd devl_gen_port = { +const struct devlink_cmd devl_gen_port = { .dump_one = devlink_nl_cmd_port_get_dump_one, }; @@ -2137,7 +2137,7 @@ static int devlink_nl_cmd_linecard_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_gen_cmd devl_gen_linecard = { +const struct devlink_cmd devl_gen_linecard = { .dump_one = devlink_nl_cmd_linecard_get_dump_one, }; @@ -2392,7 +2392,7 @@ devlink_nl_cmd_sb_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_gen_cmd devl_gen_sb = { +const struct devlink_cmd devl_gen_sb = { .dump_one = devlink_nl_cmd_sb_get_dump_one, }; @@ -2530,7 +2530,7 @@ devlink_nl_cmd_sb_pool_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_gen_cmd devl_gen_sb_pool = { +const struct devlink_cmd devl_gen_sb_pool = { .dump_one = devlink_nl_cmd_sb_pool_get_dump_one, }; @@ -2738,7 +2738,7 @@ devlink_nl_cmd_sb_port_pool_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_gen_cmd devl_gen_sb_port_pool = { +const struct devlink_cmd devl_gen_sb_port_pool = { .dump_one = devlink_nl_cmd_sb_port_pool_get_dump_one, }; @@ -2973,7 +2973,7 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_gen_cmd devl_gen_sb_tc_pool_bind = { +const struct devlink_cmd devl_gen_sb_tc_pool_bind = { .dump_one = devlink_nl_cmd_sb_tc_pool_bind_get_dump_one, }; @@ -4785,7 +4785,7 @@ devlink_nl_cmd_selftests_get_dump_one(struct sk_buff *msg, cb->extack); } -const struct devlink_gen_cmd devl_gen_selftests = { +const struct devlink_cmd devl_gen_selftests = { .dump_one = devlink_nl_cmd_selftests_get_dump_one, }; @@ -5271,7 +5271,7 @@ devlink_nl_cmd_param_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_gen_cmd devl_gen_param = { +const struct devlink_cmd devl_gen_param = { .dump_one = devlink_nl_cmd_param_get_dump_one, }; @@ -5978,7 +5978,7 @@ devlink_nl_cmd_region_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return 0; } -const struct devlink_gen_cmd devl_gen_region = { +const struct devlink_cmd devl_gen_region = { .dump_one = devlink_nl_cmd_region_get_dump_one, }; @@ -6625,7 +6625,7 @@ devlink_nl_cmd_info_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_gen_cmd devl_gen_info = { +const struct devlink_cmd devl_gen_info = { .dump_one = devlink_nl_cmd_info_get_dump_one, }; @@ -7793,7 +7793,7 @@ devlink_nl_cmd_health_reporter_get_dump_one(struct sk_buff *msg, return 0; } -const struct devlink_gen_cmd devl_gen_health_reporter = { +const struct devlink_cmd devl_gen_health_reporter = { .dump_one = devlink_nl_cmd_health_reporter_get_dump_one, }; @@ -8311,7 +8311,7 @@ devlink_nl_cmd_trap_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_gen_cmd devl_gen_trap = { +const struct devlink_cmd devl_gen_trap = { .dump_one = devlink_nl_cmd_trap_get_dump_one, }; @@ -8524,7 +8524,7 @@ devlink_nl_cmd_trap_group_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_gen_cmd devl_gen_trap_group = { +const struct devlink_cmd devl_gen_trap_group = { .dump_one = devlink_nl_cmd_trap_group_get_dump_one, }; @@ -8817,7 +8817,7 @@ devlink_nl_cmd_trap_policer_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_gen_cmd devl_gen_trap_policer = { +const struct devlink_cmd devl_gen_trap_policer = { .dump_one = devlink_nl_cmd_trap_policer_get_dump_one, }; diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 11666edf5cd2..33ed3984f3cb 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -177,7 +177,7 @@ static void devlink_nl_post_doit(const struct genl_split_ops *ops, devlink_put(devlink); } -static const struct devlink_gen_cmd *devl_gen_cmds[] = { +static const struct devlink_cmd *devl_gen_cmds[] = { [DEVLINK_CMD_GET] = &devl_gen_inst, [DEVLINK_CMD_PORT_GET] = &devl_gen_port, [DEVLINK_CMD_SB_GET] = &devl_gen_sb, @@ -201,7 +201,7 @@ int devlink_nl_instance_iter_dumpit(struct sk_buff *msg, { const struct genl_dumpit_info *info = genl_dumpit_info(cb); struct devlink_nl_dump_state *state = devlink_dump_state(cb); - const struct devlink_gen_cmd *cmd; + const struct devlink_cmd *cmd; struct devlink *devlink; int err = 0; -- cgit v1.2.3 From 8589ba4e642aa257fa46ee82c975921e19db47d5 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 31 Jan 2023 10:06:13 +0100 Subject: devlink: rename and reorder instances of struct devlink_cmd In order to maintain naming consistency, rename and reorder all usages of struct struct devlink_cmd in the following way: 1) Remove "gen" and replace it with "cmd" to match the struct name 2) Order devl_cmds[] and the header file to match the order of enum devlink_command 3) Move devl_cmd_rate_get among the peers 4) Remove "inst" for DEVLINK_CMD_GET 5) Add "_get" suffix to all to match DEVLINK_CMD_*_GET (only rate had it done correctly) Signed-off-by: Jiri Pirko Reviewed-by: Leon Romanovsky Reviewed-by: Jacob Keller Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 33 ++++++++++++++++----------------- net/devlink/leftover.c | 32 ++++++++++++++++---------------- net/devlink/netlink.c | 36 ++++++++++++++++++------------------ 3 files changed, 50 insertions(+), 51 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 3910db5547fe..bdd7ad25c7e8 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -140,21 +140,22 @@ devlink_dump_state(struct netlink_callback *cb) } /* Commands */ -extern const struct devlink_cmd devl_gen_inst; -extern const struct devlink_cmd devl_gen_port; -extern const struct devlink_cmd devl_gen_sb; -extern const struct devlink_cmd devl_gen_sb_pool; -extern const struct devlink_cmd devl_gen_sb_port_pool; -extern const struct devlink_cmd devl_gen_sb_tc_pool_bind; -extern const struct devlink_cmd devl_gen_selftests; -extern const struct devlink_cmd devl_gen_param; -extern const struct devlink_cmd devl_gen_region; -extern const struct devlink_cmd devl_gen_info; -extern const struct devlink_cmd devl_gen_health_reporter; -extern const struct devlink_cmd devl_gen_trap; -extern const struct devlink_cmd devl_gen_trap_group; -extern const struct devlink_cmd devl_gen_trap_policer; -extern const struct devlink_cmd devl_gen_linecard; +extern const struct devlink_cmd devl_cmd_get; +extern const struct devlink_cmd devl_cmd_port_get; +extern const struct devlink_cmd devl_cmd_sb_get; +extern const struct devlink_cmd devl_cmd_sb_pool_get; +extern const struct devlink_cmd devl_cmd_sb_port_pool_get; +extern const struct devlink_cmd devl_cmd_sb_tc_pool_bind_get; +extern const struct devlink_cmd devl_cmd_param_get; +extern const struct devlink_cmd devl_cmd_region_get; +extern const struct devlink_cmd devl_cmd_info_get; +extern const struct devlink_cmd devl_cmd_health_reporter_get; +extern const struct devlink_cmd devl_cmd_trap_get; +extern const struct devlink_cmd devl_cmd_trap_group_get; +extern const struct devlink_cmd devl_cmd_trap_policer_get; +extern const struct devlink_cmd devl_cmd_rate_get; +extern const struct devlink_cmd devl_cmd_linecard_get; +extern const struct devlink_cmd devl_cmd_selftests_get; /* Ports */ int devlink_port_netdevice_event(struct notifier_block *nb, @@ -182,8 +183,6 @@ struct devlink_linecard * devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info); /* Rates */ -extern const struct devlink_cmd devl_gen_rate_get; - struct devlink_rate * devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info); struct devlink_rate * diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 16cb5975de1a..056d9ca14a3d 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1236,7 +1236,7 @@ devlink_nl_cmd_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_cmd devl_gen_rate_get = { +const struct devlink_cmd devl_cmd_rate_get = { .dump_one = devlink_nl_cmd_rate_get_dump_one, }; @@ -1303,7 +1303,7 @@ devlink_nl_cmd_get_dump_one(struct sk_buff *msg, struct devlink *devlink, cb->nlh->nlmsg_seq, NLM_F_MULTI); } -const struct devlink_cmd devl_gen_inst = { +const struct devlink_cmd devl_cmd_get = { .dump_one = devlink_nl_cmd_get_dump_one, }; @@ -1359,7 +1359,7 @@ devlink_nl_cmd_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_cmd devl_gen_port = { +const struct devlink_cmd devl_cmd_port_get = { .dump_one = devlink_nl_cmd_port_get_dump_one, }; @@ -2137,7 +2137,7 @@ static int devlink_nl_cmd_linecard_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_cmd devl_gen_linecard = { +const struct devlink_cmd devl_cmd_linecard_get = { .dump_one = devlink_nl_cmd_linecard_get_dump_one, }; @@ -2392,7 +2392,7 @@ devlink_nl_cmd_sb_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_cmd devl_gen_sb = { +const struct devlink_cmd devl_cmd_sb_get = { .dump_one = devlink_nl_cmd_sb_get_dump_one, }; @@ -2530,7 +2530,7 @@ devlink_nl_cmd_sb_pool_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_cmd devl_gen_sb_pool = { +const struct devlink_cmd devl_cmd_sb_pool_get = { .dump_one = devlink_nl_cmd_sb_pool_get_dump_one, }; @@ -2738,7 +2738,7 @@ devlink_nl_cmd_sb_port_pool_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_cmd devl_gen_sb_port_pool = { +const struct devlink_cmd devl_cmd_sb_port_pool_get = { .dump_one = devlink_nl_cmd_sb_port_pool_get_dump_one, }; @@ -2973,7 +2973,7 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_cmd devl_gen_sb_tc_pool_bind = { +const struct devlink_cmd devl_cmd_sb_tc_pool_bind_get = { .dump_one = devlink_nl_cmd_sb_tc_pool_bind_get_dump_one, }; @@ -4785,7 +4785,7 @@ devlink_nl_cmd_selftests_get_dump_one(struct sk_buff *msg, cb->extack); } -const struct devlink_cmd devl_gen_selftests = { +const struct devlink_cmd devl_cmd_selftests_get = { .dump_one = devlink_nl_cmd_selftests_get_dump_one, }; @@ -5271,7 +5271,7 @@ devlink_nl_cmd_param_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_cmd devl_gen_param = { +const struct devlink_cmd devl_cmd_param_get = { .dump_one = devlink_nl_cmd_param_get_dump_one, }; @@ -5978,7 +5978,7 @@ devlink_nl_cmd_region_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return 0; } -const struct devlink_cmd devl_gen_region = { +const struct devlink_cmd devl_cmd_region_get = { .dump_one = devlink_nl_cmd_region_get_dump_one, }; @@ -6625,7 +6625,7 @@ devlink_nl_cmd_info_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_cmd devl_gen_info = { +const struct devlink_cmd devl_cmd_info_get = { .dump_one = devlink_nl_cmd_info_get_dump_one, }; @@ -7793,7 +7793,7 @@ devlink_nl_cmd_health_reporter_get_dump_one(struct sk_buff *msg, return 0; } -const struct devlink_cmd devl_gen_health_reporter = { +const struct devlink_cmd devl_cmd_health_reporter_get = { .dump_one = devlink_nl_cmd_health_reporter_get_dump_one, }; @@ -8311,7 +8311,7 @@ devlink_nl_cmd_trap_get_dump_one(struct sk_buff *msg, struct devlink *devlink, return err; } -const struct devlink_cmd devl_gen_trap = { +const struct devlink_cmd devl_cmd_trap_get = { .dump_one = devlink_nl_cmd_trap_get_dump_one, }; @@ -8524,7 +8524,7 @@ devlink_nl_cmd_trap_group_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_cmd devl_gen_trap_group = { +const struct devlink_cmd devl_cmd_trap_group_get = { .dump_one = devlink_nl_cmd_trap_group_get_dump_one, }; @@ -8817,7 +8817,7 @@ devlink_nl_cmd_trap_policer_get_dump_one(struct sk_buff *msg, return err; } -const struct devlink_cmd devl_gen_trap_policer = { +const struct devlink_cmd devl_cmd_trap_policer_get = { .dump_one = devlink_nl_cmd_trap_policer_get_dump_one, }; diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 33ed3984f3cb..7a332eb70f70 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -177,23 +177,23 @@ static void devlink_nl_post_doit(const struct genl_split_ops *ops, devlink_put(devlink); } -static const struct devlink_cmd *devl_gen_cmds[] = { - [DEVLINK_CMD_GET] = &devl_gen_inst, - [DEVLINK_CMD_PORT_GET] = &devl_gen_port, - [DEVLINK_CMD_SB_GET] = &devl_gen_sb, - [DEVLINK_CMD_SB_POOL_GET] = &devl_gen_sb_pool, - [DEVLINK_CMD_SB_PORT_POOL_GET] = &devl_gen_sb_port_pool, - [DEVLINK_CMD_SB_TC_POOL_BIND_GET] = &devl_gen_sb_tc_pool_bind, - [DEVLINK_CMD_PARAM_GET] = &devl_gen_param, - [DEVLINK_CMD_REGION_GET] = &devl_gen_region, - [DEVLINK_CMD_INFO_GET] = &devl_gen_info, - [DEVLINK_CMD_HEALTH_REPORTER_GET] = &devl_gen_health_reporter, - [DEVLINK_CMD_RATE_GET] = &devl_gen_rate_get, - [DEVLINK_CMD_TRAP_GET] = &devl_gen_trap, - [DEVLINK_CMD_TRAP_GROUP_GET] = &devl_gen_trap_group, - [DEVLINK_CMD_TRAP_POLICER_GET] = &devl_gen_trap_policer, - [DEVLINK_CMD_LINECARD_GET] = &devl_gen_linecard, - [DEVLINK_CMD_SELFTESTS_GET] = &devl_gen_selftests, +static const struct devlink_cmd *devl_cmds[] = { + [DEVLINK_CMD_GET] = &devl_cmd_get, + [DEVLINK_CMD_PORT_GET] = &devl_cmd_port_get, + [DEVLINK_CMD_SB_GET] = &devl_cmd_sb_get, + [DEVLINK_CMD_SB_POOL_GET] = &devl_cmd_sb_pool_get, + [DEVLINK_CMD_SB_PORT_POOL_GET] = &devl_cmd_sb_port_pool_get, + [DEVLINK_CMD_SB_TC_POOL_BIND_GET] = &devl_cmd_sb_tc_pool_bind_get, + [DEVLINK_CMD_PARAM_GET] = &devl_cmd_param_get, + [DEVLINK_CMD_REGION_GET] = &devl_cmd_region_get, + [DEVLINK_CMD_INFO_GET] = &devl_cmd_info_get, + [DEVLINK_CMD_HEALTH_REPORTER_GET] = &devl_cmd_health_reporter_get, + [DEVLINK_CMD_TRAP_GET] = &devl_cmd_trap_get, + [DEVLINK_CMD_TRAP_GROUP_GET] = &devl_cmd_trap_group_get, + [DEVLINK_CMD_TRAP_POLICER_GET] = &devl_cmd_trap_policer_get, + [DEVLINK_CMD_RATE_GET] = &devl_cmd_rate_get, + [DEVLINK_CMD_LINECARD_GET] = &devl_cmd_linecard_get, + [DEVLINK_CMD_SELFTESTS_GET] = &devl_cmd_selftests_get, }; int devlink_nl_instance_iter_dumpit(struct sk_buff *msg, @@ -205,7 +205,7 @@ int devlink_nl_instance_iter_dumpit(struct sk_buff *msg, struct devlink *devlink; int err = 0; - cmd = devl_gen_cmds[info->op.cmd]; + cmd = devl_cmds[info->op.cmd]; while ((devlink = devlinks_xa_find_get(sock_net(msg->sk), &state->instance))) { -- cgit v1.2.3 From 400031e05adfcef9e80eca80bdfc3f4b63658be4 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Wed, 1 Feb 2023 11:30:15 -0600 Subject: bpf: Add __bpf_kfunc tag to all kfuncs Now that we have the __bpf_kfunc tag, we should use add it to all existing kfuncs to ensure that they'll never be elided in LTO builds. Signed-off-by: David Vernet Signed-off-by: Daniel Borkmann Acked-by: Stanislav Fomichev Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com --- kernel/bpf/cpumask.c | 60 +++++++++++----------- kernel/bpf/helpers.c | 38 +++++++------- kernel/cgroup/rstat.c | 4 +- kernel/kexec_core.c | 3 +- kernel/trace/bpf_trace.c | 8 +-- net/bpf/test_run.c | 55 ++++++++++---------- net/core/xdp.c | 5 +- net/ipv4/tcp_bbr.c | 16 +++--- net/ipv4/tcp_cong.c | 10 ++-- net/ipv4/tcp_cubic.c | 12 ++--- net/ipv4/tcp_dctcp.c | 12 ++--- net/netfilter/nf_conntrack_bpf.c | 20 ++++---- net/netfilter/nf_nat_bpf.c | 6 +-- net/xfrm/xfrm_interface_bpf.c | 7 +-- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 2 +- 15 files changed, 130 insertions(+), 128 deletions(-) (limited to 'net') diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c index 6bbb67dfc998..52b981512a35 100644 --- a/kernel/bpf/cpumask.c +++ b/kernel/bpf/cpumask.c @@ -48,7 +48,7 @@ __diag_ignore_all("-Wmissing-prototypes", * bpf_cpumask_create() allocates memory using the BPF memory allocator, and * will not block. It may return NULL if no memory is available. */ -struct bpf_cpumask *bpf_cpumask_create(void) +__bpf_kfunc struct bpf_cpumask *bpf_cpumask_create(void) { struct bpf_cpumask *cpumask; @@ -74,7 +74,7 @@ struct bpf_cpumask *bpf_cpumask_create(void) * must either be embedded in a map as a kptr, or freed with * bpf_cpumask_release(). */ -struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask) +__bpf_kfunc struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask) { refcount_inc(&cpumask->usage); return cpumask; @@ -90,7 +90,7 @@ struct bpf_cpumask *bpf_cpumask_acquire(struct bpf_cpumask *cpumask) * kptr, or freed with bpf_cpumask_release(). This function may return NULL if * no BPF cpumask was found in the specified map value. */ -struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumaskp) +__bpf_kfunc struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumaskp) { struct bpf_cpumask *cpumask; @@ -116,7 +116,7 @@ struct bpf_cpumask *bpf_cpumask_kptr_get(struct bpf_cpumask **cpumaskp) * reference of the BPF cpumask has been released, it is subsequently freed in * an RCU callback in the BPF memory allocator. */ -void bpf_cpumask_release(struct bpf_cpumask *cpumask) +__bpf_kfunc void bpf_cpumask_release(struct bpf_cpumask *cpumask) { if (!cpumask) return; @@ -135,7 +135,7 @@ void bpf_cpumask_release(struct bpf_cpumask *cpumask) * Find the index of the first nonzero bit of the cpumask. A struct bpf_cpumask * pointer may be safely passed to this function. */ -u32 bpf_cpumask_first(const struct cpumask *cpumask) +__bpf_kfunc u32 bpf_cpumask_first(const struct cpumask *cpumask) { return cpumask_first(cpumask); } @@ -148,7 +148,7 @@ u32 bpf_cpumask_first(const struct cpumask *cpumask) * Find the index of the first unset bit of the cpumask. A struct bpf_cpumask * pointer may be safely passed to this function. */ -u32 bpf_cpumask_first_zero(const struct cpumask *cpumask) +__bpf_kfunc u32 bpf_cpumask_first_zero(const struct cpumask *cpumask) { return cpumask_first_zero(cpumask); } @@ -158,7 +158,7 @@ u32 bpf_cpumask_first_zero(const struct cpumask *cpumask) * @cpu: The CPU to be set in the cpumask. * @cpumask: The BPF cpumask in which a bit is being set. */ -void bpf_cpumask_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) +__bpf_kfunc void bpf_cpumask_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) { if (!cpu_valid(cpu)) return; @@ -171,7 +171,7 @@ void bpf_cpumask_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) * @cpu: The CPU to be cleared from the cpumask. * @cpumask: The BPF cpumask in which a bit is being cleared. */ -void bpf_cpumask_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) +__bpf_kfunc void bpf_cpumask_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) { if (!cpu_valid(cpu)) return; @@ -188,7 +188,7 @@ void bpf_cpumask_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) * * true - @cpu is set in the cpumask * * false - @cpu was not set in the cpumask, or @cpu is an invalid cpu. */ -bool bpf_cpumask_test_cpu(u32 cpu, const struct cpumask *cpumask) +__bpf_kfunc bool bpf_cpumask_test_cpu(u32 cpu, const struct cpumask *cpumask) { if (!cpu_valid(cpu)) return false; @@ -205,7 +205,7 @@ bool bpf_cpumask_test_cpu(u32 cpu, const struct cpumask *cpumask) * * true - @cpu is set in the cpumask * * false - @cpu was not set in the cpumask, or @cpu is invalid. */ -bool bpf_cpumask_test_and_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) +__bpf_kfunc bool bpf_cpumask_test_and_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) { if (!cpu_valid(cpu)) return false; @@ -223,7 +223,7 @@ bool bpf_cpumask_test_and_set_cpu(u32 cpu, struct bpf_cpumask *cpumask) * * true - @cpu is set in the cpumask * * false - @cpu was not set in the cpumask, or @cpu is invalid. */ -bool bpf_cpumask_test_and_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) +__bpf_kfunc bool bpf_cpumask_test_and_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) { if (!cpu_valid(cpu)) return false; @@ -235,7 +235,7 @@ bool bpf_cpumask_test_and_clear_cpu(u32 cpu, struct bpf_cpumask *cpumask) * bpf_cpumask_setall() - Set all of the bits in a BPF cpumask. * @cpumask: The BPF cpumask having all of its bits set. */ -void bpf_cpumask_setall(struct bpf_cpumask *cpumask) +__bpf_kfunc void bpf_cpumask_setall(struct bpf_cpumask *cpumask) { cpumask_setall((struct cpumask *)cpumask); } @@ -244,7 +244,7 @@ void bpf_cpumask_setall(struct bpf_cpumask *cpumask) * bpf_cpumask_clear() - Clear all of the bits in a BPF cpumask. * @cpumask: The BPF cpumask being cleared. */ -void bpf_cpumask_clear(struct bpf_cpumask *cpumask) +__bpf_kfunc void bpf_cpumask_clear(struct bpf_cpumask *cpumask) { cpumask_clear((struct cpumask *)cpumask); } @@ -261,9 +261,9 @@ void bpf_cpumask_clear(struct bpf_cpumask *cpumask) * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -bool bpf_cpumask_and(struct bpf_cpumask *dst, - const struct cpumask *src1, - const struct cpumask *src2) +__bpf_kfunc bool bpf_cpumask_and(struct bpf_cpumask *dst, + const struct cpumask *src1, + const struct cpumask *src2) { return cpumask_and((struct cpumask *)dst, src1, src2); } @@ -276,9 +276,9 @@ bool bpf_cpumask_and(struct bpf_cpumask *dst, * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -void bpf_cpumask_or(struct bpf_cpumask *dst, - const struct cpumask *src1, - const struct cpumask *src2) +__bpf_kfunc void bpf_cpumask_or(struct bpf_cpumask *dst, + const struct cpumask *src1, + const struct cpumask *src2) { cpumask_or((struct cpumask *)dst, src1, src2); } @@ -291,9 +291,9 @@ void bpf_cpumask_or(struct bpf_cpumask *dst, * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -void bpf_cpumask_xor(struct bpf_cpumask *dst, - const struct cpumask *src1, - const struct cpumask *src2) +__bpf_kfunc void bpf_cpumask_xor(struct bpf_cpumask *dst, + const struct cpumask *src1, + const struct cpumask *src2) { cpumask_xor((struct cpumask *)dst, src1, src2); } @@ -309,7 +309,7 @@ void bpf_cpumask_xor(struct bpf_cpumask *dst, * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -bool bpf_cpumask_equal(const struct cpumask *src1, const struct cpumask *src2) +__bpf_kfunc bool bpf_cpumask_equal(const struct cpumask *src1, const struct cpumask *src2) { return cpumask_equal(src1, src2); } @@ -325,7 +325,7 @@ bool bpf_cpumask_equal(const struct cpumask *src1, const struct cpumask *src2) * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -bool bpf_cpumask_intersects(const struct cpumask *src1, const struct cpumask *src2) +__bpf_kfunc bool bpf_cpumask_intersects(const struct cpumask *src1, const struct cpumask *src2) { return cpumask_intersects(src1, src2); } @@ -341,7 +341,7 @@ bool bpf_cpumask_intersects(const struct cpumask *src1, const struct cpumask *sr * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -bool bpf_cpumask_subset(const struct cpumask *src1, const struct cpumask *src2) +__bpf_kfunc bool bpf_cpumask_subset(const struct cpumask *src1, const struct cpumask *src2) { return cpumask_subset(src1, src2); } @@ -356,7 +356,7 @@ bool bpf_cpumask_subset(const struct cpumask *src1, const struct cpumask *src2) * * A struct bpf_cpumask pointer may be safely passed to @cpumask. */ -bool bpf_cpumask_empty(const struct cpumask *cpumask) +__bpf_kfunc bool bpf_cpumask_empty(const struct cpumask *cpumask) { return cpumask_empty(cpumask); } @@ -371,7 +371,7 @@ bool bpf_cpumask_empty(const struct cpumask *cpumask) * * A struct bpf_cpumask pointer may be safely passed to @cpumask. */ -bool bpf_cpumask_full(const struct cpumask *cpumask) +__bpf_kfunc bool bpf_cpumask_full(const struct cpumask *cpumask) { return cpumask_full(cpumask); } @@ -383,7 +383,7 @@ bool bpf_cpumask_full(const struct cpumask *cpumask) * * A struct bpf_cpumask pointer may be safely passed to @src. */ -void bpf_cpumask_copy(struct bpf_cpumask *dst, const struct cpumask *src) +__bpf_kfunc void bpf_cpumask_copy(struct bpf_cpumask *dst, const struct cpumask *src) { cpumask_copy((struct cpumask *)dst, src); } @@ -398,7 +398,7 @@ void bpf_cpumask_copy(struct bpf_cpumask *dst, const struct cpumask *src) * * A struct bpf_cpumask pointer may be safely passed to @src. */ -u32 bpf_cpumask_any(const struct cpumask *cpumask) +__bpf_kfunc u32 bpf_cpumask_any(const struct cpumask *cpumask) { return cpumask_any(cpumask); } @@ -415,7 +415,7 @@ u32 bpf_cpumask_any(const struct cpumask *cpumask) * * struct bpf_cpumask pointers may be safely passed to @src1 and @src2. */ -u32 bpf_cpumask_any_and(const struct cpumask *src1, const struct cpumask *src2) +__bpf_kfunc u32 bpf_cpumask_any_and(const struct cpumask *src1, const struct cpumask *src2) { return cpumask_any_and(src1, src2); } diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 458db2db2f81..2dae44581922 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1776,7 +1776,7 @@ __diag_push(); __diag_ignore_all("-Wmissing-prototypes", "Global functions as their definitions will be in vmlinux BTF"); -void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign) +__bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign) { struct btf_struct_meta *meta = meta__ign; u64 size = local_type_id__k; @@ -1790,7 +1790,7 @@ void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign) return p; } -void bpf_obj_drop_impl(void *p__alloc, void *meta__ign) +__bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign) { struct btf_struct_meta *meta = meta__ign; void *p = p__alloc; @@ -1811,12 +1811,12 @@ static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *hea tail ? list_add_tail(n, h) : list_add(n, h); } -void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node) +__bpf_kfunc void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node) { return __bpf_list_add(node, head, false); } -void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node) +__bpf_kfunc void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node) { return __bpf_list_add(node, head, true); } @@ -1834,12 +1834,12 @@ static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tai return (struct bpf_list_node *)n; } -struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) +__bpf_kfunc struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) { return __bpf_list_del(head, false); } -struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) +__bpf_kfunc struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) { return __bpf_list_del(head, true); } @@ -1850,7 +1850,7 @@ struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) * bpf_task_release(). * @p: The task on which a reference is being acquired. */ -struct task_struct *bpf_task_acquire(struct task_struct *p) +__bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p) { return get_task_struct(p); } @@ -1861,7 +1861,7 @@ struct task_struct *bpf_task_acquire(struct task_struct *p) * released by calling bpf_task_release(). * @p: The task on which a reference is being acquired. */ -struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p) +__bpf_kfunc struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p) { /* For the time being this function returns NULL, as it's not currently * possible to safely acquire a reference to a task with RCU protection @@ -1913,7 +1913,7 @@ struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p) * be released by calling bpf_task_release(). * @pp: A pointer to a task kptr on which a reference is being acquired. */ -struct task_struct *bpf_task_kptr_get(struct task_struct **pp) +__bpf_kfunc struct task_struct *bpf_task_kptr_get(struct task_struct **pp) { /* We must return NULL here until we have clarity on how to properly * leverage RCU for ensuring a task's lifetime. See the comment above @@ -1926,7 +1926,7 @@ struct task_struct *bpf_task_kptr_get(struct task_struct **pp) * bpf_task_release - Release the reference acquired on a task. * @p: The task on which a reference is being released. */ -void bpf_task_release(struct task_struct *p) +__bpf_kfunc void bpf_task_release(struct task_struct *p) { if (!p) return; @@ -1941,7 +1941,7 @@ void bpf_task_release(struct task_struct *p) * calling bpf_cgroup_release(). * @cgrp: The cgroup on which a reference is being acquired. */ -struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) +__bpf_kfunc struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) { cgroup_get(cgrp); return cgrp; @@ -1953,7 +1953,7 @@ struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) * be released by calling bpf_cgroup_release(). * @cgrpp: A pointer to a cgroup kptr on which a reference is being acquired. */ -struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp) +__bpf_kfunc struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp) { struct cgroup *cgrp; @@ -1985,7 +1985,7 @@ struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp) * drops to 0. * @cgrp: The cgroup on which a reference is being released. */ -void bpf_cgroup_release(struct cgroup *cgrp) +__bpf_kfunc void bpf_cgroup_release(struct cgroup *cgrp) { if (!cgrp) return; @@ -2000,7 +2000,7 @@ void bpf_cgroup_release(struct cgroup *cgrp) * @cgrp: The cgroup for which we're performing a lookup. * @level: The level of ancestor to look up. */ -struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) +__bpf_kfunc struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) { struct cgroup *ancestor; @@ -2019,7 +2019,7 @@ struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) * stored in a map, or released with bpf_task_release(). * @pid: The pid of the task being looked up. */ -struct task_struct *bpf_task_from_pid(s32 pid) +__bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid) { struct task_struct *p; @@ -2032,22 +2032,22 @@ struct task_struct *bpf_task_from_pid(s32 pid) return p; } -void *bpf_cast_to_kern_ctx(void *obj) +__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj) { return obj; } -void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) +__bpf_kfunc void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) { return obj__ign; } -void bpf_rcu_read_lock(void) +__bpf_kfunc void bpf_rcu_read_lock(void) { rcu_read_lock(); } -void bpf_rcu_read_unlock(void) +__bpf_kfunc void bpf_rcu_read_unlock(void) { rcu_read_unlock(); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 793ecff29038..831f1f472bb8 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -26,7 +26,7 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu) * rstat_cpu->updated_children list. See the comment on top of * cgroup_rstat_cpu definition for details. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +__bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) { raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -231,7 +231,7 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) * * This function may block. */ -void cgroup_rstat_flush(struct cgroup *cgrp) +__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) { might_sleep(); diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 969e8f52f7da..b1cf259854ca 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -6,6 +6,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include #include #include @@ -975,7 +976,7 @@ void __noclone __crash_kexec(struct pt_regs *regs) } STACK_FRAME_NON_STANDARD(__crash_kexec); -void crash_kexec(struct pt_regs *regs) +__bpf_kfunc void crash_kexec(struct pt_regs *regs) { int old_cpu, this_cpu; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b1eff2efd3b4..ff1458e541a8 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1236,7 +1236,7 @@ __diag_ignore_all("-Wmissing-prototypes", * Return: a bpf_key pointer with a valid key pointer if the key is found, a * NULL pointer otherwise. */ -struct bpf_key *bpf_lookup_user_key(u32 serial, u64 flags) +__bpf_kfunc struct bpf_key *bpf_lookup_user_key(u32 serial, u64 flags) { key_ref_t key_ref; struct bpf_key *bkey; @@ -1285,7 +1285,7 @@ struct bpf_key *bpf_lookup_user_key(u32 serial, u64 flags) * Return: a bpf_key pointer with an invalid key pointer set from the * pre-determined ID on success, a NULL pointer otherwise */ -struct bpf_key *bpf_lookup_system_key(u64 id) +__bpf_kfunc struct bpf_key *bpf_lookup_system_key(u64 id) { struct bpf_key *bkey; @@ -1309,7 +1309,7 @@ struct bpf_key *bpf_lookup_system_key(u64 id) * Decrement the reference count of the key inside *bkey*, if the pointer * is valid, and free *bkey*. */ -void bpf_key_put(struct bpf_key *bkey) +__bpf_kfunc void bpf_key_put(struct bpf_key *bkey) { if (bkey->has_ref) key_put(bkey->key); @@ -1329,7 +1329,7 @@ void bpf_key_put(struct bpf_key *bkey) * * Return: 0 on success, a negative value on error. */ -int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr, +__bpf_kfunc int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr, struct bpf_dynptr_kern *sig_ptr, struct bpf_key *trusted_keyring) { diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 7dbefa4fd2eb..af9827c4b351 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -484,7 +484,7 @@ out: __diag_push(); __diag_ignore_all("-Wmissing-prototypes", "Global functions as their definitions will be in vmlinux BTF"); -int noinline bpf_fentry_test1(int a) +__bpf_kfunc int bpf_fentry_test1(int a) { return a + 1; } @@ -529,23 +529,23 @@ int noinline bpf_fentry_test8(struct bpf_fentry_test_t *arg) return (long)arg->a; } -int noinline bpf_modify_return_test(int a, int *b) +__bpf_kfunc int bpf_modify_return_test(int a, int *b) { *b += 1; return a + *b; } -u64 noinline bpf_kfunc_call_test1(struct sock *sk, u32 a, u64 b, u32 c, u64 d) +__bpf_kfunc u64 bpf_kfunc_call_test1(struct sock *sk, u32 a, u64 b, u32 c, u64 d) { return a + b + c + d; } -int noinline bpf_kfunc_call_test2(struct sock *sk, u32 a, u32 b) +__bpf_kfunc int bpf_kfunc_call_test2(struct sock *sk, u32 a, u32 b) { return a + b; } -struct sock * noinline bpf_kfunc_call_test3(struct sock *sk) +__bpf_kfunc struct sock *bpf_kfunc_call_test3(struct sock *sk) { return sk; } @@ -582,21 +582,21 @@ static struct prog_test_ref_kfunc prog_test_struct = { .cnt = REFCOUNT_INIT(1), }; -noinline struct prog_test_ref_kfunc * +__bpf_kfunc struct prog_test_ref_kfunc * bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) { refcount_inc(&prog_test_struct.cnt); return &prog_test_struct; } -noinline struct prog_test_member * +__bpf_kfunc struct prog_test_member * bpf_kfunc_call_memb_acquire(void) { WARN_ON_ONCE(1); return NULL; } -noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) +__bpf_kfunc void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) { if (!p) return; @@ -604,11 +604,11 @@ noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) refcount_dec(&p->cnt); } -noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p) +__bpf_kfunc void bpf_kfunc_call_memb_release(struct prog_test_member *p) { } -noinline void bpf_kfunc_call_memb1_release(struct prog_test_member1 *p) +__bpf_kfunc void bpf_kfunc_call_memb1_release(struct prog_test_member1 *p) { WARN_ON_ONCE(1); } @@ -621,12 +621,14 @@ static int *__bpf_kfunc_call_test_get_mem(struct prog_test_ref_kfunc *p, const i return (int *)p; } -noinline int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size) +__bpf_kfunc int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, + const int rdwr_buf_size) { return __bpf_kfunc_call_test_get_mem(p, rdwr_buf_size); } -noinline int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) +__bpf_kfunc int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, + const int rdonly_buf_size) { return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size); } @@ -636,16 +638,17 @@ noinline int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, * Acquire functions must return struct pointers, so these ones are * failing. */ -noinline int *bpf_kfunc_call_test_acq_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) +__bpf_kfunc int *bpf_kfunc_call_test_acq_rdonly_mem(struct prog_test_ref_kfunc *p, + const int rdonly_buf_size) { return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size); } -noinline void bpf_kfunc_call_int_mem_release(int *p) +__bpf_kfunc void bpf_kfunc_call_int_mem_release(int *p) { } -noinline struct prog_test_ref_kfunc * +__bpf_kfunc struct prog_test_ref_kfunc * bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **pp, int a, int b) { struct prog_test_ref_kfunc *p = READ_ONCE(*pp); @@ -694,47 +697,47 @@ struct prog_test_fail3 { char arr2[]; }; -noinline void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb) +__bpf_kfunc void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb) { } -noinline void bpf_kfunc_call_test_pass1(struct prog_test_pass1 *p) +__bpf_kfunc void bpf_kfunc_call_test_pass1(struct prog_test_pass1 *p) { } -noinline void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) +__bpf_kfunc void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) { } -noinline void bpf_kfunc_call_test_fail1(struct prog_test_fail1 *p) +__bpf_kfunc void bpf_kfunc_call_test_fail1(struct prog_test_fail1 *p) { } -noinline void bpf_kfunc_call_test_fail2(struct prog_test_fail2 *p) +__bpf_kfunc void bpf_kfunc_call_test_fail2(struct prog_test_fail2 *p) { } -noinline void bpf_kfunc_call_test_fail3(struct prog_test_fail3 *p) +__bpf_kfunc void bpf_kfunc_call_test_fail3(struct prog_test_fail3 *p) { } -noinline void bpf_kfunc_call_test_mem_len_pass1(void *mem, int mem__sz) +__bpf_kfunc void bpf_kfunc_call_test_mem_len_pass1(void *mem, int mem__sz) { } -noinline void bpf_kfunc_call_test_mem_len_fail1(void *mem, int len) +__bpf_kfunc void bpf_kfunc_call_test_mem_len_fail1(void *mem, int len) { } -noinline void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len) +__bpf_kfunc void bpf_kfunc_call_test_mem_len_fail2(u64 *mem, int len) { } -noinline void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) +__bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) { } -noinline void bpf_kfunc_call_test_destructive(void) +__bpf_kfunc void bpf_kfunc_call_test_destructive(void) { } diff --git a/net/core/xdp.c b/net/core/xdp.c index a5a7ecf6391c..787fb9f92b36 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -4,6 +4,7 @@ * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. */ #include +#include #include #include #include @@ -722,7 +723,7 @@ __diag_ignore_all("-Wmissing-prototypes", * * Returns 0 on success or ``-errno`` on error. */ -int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) +__bpf_kfunc int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) { return -EOPNOTSUPP; } @@ -734,7 +735,7 @@ int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) * * Returns 0 on success or ``-errno`` on error. */ -int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash) +__bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash) { return -EOPNOTSUPP; } diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index d2c470524e58..146792cd26fe 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -295,7 +295,7 @@ static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain) } /* override sysctl_tcp_min_tso_segs */ -static u32 bbr_min_tso_segs(struct sock *sk) +__bpf_kfunc static u32 bbr_min_tso_segs(struct sock *sk) { return sk->sk_pacing_rate < (bbr_min_tso_rate >> 3) ? 1 : 2; } @@ -328,7 +328,7 @@ static void bbr_save_cwnd(struct sock *sk) bbr->prior_cwnd = max(bbr->prior_cwnd, tcp_snd_cwnd(tp)); } -static void bbr_cwnd_event(struct sock *sk, enum tcp_ca_event event) +__bpf_kfunc static void bbr_cwnd_event(struct sock *sk, enum tcp_ca_event event) { struct tcp_sock *tp = tcp_sk(sk); struct bbr *bbr = inet_csk_ca(sk); @@ -1023,7 +1023,7 @@ static void bbr_update_model(struct sock *sk, const struct rate_sample *rs) bbr_update_gains(sk); } -static void bbr_main(struct sock *sk, const struct rate_sample *rs) +__bpf_kfunc static void bbr_main(struct sock *sk, const struct rate_sample *rs) { struct bbr *bbr = inet_csk_ca(sk); u32 bw; @@ -1035,7 +1035,7 @@ static void bbr_main(struct sock *sk, const struct rate_sample *rs) bbr_set_cwnd(sk, rs, rs->acked_sacked, bw, bbr->cwnd_gain); } -static void bbr_init(struct sock *sk) +__bpf_kfunc static void bbr_init(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct bbr *bbr = inet_csk_ca(sk); @@ -1077,7 +1077,7 @@ static void bbr_init(struct sock *sk) cmpxchg(&sk->sk_pacing_status, SK_PACING_NONE, SK_PACING_NEEDED); } -static u32 bbr_sndbuf_expand(struct sock *sk) +__bpf_kfunc static u32 bbr_sndbuf_expand(struct sock *sk) { /* Provision 3 * cwnd since BBR may slow-start even during recovery. */ return 3; @@ -1086,7 +1086,7 @@ static u32 bbr_sndbuf_expand(struct sock *sk) /* In theory BBR does not need to undo the cwnd since it does not * always reduce cwnd on losses (see bbr_main()). Keep it for now. */ -static u32 bbr_undo_cwnd(struct sock *sk) +__bpf_kfunc static u32 bbr_undo_cwnd(struct sock *sk) { struct bbr *bbr = inet_csk_ca(sk); @@ -1097,7 +1097,7 @@ static u32 bbr_undo_cwnd(struct sock *sk) } /* Entering loss recovery, so save cwnd for when we exit or undo recovery. */ -static u32 bbr_ssthresh(struct sock *sk) +__bpf_kfunc static u32 bbr_ssthresh(struct sock *sk) { bbr_save_cwnd(sk); return tcp_sk(sk)->snd_ssthresh; @@ -1125,7 +1125,7 @@ static size_t bbr_get_info(struct sock *sk, u32 ext, int *attr, return 0; } -static void bbr_set_state(struct sock *sk, u8 new_state) +__bpf_kfunc static void bbr_set_state(struct sock *sk, u8 new_state) { struct bbr *bbr = inet_csk_ca(sk); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index d3cae40749e8..db8b4b488c31 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -403,7 +403,7 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, * ABC caps N to 2. Slow start exits when cwnd grows over ssthresh and * returns the leftover acks to adjust cwnd in congestion avoidance mode. */ -u32 tcp_slow_start(struct tcp_sock *tp, u32 acked) +__bpf_kfunc u32 tcp_slow_start(struct tcp_sock *tp, u32 acked) { u32 cwnd = min(tcp_snd_cwnd(tp) + acked, tp->snd_ssthresh); @@ -417,7 +417,7 @@ EXPORT_SYMBOL_GPL(tcp_slow_start); /* In theory this is tp->snd_cwnd += 1 / tp->snd_cwnd (or alternative w), * for every packet that was ACKed. */ -void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked) +__bpf_kfunc void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked) { /* If credits accumulated at a higher w, apply them gently now. */ if (tp->snd_cwnd_cnt >= w) { @@ -443,7 +443,7 @@ EXPORT_SYMBOL_GPL(tcp_cong_avoid_ai); /* This is Jacobson's slow start and congestion avoidance. * SIGCOMM '88, p. 328. */ -void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked) +__bpf_kfunc void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked) { struct tcp_sock *tp = tcp_sk(sk); @@ -462,7 +462,7 @@ void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked) EXPORT_SYMBOL_GPL(tcp_reno_cong_avoid); /* Slow start threshold is half the congestion window (min 2) */ -u32 tcp_reno_ssthresh(struct sock *sk) +__bpf_kfunc u32 tcp_reno_ssthresh(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); @@ -470,7 +470,7 @@ u32 tcp_reno_ssthresh(struct sock *sk) } EXPORT_SYMBOL_GPL(tcp_reno_ssthresh); -u32 tcp_reno_undo_cwnd(struct sock *sk) +__bpf_kfunc u32 tcp_reno_undo_cwnd(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 768c10c1f649..0fd78ecb67e7 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -126,7 +126,7 @@ static inline void bictcp_hystart_reset(struct sock *sk) ca->sample_cnt = 0; } -static void cubictcp_init(struct sock *sk) +__bpf_kfunc static void cubictcp_init(struct sock *sk) { struct bictcp *ca = inet_csk_ca(sk); @@ -139,7 +139,7 @@ static void cubictcp_init(struct sock *sk) tcp_sk(sk)->snd_ssthresh = initial_ssthresh; } -static void cubictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event) +__bpf_kfunc static void cubictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event) { if (event == CA_EVENT_TX_START) { struct bictcp *ca = inet_csk_ca(sk); @@ -321,7 +321,7 @@ tcp_friendliness: ca->cnt = max(ca->cnt, 2U); } -static void cubictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked) +__bpf_kfunc static void cubictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked) { struct tcp_sock *tp = tcp_sk(sk); struct bictcp *ca = inet_csk_ca(sk); @@ -338,7 +338,7 @@ static void cubictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked) tcp_cong_avoid_ai(tp, ca->cnt, acked); } -static u32 cubictcp_recalc_ssthresh(struct sock *sk) +__bpf_kfunc static u32 cubictcp_recalc_ssthresh(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); struct bictcp *ca = inet_csk_ca(sk); @@ -355,7 +355,7 @@ static u32 cubictcp_recalc_ssthresh(struct sock *sk) return max((tcp_snd_cwnd(tp) * beta) / BICTCP_BETA_SCALE, 2U); } -static void cubictcp_state(struct sock *sk, u8 new_state) +__bpf_kfunc static void cubictcp_state(struct sock *sk, u8 new_state) { if (new_state == TCP_CA_Loss) { bictcp_reset(inet_csk_ca(sk)); @@ -445,7 +445,7 @@ static void hystart_update(struct sock *sk, u32 delay) } } -static void cubictcp_acked(struct sock *sk, const struct ack_sample *sample) +__bpf_kfunc static void cubictcp_acked(struct sock *sk, const struct ack_sample *sample) { const struct tcp_sock *tp = tcp_sk(sk); struct bictcp *ca = inet_csk_ca(sk); diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index e0a2ca7456ff..bb23bb5b387a 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -75,7 +75,7 @@ static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca) ca->old_delivered_ce = tp->delivered_ce; } -static void dctcp_init(struct sock *sk) +__bpf_kfunc static void dctcp_init(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); @@ -104,7 +104,7 @@ static void dctcp_init(struct sock *sk) INET_ECN_dontxmit(sk); } -static u32 dctcp_ssthresh(struct sock *sk) +__bpf_kfunc static u32 dctcp_ssthresh(struct sock *sk) { struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -113,7 +113,7 @@ static u32 dctcp_ssthresh(struct sock *sk) return max(tcp_snd_cwnd(tp) - ((tcp_snd_cwnd(tp) * ca->dctcp_alpha) >> 11U), 2U); } -static void dctcp_update_alpha(struct sock *sk, u32 flags) +__bpf_kfunc static void dctcp_update_alpha(struct sock *sk, u32 flags) { const struct tcp_sock *tp = tcp_sk(sk); struct dctcp *ca = inet_csk_ca(sk); @@ -169,7 +169,7 @@ static void dctcp_react_to_loss(struct sock *sk) tp->snd_ssthresh = max(tcp_snd_cwnd(tp) >> 1U, 2U); } -static void dctcp_state(struct sock *sk, u8 new_state) +__bpf_kfunc static void dctcp_state(struct sock *sk, u8 new_state) { if (new_state == TCP_CA_Recovery && new_state != inet_csk(sk)->icsk_ca_state) @@ -179,7 +179,7 @@ static void dctcp_state(struct sock *sk, u8 new_state) */ } -static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) +__bpf_kfunc static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) { struct dctcp *ca = inet_csk_ca(sk); @@ -229,7 +229,7 @@ static size_t dctcp_get_info(struct sock *sk, u32 ext, int *attr, return 0; } -static u32 dctcp_cwnd_undo(struct sock *sk) +__bpf_kfunc static u32 dctcp_cwnd_undo(struct sock *sk) { const struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c index 24002bc61e07..34913521c385 100644 --- a/net/netfilter/nf_conntrack_bpf.c +++ b/net/netfilter/nf_conntrack_bpf.c @@ -249,7 +249,7 @@ __diag_ignore_all("-Wmissing-prototypes", * @opts__sz - Length of the bpf_ct_opts structure * Must be NF_BPF_CT_OPTS_SZ (12) */ -struct nf_conn___init * +__bpf_kfunc struct nf_conn___init * bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple, u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) { @@ -283,7 +283,7 @@ bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple, * @opts__sz - Length of the bpf_ct_opts structure * Must be NF_BPF_CT_OPTS_SZ (12) */ -struct nf_conn * +__bpf_kfunc struct nf_conn * bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple, u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) { @@ -316,7 +316,7 @@ bpf_xdp_ct_lookup(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple, * @opts__sz - Length of the bpf_ct_opts structure * Must be NF_BPF_CT_OPTS_SZ (12) */ -struct nf_conn___init * +__bpf_kfunc struct nf_conn___init * bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple, u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) { @@ -351,7 +351,7 @@ bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple, * @opts__sz - Length of the bpf_ct_opts structure * Must be NF_BPF_CT_OPTS_SZ (12) */ -struct nf_conn * +__bpf_kfunc struct nf_conn * bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple, u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) { @@ -376,7 +376,7 @@ bpf_skb_ct_lookup(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple, * @nfct - Pointer to referenced nf_conn___init object, obtained * using bpf_xdp_ct_alloc or bpf_skb_ct_alloc. */ -struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) +__bpf_kfunc struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) { struct nf_conn *nfct = (struct nf_conn *)nfct_i; int err; @@ -400,7 +400,7 @@ struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) * @nf_conn - Pointer to referenced nf_conn object, obtained using * bpf_xdp_ct_lookup or bpf_skb_ct_lookup. */ -void bpf_ct_release(struct nf_conn *nfct) +__bpf_kfunc void bpf_ct_release(struct nf_conn *nfct) { if (!nfct) return; @@ -417,7 +417,7 @@ void bpf_ct_release(struct nf_conn *nfct) * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. * @timeout - Timeout in msecs. */ -void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout) +__bpf_kfunc void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout) { __nf_ct_set_timeout((struct nf_conn *)nfct, msecs_to_jiffies(timeout)); } @@ -432,7 +432,7 @@ void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout) * bpf_ct_insert_entry, bpf_xdp_ct_lookup, or bpf_skb_ct_lookup. * @timeout - New timeout in msecs. */ -int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout) +__bpf_kfunc int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout) { return __nf_ct_change_timeout(nfct, msecs_to_jiffies(timeout)); } @@ -447,7 +447,7 @@ int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout) * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. * @status - New status value. */ -int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status) +__bpf_kfunc int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status) { return nf_ct_change_status_common((struct nf_conn *)nfct, status); } @@ -462,7 +462,7 @@ int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status) * bpf_ct_insert_entry, bpf_xdp_ct_lookup or bpf_skb_ct_lookup. * @status - New status value. */ -int bpf_ct_change_status(struct nf_conn *nfct, u32 status) +__bpf_kfunc int bpf_ct_change_status(struct nf_conn *nfct, u32 status) { return nf_ct_change_status_common(nfct, status); } diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c index 0fa5a0bbb0ff..141ee7783223 100644 --- a/net/netfilter/nf_nat_bpf.c +++ b/net/netfilter/nf_nat_bpf.c @@ -30,9 +30,9 @@ __diag_ignore_all("-Wmissing-prototypes", * interpreted as select a random port. * @manip - NF_NAT_MANIP_SRC or NF_NAT_MANIP_DST */ -int bpf_ct_set_nat_info(struct nf_conn___init *nfct, - union nf_inet_addr *addr, int port, - enum nf_nat_manip_type manip) +__bpf_kfunc int bpf_ct_set_nat_info(struct nf_conn___init *nfct, + union nf_inet_addr *addr, int port, + enum nf_nat_manip_type manip) { struct nf_conn *ct = (struct nf_conn *)nfct; u16 proto = nf_ct_l3num(ct); diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c index 1ef2162cebcf..d74f3fd20f2b 100644 --- a/net/xfrm/xfrm_interface_bpf.c +++ b/net/xfrm/xfrm_interface_bpf.c @@ -39,8 +39,7 @@ __diag_ignore_all("-Wmissing-prototypes", * @to - Pointer to memory to which the metadata will be copied * Cannot be NULL */ -__used noinline -int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to) +__bpf_kfunc int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to) { struct sk_buff *skb = (struct sk_buff *)skb_ctx; struct xfrm_md_info *info; @@ -62,9 +61,7 @@ int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to) * @from - Pointer to memory from which the metadata will be copied * Cannot be NULL */ -__used noinline -int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx, - const struct bpf_xfrm_info *from) +__bpf_kfunc int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx, const struct bpf_xfrm_info *from) { struct sk_buff *skb = (struct sk_buff *)skb_ctx; struct metadata_dst *md_dst; diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 5085fea3cac5..46500636d8cd 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -59,7 +59,7 @@ bpf_testmod_test_struct_arg_5(void) { return bpf_testmod_test_struct_arg_result; } -noinline void +__bpf_kfunc void bpf_testmod_test_mod_kfunc(int i) { *(int *)this_cpu_ptr(&bpf_testmod_ksym_percpu) = i; -- cgit v1.2.3 From 6aed15e330bfec6a423f40582b2a8b53d9ce1757 Mon Sep 17 00:00:00 2001 From: David Vernet Date: Wed, 1 Feb 2023 11:30:16 -0600 Subject: selftests/bpf: Add testcase for static kfunc with unused arg kfuncs are allowed to be static, or not use one or more of their arguments. For example, bpf_xdp_metadata_rx_hash() in net/core/xdp.c is meant to be implemented by drivers, with the default implementation just returning -EOPNOTSUPP. As described in [0], such kfuncs can have their arguments elided, which can cause BTF encoding to be skipped. The new __bpf_kfunc macro should address this, and this patch adds a selftest which verifies that a static kfunc with at least one unused argument can still be encoded and invoked by a BPF program. Signed-off-by: David Vernet Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230201173016.342758-5-void@manifault.com --- net/bpf/test_run.c | 6 ++++++ tools/testing/selftests/bpf/prog_tests/kfunc_call.c | 1 + tools/testing/selftests/bpf/progs/kfunc_call_test.c | 11 +++++++++++ 3 files changed, 18 insertions(+) (limited to 'net') diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index af9827c4b351..e6f773d12045 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -741,6 +741,11 @@ __bpf_kfunc void bpf_kfunc_call_test_destructive(void) { } +__bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused) +{ + return arg; +} + __diag_pop(); BTF_SET8_START(bpf_test_modify_return_ids) @@ -779,6 +784,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1) BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2) BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) +BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) BTF_SET8_END(test_sk_check_kfunc_ids) static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c index bb4cd82a788a..a543742cd7bd 100644 --- a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c +++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c @@ -77,6 +77,7 @@ static struct kfunc_test_params kfunc_tests[] = { TC_TEST(kfunc_call_test_get_mem, 42), SYSCALL_TEST(kfunc_syscall_test, 0), SYSCALL_NULL_CTX_TEST(kfunc_syscall_test_null, 0), + TC_TEST(kfunc_call_test_static_unused_arg, 0), }; struct syscall_test_args { diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test.c b/tools/testing/selftests/bpf/progs/kfunc_call_test.c index d91c58d06d38..7daa8f5720b9 100644 --- a/tools/testing/selftests/bpf/progs/kfunc_call_test.c +++ b/tools/testing/selftests/bpf/progs/kfunc_call_test.c @@ -17,6 +17,7 @@ extern void bpf_kfunc_call_test_mem_len_pass1(void *mem, int len) __ksym; extern void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym; extern int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size) __ksym; extern int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) __ksym; +extern u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused) __ksym; SEC("tc") int kfunc_call_test4(struct __sk_buff *skb) @@ -181,4 +182,14 @@ int kfunc_call_test_get_mem(struct __sk_buff *skb) return ret; } +SEC("tc") +int kfunc_call_test_static_unused_arg(struct __sk_buff *skb) +{ + + u32 expected = 5, actual; + + actual = bpf_kfunc_call_test_static_unused_arg(expected, 0xdeadbeef); + return actual != expected ? -1 : 0; +} + char _license[] SEC("license") = "GPL"; -- cgit v1.2.3 From bc61761394ce0f0cc35c6fc60426f08d83d0d488 Mon Sep 17 00:00:00 2001 From: Jiapeng Chong Date: Tue, 31 Jan 2023 14:34:56 +0800 Subject: ipv6: ICMPV6: Use swap() instead of open coding it Swap is a function interface that provides exchange function. To avoid code duplication, we can use swap function. ./net/ipv6/icmp.c:344:25-26: WARNING opportunity for swap(). Reported-by: Abaci Robot Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3896 Signed-off-by: Jiapeng Chong Reviewed-by: Simon Horman Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20230131063456.76302-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski --- net/ipv6/icmp.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'net') diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 79c769c0d113..c9346515e24d 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -332,7 +332,6 @@ static void mip6_addr_swap(struct sk_buff *skb, const struct inet6_skb_parm *opt { struct ipv6hdr *iph = ipv6_hdr(skb); struct ipv6_destopt_hao *hao; - struct in6_addr tmp; int off; if (opt->dsthao) { @@ -340,9 +339,7 @@ static void mip6_addr_swap(struct sk_buff *skb, const struct inet6_skb_parm *opt if (likely(off >= 0)) { hao = (struct ipv6_destopt_hao *) (skb_network_header(skb) + off); - tmp = iph->saddr; - iph->saddr = hao->addr; - hao->addr = tmp; + swap(iph->saddr, hao->addr); } } } -- cgit v1.2.3 From 46abd17302ba6be2e06818088e40a568e8f9e7af Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:31 -0500 Subject: bridge: use skb_ip_totlen in br netfilter These 3 places in bridge netfilter are called on RX path after GRO and IPv4 TCP GSO packets may come through, so replace iph tot_len accessing with skb_ip_totlen() in there. Signed-off-by: Xin Long Reviewed-by: Nikolay Aleksandrov Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/bridge/br_netfilter_hooks.c | 2 +- net/bridge/netfilter/nf_conntrack_bridge.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index f20f4373ff40..b67c9c98effa 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -214,7 +214,7 @@ static int br_validate_ipv4(struct net *net, struct sk_buff *skb) if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) goto csum_error; - len = ntohs(iph->tot_len); + len = skb_ip_totlen(skb); if (skb->len < len) { __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS); goto drop; diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index 5c5dd437f1c2..71056ee84773 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -212,7 +212,7 @@ static int nf_ct_br_ip_check(const struct sk_buff *skb) iph->version != 4) return -1; - len = ntohs(iph->tot_len); + len = skb_ip_totlen(skb); if (skb->len < nhoff + len || len < (iph->ihl * 4)) return -1; @@ -256,7 +256,7 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb, if (!pskb_may_pull(skb, sizeof(struct iphdr))) return NF_ACCEPT; - len = ntohs(ip_hdr(skb)->tot_len); + len = skb_ip_totlen(skb); if (pskb_trim_rcsum(skb, len)) return NF_ACCEPT; -- cgit v1.2.3 From ec84c955a0d06cef31664bae328d94be7a3e2f03 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:32 -0500 Subject: openvswitch: use skb_ip_totlen in conntrack IPv4 GSO packets may get processed in ovs_skb_network_trim(), and we need to use skb_ip_totlen() to get iph totlen. Signed-off-by: Xin Long Reviewed-by: Aaron Conole Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/openvswitch/conntrack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index c8b137649ca4..2172930b1f17 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1103,7 +1103,7 @@ static int ovs_skb_network_trim(struct sk_buff *skb) switch (skb->protocol) { case htons(ETH_P_IP): - len = ntohs(ip_hdr(skb)->tot_len); + len = skb_ip_totlen(skb); break; case htons(ETH_P_IPV6): len = sizeof(struct ipv6hdr) -- cgit v1.2.3 From 043e397e48c58b4442ea5124dc1bdc95367a0a33 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:33 -0500 Subject: net: sched: use skb_ip_totlen and iph_totlen There are 1 action and 1 qdisc that may process IPv4 TCP GSO packets and access iph->tot_len, replace them with skb_ip_totlen() and iph_totlen() accordingly. Note that we don't need to replace the one in tcf_csum_ipv4(), as it will return for TCP GSO packets in tcf_csum_ipv4_tcp(). Signed-off-by: Xin Long Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/sched/act_ct.c | 2 +- net/sched/sch_cake.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 0ca2bb8ed026..d68bb5dbf0dc 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -707,7 +707,7 @@ static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family) switch (family) { case NFPROTO_IPV4: - len = ntohs(ip_hdr(skb)->tot_len); + len = skb_ip_totlen(skb); break; case NFPROTO_IPV6: len = sizeof(struct ipv6hdr) diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index 3ed0c3342189..7970217b565a 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -1209,7 +1209,7 @@ static struct sk_buff *cake_ack_filter(struct cake_sched_data *q, iph_check->daddr != iph->daddr) continue; - seglen = ntohs(iph_check->tot_len) - + seglen = iph_totlen(skb, iph_check) - (4 * iph_check->ihl); } else if (iph_check->version == 6) { ipv6h = (struct ipv6hdr *)iph; -- cgit v1.2.3 From a13fbf5ed5b4fc9095f12e955ca3a59b5507ff01 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:34 -0500 Subject: netfilter: use skb_ip_totlen and iph_totlen There are also quite some places in netfilter that may process IPv4 TCP GSO packets, we need to replace them too. In length_mt(), we have to use u_int32_t/int to accept skb_ip_totlen() return value, otherwise it may overflow and mismatch. This change will also help us add selftest for IPv4 BIG TCP in the following patch. Note that we don't need to replace the one in tcpmss_tg4(), as it will return if there is data after tcphdr in tcpmss_mangle_packet(). The same in mangle_contents() in nf_nat_helper.c, it returns false when skb->len + extra > 65535 in enlarge_skb(). Signed-off-by: Xin Long Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/net/netfilter/nf_tables_ipv4.h | 4 ++-- net/netfilter/ipvs/ip_vs_xmit.c | 2 +- net/netfilter/nf_log_syslog.c | 2 +- net/netfilter/xt_length.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/include/net/netfilter/nf_tables_ipv4.h b/include/net/netfilter/nf_tables_ipv4.h index 112708f7a6b4..947973623dc7 100644 --- a/include/net/netfilter/nf_tables_ipv4.h +++ b/include/net/netfilter/nf_tables_ipv4.h @@ -29,7 +29,7 @@ static inline int __nft_set_pktinfo_ipv4_validate(struct nft_pktinfo *pkt) if (iph->ihl < 5 || iph->version != 4) return -1; - len = ntohs(iph->tot_len); + len = iph_totlen(pkt->skb, iph); thoff = iph->ihl * 4; if (pkt->skb->len < len) return -1; @@ -64,7 +64,7 @@ static inline int nft_set_pktinfo_ipv4_ingress(struct nft_pktinfo *pkt) if (iph->ihl < 5 || iph->version != 4) goto inhdr_error; - len = ntohs(iph->tot_len); + len = iph_totlen(pkt->skb, iph); thoff = iph->ihl * 4; if (pkt->skb->len < len) { __IP_INC_STATS(nft_net(pkt), IPSTATS_MIB_INTRUNCATEDPKTS); diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 029171379884..80448885c3d7 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -994,7 +994,7 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af, old_dsfield = ipv4_get_dsfield(old_iph); *ttl = old_iph->ttl; if (payload_len) - *payload_len = ntohs(old_iph->tot_len); + *payload_len = skb_ip_totlen(skb); } /* Implement full-functionality option for ECN encapsulation */ diff --git a/net/netfilter/nf_log_syslog.c b/net/netfilter/nf_log_syslog.c index cb894f0d63e9..c66689ad2b49 100644 --- a/net/netfilter/nf_log_syslog.c +++ b/net/netfilter/nf_log_syslog.c @@ -322,7 +322,7 @@ dump_ipv4_packet(struct net *net, struct nf_log_buf *m, /* Max length: 46 "LEN=65535 TOS=0xFF PREC=0xFF TTL=255 ID=65535 " */ nf_log_buf_add(m, "LEN=%u TOS=0x%02X PREC=0x%02X TTL=%u ID=%u ", - ntohs(ih->tot_len), ih->tos & IPTOS_TOS_MASK, + iph_totlen(skb, ih), ih->tos & IPTOS_TOS_MASK, ih->tos & IPTOS_PREC_MASK, ih->ttl, ntohs(ih->id)); /* Max length: 6 "CE DF MF " */ diff --git a/net/netfilter/xt_length.c b/net/netfilter/xt_length.c index 1873da3a945a..b3d623a52885 100644 --- a/net/netfilter/xt_length.c +++ b/net/netfilter/xt_length.c @@ -21,7 +21,7 @@ static bool length_mt(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_length_info *info = par->matchinfo; - u_int16_t pktlen = ntohs(ip_hdr(skb)->tot_len); + u32 pktlen = skb_ip_totlen(skb); return (pktlen >= info->min && pktlen <= info->max) ^ info->invert; } -- cgit v1.2.3 From 7eb072be41ba4d8ecea17092dece50c7375d8980 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:35 -0500 Subject: cipso_ipv4: use iph_set_totlen in skbuff_setattr It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so the iph->tot_len update should use iph_set_totlen(). Note that for these non GSO packets, the new iph tot_len with extra iph option len added may become greater than 65535, the old process will cast it and set iph->tot_len to it, which is a bug. In theory, iph options shouldn't be added for these big packets in here, a fix may be needed here in the future. For now this patch is only to set iph->tot_len to 0 when it happens. Signed-off-by: Xin Long Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/ipv4/cipso_ipv4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index 6cd3b6c559f0..79ae7204e8ed 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -2222,7 +2222,7 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb, memset((char *)(iph + 1) + buf_len, 0, opt_len - buf_len); if (len_delta != 0) { iph->ihl = 5 + (opt_len >> 2); - iph->tot_len = htons(skb->len); + iph_set_totlen(iph, skb->len); } ip_send_check(iph); -- cgit v1.2.3 From 8e08bb75b60f7f9ed319185cef80188b87d9b43a Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:37 -0500 Subject: packet: add TP_STATUS_GSO_TCP for tp_status Introduce TP_STATUS_GSO_TCP tp_status flag to tell the af_packet user that this is a TCP GSO packet. When parsing IPv4 BIG TCP packets in tcpdump/libpcap, it can use tp_len as the IPv4 packet len when this flag is set, as iph tot_len is set to 0 for IPv4 BIG TCP packets. Signed-off-by: Xin Long Reviewed-by: David Ahern Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/uapi/linux/if_packet.h | 1 + net/packet/af_packet.c | 4 ++++ 2 files changed, 5 insertions(+) (limited to 'net') diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index a8516b3594a4..78c981d6a9d4 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -115,6 +115,7 @@ struct tpacket_auxdata { #define TP_STATUS_BLK_TMO (1 << 5) #define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */ #define TP_STATUS_CSUM_VALID (1 << 7) +#define TP_STATUS_GSO_TCP (1 << 8) /* Tx ring - header status */ #define TP_STATUS_AVAILABLE 0 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b5ab98ca2511..8ffb19c643ab 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2296,6 +2296,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, else if (skb->pkt_type != PACKET_OUTGOING && skb_csum_unnecessary(skb)) status |= TP_STATUS_CSUM_VALID; + if (skb_is_gso(skb) && skb_is_gso_tcp(skb)) + status |= TP_STATUS_GSO_TCP; if (snaplen > res) snaplen = res; @@ -3522,6 +3524,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, else if (skb->pkt_type != PACKET_OUTGOING && skb_csum_unnecessary(skb)) aux.tp_status |= TP_STATUS_CSUM_VALID; + if (skb_is_gso(skb) && skb_is_gso_tcp(skb)) + aux.tp_status |= TP_STATUS_GSO_TCP; aux.tp_len = origlen; aux.tp_snaplen = skb->len; -- cgit v1.2.3 From 9eefedd58ae1daece2ba907849a44db2941fb4b0 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:38 -0500 Subject: net: add gso_ipv4_max_size and gro_ipv4_max_size per device This patch introduces gso_ipv4_max_size and gro_ipv4_max_size per device and adds netlink attributes for them, so that IPV4 BIG TCP can be guarded by a separate tunable in the next patch. To not break the old application using "gso/gro_max_size" for IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size" in netif_set_gso/gro_max_size() if the new size isn't greater than GSO_LEGACY_MAX_SIZE, so that nothing will change even if userspace doesn't realize the new netlink attributes. Signed-off-by: Xin Long Reviewed-by: David Ahern Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 6 ++++++ include/uapi/linux/if_link.h | 3 +++ net/core/dev.c | 4 ++++ net/core/dev.h | 18 ++++++++++++++++++ net/core/rtnetlink.c | 33 +++++++++++++++++++++++++++++++++ 5 files changed, 64 insertions(+) (limited to 'net') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2466afa25078..d5ef4c1fedd2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1964,6 +1964,8 @@ enum netdev_ml_priv_type { * @gso_max_segs: Maximum number of segments that can be passed to the * NIC for GSO * @tso_max_segs: Device (as in HW) limit on the max TSO segment count + * @gso_ipv4_max_size: Maximum size of generic segmentation offload, + * for IPv4. * * @dcbnl_ops: Data Center Bridging netlink ops * @num_tc: Number of traffic classes in the net device @@ -2004,6 +2006,8 @@ enum netdev_ml_priv_type { * keep a list of interfaces to be deleted. * @gro_max_size: Maximum size of aggregated packet in generic * receive offload (GRO) + * @gro_ipv4_max_size: Maximum size of aggregated packet in generic + * receive offload (GRO), for IPv4. * * @dev_addr_shadow: Copy of @dev_addr to catch direct writes. * @linkwatch_dev_tracker: refcount tracker used by linkwatch. @@ -2207,6 +2211,7 @@ struct net_device { */ #define GRO_MAX_SIZE (8 * 65535u) unsigned int gro_max_size; + unsigned int gro_ipv4_max_size; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; @@ -2330,6 +2335,7 @@ struct net_device { u16 gso_max_segs; #define TSO_MAX_SEGS U16_MAX u16 tso_max_segs; + unsigned int gso_ipv4_max_size; #ifdef CONFIG_DCB const struct dcbnl_rtnl_ops *dcbnl_ops; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 1021a7e47a86..02b87e4c65be 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -374,6 +374,9 @@ enum { IFLA_DEVLINK_PORT, + IFLA_GSO_IPV4_MAX_SIZE, + IFLA_GRO_IPV4_MAX_SIZE, + __IFLA_MAX }; diff --git a/net/core/dev.c b/net/core/dev.c index f72f5c4ee7e2..bb42150a38ec 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3001,6 +3001,8 @@ void netif_set_tso_max_size(struct net_device *dev, unsigned int size) dev->tso_max_size = min(GSO_MAX_SIZE, size); if (size < READ_ONCE(dev->gso_max_size)) netif_set_gso_max_size(dev, size); + if (size < READ_ONCE(dev->gso_ipv4_max_size)) + netif_set_gso_ipv4_max_size(dev, size); } EXPORT_SYMBOL(netif_set_tso_max_size); @@ -10614,6 +10616,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->gso_max_size = GSO_LEGACY_MAX_SIZE; dev->gso_max_segs = GSO_MAX_SEGS; dev->gro_max_size = GRO_LEGACY_MAX_SIZE; + dev->gso_ipv4_max_size = GSO_LEGACY_MAX_SIZE; + dev->gro_ipv4_max_size = GRO_LEGACY_MAX_SIZE; dev->tso_max_size = TSO_LEGACY_MAX_SIZE; dev->tso_max_segs = TSO_MAX_SEGS; dev->upper_level = 1; diff --git a/net/core/dev.h b/net/core/dev.h index 814ed5b7b960..a065b7571441 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -100,6 +100,8 @@ static inline void netif_set_gso_max_size(struct net_device *dev, { /* dev->gso_max_size is read locklessly from sk_setup_caps() */ WRITE_ONCE(dev->gso_max_size, size); + if (size <= GSO_LEGACY_MAX_SIZE) + WRITE_ONCE(dev->gso_ipv4_max_size, size); } static inline void netif_set_gso_max_segs(struct net_device *dev, @@ -114,6 +116,22 @@ static inline void netif_set_gro_max_size(struct net_device *dev, { /* This pairs with the READ_ONCE() in skb_gro_receive() */ WRITE_ONCE(dev->gro_max_size, size); + if (size <= GRO_LEGACY_MAX_SIZE) + WRITE_ONCE(dev->gro_ipv4_max_size, size); +} + +static inline void netif_set_gso_ipv4_max_size(struct net_device *dev, + unsigned int size) +{ + /* dev->gso_ipv4_max_size is read locklessly from sk_setup_caps() */ + WRITE_ONCE(dev->gso_ipv4_max_size, size); +} + +static inline void netif_set_gro_ipv4_max_size(struct net_device *dev, + unsigned int size) +{ + /* This pairs with the READ_ONCE() in skb_gro_receive() */ + WRITE_ONCE(dev->gro_ipv4_max_size, size); } #endif diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 64289bc98887..b9f584955b77 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1074,6 +1074,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */ + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */ + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */ + + nla_total_size(4) /* IFLA_GSO_IPV4_MAX_SIZE */ + + nla_total_size(4) /* IFLA_GRO_IPV4_MAX_SIZE */ + nla_total_size(4) /* IFLA_TSO_MAX_SIZE */ + nla_total_size(4) /* IFLA_TSO_MAX_SEGS */ + nla_total_size(1) /* IFLA_OPERSTATE */ @@ -1807,6 +1809,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) || nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) || nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) || + nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE, dev->gso_ipv4_max_size) || + nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE, dev->gro_ipv4_max_size) || nla_put_u32(skb, IFLA_TSO_MAX_SIZE, dev->tso_max_size) || nla_put_u32(skb, IFLA_TSO_MAX_SEGS, dev->tso_max_segs) || #ifdef CONFIG_RPS @@ -1968,6 +1972,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [IFLA_TSO_MAX_SIZE] = { .type = NLA_REJECT }, [IFLA_TSO_MAX_SEGS] = { .type = NLA_REJECT }, [IFLA_ALLMULTI] = { .type = NLA_REJECT }, + [IFLA_GSO_IPV4_MAX_SIZE] = { .type = NLA_U32 }, + [IFLA_GRO_IPV4_MAX_SIZE] = { .type = NLA_U32 }, }; static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { @@ -2883,6 +2889,29 @@ static int do_setlink(const struct sk_buff *skb, } } + if (tb[IFLA_GSO_IPV4_MAX_SIZE]) { + u32 max_size = nla_get_u32(tb[IFLA_GSO_IPV4_MAX_SIZE]); + + if (max_size > dev->tso_max_size) { + err = -EINVAL; + goto errout; + } + + if (dev->gso_ipv4_max_size ^ max_size) { + netif_set_gso_ipv4_max_size(dev, max_size); + status |= DO_SETLINK_MODIFIED; + } + } + + if (tb[IFLA_GRO_IPV4_MAX_SIZE]) { + u32 gro_max_size = nla_get_u32(tb[IFLA_GRO_IPV4_MAX_SIZE]); + + if (dev->gro_ipv4_max_size ^ gro_max_size) { + netif_set_gro_ipv4_max_size(dev, gro_max_size); + status |= DO_SETLINK_MODIFIED; + } + } + if (tb[IFLA_OPERSTATE]) set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); @@ -3325,6 +3354,10 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname, netif_set_gso_max_segs(dev, nla_get_u32(tb[IFLA_GSO_MAX_SEGS])); if (tb[IFLA_GRO_MAX_SIZE]) netif_set_gro_max_size(dev, nla_get_u32(tb[IFLA_GRO_MAX_SIZE])); + if (tb[IFLA_GSO_IPV4_MAX_SIZE]) + netif_set_gso_ipv4_max_size(dev, nla_get_u32(tb[IFLA_GSO_IPV4_MAX_SIZE])); + if (tb[IFLA_GRO_IPV4_MAX_SIZE]) + netif_set_gro_ipv4_max_size(dev, nla_get_u32(tb[IFLA_GRO_IPV4_MAX_SIZE])); return dev; } -- cgit v1.2.3 From b1a78b9b98862cda167b643690e43662ea060625 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 28 Jan 2023 10:58:39 -0500 Subject: net: add support for ipv4 big tcp Similar to Eric's IPv6 BIG TCP, this patch is to enable IPv4 BIG TCP. Firstly, allow sk->sk_gso_max_size to be set to a value greater than GSO_LEGACY_MAX_SIZE by not trimming gso_max_size in sk_trim_gso_size() for IPv4 TCP sockets. Then on TX path, set IP header tot_len to 0 when skb->len > IP_MAX_MTU in __ip_local_out() to allow to send BIG TCP packets, and this implies that skb->len is the length of a IPv4 packet; On RX path, use skb->len as the length of the IPv4 packet when the IP header tot_len is 0 and skb->len > IP_MAX_MTU in ip_rcv_core(). As the API iph_set_totlen() and skb_ip_totlen() are used in __ip_local_out() and ip_rcv_core(), we only need to update these APIs. Also in GRO receive, add the check for ETH_P_IP/IPPROTO_TCP, and allows the merged packet size >= GRO_LEGACY_MAX_SIZE in skb_gro_receive(). In GRO complete, set IP header tot_len to 0 when the merged packet size greater than IP_MAX_MTU in iph_set_totlen() so that it can be processed on RX path. Note that by checking skb_is_gso_tcp() in API iph_totlen(), it makes this implementation safe to use iph->len == 0 indicates IPv4 BIG TCP packets. Signed-off-by: Xin Long Reviewed-by: David Ahern Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/core/gro.c | 12 +++++++----- net/core/sock.c | 26 ++++++++++++++------------ net/ipv4/af_inet.c | 7 ++++--- net/ipv4/ip_input.c | 2 +- net/ipv4/ip_output.c | 2 +- 5 files changed, 27 insertions(+), 22 deletions(-) (limited to 'net') diff --git a/net/core/gro.c b/net/core/gro.c index 506f83d715f8..b15f85546bdd 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -162,16 +162,18 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb) struct sk_buff *lp; int segs; - /* pairs with WRITE_ONCE() in netif_set_gro_max_size() */ - gro_max_size = READ_ONCE(p->dev->gro_max_size); + /* pairs with WRITE_ONCE() in netif_set_gro(_ipv4)_max_size() */ + gro_max_size = p->protocol == htons(ETH_P_IPV6) ? + READ_ONCE(p->dev->gro_max_size) : + READ_ONCE(p->dev->gro_ipv4_max_size); if (unlikely(p->len + len >= gro_max_size || NAPI_GRO_CB(skb)->flush)) return -E2BIG; if (unlikely(p->len + len >= GRO_LEGACY_MAX_SIZE)) { - if (p->protocol != htons(ETH_P_IPV6) || - skb_headroom(p) < sizeof(struct hop_jumbo_hdr) || - ipv6_hdr(p)->nexthdr != IPPROTO_TCP || + if (NAPI_GRO_CB(skb)->proto != IPPROTO_TCP || + (p->protocol == htons(ETH_P_IPV6) && + skb_headroom(p) < sizeof(struct hop_jumbo_hdr)) || p->encapsulation) return -E2BIG; } diff --git a/net/core/sock.c b/net/core/sock.c index 7ba4891460ad..f08b76acde9b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2373,17 +2373,22 @@ void sk_free_unlock_clone(struct sock *sk) } EXPORT_SYMBOL_GPL(sk_free_unlock_clone); -static void sk_trim_gso_size(struct sock *sk) +static u32 sk_dst_gso_max_size(struct sock *sk, struct dst_entry *dst) { - if (sk->sk_gso_max_size <= GSO_LEGACY_MAX_SIZE) - return; + bool is_ipv6 = false; + u32 max_size; + #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family == AF_INET6 && - sk_is_tcp(sk) && - !ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)) - return; + is_ipv6 = (sk->sk_family == AF_INET6 && + !ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)); #endif - sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE; + /* pairs with the WRITE_ONCE() in netif_set_gso(_ipv4)_max_size() */ + max_size = is_ipv6 ? READ_ONCE(dst->dev->gso_max_size) : + READ_ONCE(dst->dev->gso_ipv4_max_size); + if (max_size > GSO_LEGACY_MAX_SIZE && !sk_is_tcp(sk)) + max_size = GSO_LEGACY_MAX_SIZE; + + return max_size - (MAX_TCP_HEADER + 1); } void sk_setup_caps(struct sock *sk, struct dst_entry *dst) @@ -2403,10 +2408,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) sk->sk_route_caps &= ~NETIF_F_GSO_MASK; } else { sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; - /* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */ - sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size); - sk_trim_gso_size(sk); - sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1); + sk->sk_gso_max_size = sk_dst_gso_max_size(sk, dst); /* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */ max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1); } diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 6c0ec2789943..2f992a323b95 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1485,6 +1485,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) if (unlikely(ip_fast_csum((u8 *)iph, 5))) goto out; + NAPI_GRO_CB(skb)->proto = proto; id = ntohl(*(__be32 *)&iph->id); flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF)); id >>= 16; @@ -1618,9 +1619,9 @@ int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) int inet_gro_complete(struct sk_buff *skb, int nhoff) { - __be16 newlen = htons(skb->len - nhoff); struct iphdr *iph = (struct iphdr *)(skb->data + nhoff); const struct net_offload *ops; + __be16 totlen = iph->tot_len; int proto = iph->protocol; int err = -ENOSYS; @@ -1629,8 +1630,8 @@ int inet_gro_complete(struct sk_buff *skb, int nhoff) skb_set_inner_network_header(skb, nhoff); } - csum_replace2(&iph->check, iph->tot_len, newlen); - iph->tot_len = newlen; + iph_set_totlen(iph, skb->len - nhoff); + csum_replace2(&iph->check, totlen, iph->tot_len); ops = rcu_dereference(inet_offloads[proto]); if (WARN_ON(!ops || !ops->callbacks.gro_complete)) diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index e880ce77322a..fe9ead9ee863 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -511,7 +511,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff *skb, struct net *net) if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) goto csum_error; - len = ntohs(iph->tot_len); + len = iph_totlen(skb, iph); if (skb->len < len) { drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL; __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 922c87ef1ab5..4e4e308c3230 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -100,7 +100,7 @@ int __ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) { struct iphdr *iph = ip_hdr(skb); - iph->tot_len = htons(skb->len); + iph_set_totlen(iph, skb->len); ip_send_check(iph); /* if egress device is enslaved to an L3 master device pass the -- cgit v1.2.3 From 62e395f82d04510b0f86e5e603e29412be88596f Mon Sep 17 00:00:00 2001 From: Brian Haley Date: Mon, 30 Jan 2023 12:14:28 -0500 Subject: neighbor: fix proxy_delay usage when it is zero When set to zero, the neighbor sysctl proxy_delay value does not cause an immediate reply for ARP/ND requests as expected, it instead causes a random delay between [0, U32_MAX). Looking at this comment from __get_random_u32_below() explains the reason: /* * This function is technically undefined for ceil == 0, and in fact * for the non-underscored constant version in the header, we build bug * on that. But for the non-constant case, it's convenient to have that * evaluate to being a straight call to get_random_u32(), so that * get_random_u32_inclusive() can work over its whole range without * undefined behavior. */ Added helper function that does not call get_random_u32_below() if proxy_delay is zero and just uses the current value of jiffies instead, causing pneigh_enqueue() to respond immediately. Also added definition of proxy_delay to ip-sysctl.txt since it was missing. Signed-off-by: Brian Haley Link: https://lore.kernel.org/r/20230130171428.367111-1-haleyb.dev@gmail.com Signed-off-by: Jakub Kicinski --- Documentation/networking/ip-sysctl.rst | 8 ++++++++ net/core/neighbour.c | 14 ++++++++++++-- 2 files changed, 20 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 39df3cfb3bcb..87dd1c5283e6 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1592,6 +1592,14 @@ proxy_arp_pvlan - BOOLEAN Hewlett-Packard call it Source-Port filtering or port-isolation. Ericsson call it MAC-Forced Forwarding (RFC Draft). +proxy_delay - INTEGER + Delay proxy response. + + Delay response to a neighbor solicitation when proxy_arp + or proxy_ndp is enabled. A random value between [0, proxy_delay) + will be chosen, setting to zero means reply with no delay. + Value in jiffies. Defaults to 80. + shared_media - BOOLEAN Send(router) or accept(host) RFC1620 shared media redirects. Overrides secure_redirects. diff --git a/net/core/neighbour.c b/net/core/neighbour.c index f00a79fc301b..57258110bccd 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1662,11 +1662,21 @@ static void neigh_proxy_process(struct timer_list *t) spin_unlock(&tbl->proxy_queue.lock); } +static unsigned long neigh_proxy_delay(struct neigh_parms *p) +{ + /* If proxy_delay is zero, do not call get_random_u32_below() + * as it is undefined behavior. + */ + unsigned long proxy_delay = NEIGH_VAR(p, PROXY_DELAY); + + return proxy_delay ? + jiffies + get_random_u32_below(proxy_delay) : jiffies; +} + void pneigh_enqueue(struct neigh_table *tbl, struct neigh_parms *p, struct sk_buff *skb) { - unsigned long sched_next = jiffies + - get_random_u32_below(NEIGH_VAR(p, PROXY_DELAY)); + unsigned long sched_next = neigh_proxy_delay(p); if (p->qlen > NEIGH_VAR(p, PROXY_QLEN)) { kfree_skb(skb); -- cgit v1.2.3 From 028fb19c6ba743ed308ba99ac325afa968795e0f Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 31 Jan 2023 15:31:57 +0200 Subject: netlink: provide an ability to set default extack message In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwrite the filled message. Instead of adding multiple "if (!extack->_msg)" checks before any NL_SET_ERR_MSG() call, which appears after call to the driver, let's add new macro to common code. [1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal Reviewed-by: Simon Horman Reviewed-by: Nikolay Aleksandrov Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netlink.h | 10 ++++++++++ net/bridge/br_switchdev.c | 10 ++++------ net/dsa/master.c | 4 +--- net/dsa/slave.c | 4 +--- net/xfrm/xfrm_device.c | 5 ++++- 5 files changed, 20 insertions(+), 13 deletions(-) (limited to 'net') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index fa4d86da0ec7..c43ac7690eca 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -130,6 +130,16 @@ struct netlink_ext_ack { #define NL_SET_ERR_MSG_FMT_MOD(extack, fmt, args...) \ NL_SET_ERR_MSG_FMT((extack), KBUILD_MODNAME ": " fmt, ##args) +#define NL_SET_ERR_MSG_WEAK(extack, msg) do { \ + if ((extack) && !(extack)->_msg) \ + NL_SET_ERR_MSG((extack), msg); \ +} while (0) + +#define NL_SET_ERR_MSG_WEAK_MOD(extack, msg) do { \ + if ((extack) && !(extack)->_msg) \ + NL_SET_ERR_MSG_MOD((extack), msg); \ +} while (0) + #define NL_SET_BAD_ATTR_POLICY(extack, attr, pol) do { \ if ((extack)) { \ (extack)->bad_attr = (attr); \ diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index 7eb6fd5bb917..de18e9c1d7a7 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -104,9 +104,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p, return 0; if (err) { - if (extack && !extack->_msg) - NL_SET_ERR_MSG_MOD(extack, - "bridge flag offload is not supported"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "bridge flag offload is not supported"); return -EOPNOTSUPP; } @@ -115,9 +114,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p, err = switchdev_port_attr_set(p->dev, &attr, extack); if (err) { - if (extack && !extack->_msg) - NL_SET_ERR_MSG_MOD(extack, - "error setting offload flag on port"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "error setting offload flag on port"); return err; } diff --git a/net/dsa/master.c b/net/dsa/master.c index 26d90140d271..1507b8cdb360 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -464,9 +464,7 @@ int dsa_master_lag_setup(struct net_device *lag_dev, struct dsa_port *cpu_dp, err = dsa_port_lag_join(cpu_dp, lag_dev, uinfo, extack); if (err) { - if (extack && !extack->_msg) - NL_SET_ERR_MSG_MOD(extack, - "CPU port failed to join LAG"); + NL_SET_ERR_MSG_WEAK_MOD(extack, "CPU port failed to join LAG"); goto out_master_teardown; } diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 6014ac3aad34..26c458f50ac6 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -2692,9 +2692,7 @@ static int dsa_slave_changeupper(struct net_device *dev, if (!err) dsa_bridge_mtu_normalization(dp); if (err == -EOPNOTSUPP) { - if (extack && !extack->_msg) - NL_SET_ERR_MSG_MOD(extack, - "Offloading not supported"); + NL_SET_ERR_MSG_WEAK_MOD(extack, "Offloading not supported"); err = 0; } err = notifier_from_errno(err); diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 562b9d951598..95f1436bf6a2 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -325,8 +325,10 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, * authors to do not return -EOPNOTSUPP in packet offload mode. */ WARN_ON(err == -EOPNOTSUPP && is_packet_offload); - if (err != -EOPNOTSUPP || is_packet_offload) + if (err != -EOPNOTSUPP || is_packet_offload) { + NL_SET_ERR_MSG_WEAK(extack, "Device failed to offload this state"); return err; + } } return 0; @@ -388,6 +390,7 @@ int xfrm_dev_policy_add(struct net *net, struct xfrm_policy *xp, xdo->type = XFRM_DEV_OFFLOAD_UNSPECIFIED; xdo->dir = 0; netdev_put(dev, &xdo->dev_tracker); + NL_SET_ERR_MSG_WEAK(extack, "Device failed to offload this policy"); return err; } -- cgit v1.2.3 From 52cf89f78c01bf39973f3e70d366921d70faff7a Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Tue, 31 Jan 2023 16:05:11 -0300 Subject: net/sched: transition act_pedit to rcu and percpu stats The software pedit action didn't get the same love as some of the other actions and it's still using spinlocks and shared stats in the datapath. Transition the action to rcu and percpu stats as this improves the action's performance dramatically on multiple cpu deployments. Reviewed-by: Jamal Hadi Salim Signed-off-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: Paolo Abeni --- include/net/tc_act/tc_pedit.h | 81 ++++++++++++++++++----- net/sched/act_pedit.c | 148 +++++++++++++++++++++++++----------------- 2 files changed, 153 insertions(+), 76 deletions(-) (limited to 'net') diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h index 3e02709a1df6..83fe39931781 100644 --- a/include/net/tc_act/tc_pedit.h +++ b/include/net/tc_act/tc_pedit.h @@ -4,22 +4,29 @@ #include #include +#include struct tcf_pedit_key_ex { enum pedit_header_type htype; enum pedit_cmd cmd; }; -struct tcf_pedit { - struct tc_action common; - unsigned char tcfp_nkeys; - unsigned char tcfp_flags; - u32 tcfp_off_max_hint; +struct tcf_pedit_parms { struct tc_pedit_key *tcfp_keys; struct tcf_pedit_key_ex *tcfp_keys_ex; + u32 tcfp_off_max_hint; + unsigned char tcfp_nkeys; + unsigned char tcfp_flags; + struct rcu_head rcu; +}; + +struct tcf_pedit { + struct tc_action common; + struct tcf_pedit_parms __rcu *parms; }; #define to_pedit(a) ((struct tcf_pedit *)a) +#define to_pedit_parms(a) (rcu_dereference(to_pedit(a)->parms)) static inline bool is_tcf_pedit(const struct tc_action *a) { @@ -32,37 +39,81 @@ static inline bool is_tcf_pedit(const struct tc_action *a) static inline int tcf_pedit_nkeys(const struct tc_action *a) { - return to_pedit(a)->tcfp_nkeys; + struct tcf_pedit_parms *parms; + int nkeys; + + rcu_read_lock(); + parms = to_pedit_parms(a); + nkeys = parms->tcfp_nkeys; + rcu_read_unlock(); + + return nkeys; } static inline u32 tcf_pedit_htype(const struct tc_action *a, int index) { - if (to_pedit(a)->tcfp_keys_ex) - return to_pedit(a)->tcfp_keys_ex[index].htype; + u32 htype = TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; + struct tcf_pedit_parms *parms; + + rcu_read_lock(); + parms = to_pedit_parms(a); + if (parms->tcfp_keys_ex) + htype = parms->tcfp_keys_ex[index].htype; + rcu_read_unlock(); - return TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; + return htype; } static inline u32 tcf_pedit_cmd(const struct tc_action *a, int index) { - if (to_pedit(a)->tcfp_keys_ex) - return to_pedit(a)->tcfp_keys_ex[index].cmd; + struct tcf_pedit_parms *parms; + u32 cmd = __PEDIT_CMD_MAX; - return __PEDIT_CMD_MAX; + rcu_read_lock(); + parms = to_pedit_parms(a); + if (parms->tcfp_keys_ex) + cmd = parms->tcfp_keys_ex[index].cmd; + rcu_read_unlock(); + + return cmd; } static inline u32 tcf_pedit_mask(const struct tc_action *a, int index) { - return to_pedit(a)->tcfp_keys[index].mask; + struct tcf_pedit_parms *parms; + u32 mask; + + rcu_read_lock(); + parms = to_pedit_parms(a); + mask = parms->tcfp_keys[index].mask; + rcu_read_unlock(); + + return mask; } static inline u32 tcf_pedit_val(const struct tc_action *a, int index) { - return to_pedit(a)->tcfp_keys[index].val; + struct tcf_pedit_parms *parms; + u32 val; + + rcu_read_lock(); + parms = to_pedit_parms(a); + val = parms->tcfp_keys[index].val; + rcu_read_unlock(); + + return val; } static inline u32 tcf_pedit_offset(const struct tc_action *a, int index) { - return to_pedit(a)->tcfp_keys[index].off; + struct tcf_pedit_parms *parms; + u32 off; + + rcu_read_lock(); + parms = to_pedit_parms(a); + off = parms->tcfp_keys[index].off; + rcu_read_unlock(); + + return off; } #endif /* __NET_TC_PED_H */ diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index a0378e9f0121..991541094278 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -134,6 +134,17 @@ nla_failure: return -EINVAL; } +static void tcf_pedit_cleanup_rcu(struct rcu_head *head) +{ + struct tcf_pedit_parms *parms = + container_of(head, struct tcf_pedit_parms, rcu); + + kfree(parms->tcfp_keys_ex); + kfree(parms->tcfp_keys); + + kfree(parms); +} + static int tcf_pedit_init(struct net *net, struct nlattr *nla, struct nlattr *est, struct tc_action **a, struct tcf_proto *tp, u32 flags, @@ -141,10 +152,9 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, { struct tc_action_net *tn = net_generic(net, act_pedit_ops.net_id); bool bind = flags & TCA_ACT_FLAGS_BIND; - struct nlattr *tb[TCA_PEDIT_MAX + 1]; struct tcf_chain *goto_ch = NULL; - struct tc_pedit_key *keys = NULL; - struct tcf_pedit_key_ex *keys_ex; + struct tcf_pedit_parms *oparms, *nparms; + struct nlattr *tb[TCA_PEDIT_MAX + 1]; struct tc_pedit *parm; struct nlattr *pattr; struct tcf_pedit *p; @@ -181,18 +191,25 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, return -EINVAL; } - keys_ex = tcf_pedit_keys_ex_parse(tb[TCA_PEDIT_KEYS_EX], parm->nkeys); - if (IS_ERR(keys_ex)) - return PTR_ERR(keys_ex); + nparms = kzalloc(sizeof(*nparms), GFP_KERNEL); + if (!nparms) + return -ENOMEM; + + nparms->tcfp_keys_ex = + tcf_pedit_keys_ex_parse(tb[TCA_PEDIT_KEYS_EX], parm->nkeys); + if (IS_ERR(nparms->tcfp_keys_ex)) { + ret = PTR_ERR(nparms->tcfp_keys_ex); + goto out_free; + } index = parm->index; err = tcf_idr_check_alloc(tn, &index, a, bind); if (!err) { - ret = tcf_idr_create(tn, index, est, a, - &act_pedit_ops, bind, false, flags); + ret = tcf_idr_create_from_flags(tn, index, est, a, + &act_pedit_ops, bind, flags); if (ret) { tcf_idr_cleanup(tn, index); - goto out_free; + goto out_free_ex; } ret = ACT_P_CREATED; } else if (err > 0) { @@ -204,7 +221,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, } } else { ret = err; - goto out_free; + goto out_free_ex; } err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); @@ -212,48 +229,50 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, ret = err; goto out_release; } - p = to_pedit(*a); - spin_lock_bh(&p->tcf_lock); - if (ret == ACT_P_CREATED || - (p->tcfp_nkeys && p->tcfp_nkeys != parm->nkeys)) { - keys = kmalloc(ksize, GFP_ATOMIC); - if (!keys) { - spin_unlock_bh(&p->tcf_lock); - ret = -ENOMEM; - goto put_chain; - } - kfree(p->tcfp_keys); - p->tcfp_keys = keys; - p->tcfp_nkeys = parm->nkeys; + nparms->tcfp_off_max_hint = 0; + nparms->tcfp_flags = parm->flags; + nparms->tcfp_nkeys = parm->nkeys; + + nparms->tcfp_keys = kmalloc(ksize, GFP_KERNEL); + if (!nparms->tcfp_keys) { + ret = -ENOMEM; + goto put_chain; } - memcpy(p->tcfp_keys, parm->keys, ksize); - p->tcfp_off_max_hint = 0; - for (i = 0; i < p->tcfp_nkeys; ++i) { - u32 cur = p->tcfp_keys[i].off; + + memcpy(nparms->tcfp_keys, parm->keys, ksize); + + for (i = 0; i < nparms->tcfp_nkeys; ++i) { + u32 cur = nparms->tcfp_keys[i].off; /* sanitize the shift value for any later use */ - p->tcfp_keys[i].shift = min_t(size_t, BITS_PER_TYPE(int) - 1, - p->tcfp_keys[i].shift); + nparms->tcfp_keys[i].shift = min_t(size_t, + BITS_PER_TYPE(int) - 1, + nparms->tcfp_keys[i].shift); /* The AT option can read a single byte, we can bound the actual * value with uchar max. */ - cur += (0xff & p->tcfp_keys[i].offmask) >> p->tcfp_keys[i].shift; + cur += (0xff & nparms->tcfp_keys[i].offmask) >> nparms->tcfp_keys[i].shift; /* Each key touches 4 bytes starting from the computed offset */ - p->tcfp_off_max_hint = max(p->tcfp_off_max_hint, cur + 4); + nparms->tcfp_off_max_hint = + max(nparms->tcfp_off_max_hint, cur + 4); } - p->tcfp_flags = parm->flags; + p = to_pedit(*a); + + spin_lock_bh(&p->tcf_lock); goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); + oparms = rcu_replace_pointer(p->parms, nparms, 1); + spin_unlock_bh(&p->tcf_lock); - kfree(p->tcfp_keys_ex); - p->tcfp_keys_ex = keys_ex; + if (oparms) + call_rcu(&oparms->rcu, tcf_pedit_cleanup_rcu); - spin_unlock_bh(&p->tcf_lock); if (goto_ch) tcf_chain_put_by_act(goto_ch); + return ret; put_chain: @@ -261,19 +280,22 @@ put_chain: tcf_chain_put_by_act(goto_ch); out_release: tcf_idr_release(*a, bind); +out_free_ex: + kfree(nparms->tcfp_keys_ex); out_free: - kfree(keys_ex); + kfree(nparms); return ret; - } static void tcf_pedit_cleanup(struct tc_action *a) { struct tcf_pedit *p = to_pedit(a); - struct tc_pedit_key *keys = p->tcfp_keys; + struct tcf_pedit_parms *parms; - kfree(keys); - kfree(p->tcfp_keys_ex); + parms = rcu_dereference_protected(p->parms, 1); + + if (parms) + call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu); } static bool offset_valid(struct sk_buff *skb, int offset) @@ -325,28 +347,30 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, struct tcf_result *res) { struct tcf_pedit *p = to_pedit(a); + struct tcf_pedit_parms *parms; u32 max_offset; int i; - spin_lock(&p->tcf_lock); + parms = rcu_dereference_bh(p->parms); max_offset = (skb_transport_header_was_set(skb) ? skb_transport_offset(skb) : skb_network_offset(skb)) + - p->tcfp_off_max_hint; + parms->tcfp_off_max_hint; if (skb_ensure_writable(skb, min(skb->len, max_offset))) - goto unlock; + goto done; tcf_lastuse_update(&p->tcf_tm); + tcf_action_update_bstats(&p->common, skb); - if (p->tcfp_nkeys > 0) { - struct tc_pedit_key *tkey = p->tcfp_keys; - struct tcf_pedit_key_ex *tkey_ex = p->tcfp_keys_ex; + if (parms->tcfp_nkeys > 0) { + struct tc_pedit_key *tkey = parms->tcfp_keys; + struct tcf_pedit_key_ex *tkey_ex = parms->tcfp_keys_ex; enum pedit_header_type htype = TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; enum pedit_cmd cmd = TCA_PEDIT_KEY_EX_CMD_SET; - for (i = p->tcfp_nkeys; i > 0; i--, tkey++) { + for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { u32 *ptr, hdata; int offset = tkey->off; int hoffset; @@ -422,11 +446,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, } bad: + spin_lock(&p->tcf_lock); p->tcf_qstats.overlimits++; -done: - bstats_update(&p->tcf_bstats, skb); -unlock: spin_unlock(&p->tcf_lock); +done: return p->tcf_action; } @@ -445,30 +468,33 @@ static int tcf_pedit_dump(struct sk_buff *skb, struct tc_action *a, { unsigned char *b = skb_tail_pointer(skb); struct tcf_pedit *p = to_pedit(a); + struct tcf_pedit_parms *parms; struct tc_pedit *opt; struct tcf_t t; int s; - s = struct_size(opt, keys, p->tcfp_nkeys); + spin_lock_bh(&p->tcf_lock); + parms = rcu_dereference_protected(p->parms, 1); + s = struct_size(opt, keys, parms->tcfp_nkeys); - /* netlink spinlocks held above us - must use ATOMIC */ opt = kzalloc(s, GFP_ATOMIC); - if (unlikely(!opt)) + if (unlikely(!opt)) { + spin_unlock_bh(&p->tcf_lock); return -ENOBUFS; + } - spin_lock_bh(&p->tcf_lock); - memcpy(opt->keys, p->tcfp_keys, flex_array_size(opt, keys, p->tcfp_nkeys)); + memcpy(opt->keys, parms->tcfp_keys, + flex_array_size(opt, keys, parms->tcfp_nkeys)); opt->index = p->tcf_index; - opt->nkeys = p->tcfp_nkeys; - opt->flags = p->tcfp_flags; + opt->nkeys = parms->tcfp_nkeys; + opt->flags = parms->tcfp_flags; opt->action = p->tcf_action; opt->refcnt = refcount_read(&p->tcf_refcnt) - ref; opt->bindcnt = atomic_read(&p->tcf_bindcnt) - bind; - if (p->tcfp_keys_ex) { - if (tcf_pedit_key_ex_dump(skb, - p->tcfp_keys_ex, - p->tcfp_nkeys)) + if (parms->tcfp_keys_ex) { + if (tcf_pedit_key_ex_dump(skb, parms->tcfp_keys_ex, + parms->tcfp_nkeys)) goto nla_put_failure; if (nla_put(skb, TCA_PEDIT_PARMS_EX, s, opt)) -- cgit v1.2.3 From 95b069382351826c0ae37938070aa82dbeaf288d Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Tue, 31 Jan 2023 16:05:12 -0300 Subject: net/sched: simplify tcf_pedit_act Remove the check for a negative number of keys as this cannot ever happen Reviewed-by: Jamal Hadi Salim Reviewed-by: Simon Horman Signed-off-by: Pedro Tammela Signed-off-by: Paolo Abeni --- net/sched/act_pedit.c | 137 ++++++++++++++++++++++++-------------------------- 1 file changed, 67 insertions(+), 70 deletions(-) (limited to 'net') diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 991541094278..c42fcc47dd6d 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -346,8 +346,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { + enum pedit_header_type htype = TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; + enum pedit_cmd cmd = TCA_PEDIT_KEY_EX_CMD_SET; struct tcf_pedit *p = to_pedit(a); + struct tcf_pedit_key_ex *tkey_ex; struct tcf_pedit_parms *parms; + struct tc_pedit_key *tkey; u32 max_offset; int i; @@ -363,88 +367,81 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, tcf_lastuse_update(&p->tcf_tm); tcf_action_update_bstats(&p->common, skb); - if (parms->tcfp_nkeys > 0) { - struct tc_pedit_key *tkey = parms->tcfp_keys; - struct tcf_pedit_key_ex *tkey_ex = parms->tcfp_keys_ex; - enum pedit_header_type htype = - TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK; - enum pedit_cmd cmd = TCA_PEDIT_KEY_EX_CMD_SET; - - for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { - u32 *ptr, hdata; - int offset = tkey->off; - int hoffset; - u32 val; - int rc; - - if (tkey_ex) { - htype = tkey_ex->htype; - cmd = tkey_ex->cmd; - - tkey_ex++; - } + tkey = parms->tcfp_keys; + tkey_ex = parms->tcfp_keys_ex; - rc = pedit_skb_hdr_offset(skb, htype, &hoffset); - if (rc) { - pr_info("tc action pedit bad header type specified (0x%x)\n", - htype); - goto bad; - } + for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { + int offset = tkey->off; + u32 *ptr, hdata; + int hoffset; + u32 val; + int rc; - if (tkey->offmask) { - u8 *d, _d; - - if (!offset_valid(skb, hoffset + tkey->at)) { - pr_info("tc action pedit 'at' offset %d out of bounds\n", - hoffset + tkey->at); - goto bad; - } - d = skb_header_pointer(skb, hoffset + tkey->at, - sizeof(_d), &_d); - if (!d) - goto bad; - offset += (*d & tkey->offmask) >> tkey->shift; - } + if (tkey_ex) { + htype = tkey_ex->htype; + cmd = tkey_ex->cmd; - if (offset % 4) { - pr_info("tc action pedit offset must be on 32 bit boundaries\n"); - goto bad; - } + tkey_ex++; + } - if (!offset_valid(skb, hoffset + offset)) { - pr_info("tc action pedit offset %d out of bounds\n", - hoffset + offset); - goto bad; - } + rc = pedit_skb_hdr_offset(skb, htype, &hoffset); + if (rc) { + pr_info("tc action pedit bad header type specified (0x%x)\n", + htype); + goto bad; + } - ptr = skb_header_pointer(skb, hoffset + offset, - sizeof(hdata), &hdata); - if (!ptr) - goto bad; - /* just do it, baby */ - switch (cmd) { - case TCA_PEDIT_KEY_EX_CMD_SET: - val = tkey->val; - break; - case TCA_PEDIT_KEY_EX_CMD_ADD: - val = (*ptr + tkey->val) & ~tkey->mask; - break; - default: - pr_info("tc action pedit bad command (%d)\n", - cmd); + if (tkey->offmask) { + u8 *d, _d; + + if (!offset_valid(skb, hoffset + tkey->at)) { + pr_info("tc action pedit 'at' offset %d out of bounds\n", + hoffset + tkey->at); goto bad; } + d = skb_header_pointer(skb, hoffset + tkey->at, + sizeof(_d), &_d); + if (!d) + goto bad; + offset += (*d & tkey->offmask) >> tkey->shift; + } - *ptr = ((*ptr & tkey->mask) ^ val); - if (ptr == &hdata) - skb_store_bits(skb, hoffset + offset, ptr, 4); + if (offset % 4) { + pr_info("tc action pedit offset must be on 32 bit boundaries\n"); + goto bad; } - goto done; - } else { - WARN(1, "pedit BUG: index %d\n", p->tcf_index); + if (!offset_valid(skb, hoffset + offset)) { + pr_info("tc action pedit offset %d out of bounds\n", + hoffset + offset); + goto bad; + } + + ptr = skb_header_pointer(skb, hoffset + offset, + sizeof(hdata), &hdata); + if (!ptr) + goto bad; + /* just do it, baby */ + switch (cmd) { + case TCA_PEDIT_KEY_EX_CMD_SET: + val = tkey->val; + break; + case TCA_PEDIT_KEY_EX_CMD_ADD: + val = (*ptr + tkey->val) & ~tkey->mask; + break; + default: + pr_info("tc action pedit bad command (%d)\n", + cmd); + goto bad; + } + + *ptr = ((*ptr & tkey->mask) ^ val); + if (ptr == &hdata) + skb_store_bits(skb, hoffset + offset, ptr, 4); } + goto done; + bad: spin_lock(&p->tcf_lock); p->tcf_qstats.overlimits++; -- cgit v1.2.3 From e4d0fe71f59dc5137a2793ff7560730d80d1e1f4 Mon Sep 17 00:00:00 2001 From: Julian Anastasov Date: Wed, 1 Feb 2023 19:56:53 +0200 Subject: ipvs: avoid kfree_rcu without 2nd arg Avoid possible synchronize_rcu() as part from the kfree_rcu() call when 2nd arg is not provided. Signed-off-by: Julian Anastasov Signed-off-by: Pablo Neira Ayuso --- include/net/ip_vs.h | 1 + net/netfilter/ipvs/ip_vs_est.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index c6c61100d244..6d71a5ff52df 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -461,6 +461,7 @@ void ip_vs_stats_free(struct ip_vs_stats *stats); /* Multiple chains processed in same tick */ struct ip_vs_est_tick_data { + struct rcu_head rcu_head; struct hlist_head chains[IPVS_EST_TICK_CHAINS]; DECLARE_BITMAP(present, IPVS_EST_TICK_CHAINS); DECLARE_BITMAP(full, IPVS_EST_TICK_CHAINS); diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c index ce2a1549b304..c5970ba416ae 100644 --- a/net/netfilter/ipvs/ip_vs_est.c +++ b/net/netfilter/ipvs/ip_vs_est.c @@ -549,7 +549,7 @@ void ip_vs_stop_estimator(struct netns_ipvs *ipvs, struct ip_vs_stats *stats) __set_bit(row, kd->avail); if (!kd->tick_len[row]) { RCU_INIT_POINTER(kd->ticks[row], NULL); - kfree_rcu(td); + kfree_rcu(td, rcu_head); } kd->est_count--; if (kd->est_count) { -- cgit v1.2.3 From b18ea3d9d214dfb23b0b6bd2acc121cb0d0fa2c5 Mon Sep 17 00:00:00 2001 From: Bo Liu Date: Wed, 1 Feb 2023 03:14:38 -0500 Subject: net: dsa: Use sysfs_emit() to instead of sprintf() Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu Link: https://lore.kernel.org/r/20230201081438.3151-1-liubo03@inspur.com Signed-off-by: Paolo Abeni --- net/dsa/master.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/dsa/master.c b/net/dsa/master.c index 1507b8cdb360..22d3f16b0e6d 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -299,7 +299,7 @@ static ssize_t tagging_show(struct device *d, struct device_attribute *attr, struct net_device *dev = to_net_dev(d); struct dsa_port *cpu_dp = dev->dsa_ptr; - return sprintf(buf, "%s\n", + return sysfs_emit(buf, "%s\n", dsa_tag_protocol_to_str(cpu_dp->tag_ops)); } -- cgit v1.2.3 From 2a30b2bd01c23a7eeace3a3f82c2817227099805 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Wed, 25 Jan 2023 06:54:07 +0100 Subject: can: gw: give feedback on missing CGW_FLAGS_CAN_IIF_TX_OK flag To send CAN traffic back to the incoming interface a special flag has to be set. When creating a routing job for identical interfaces without this flag the rule is created but has no effect. This patch adds an error return value in the case that the CAN interfaces are identical but the CGW_FLAGS_CAN_IIF_TX_OK flag was not set. Reported-by: Jannik Hartung Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20230125055407.2053-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- net/can/gw.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'net') diff --git a/net/can/gw.c b/net/can/gw.c index 23a3d89cad81..37528826935e 100644 --- a/net/can/gw.c +++ b/net/can/gw.c @@ -1139,6 +1139,13 @@ static int cgw_create_job(struct sk_buff *skb, struct nlmsghdr *nlh, if (gwj->dst.dev->type != ARPHRD_CAN) goto out; + /* is sending the skb back to the incoming interface intended? */ + if (gwj->src.dev == gwj->dst.dev && + !(gwj->flags & CGW_FLAGS_CAN_IIF_TX_OK)) { + err = -EINVAL; + goto out; + } + ASSERT_RTNL(); err = cgw_register_filter(net, gwj); -- cgit v1.2.3 From c6adf659a8ba85913e16a571d5a9bcd17d3d1234 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Wed, 4 Jan 2023 21:18:44 +0100 Subject: can: isotp: check CAN address family in isotp_bind() Add missing check to block non-AF_CAN binds. Syzbot created some code which matched the right sockaddr struct size but used AF_XDP (0x2C) instead of AF_CAN (0x1D) in the address family field: bind$xdp(r2, &(0x7f0000000540)={0x2c, 0x0, r4, 0x0, r2}, 0x10) ^^^^ This has no funtional impact but the userspace should be notified about the wrong address family field content. Link: https://syzkaller.appspot.com/text?tag=CrashLog&x=11ff9d8c480000 Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20230104201844.13168-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- net/can/isotp.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'net') diff --git a/net/can/isotp.c b/net/can/isotp.c index 608f8c24ae46..a18450ffae01 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -1225,6 +1225,9 @@ static int isotp_bind(struct socket *sock, struct sockaddr *uaddr, int len) if (len < ISOTP_MIN_NAMELEN) return -EINVAL; + if (addr->can_family != AF_CAN) + return -EINVAL; + /* sanitize tx CAN identifier */ if (tx_id & CAN_EFF_FLAG) tx_id &= (CAN_EFF_FLAG | CAN_EFF_MASK); -- cgit v1.2.3 From d3d854fd6a1d97157f790604e07f6386e8df8fe4 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 1 Feb 2023 11:24:17 +0100 Subject: netdev-genl: create a simple family for netdev stuff Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it. It allows us to reap all the benefits of Netlink specs, one can use the generic client to issue the commands: $ ./cli.py --spec netdev.yaml --dump dev_get [{'ifindex': 1, 'xdp-features': set()}, {'ifindex': 2, 'xdp-features': {'basic', 'ndo-xmit', 'redirect'}}, {'ifindex': 3, 'xdp-features': {'rx-sg'}}] the generic python library does not have flags-by-name support, yet, but we also don't have to carry strings in the messages, as user space can get the names from the spec. Acked-by: Jesper Dangaard Brouer Co-developed-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Co-developed-by: Kumar Kartikeya Dwivedi Signed-off-by: Kumar Kartikeya Dwivedi Co-developed-by: Marek Majtyka Signed-off-by: Marek Majtyka Signed-off-by: Jakub Kicinski Link: https://lore.kernel.org/r/327ad9c9868becbe1e601b580c962549c8cd81f2.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- Documentation/netlink/specs/netdev.yaml | 100 ++++++++++++++++++ include/linux/netdevice.h | 3 + include/net/xdp.h | 3 + include/uapi/linux/netdev.h | 59 +++++++++++ net/core/Makefile | 3 +- net/core/dev.c | 1 + net/core/netdev-genl-gen.c | 48 +++++++++ net/core/netdev-genl-gen.h | 23 ++++ net/core/netdev-genl.c | 179 ++++++++++++++++++++++++++++++++ tools/include/uapi/linux/netdev.h | 59 +++++++++++ 10 files changed, 477 insertions(+), 1 deletion(-) create mode 100644 Documentation/netlink/specs/netdev.yaml create mode 100644 include/uapi/linux/netdev.h create mode 100644 net/core/netdev-genl-gen.c create mode 100644 net/core/netdev-genl-gen.h create mode 100644 net/core/netdev-genl.c create mode 100644 tools/include/uapi/linux/netdev.h (limited to 'net') diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml new file mode 100644 index 000000000000..b4dcdae54ffd --- /dev/null +++ b/Documentation/netlink/specs/netdev.yaml @@ -0,0 +1,100 @@ +name: netdev + +doc: + netdev configuration over generic netlink. + +definitions: + - + type: flags + name: xdp-act + entries: + - + name: basic + doc: + XDP feautues set supported by all drivers + (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) + - + name: redirect + doc: + The netdev supports XDP_REDIRECT + - + name: ndo-xmit + doc: + This feature informs if netdev implements ndo_xdp_xmit callback. + - + name: xsk-zerocopy + doc: + This feature informs if netdev supports AF_XDP in zero copy mode. + - + name: hw-offload + doc: + This feature informs if netdev supports XDP hw oflloading. + - + name: rx-sg + doc: + This feature informs if netdev implements non-linear XDP buffer + support in the driver napi callback. + - + name: ndo-xmit-sg + doc: + This feature informs if netdev implements non-linear XDP buffer + support in ndo_xdp_xmit callback. + +attribute-sets: + - + name: dev + attributes: + - + name: ifindex + doc: netdev ifindex + type: u32 + value: 1 + checks: + min: 1 + - + name: pad + type: pad + - + name: xdp-features + doc: Bitmask of enabled xdp-features. + type: u64 + enum: xdp-act + enum-as-flags: true + +operations: + list: + - + name: dev-get + doc: Get / dump information about a netdev. + value: 1 + attribute-set: dev + do: + request: + attributes: + - ifindex + reply: &dev-all + attributes: + - ifindex + - xdp-features + dump: + reply: *dev-all + - + name: dev-add-ntf + doc: Notification about device appearing. + notify: dev-get + mcgrp: mgmt + - + name: dev-del-ntf + doc: Notification about device disappearing. + notify: dev-get + mcgrp: mgmt + - + name: dev-change-ntf + doc: Notification about device configuration being changed. + notify: dev-get + mcgrp: mgmt + +mcast-groups: + list: + - + name: mgmt diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2466afa25078..0f7967591288 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -47,6 +47,7 @@ #include #include #include +#include #include #include #include @@ -2055,6 +2056,7 @@ struct net_device { /* Read-mostly cache-line for fast-path access */ unsigned int flags; + xdp_features_t xdp_features; unsigned long long priv_flags; const struct net_device_ops *netdev_ops; const struct xdp_metadata_ops *xdp_metadata_ops; @@ -2839,6 +2841,7 @@ enum netdev_cmd { NETDEV_OFFLOAD_XSTATS_DISABLE, NETDEV_OFFLOAD_XSTATS_REPORT_USED, NETDEV_OFFLOAD_XSTATS_REPORT_DELTA, + NETDEV_XDP_FEAT_CHANGE, }; const char *netdev_cmd_to_name(enum netdev_cmd cmd); diff --git a/include/net/xdp.h b/include/net/xdp.h index 91292aa13bc0..8d1c86914f4c 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -7,6 +7,7 @@ #define __LINUX_NET_XDP_H__ #include /* skb_shared_info */ +#include /** * DOC: XDP RX-queue information @@ -43,6 +44,8 @@ enum xdp_mem_type { MEM_TYPE_MAX, }; +typedef u32 xdp_features_t; + /* XDP flags for ndo_xdp_xmit */ #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h new file mode 100644 index 000000000000..9ee459872600 --- /dev/null +++ b/include/uapi/linux/netdev.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN uapi header */ + +#ifndef _UAPI_LINUX_NETDEV_H +#define _UAPI_LINUX_NETDEV_H + +#define NETDEV_FAMILY_NAME "netdev" +#define NETDEV_FAMILY_VERSION 1 + +/** + * enum netdev_xdp_act + * @NETDEV_XDP_ACT_BASIC: XDP feautues set supported by all drivers + * (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) + * @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT + * @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements + * ndo_xdp_xmit callback. + * @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP + * in zero copy mode. + * @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw + * oflloading. + * @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear + * XDP buffer support in the driver napi callback. + * @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements + * non-linear XDP buffer support in ndo_xdp_xmit callback. + */ +enum netdev_xdp_act { + NETDEV_XDP_ACT_BASIC = 1, + NETDEV_XDP_ACT_REDIRECT = 2, + NETDEV_XDP_ACT_NDO_XMIT = 4, + NETDEV_XDP_ACT_XSK_ZEROCOPY = 8, + NETDEV_XDP_ACT_HW_OFFLOAD = 16, + NETDEV_XDP_ACT_RX_SG = 32, + NETDEV_XDP_ACT_NDO_XMIT_SG = 64, +}; + +enum { + NETDEV_A_DEV_IFINDEX = 1, + NETDEV_A_DEV_PAD, + NETDEV_A_DEV_XDP_FEATURES, + + __NETDEV_A_DEV_MAX, + NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) +}; + +enum { + NETDEV_CMD_DEV_GET = 1, + NETDEV_CMD_DEV_ADD_NTF, + NETDEV_CMD_DEV_DEL_NTF, + NETDEV_CMD_DEV_CHANGE_NTF, + + __NETDEV_CMD_MAX, + NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1) +}; + +#define NETDEV_MCGRP_MGMT "mgmt" + +#endif /* _UAPI_LINUX_NETDEV_H */ diff --git a/net/core/Makefile b/net/core/Makefile index 10edd66a8a37..8f367813bc68 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -12,7 +12,8 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o obj-y += dev.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \ - fib_notifier.o xdp.o flow_offload.o gro.o + fib_notifier.o xdp.o flow_offload.o gro.o \ + netdev-genl.o netdev-genl-gen.o obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o diff --git a/net/core/dev.c b/net/core/dev.c index f72f5c4ee7e2..9ac0eeb2c8cd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1614,6 +1614,7 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd) N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO) N(PRE_CHANGEADDR) N(OFFLOAD_XSTATS_ENABLE) N(OFFLOAD_XSTATS_DISABLE) N(OFFLOAD_XSTATS_REPORT_USED) N(OFFLOAD_XSTATS_REPORT_DELTA) + N(XDP_FEAT_CHANGE) } #undef N return "UNKNOWN_NETDEV_EVENT"; diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c new file mode 100644 index 000000000000..48812ec843f5 --- /dev/null +++ b/net/core/netdev-genl-gen.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: BSD-3-Clause +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN kernel source */ + +#include +#include + +#include "netdev-genl-gen.h" + +#include + +/* NETDEV_CMD_DEV_GET - do */ +static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1] = { + [NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1), +}; + +/* Ops table for netdev */ +static const struct genl_split_ops netdev_nl_ops[2] = { + { + .cmd = NETDEV_CMD_DEV_GET, + .doit = netdev_nl_dev_get_doit, + .policy = netdev_dev_get_nl_policy, + .maxattr = NETDEV_A_DEV_IFINDEX, + .flags = GENL_CMD_CAP_DO, + }, + { + .cmd = NETDEV_CMD_DEV_GET, + .dumpit = netdev_nl_dev_get_dumpit, + .flags = GENL_CMD_CAP_DUMP, + }, +}; + +static const struct genl_multicast_group netdev_nl_mcgrps[] = { + [NETDEV_NLGRP_MGMT] = { "mgmt", }, +}; + +struct genl_family netdev_nl_family __ro_after_init = { + .name = NETDEV_FAMILY_NAME, + .version = NETDEV_FAMILY_VERSION, + .netnsok = true, + .parallel_ops = true, + .module = THIS_MODULE, + .split_ops = netdev_nl_ops, + .n_split_ops = ARRAY_SIZE(netdev_nl_ops), + .mcgrps = netdev_nl_mcgrps, + .n_mcgrps = ARRAY_SIZE(netdev_nl_mcgrps), +}; diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h new file mode 100644 index 000000000000..b16dc7e026bb --- /dev/null +++ b/net/core/netdev-genl-gen.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN kernel header */ + +#ifndef _LINUX_NETDEV_GEN_H +#define _LINUX_NETDEV_GEN_H + +#include +#include + +#include + +int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info); +int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); + +enum { + NETDEV_NLGRP_MGMT, +}; + +extern struct genl_family netdev_nl_family; + +#endif /* _LINUX_NETDEV_GEN_H */ diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c new file mode 100644 index 000000000000..a4270fafdf11 --- /dev/null +++ b/net/core/netdev-genl.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include + +#include "netdev-genl-gen.h" + +static int +netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp, + u32 portid, u32 seq, int flags, u32 cmd) +{ + void *hdr; + + hdr = genlmsg_put(rsp, portid, seq, &netdev_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) || + nla_put_u64_64bit(rsp, NETDEV_A_DEV_XDP_FEATURES, + netdev->xdp_features, NETDEV_A_DEV_PAD)) { + genlmsg_cancel(rsp, hdr); + return -EINVAL; + } + + genlmsg_end(rsp, hdr); + + return 0; +} + +static void +netdev_genl_dev_notify(struct net_device *netdev, int cmd) +{ + struct sk_buff *ntf; + + if (!genl_has_listeners(&netdev_nl_family, dev_net(netdev), + NETDEV_NLGRP_MGMT)) + return; + + ntf = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!ntf) + return; + + if (netdev_nl_dev_fill(netdev, ntf, 0, 0, 0, cmd)) { + nlmsg_free(ntf); + return; + } + + genlmsg_multicast_netns(&netdev_nl_family, dev_net(netdev), ntf, + 0, NETDEV_NLGRP_MGMT, GFP_KERNEL); +} + +int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct net_device *netdev; + struct sk_buff *rsp; + u32 ifindex; + int err; + + if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX)) + return -EINVAL; + + ifindex = nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); + + rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!rsp) + return -ENOMEM; + + rtnl_lock(); + + netdev = __dev_get_by_index(genl_info_net(info), ifindex); + if (netdev) + err = netdev_nl_dev_fill(netdev, rsp, info->snd_portid, + info->snd_seq, 0, info->genlhdr->cmd); + else + err = -ENODEV; + + rtnl_unlock(); + + if (err) + goto err_free_msg; + + return genlmsg_reply(rsp, info); + +err_free_msg: + nlmsg_free(rsp); + return err; +} + +int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct net *net = sock_net(skb->sk); + struct net_device *netdev; + int idx = 0, s_idx; + int h, s_h; + int err; + + s_h = cb->args[0]; + s_idx = cb->args[1]; + + rtnl_lock(); + + for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { + struct hlist_head *head; + + idx = 0; + head = &net->dev_index_head[h]; + hlist_for_each_entry(netdev, head, index_hlist) { + if (idx < s_idx) + goto cont; + err = netdev_nl_dev_fill(netdev, skb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, 0, + NETDEV_CMD_DEV_GET); + if (err < 0) + break; +cont: + idx++; + } + } + + rtnl_unlock(); + + if (err != -EMSGSIZE) + return err; + + cb->args[1] = idx; + cb->args[0] = h; + cb->seq = net->dev_base_seq; + + return skb->len; +} + +static int netdev_genl_netdevice_event(struct notifier_block *nb, + unsigned long event, void *ptr) +{ + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); + + switch (event) { + case NETDEV_REGISTER: + netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_ADD_NTF); + break; + case NETDEV_UNREGISTER: + netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_DEL_NTF); + break; + case NETDEV_XDP_FEAT_CHANGE: + netdev_genl_dev_notify(netdev, NETDEV_CMD_DEV_CHANGE_NTF); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block netdev_genl_nb = { + .notifier_call = netdev_genl_netdevice_event, +}; + +static int __init netdev_genl_init(void) +{ + int err; + + err = register_netdevice_notifier(&netdev_genl_nb); + if (err) + return err; + + err = genl_register_family(&netdev_nl_family); + if (err) + goto err_unreg_ntf; + + return 0; + +err_unreg_ntf: + unregister_netdevice_notifier(&netdev_genl_nb); + return err; +} + +subsys_initcall(netdev_genl_init); diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h new file mode 100644 index 000000000000..9ee459872600 --- /dev/null +++ b/tools/include/uapi/linux/netdev.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* Do not edit directly, auto-generated from: */ +/* Documentation/netlink/specs/netdev.yaml */ +/* YNL-GEN uapi header */ + +#ifndef _UAPI_LINUX_NETDEV_H +#define _UAPI_LINUX_NETDEV_H + +#define NETDEV_FAMILY_NAME "netdev" +#define NETDEV_FAMILY_VERSION 1 + +/** + * enum netdev_xdp_act + * @NETDEV_XDP_ACT_BASIC: XDP feautues set supported by all drivers + * (XDP_ABORTED, XDP_DROP, XDP_PASS, XDP_TX) + * @NETDEV_XDP_ACT_REDIRECT: The netdev supports XDP_REDIRECT + * @NETDEV_XDP_ACT_NDO_XMIT: This feature informs if netdev implements + * ndo_xdp_xmit callback. + * @NETDEV_XDP_ACT_XSK_ZEROCOPY: This feature informs if netdev supports AF_XDP + * in zero copy mode. + * @NETDEV_XDP_ACT_HW_OFFLOAD: This feature informs if netdev supports XDP hw + * oflloading. + * @NETDEV_XDP_ACT_RX_SG: This feature informs if netdev implements non-linear + * XDP buffer support in the driver napi callback. + * @NETDEV_XDP_ACT_NDO_XMIT_SG: This feature informs if netdev implements + * non-linear XDP buffer support in ndo_xdp_xmit callback. + */ +enum netdev_xdp_act { + NETDEV_XDP_ACT_BASIC = 1, + NETDEV_XDP_ACT_REDIRECT = 2, + NETDEV_XDP_ACT_NDO_XMIT = 4, + NETDEV_XDP_ACT_XSK_ZEROCOPY = 8, + NETDEV_XDP_ACT_HW_OFFLOAD = 16, + NETDEV_XDP_ACT_RX_SG = 32, + NETDEV_XDP_ACT_NDO_XMIT_SG = 64, +}; + +enum { + NETDEV_A_DEV_IFINDEX = 1, + NETDEV_A_DEV_PAD, + NETDEV_A_DEV_XDP_FEATURES, + + __NETDEV_A_DEV_MAX, + NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) +}; + +enum { + NETDEV_CMD_DEV_GET = 1, + NETDEV_CMD_DEV_ADD_NTF, + NETDEV_CMD_DEV_DEL_NTF, + NETDEV_CMD_DEV_CHANGE_NTF, + + __NETDEV_CMD_MAX, + NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1) +}; + +#define NETDEV_MCGRP_MGMT "mgmt" + +#endif /* _UAPI_LINUX_NETDEV_H */ -- cgit v1.2.3 From 66c0e13ad236c74ea88c7c1518f3cef7f372e3da Mon Sep 17 00:00:00 2001 From: Marek Majtyka Date: Wed, 1 Feb 2023 11:24:18 +0100 Subject: drivers: net: turn on XDP features A summary of the flags being set for various drivers is given below. Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features that can be turned off and on at runtime. This means that these flags may be set and unset under RTNL lock protection by the driver. Hence, READ_ONCE must be used by code loading the flag value. Also, these flags are not used for synchronization against the availability of XDP resources on a device. It is merely a hint, and hence the read may race with the actual teardown of XDP resources on the device. This may change in the future, e.g. operations taking a reference on the XDP resources of the driver, and in turn inhibiting turning off this flag. However, for now, it can only be used as a hint to check whether device supports becoming a redirection target. Turn 'hw-offload' feature flag on for: - netronome (nfp) - netdevsim. Turn 'native' and 'zerocopy' features flags on for: - intel (i40e, ice, ixgbe, igc) - mellanox (mlx5). - stmmac - netronome (nfp) Turn 'native' features flags on for: - amazon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2, enetc) - funeth - intel (igb) - marvell (mvneta, mvpp2, octeontx2) - mellanox (mlx4) - mtk_eth_soc - qlogic (qede) - sfc - socionext (netsec) - ti (cpsw) - tap - tsnep - veth - xen - virtio_net. Turn 'basic' (tx, pass, aborted and drop) features flags on for: - netronome (nfp) - cavium (thunder) - hyperv. Turn 'redirect_target' feature flag on for: - amanzon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2) - intel (i40e, ice, igb, ixgbe) - ti (cpsw) - marvell (mvneta, mvpp2) - sfc - socionext (netsec) - qlogic (qede) - mellanox (mlx5) - tap - veth - virtio_net - xen Reviewed-by: Gerhard Engleder Reviewed-by: Simon Horman Acked-by: Stanislav Fomichev Acked-by: Jakub Kicinski Co-developed-by: Kumar Kartikeya Dwivedi Signed-off-by: Kumar Kartikeya Dwivedi Co-developed-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Marek Majtyka Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 4 ++++ drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 5 +++++ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++ drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 ++ drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 ++ drivers/net/ethernet/engleder/tsnep_main.c | 4 ++++ drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 4 ++++ drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 4 ++++ drivers/net/ethernet/freescale/enetc/enetc_pf.c | 3 +++ drivers/net/ethernet/fungible/funeth/funeth_main.c | 6 ++++++ drivers/net/ethernet/intel/i40e/i40e_main.c | 10 ++++++++-- drivers/net/ethernet/intel/ice/ice_main.c | 5 +++++ drivers/net/ethernet/intel/igb/igb_main.c | 9 ++++++++- drivers/net/ethernet/intel/igc/igc_main.c | 3 +++ drivers/net/ethernet/intel/igc/igc_xdp.c | 5 +++++ drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 ++++++ drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 + drivers/net/ethernet/marvell/mvneta.c | 3 +++ drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ++++ drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 8 ++++++-- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 6 ++++++ drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++++++++++ drivers/net/ethernet/microsoft/mana/mana_en.c | 2 ++ drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 +++++ drivers/net/ethernet/qlogic/qede/qede_main.c | 3 +++ drivers/net/ethernet/sfc/efx.c | 4 ++++ drivers/net/ethernet/sfc/siena/efx.c | 4 ++++ drivers/net/ethernet/socionext/netsec.c | 3 +++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++ drivers/net/ethernet/ti/cpsw.c | 4 ++++ drivers/net/ethernet/ti/cpsw_new.c | 4 ++++ drivers/net/hyperv/netvsc_drv.c | 2 ++ drivers/net/netdevsim/netdev.c | 1 + drivers/net/tun.c | 5 +++++ drivers/net/veth.c | 4 ++++ drivers/net/virtio_net.c | 4 ++++ drivers/net/xen-netfront.c | 2 ++ include/net/xdp.h | 12 ++++++++++++ net/core/xdp.c | 18 ++++++++++++++++++ 40 files changed, 184 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index e8ad5ea31aff..d3999db7c6a2 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -597,7 +597,9 @@ static int ena_xdp_set(struct net_device *netdev, struct netdev_bpf *bpf) if (rc) return rc; } + xdp_features_set_redirect_target(netdev, false); } else if (old_bpf_prog) { + xdp_features_clear_redirect_target(netdev); rc = ena_destroy_and_free_all_xdp_queues(adapter); if (rc) return rc; @@ -4103,6 +4105,8 @@ static void ena_set_conf_feat_params(struct ena_adapter *adapter, /* Set offload features */ ena_set_dev_offloads(feat, netdev); + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + adapter->max_mtu = feat->dev_attr.max_mtu; netdev->max_mtu = adapter->max_mtu; netdev->min_mtu = ENA_MIN_MTU; diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c index 06508eebb585..d6d6d5d37ff3 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c @@ -384,6 +384,11 @@ void aq_nic_ndev_init(struct aq_nic_s *self) self->ndev->mtu = aq_nic_cfg->mtu - ETH_HLEN; self->ndev->max_mtu = aq_hw_caps->mtu - ETH_FCS_LEN - ETH_HLEN; + self->ndev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; } void aq_nic_set_tx_ring(struct aq_nic_s *self, unsigned int idx, diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 240a7e8a7652..a1b4356dfb6c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -13686,6 +13686,9 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) netif_set_tso_max_size(dev, GSO_MAX_SIZE); + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_RX_SG; + #ifdef CONFIG_BNXT_SRIOV init_waitqueue_head(&bp->sriov_cfg_wait); #endif diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 36d5202c0aee..5843c93b1711 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -422,9 +422,11 @@ static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog) if (prog) { bnxt_set_rx_skb_mode(bp, true); + xdp_features_set_redirect_target(dev, true); } else { int rx, tx; + xdp_features_clear_redirect_target(dev); bnxt_set_rx_skb_mode(bp, false); bnxt_get_max_rings(bp, &rx, &tx, true); if (rx > 1) { diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index f2f95493ec89..8b25313c7f6b 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -2218,6 +2218,8 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) netdev->netdev_ops = &nicvf_netdev_ops; netdev->watchdog_timeo = NICVF_TX_TIMEOUT; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC; + /* MTU range: 64 - 9200 */ netdev->min_mtu = NIC_HW_MIN_FRS; netdev->max_mtu = NIC_HW_MAX_FRS; diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index c3cf427a9409..6982aaa928b5 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -1926,6 +1926,10 @@ static int tsnep_probe(struct platform_device *pdev) netdev->features = NETIF_F_SG; netdev->hw_features = netdev->features | NETIF_F_LOOPBACK; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_NDO_XMIT_SG; + /* carrier off reporting is important to ethtool even BEFORE open */ netif_carrier_off(netdev); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 027fff9f7db0..9318a2554056 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -244,6 +244,10 @@ static int dpaa_netdev_init(struct net_device *net_dev, net_dev->features |= net_dev->hw_features; net_dev->vlan_features = net_dev->features; + net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + if (is_valid_ether_addr(mac_addr)) { memcpy(net_dev->perm_addr, mac_addr, net_dev->addr_len); eth_hw_addr_set(net_dev, mac_addr); diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 2e79d18fc3c7..746ccfde7255 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -4596,6 +4596,10 @@ static int dpaa2_eth_netdev_init(struct net_device *net_dev) NETIF_F_LLTX | NETIF_F_HW_TC | NETIF_F_TSO; net_dev->gso_max_segs = DPAA2_ETH_ENQUEUE_MAX_FDS; net_dev->hw_features = net_dev->features; + net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY | + NETDEV_XDP_ACT_NDO_XMIT; if (priv->dpni_attrs.vlan_filter_entries) net_dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER; diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index 7facc7d5261e..6b54071d4ecc 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -807,6 +807,9 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev, ndev->hw_features |= NETIF_F_RXHASH; ndev->priv_flags |= IFF_UNICAST_FLT; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) { priv->active_offloads |= ENETC_F_QCI; diff --git a/drivers/net/ethernet/fungible/funeth/funeth_main.c b/drivers/net/ethernet/fungible/funeth/funeth_main.c index b4cce30e526a..df86770731ad 100644 --- a/drivers/net/ethernet/fungible/funeth/funeth_main.c +++ b/drivers/net/ethernet/fungible/funeth/funeth_main.c @@ -1160,6 +1160,11 @@ static int fun_xdp_setup(struct net_device *dev, struct netdev_bpf *xdp) WRITE_ONCE(rxqs[i]->xdp_prog, prog); } + if (prog) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); + dev->max_mtu = prog ? XDP_MAX_MTU : FUN_MAX_MTU; old_prog = xchg(&fp->xdp_prog, prog); if (old_prog) @@ -1765,6 +1770,7 @@ static int fun_create_netdev(struct fun_ethdev *ed, unsigned int portid) netdev->vlan_features = netdev->features & VLAN_FEAT; netdev->mpls_features = netdev->vlan_features; netdev->hw_enc_features = netdev->hw_features; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; netdev->min_mtu = ETH_MIN_MTU; netdev->max_mtu = FUN_MAX_MTU; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 53d0083e35da..8a79cc18c428 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -13339,9 +13339,11 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog, old_prog = xchg(&vsi->xdp_prog, prog); if (need_reset) { - if (!prog) + if (!prog) { + xdp_features_clear_redirect_target(vsi->netdev); /* Wait until ndo_xsk_wakeup completes. */ synchronize_rcu(); + } i40e_reset_and_rebuild(pf, true, true); } @@ -13362,11 +13364,13 @@ static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog, /* Kick start the NAPI context if there is an AF_XDP socket open * on that queue id. This so that receiving will start. */ - if (need_reset && prog) + if (need_reset && prog) { for (i = 0; i < vsi->num_queue_pairs; i++) if (vsi->xdp_rings[i]->xsk_pool) (void)i40e_xsk_wakeup(vsi->netdev, i, XDP_WAKEUP_RX); + xdp_features_set_redirect_target(vsi->netdev, true); + } return 0; } @@ -13783,6 +13787,8 @@ static int i40e_config_netdev(struct i40e_vsi *vsi) netdev->hw_enc_features |= NETIF_F_TSO_MANGLEID; netdev->features &= ~NETIF_F_HW_TC; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; if (vsi->type == I40E_VSI_MAIN) { SET_NETDEV_DEV(netdev, &pf->pdev->dev); diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 26a8910a41ff..074b0e6d0e2d 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -22,6 +22,7 @@ #include "ice_eswitch.h" #include "ice_tc_lib.h" #include "ice_vsi_vlan_ops.h" +#include #define DRV_SUMMARY "Intel(R) Ethernet Connection E800 Series Linux Driver" static const char ice_driver_string[] = DRV_SUMMARY; @@ -2912,11 +2913,13 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); } + xdp_features_set_redirect_target(vsi->netdev, false); /* reallocate Rx queues that are used for zero-copy */ xdp_ring_err = ice_realloc_zc_buf(vsi, true); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Rx resources failed"); } else if (ice_is_xdp_ena_vsi(vsi) && !prog) { + xdp_features_clear_redirect_target(vsi->netdev); xdp_ring_err = ice_destroy_xdp_rings(vsi); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Freeing XDP Tx resources failed"); @@ -3459,6 +3462,8 @@ static int ice_cfg_netdev(struct ice_vsi *vsi) np->vsi = vsi; ice_set_netdev_features(netdev); + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 3c0c35ecea10..0e11a082f7a1 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2871,8 +2871,14 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf) bpf_prog_put(old_prog); /* bpf is just replaced, RXQ and MTU are already setup */ - if (!need_reset) + if (!need_reset) { return 0; + } else { + if (prog) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); + } if (running) igb_open(dev); @@ -3317,6 +3323,7 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) netdev->priv_flags |= IFF_SUPP_NOFCS; netdev->priv_flags |= IFF_UNICAST_FLT; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; /* MTU range: 68 - 9216 */ netdev->min_mtu = ETH_MIN_MTU; diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index e86b15efaeb8..8b572cd2c350 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -6533,6 +6533,9 @@ static int igc_probe(struct pci_dev *pdev, netdev->mpls_features |= NETIF_F_HW_CSUM; netdev->hw_enc_features |= netdev->vlan_features; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; + /* MTU range: 68 - 9216 */ netdev->min_mtu = ETH_MIN_MTU; netdev->max_mtu = MAX_STD_JUMBO_FRAME_SIZE; diff --git a/drivers/net/ethernet/intel/igc/igc_xdp.c b/drivers/net/ethernet/intel/igc/igc_xdp.c index aeeb34e64610..e27af72aada8 100644 --- a/drivers/net/ethernet/intel/igc/igc_xdp.c +++ b/drivers/net/ethernet/intel/igc/igc_xdp.c @@ -29,6 +29,11 @@ int igc_xdp_set_prog(struct igc_adapter *adapter, struct bpf_prog *prog, if (old_prog) bpf_prog_put(old_prog); + if (prog) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); + if (if_running) igc_open(dev); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 43a44c1e1576..af4c12b6059f 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -10301,6 +10301,8 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog) if (err) return -EINVAL; + if (!prog) + xdp_features_clear_redirect_target(dev); } else { for (i = 0; i < adapter->num_rx_queues; i++) { WRITE_ONCE(adapter->rx_ring[i]->xdp_prog, @@ -10321,6 +10323,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog) if (adapter->xdp_ring[i]->xsk_pool) (void)ixgbe_xsk_wakeup(adapter->netdev, i, XDP_WAKEUP_RX); + xdp_features_set_redirect_target(dev, true); } return 0; @@ -11018,6 +11021,9 @@ skip_sriov: netdev->priv_flags |= IFF_UNICAST_FLT; netdev->priv_flags |= IFF_SUPP_NOFCS; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; + /* MTU range: 68 - 9710 */ netdev->min_mtu = ETH_MIN_MTU; netdev->max_mtu = IXGBE_MAX_JUMBO_FRAME_SIZE - (ETH_HLEN + ETH_FCS_LEN); diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index ea0a230c1153..a44e4bd56142 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -4634,6 +4634,7 @@ static int ixgbevf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) NETIF_F_HW_VLAN_CTAG_TX; netdev->priv_flags |= IFF_UNICAST_FLT; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC; /* MTU range: 68 - 1504 or 9710 */ netdev->min_mtu = ETH_MIN_MTU; diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index f8925cac61e4..dc2989103a77 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -5612,6 +5612,9 @@ static int mvneta_probe(struct platform_device *pdev) NETIF_F_TSO | NETIF_F_RXCSUM; dev->hw_features |= dev->features; dev->vlan_features |= dev->features; + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; netif_set_tso_max_segs(dev, MVNETA_MAX_TSO_SEGS); diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 4da45c5abba5..9b4ecbe4f36d 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -6866,6 +6866,10 @@ static int mvpp2_port_probe(struct platform_device *pdev, dev->vlan_features |= features; netif_set_tso_max_segs(dev, MVPP2_MAX_TSO_SEGS); + + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + dev->priv_flags |= IFF_UNICAST_FLT; /* MTU range: 68 - 9704 */ diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index c1ea60bc2630..179433d0a54a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -2512,10 +2512,13 @@ static int otx2_xdp_setup(struct otx2_nic *pf, struct bpf_prog *prog) /* Network stack and XDP shared same rx queues. * Use separate tx queues for XDP and network stack. */ - if (pf->xdp_prog) + if (pf->xdp_prog) { pf->hw.xdp_queues = pf->hw.rx_queues; - else + xdp_features_set_redirect_target(dev, false); + } else { pf->hw.xdp_queues = 0; + xdp_features_clear_redirect_target(dev); + } pf->hw.tot_tx_queues += pf->hw.xdp_queues; @@ -2878,6 +2881,7 @@ static int otx2_probe(struct pci_dev *pdev, const struct pci_device_id *id) netdev->watchdog_timeo = OTX2_TX_TIMEOUT; netdev->netdev_ops = &otx2_netdev_ops; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; netdev->min_mtu = OTX2_MIN_MTU; netdev->max_mtu = otx2_get_max_mtu(pf); diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 801deac58bf7..ac54b6f2bb5c 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -4447,6 +4447,12 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) register_netdevice_notifier(&mac->device_notifier); } + if (mtk_page_pool_enabled(eth)) + eth->netdev[id]->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_NDO_XMIT_SG; + return 0; free_netdev: diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index af4c4858f397..e11bc0ac880e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -3416,6 +3416,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, priv->rss_hash_fn = ETH_RSS_HASH_TOP; } + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + /* MTU range: 68 - hw-specific max */ dev->min_mtu = ETH_MIN_MTU; dev->max_mtu = priv->max_mtu; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 0e87432ec6f1..e4996ef04d86 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4780,6 +4780,13 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) if (old_prog) bpf_prog_put(old_prog); + if (reset) { + if (prog) + xdp_features_set_redirect_target(netdev, true); + else + xdp_features_clear_redirect_target(netdev); + } + if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset) goto unlock; @@ -5175,6 +5182,10 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->features |= NETIF_F_HIGHDMA; netdev->features |= NETIF_F_HW_VLAN_STAG_FILTER; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY | + NETDEV_XDP_ACT_RX_SG; + netdev->priv_flags |= IFF_UNICAST_FLT; netif_set_tso_max_size(netdev, GSO_MAX_SIZE); diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 2f6a048dee90..6120f2b6684f 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -2160,6 +2160,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx, ndev->hw_features |= NETIF_F_RXHASH; ndev->features = ndev->hw_features; ndev->vlan_features = 0; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; err = register_netdev(ndev); if (err) { diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c index 18fc9971f1c8..e4825d885560 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -2529,10 +2529,15 @@ static void nfp_net_netdev_init(struct nfp_net *nn) netdev->features &= ~NETIF_F_HW_VLAN_STAG_RX; nn->dp.ctrl &= ~NFP_NET_CFG_CTRL_RXQINQ; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC; + if (nn->app && nn->app->type->id == NFP_APP_BPF_NIC) + netdev->xdp_features |= NETDEV_XDP_ACT_HW_OFFLOAD; + /* Finalise the netdev setup */ switch (nn->dp.ops->version) { case NFP_NFD_VER_NFD3: netdev->netdev_ops = &nfp_nfd3_netdev_ops; + netdev->xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY; break; case NFP_NFD_VER_NFDK: netdev->netdev_ops = &nfp_nfdk_netdev_ops; diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 953f304b8588..b6d999927e86 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -892,6 +892,9 @@ static void qede_init_ndev(struct qede_dev *edev) ndev->hw_features = hw_features; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + /* MTU range: 46 - 9600 */ ndev->min_mtu = ETH_ZLEN - ETH_HLEN; ndev->max_mtu = QEDE_MAX_JUMBO_PACKET_SIZE; diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 0556542d7a6b..18ff8d8cff42 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -1078,6 +1078,10 @@ static int efx_pci_probe(struct pci_dev *pci_dev, pci_info(pci_dev, "Solarflare NIC detected\n"); + efx->net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + if (!efx->type->is_vf) efx_probe_vpd_strings(efx); diff --git a/drivers/net/ethernet/sfc/siena/efx.c b/drivers/net/ethernet/sfc/siena/efx.c index 60e5b7c8ccf9..a6ef21845224 100644 --- a/drivers/net/ethernet/sfc/siena/efx.c +++ b/drivers/net/ethernet/sfc/siena/efx.c @@ -1048,6 +1048,10 @@ static int efx_pci_probe(struct pci_dev *pci_dev, pci_info(pci_dev, "Solarflare NIC detected\n"); + efx->net_dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + if (!efx->type->is_vf) efx_probe_vpd_strings(efx); diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 9b46579b5a10..2d7347b71c41 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -2104,6 +2104,9 @@ static int netsec_probe(struct platform_device *pdev) NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM; ndev->hw_features = ndev->features; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + priv->rx_cksum_offload_flag = true; ret = netsec_register_mdio(priv, phy_addr); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b7e5af58ab75..734d84263fd2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7150,6 +7150,8 @@ int stmmac_dvr_probe(struct device *device, ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; ret = stmmac_tc_init(priv, priv); if (!ret) { diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 13c9c2d6b79b..37f0b62ec5d6 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1458,6 +1458,8 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv) priv_sl2->emac_port = 1; cpsw->slaves[1].ndev = ndev; ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; ndev->netdev_ops = &cpsw_netdev_ops; ndev->ethtool_ops = &cpsw_ethtool_ops; @@ -1635,6 +1637,8 @@ static int cpsw_probe(struct platform_device *pdev) cpsw->slaves[0].ndev = ndev; ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; ndev->netdev_ops = &cpsw_netdev_ops; ndev->ethtool_ops = &cpsw_ethtool_ops; diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c index 83596ec0c7cb..35128dd45ffc 100644 --- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -1405,6 +1405,10 @@ static int cpsw_create_ports(struct cpsw_common *cpsw) ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_NETNS_LOCAL | NETIF_F_HW_TC; + ndev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + ndev->netdev_ops = &cpsw_netdev_ops; ndev->ethtool_ops = &cpsw_ethtool_ops; SET_NETDEV_DEV(ndev, dev); diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index f9b219e6cd58..a9b139bbdb2c 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2559,6 +2559,8 @@ static int netvsc_probe(struct hv_device *dev, netdev_lockdep_set_classes(net); + net->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + /* MTU range: 68 - 1500 or 65521 */ net->min_mtu = NETVSC_MTU_MIN; if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2) diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 6db6a75ff9b9..35fa1ca98671 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -286,6 +286,7 @@ static void nsim_setup(struct net_device *dev) NETIF_F_TSO; dev->hw_features |= NETIF_F_HW_TC; dev->max_mtu = ETH_MAX_MTU; + dev->xdp_features = NETDEV_XDP_ACT_HW_OFFLOAD; } static int nsim_init_netdevsim(struct netdevsim *ns) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a7d17c680f4a..36620afde373 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1401,6 +1401,11 @@ static void tun_net_initialize(struct net_device *dev) eth_hw_addr_random(dev); + /* Currently tun does not support XDP, only tap does. */ + dev->xdp_features = NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; + break; } diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ba3e05832843..1bb54de7124d 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1686,6 +1686,10 @@ static void veth_setup(struct net_device *dev) dev->hw_enc_features = VETH_FEATURES; dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE; netif_set_tso_max_size(dev, GSO_MAX_SIZE); + + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG; } /* diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7e1a98430190..692dff071782 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3280,7 +3280,10 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (i == 0 && !old_prog) virtnet_clear_guest_offloads(vi); } + if (!old_prog) + xdp_features_set_redirect_target(dev, false); } else { + xdp_features_clear_redirect_target(dev); vi->xdp_enabled = false; } @@ -3910,6 +3913,7 @@ static int virtnet_probe(struct virtio_device *vdev) dev->hw_features |= NETIF_F_GRO_HW; dev->vlan_features = dev->features; + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; /* MTU range: 68 - 65535 */ dev->min_mtu = MIN_MTU; diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 12b074286df9..47d54d8ea59d 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1741,6 +1741,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev) * negotiate with the backend regarding supported features. */ netdev->features |= netdev->hw_features; + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT; netdev->ethtool_ops = &xennet_ethtool_ops; netdev->min_mtu = ETH_MIN_MTU; diff --git a/include/net/xdp.h b/include/net/xdp.h index 8d1c86914f4c..d517bfac937b 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -428,9 +428,21 @@ MAX_XDP_METADATA_KFUNC, #ifdef CONFIG_NET u32 bpf_xdp_metadata_kfunc_id(int id); bool bpf_dev_bound_kfunc_id(u32 btf_id); +void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg); +void xdp_features_clear_redirect_target(struct net_device *dev); #else static inline u32 bpf_xdp_metadata_kfunc_id(int id) { return 0; } static inline bool bpf_dev_bound_kfunc_id(u32 btf_id) { return false; } + +static inline void +xdp_features_set_redirect_target(struct net_device *dev, bool support_sg) +{ +} + +static inline void +xdp_features_clear_redirect_target(struct net_device *dev) +{ +} #endif #endif /* __LINUX_NET_XDP_H__ */ diff --git a/net/core/xdp.c b/net/core/xdp.c index 787fb9f92b36..26483935b7a4 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -774,3 +774,21 @@ static int __init xdp_metadata_init(void) return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &xdp_metadata_kfunc_set); } late_initcall(xdp_metadata_init); + +void xdp_features_set_redirect_target(struct net_device *dev, bool support_sg) +{ + dev->xdp_features |= NETDEV_XDP_ACT_NDO_XMIT; + if (support_sg) + dev->xdp_features |= NETDEV_XDP_ACT_NDO_XMIT_SG; + + call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev); +} +EXPORT_SYMBOL_GPL(xdp_features_set_redirect_target); + +void xdp_features_clear_redirect_target(struct net_device *dev) +{ + dev->xdp_features &= ~(NETDEV_XDP_ACT_NDO_XMIT | + NETDEV_XDP_ACT_NDO_XMIT_SG); + call_netdevice_notifiers(NETDEV_XDP_FEAT_CHANGE, dev); +} +EXPORT_SYMBOL_GPL(xdp_features_clear_redirect_target); -- cgit v1.2.3 From 0ae0cb2bb22eb8cf943fa07137068347e1b918c4 Mon Sep 17 00:00:00 2001 From: Marek Majtyka Date: Wed, 1 Feb 2023 11:24:19 +0100 Subject: xsk: add usage of XDP features flags Change necessary condition check for XSK from ndo functions to xdp features flags. Signed-off-by: Marek Majtyka Signed-off-by: Lorenzo Bianconi Link: https://lore.kernel.org/r/45a98ec67b4556a6a22dfd85df3eb8276beeeb74.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- net/xdp/xsk_buff_pool.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index ed6c71826d31..b2df1e0f8153 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -140,6 +140,10 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool) } } +#define NETDEV_XDP_ACT_ZC (NETDEV_XDP_ACT_BASIC | \ + NETDEV_XDP_ACT_REDIRECT | \ + NETDEV_XDP_ACT_XSK_ZEROCOPY) + int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev, u16 queue_id, u16 flags) { @@ -178,8 +182,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool, /* For copy-mode, we are done. */ return 0; - if (!netdev->netdev_ops->ndo_bpf || - !netdev->netdev_ops->ndo_xsk_wakeup) { + if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) { err = -EOPNOTSUPP; goto err_unreg_pool; } -- cgit v1.2.3 From b9d460c9245541b13de2369e79688f8e0acc0c3d Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 1 Feb 2023 11:24:22 +0100 Subject: bpf: devmap: check XDP features in __xdp_enqueue routine Check if the destination device implements ndo_xdp_xmit callback relying on NETDEV_XDP_ACT_NDO_XMIT flags. Moreover, check if the destination device supports XDP non-linear frame in __xdp_enqueue and is_valid_dst routines. This patch allows to perform XDP_REDIRECT on non-linear XDP buffers. Acked-by: Jesper Dangaard Brouer Co-developed-by: Kumar Kartikeya Dwivedi Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Lorenzo Bianconi Link: https://lore.kernel.org/r/26a94c33520c0bfba021b3fbb2cb8c1e69bf53b8.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- kernel/bpf/devmap.c | 16 +++++++++++++--- net/core/filter.c | 13 +++++-------- 2 files changed, 18 insertions(+), 11 deletions(-) (limited to 'net') diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index d01e4c55b376..2675fefc6cb6 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -474,7 +474,11 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf, { int err; - if (!dev->netdev_ops->ndo_xdp_xmit) + if (!(dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT)) + return -EOPNOTSUPP; + + if (unlikely(!(dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT_SG) && + xdp_frame_has_frags(xdpf))) return -EOPNOTSUPP; err = xdp_ok_fwd_dev(dev, xdp_get_frame_len(xdpf)); @@ -532,8 +536,14 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf, static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf) { - if (!obj || - !obj->dev->netdev_ops->ndo_xdp_xmit) + if (!obj) + return false; + + if (!(obj->dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT)) + return false; + + if (unlikely(!(obj->dev->xdp_features & NETDEV_XDP_ACT_NDO_XMIT_SG) && + xdp_frame_has_frags(xdpf))) return false; if (xdp_ok_fwd_dev(obj->dev, xdp_get_frame_len(xdpf))) diff --git a/net/core/filter.c b/net/core/filter.c index 0039cf16713e..2ce06a72a5ba 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4318,16 +4318,13 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); enum bpf_map_type map_type = ri->map_type; - /* XDP_REDIRECT is not fully supported yet for xdp frags since - * not all XDP capable drivers can map non-linear xdp_frame in - * ndo_xdp_xmit. - */ - if (unlikely(xdp_buff_has_frags(xdp) && - map_type != BPF_MAP_TYPE_CPUMAP)) - return -EOPNOTSUPP; + if (map_type == BPF_MAP_TYPE_XSKMAP) { + /* XDP_REDIRECT is not supported AF_XDP yet. */ + if (unlikely(xdp_buff_has_frags(xdp))) + return -EOPNOTSUPP; - if (map_type == BPF_MAP_TYPE_XSKMAP) return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); + } return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), xdp_prog); -- cgit v1.2.3 From 2798e36dc233a409a5d3f26f73029596dc504020 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 1 Feb 2023 17:43:45 +0000 Subject: tcp: add TCP_MINTTL drop reason In the unlikely case incoming packets are dropped because of IP_MINTTL / IPV6_MINHOPCOUNT constraints... Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230201174345.2708943-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/dropreason.h | 6 ++++++ net/ipv4/tcp_ipv4.c | 1 + net/ipv6/tcp_ipv6.c | 3 ++- 3 files changed, 9 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 70539288f995..94bc3d5d8803 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -71,6 +71,7 @@ FN(DUP_FRAG) \ FN(FRAG_REASM_TIMEOUT) \ FN(FRAG_TOO_FAR) \ + FN(TCP_MINTTL) \ FNe(MAX) /** @@ -312,6 +313,11 @@ enum skb_drop_reason { * (/proc/sys/net/ipv4/ipfrag_max_dist) */ SKB_DROP_REASON_FRAG_TOO_FAR, + /** + * @SKB_DROP_REASON_TCP_MINTTL: ipv4 ttl or ipv6 hoplimit below + * the threshold (IP_MINTTL or IPV6_MINHOPCOUNT). + */ + SKB_DROP_REASON_TCP_MINTTL, /** * @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be * used as a real 'reason' diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8320d0ecb13a..ea370afa70ed 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2102,6 +2102,7 @@ process: /* min_ttl can be changed concurrently from do_ip_setsockopt() */ if (unlikely(iph->ttl < READ_ONCE(inet_sk(sk)->min_ttl))) { __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP); + drop_reason = SKB_DROP_REASON_TCP_MINTTL; goto discard_and_relse; } } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 11b736a76bd7..543ee2167720 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1708,8 +1708,9 @@ process: if (static_branch_unlikely(&ip6_min_hopcount)) { /* min_hopcount can be changed concurrently from do_ipv6_setsockopt() */ - if (hdr->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount)) { + if (unlikely(hdr->hop_limit < READ_ONCE(tcp_inet6_sk(sk)->min_hopcount))) { __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP); + drop_reason = SKB_DROP_REASON_TCP_MINTTL; goto discard_and_relse; } } -- cgit v1.2.3 From 0eb5acb16418898c3d813e2c2d59a7ea7763a824 Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 1 Feb 2023 17:30:55 +0100 Subject: netfilter: flowtable: fixup UDP timeout depending on ct state Currently flow_offload_fixup_ct() function assumes that only replied UDP connections can be offloaded and hardcodes UDP_CT_REPLIED timeout value. To enable UDP NEW connection offload in following patches extract the actual connections state from ct->status and set the timeout according to it. Signed-off-by: Vlad Buslov Signed-off-by: David S. Miller --- net/netfilter/nf_flow_table_core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 81c26a96c30b..04bd0ed4d2ae 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -193,8 +193,11 @@ static void flow_offload_fixup_ct(struct nf_conn *ct) timeout -= tn->offload_timeout; } else if (l4num == IPPROTO_UDP) { struct nf_udp_net *tn = nf_udp_pernet(net); + enum udp_conntrack state = + test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ? + UDP_CT_REPLIED : UDP_CT_UNREPLIED; - timeout = tn->timeouts[UDP_CT_REPLIED]; + timeout = tn->timeouts[state]; timeout -= tn->offload_timeout; } else { return; -- cgit v1.2.3 From 8f84780b84d645d6e35467f4a6f3236b20d7f4b2 Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 1 Feb 2023 17:30:56 +0100 Subject: netfilter: flowtable: allow unidirectional rules Modify flow table offload to support unidirectional connections by extending enum nf_flow_flags with new "NF_FLOW_HW_BIDIRECTIONAL" flag. Only offload reply direction when the flag is set. This infrastructure change is necessary to support offloading UDP NEW connections in original direction in following patches in series. Signed-off-by: Vlad Buslov Signed-off-by: David S. Miller --- include/net/netfilter/nf_flow_table.h | 1 + net/netfilter/nf_flow_table_offload.c | 12 ++++++++---- 2 files changed, 9 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index cd982f4a0f50..88ab98ab41d9 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -164,6 +164,7 @@ enum nf_flow_flags { NF_FLOW_HW_DYING, NF_FLOW_HW_DEAD, NF_FLOW_HW_PENDING, + NF_FLOW_HW_BIDIRECTIONAL, }; enum flow_offload_type { diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 4d9b99abe37d..8b852f10fab4 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -895,8 +895,9 @@ static int flow_offload_rule_add(struct flow_offload_work *offload, ok_count += flow_offload_tuple_add(offload, flow_rule[0], FLOW_OFFLOAD_DIR_ORIGINAL); - ok_count += flow_offload_tuple_add(offload, flow_rule[1], - FLOW_OFFLOAD_DIR_REPLY); + if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags)) + ok_count += flow_offload_tuple_add(offload, flow_rule[1], + FLOW_OFFLOAD_DIR_REPLY); if (ok_count == 0) return -ENOENT; @@ -926,7 +927,8 @@ static void flow_offload_work_del(struct flow_offload_work *offload) { clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status); flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_ORIGINAL); - flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY); + if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags)) + flow_offload_tuple_del(offload, FLOW_OFFLOAD_DIR_REPLY); set_bit(NF_FLOW_HW_DEAD, &offload->flow->flags); } @@ -946,7 +948,9 @@ static void flow_offload_work_stats(struct flow_offload_work *offload) u64 lastused; flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_ORIGINAL, &stats[0]); - flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY, &stats[1]); + if (test_bit(NF_FLOW_HW_BIDIRECTIONAL, &offload->flow->flags)) + flow_offload_tuple_stats(offload, FLOW_OFFLOAD_DIR_REPLY, + &stats[1]); lastused = max_t(u64, stats[0].lastused, stats[1].lastused); offload->flow->timeout = max_t(u64, offload->flow->timeout, -- cgit v1.2.3 From 1a441a9b8be8849957a01413a144f84932c324cb Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 1 Feb 2023 17:30:57 +0100 Subject: netfilter: flowtable: cache info of last offload Modify flow table offload to cache the last ct info status that was passed to the driver offload callbacks by extending enum nf_flow_flags with new "NF_FLOW_HW_ESTABLISHED" flag. Set the flag if ctinfo was 'established' during last act_ct meta actions fill call. This infrastructure change is necessary to optimize promoting of UDP connections from 'new' to 'established' in following patches in this series. Signed-off-by: Vlad Buslov Signed-off-by: David S. Miller --- include/net/netfilter/nf_flow_table.h | 7 ++++--- net/netfilter/nf_flow_table_inet.c | 2 +- net/netfilter/nf_flow_table_offload.c | 6 +++--- net/sched/act_ct.c | 12 +++++++----- 4 files changed, 15 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 88ab98ab41d9..ebb28ec5b6fa 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -57,7 +57,7 @@ struct nf_flowtable_type { struct net_device *dev, enum flow_block_command cmd); int (*action)(struct net *net, - const struct flow_offload *flow, + struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule); void (*free)(struct nf_flowtable *ft); @@ -165,6 +165,7 @@ enum nf_flow_flags { NF_FLOW_HW_DEAD, NF_FLOW_HW_PENDING, NF_FLOW_HW_BIDIRECTIONAL, + NF_FLOW_HW_ESTABLISHED, }; enum flow_offload_type { @@ -313,10 +314,10 @@ void nf_flow_table_offload_flush_cleanup(struct nf_flowtable *flowtable); int nf_flow_table_offload_setup(struct nf_flowtable *flowtable, struct net_device *dev, enum flow_block_command cmd); -int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, +int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule); -int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow, +int nf_flow_rule_route_ipv6(struct net *net, struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule); diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c index 0ccabf3fa6aa..9505f9d188ff 100644 --- a/net/netfilter/nf_flow_table_inet.c +++ b/net/netfilter/nf_flow_table_inet.c @@ -39,7 +39,7 @@ nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb, } static int nf_flow_rule_route_inet(struct net *net, - const struct flow_offload *flow, + struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 8b852f10fab4..1c26f03fc661 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -679,7 +679,7 @@ nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow, return 0; } -int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, +int nf_flow_rule_route_ipv4(struct net *net, struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { @@ -704,7 +704,7 @@ int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow, } EXPORT_SYMBOL_GPL(nf_flow_rule_route_ipv4); -int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow, +int nf_flow_rule_route_ipv6(struct net *net, struct flow_offload *flow, enum flow_offload_tuple_dir dir, struct nf_flow_rule *flow_rule) { @@ -735,7 +735,7 @@ nf_flow_offload_rule_alloc(struct net *net, { const struct nf_flowtable *flowtable = offload->flowtable; const struct flow_offload_tuple *tuple, *other_tuple; - const struct flow_offload *flow = offload->flow; + struct flow_offload *flow = offload->flow; struct dst_entry *other_dst = NULL; struct nf_flow_rule *flow_rule; int err = -ENOMEM; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index d68bb5dbf0dc..b9d3e338f72d 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -170,11 +170,11 @@ tcf_ct_flow_table_add_action_nat_udp(const struct nf_conntrack_tuple *tuple, static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct, enum ip_conntrack_dir dir, + enum ip_conntrack_info ctinfo, struct flow_action *action) { struct nf_conn_labels *ct_labels; struct flow_action_entry *entry; - enum ip_conntrack_info ctinfo; u32 *act_ct_labels; entry = tcf_ct_flow_table_flow_action_get_next(action); @@ -182,8 +182,6 @@ static void tcf_ct_flow_table_add_action_meta(struct nf_conn *ct, #if IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) entry->ct_metadata.mark = READ_ONCE(ct->mark); #endif - ctinfo = dir == IP_CT_DIR_ORIGINAL ? IP_CT_ESTABLISHED : - IP_CT_ESTABLISHED_REPLY; /* aligns with the CT reference on the SKB nf_ct_set */ entry->ct_metadata.cookie = (unsigned long)ct | ctinfo; entry->ct_metadata.orig_dir = dir == IP_CT_DIR_ORIGINAL; @@ -237,22 +235,26 @@ static int tcf_ct_flow_table_add_action_nat(struct net *net, } static int tcf_ct_flow_table_fill_actions(struct net *net, - const struct flow_offload *flow, + struct flow_offload *flow, enum flow_offload_tuple_dir tdir, struct nf_flow_rule *flow_rule) { struct flow_action *action = &flow_rule->rule->action; int num_entries = action->num_entries; struct nf_conn *ct = flow->ct; + enum ip_conntrack_info ctinfo; enum ip_conntrack_dir dir; int i, err; switch (tdir) { case FLOW_OFFLOAD_DIR_ORIGINAL: dir = IP_CT_DIR_ORIGINAL; + ctinfo = IP_CT_ESTABLISHED; + set_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags); break; case FLOW_OFFLOAD_DIR_REPLY: dir = IP_CT_DIR_REPLY; + ctinfo = IP_CT_ESTABLISHED_REPLY; break; default: return -EOPNOTSUPP; @@ -262,7 +264,7 @@ static int tcf_ct_flow_table_fill_actions(struct net *net, if (err) goto err_nat; - tcf_ct_flow_table_add_action_meta(ct, dir, action); + tcf_ct_flow_table_add_action_meta(ct, dir, ctinfo, action); return 0; err_nat: -- cgit v1.2.3 From d5774cb6c55c8721c2daf57cc5e5345e3af286ea Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 1 Feb 2023 17:30:58 +0100 Subject: net/sched: act_ct: set ctinfo in meta action depending on ct state Currently tcf_ct_flow_table_fill_actions() function assumes that only established connections can be offloaded and always sets ctinfo to either IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction without checking actual connection state. To enable UDP NEW connection offload set the ctinfo, metadata cookie and NF_FLOW_HW_ESTABLISHED flow_offload flags bit based on ct->status value. Signed-off-by: Vlad Buslov Signed-off-by: David S. Miller --- net/sched/act_ct.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b9d3e338f72d..2cee3f9f83de 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -249,8 +249,10 @@ static int tcf_ct_flow_table_fill_actions(struct net *net, switch (tdir) { case FLOW_OFFLOAD_DIR_ORIGINAL: dir = IP_CT_DIR_ORIGINAL; - ctinfo = IP_CT_ESTABLISHED; - set_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags); + ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ? + IP_CT_ESTABLISHED : IP_CT_NEW; + if (ctinfo == IP_CT_ESTABLISHED) + set_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags); break; case FLOW_OFFLOAD_DIR_REPLY: dir = IP_CT_DIR_REPLY; -- cgit v1.2.3 From 6a9bad0069cf306f3df6ac53cf02438d4e15f296 Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 1 Feb 2023 17:30:59 +0100 Subject: net/sched: act_ct: offload UDP NEW connections Modify the offload algorithm of UDP connections to the following: - Offload NEW connection as unidirectional. - When connection state changes to ESTABLISHED also update the hardware flow. However, in order to prevent act_ct from spamming offload add wq for every packet coming in reply direction in this state verify whether connection has already been updated to ESTABLISHED in the drivers. If that it the case, then skip flow_table and let conntrack handle such packets which will also allow conntrack to potentially promote the connection to ASSURED. - When connection state changes to ASSURED set the flow_table flow NF_FLOW_HW_BIDIRECTIONAL flag which will cause refresh mechanism to offload the reply direction. All other protocols have their offload algorithm preserved and are always offloaded as bidirectional. Note that this change tries to minimize the load on flow_table add workqueue. First, it tracks the last ctinfo that was offloaded by using new flow 'NF_FLOW_HW_ESTABLISHED' flag and doesn't schedule the refresh for reply direction packets when the offloads have already been updated with current ctinfo. Second, when 'add' task executes on workqueue it always update the offload with current flow state (by checking 'bidirectional' flow flag and obtaining actual ctinfo/cookie through meta action instead of caching any of these from the moment of scheduling the 'add' work) preventing the need from scheduling more updates if state changed concurrently while the 'add' work was pending on workqueue. Signed-off-by: Vlad Buslov Signed-off-by: David S. Miller --- net/sched/act_ct.c | 51 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 2cee3f9f83de..b126f03c1bb6 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -369,7 +369,7 @@ static void tcf_ct_flow_tc_ifidx(struct flow_offload *entry, static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft, struct nf_conn *ct, - bool tcp) + bool tcp, bool bidirectional) { struct nf_conn_act_ct_ext *act_ct_ext; struct flow_offload *entry; @@ -388,6 +388,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft, ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL; ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL; } + if (bidirectional) + __set_bit(NF_FLOW_HW_BIDIRECTIONAL, &entry->flags); act_ct_ext = nf_conn_act_ct_ext_find(ct); if (act_ct_ext) { @@ -411,26 +413,34 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft, struct nf_conn *ct, enum ip_conntrack_info ctinfo) { - bool tcp = false; - - if ((ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) || - !test_bit(IPS_ASSURED_BIT, &ct->status)) - return; + bool tcp = false, bidirectional = true; switch (nf_ct_protonum(ct)) { case IPPROTO_TCP: - tcp = true; - if (ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED) + if ((ctinfo != IP_CT_ESTABLISHED && + ctinfo != IP_CT_ESTABLISHED_REPLY) || + !test_bit(IPS_ASSURED_BIT, &ct->status) || + ct->proto.tcp.state != TCP_CONNTRACK_ESTABLISHED) return; + + tcp = true; break; case IPPROTO_UDP: + if (!nf_ct_is_confirmed(ct)) + return; + if (!test_bit(IPS_ASSURED_BIT, &ct->status)) + bidirectional = false; break; #ifdef CONFIG_NF_CT_PROTO_GRE case IPPROTO_GRE: { struct nf_conntrack_tuple *tuple; - if (ct->status & IPS_NAT_MASK) + if ((ctinfo != IP_CT_ESTABLISHED && + ctinfo != IP_CT_ESTABLISHED_REPLY) || + !test_bit(IPS_ASSURED_BIT, &ct->status) || + ct->status & IPS_NAT_MASK) return; + tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple; /* No support for GRE v1 */ if (tuple->src.u.gre.key || tuple->dst.u.gre.key) @@ -446,7 +456,7 @@ static void tcf_ct_flow_table_process_conn(struct tcf_ct_flow_table *ct_ft, ct->status & IPS_SEQ_ADJUST) return; - tcf_ct_flow_table_add(ct_ft, ct, tcp); + tcf_ct_flow_table_add(ct_ft, ct, tcp, bidirectional); } static bool @@ -625,13 +635,30 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p, flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); ct = flow->ct; + if (dir == FLOW_OFFLOAD_DIR_REPLY && + !test_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags)) { + /* Only offload reply direction after connection became + * assured. + */ + if (test_bit(IPS_ASSURED_BIT, &ct->status)) + set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags); + else if (test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags)) + /* If flow_table flow has already been updated to the + * established state, then don't refresh. + */ + return false; + } + if (tcph && (unlikely(tcph->fin || tcph->rst))) { flow_offload_teardown(flow); return false; } - ctinfo = dir == FLOW_OFFLOAD_DIR_ORIGINAL ? IP_CT_ESTABLISHED : - IP_CT_ESTABLISHED_REPLY; + if (dir == FLOW_OFFLOAD_DIR_ORIGINAL) + ctinfo = test_bit(IPS_SEEN_REPLY_BIT, &ct->status) ? + IP_CT_ESTABLISHED : IP_CT_NEW; + else + ctinfo = IP_CT_ESTABLISHED_REPLY; flow_offload_refresh(nf_ft, flow); nf_conntrack_get(&ct->ct_general); -- cgit v1.2.3 From df25455e5a489764508942b77b77de8f550e92cd Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 1 Feb 2023 17:31:00 +0100 Subject: netfilter: nf_conntrack: allow early drop of offloaded UDP conns Both synchronous early drop algorithm and asynchronous gc worker completely ignore connections with IPS_OFFLOAD_BIT status bit set. With new functionality that enabled UDP NEW connection offload in action CT malicious user can flood the conntrack table with offloaded UDP connections by just sending a single packet per 5tuple because such connections can no longer be deleted by early drop algorithm. To mitigate the issue allow both early drop and gc to consider offloaded UDP connections for deletion. Signed-off-by: Vlad Buslov Signed-off-by: David S. Miller --- net/netfilter/nf_conntrack_core.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index c00858344f02..9a830573480e 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1371,9 +1371,6 @@ static unsigned int early_drop_list(struct net *net, hlist_nulls_for_each_entry_rcu(h, n, head, hnnode) { tmp = nf_ct_tuplehash_to_ctrack(h); - if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) - continue; - if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); continue; @@ -1443,11 +1440,14 @@ static bool gc_worker_skip_ct(const struct nf_conn *ct) static bool gc_worker_can_early_drop(const struct nf_conn *ct) { const struct nf_conntrack_l4proto *l4proto; + u8 protonum = nf_ct_protonum(ct); + if (test_bit(IPS_OFFLOAD_BIT, &ct->status) && protonum != IPPROTO_UDP) + return false; if (!test_bit(IPS_ASSURED_BIT, &ct->status)) return true; - l4proto = nf_ct_l4proto_find(nf_ct_protonum(ct)); + l4proto = nf_ct_l4proto_find(protonum); if (l4proto->can_early_drop && l4proto->can_early_drop(ct)) return true; @@ -1504,7 +1504,8 @@ static void gc_worker(struct work_struct *work) if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) { nf_ct_offload_timeout(tmp); - continue; + if (!nf_conntrack_max95) + continue; } if (expired_count > GC_SCAN_EXPIRED_MAX) { -- cgit v1.2.3 From d795527d50796b18c08462188e555139bdd6f967 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 2 Feb 2023 16:03:54 +0200 Subject: net: dsa: use NL_SET_ERR_MSG_WEAK_MOD() more consistently Now that commit 028fb19c6ba7 ("netlink: provide an ability to set default extack message") provides a weak function that doesn't override an existing extack message provided by the driver, it makes sense to use it also for LAG and HSR offloading, not just for bridge offloading. Also consistently put the message string on a separate line, to reduce line length from 92 to 84 characters. Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20230202140354.3158129-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- net/dsa/slave.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 26c458f50ac6..6957971c2db2 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -2692,7 +2692,8 @@ static int dsa_slave_changeupper(struct net_device *dev, if (!err) dsa_bridge_mtu_normalization(dp); if (err == -EOPNOTSUPP) { - NL_SET_ERR_MSG_WEAK_MOD(extack, "Offloading not supported"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "Offloading not supported"); err = 0; } err = notifier_from_errno(err); @@ -2705,8 +2706,8 @@ static int dsa_slave_changeupper(struct net_device *dev, err = dsa_port_lag_join(dp, info->upper_dev, info->upper_info, extack); if (err == -EOPNOTSUPP) { - NL_SET_ERR_MSG_MOD(info->info.extack, - "Offloading not supported"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "Offloading not supported"); err = 0; } err = notifier_from_errno(err); @@ -2718,8 +2719,8 @@ static int dsa_slave_changeupper(struct net_device *dev, if (info->linking) { err = dsa_port_hsr_join(dp, info->upper_dev); if (err == -EOPNOTSUPP) { - NL_SET_ERR_MSG_MOD(info->info.extack, - "Offloading not supported"); + NL_SET_ERR_MSG_WEAK_MOD(extack, + "Offloading not supported"); err = 0; } err = notifier_from_errno(err); -- cgit v1.2.3 From dbeeca81bd9399fc60ae69ff944836280b4fd094 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:00 +0200 Subject: devlink: Split out dev get and dump code Move devlink dev get and dump callbacks and related dev code to new file dev.c. This file shall include all callbacks that are specific on devlink dev object. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/Makefile | 2 +- net/devlink/dev.c | 99 ++++++++++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 17 ++++++++ net/devlink/leftover.c | 102 +------------------------------------------- 4 files changed, 118 insertions(+), 102 deletions(-) create mode 100644 net/devlink/dev.c (limited to 'net') diff --git a/net/devlink/Makefile b/net/devlink/Makefile index 1b1eeac59cb3..daad4521c61e 100644 --- a/net/devlink/Makefile +++ b/net/devlink/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 -obj-y := leftover.o core.o netlink.o +obj-y := leftover.o core.o netlink.o dev.o diff --git a/net/devlink/dev.c b/net/devlink/dev.c new file mode 100644 index 000000000000..6a26dc7edef6 --- /dev/null +++ b/net/devlink/dev.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2016 Jiri Pirko + */ + +#include +#include "devl_internal.h" + +static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + struct nlattr *dev_stats; + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed)) + goto nla_put_failure; + + dev_stats = nla_nest_start(msg, DEVLINK_ATTR_DEV_STATS); + if (!dev_stats) + goto nla_put_failure; + + if (devlink_reload_stats_put(msg, devlink, false)) + goto dev_stats_nest_cancel; + if (devlink_reload_stats_put(msg, devlink, true)) + goto dev_stats_nest_cancel; + + nla_nest_end(msg, dev_stats); + genlmsg_end(msg, hdr); + return 0; + +dev_stats_nest_cancel: + nla_nest_cancel(msg, dev_stats); +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +void devlink_notify(struct devlink *devlink, enum devlink_command cmd) +{ + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_NEW && cmd != DEVLINK_CMD_DEL); + WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_fill(msg, devlink, cmd, 0, 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, + info->snd_portid, info->snd_seq, 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int +devlink_nl_cmd_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) +{ + return devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI); +} + +const struct devlink_cmd devl_cmd_get = { + .dump_one = devlink_nl_cmd_get_dump_one, +}; diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index bdd7ad25c7e8..60dce85885b8 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -139,6 +139,16 @@ devlink_dump_state(struct netlink_callback *cb) return (struct devlink_nl_dump_state *)cb->ctx; } +static inline int +devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink) +{ + if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name)) + return -EMSGSIZE; + if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(devlink->dev))) + return -EMSGSIZE; + return 0; +} + /* Commands */ extern const struct devlink_cmd devl_cmd_get; extern const struct devlink_cmd devl_cmd_port_get; @@ -157,6 +167,9 @@ extern const struct devlink_cmd devl_cmd_rate_get; extern const struct devlink_cmd devl_cmd_linecard_get; extern const struct devlink_cmd devl_cmd_selftests_get; +/* Notify */ +void devlink_notify(struct devlink *devlink, enum devlink_command cmd); + /* Ports */ int devlink_port_netdevice_event(struct notifier_block *nb, unsigned long event, void *ptr); @@ -166,6 +179,8 @@ devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info); /* Reload */ bool devlink_reload_actions_valid(const struct devlink_ops *ops); +int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, + bool is_remote); int devlink_reload(struct devlink *devlink, struct net *dest_net, enum devlink_reload_action action, enum devlink_reload_limit limit, @@ -188,3 +203,5 @@ devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info); struct devlink_rate * devlink_rate_node_get_from_info(struct devlink *devlink, struct genl_info *info); +/* Devlink nl cmds */ +int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 056d9ca14a3d..54c8ea87e76b 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -596,15 +596,6 @@ devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id) return NULL; } -static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink) -{ - if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name)) - return -EMSGSIZE; - if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(devlink->dev))) - return -EMSGSIZE; - return 0; -} - static int devlink_nl_put_nested_handle(struct sk_buff *msg, struct devlink *devlink) { struct nlattr *nested_attr; @@ -699,7 +690,7 @@ nla_put_failure: return -EMSGSIZE; } -static int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote) +int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote) { struct nlattr *reload_stats_attr, *act_info, *act_stats; int i, j, stat_idx; @@ -762,64 +753,6 @@ nla_put_failure: return -EMSGSIZE; } -static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - struct nlattr *dev_stats; - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed)) - goto nla_put_failure; - - dev_stats = nla_nest_start(msg, DEVLINK_ATTR_DEV_STATS); - if (!dev_stats) - goto nla_put_failure; - - if (devlink_reload_stats_put(msg, devlink, false)) - goto dev_stats_nest_cancel; - if (devlink_reload_stats_put(msg, devlink, true)) - goto dev_stats_nest_cancel; - - nla_nest_end(msg, dev_stats); - genlmsg_end(msg, hdr); - return 0; - -dev_stats_nest_cancel: - nla_nest_cancel(msg, dev_stats); -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void devlink_notify(struct devlink *devlink, enum devlink_command cmd) -{ - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_NEW && cmd != DEVLINK_CMD_DEL); - WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_fill(msg, devlink, cmd, 0, 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - static int devlink_nl_port_attrs_put(struct sk_buff *msg, struct devlink_port *devlink_port) { @@ -1274,39 +1207,6 @@ devlink_rate_is_parent_node(struct devlink_rate *devlink_rate, return false; } -static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, - info->snd_portid, info->snd_seq, 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int -devlink_nl_cmd_get_dump_one(struct sk_buff *msg, struct devlink *devlink, - struct netlink_callback *cb) -{ - return devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI); -} - -const struct devlink_cmd devl_cmd_get = { - .dump_one = devlink_nl_cmd_get_dump_one, -}; - static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb, struct genl_info *info) { -- cgit v1.2.3 From c6ed7d6ef929a4502cc89c90a3c57e8814f4fcc9 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:01 +0200 Subject: devlink: Move devlink dev reload code to dev Move devlink dev reload callback and related code from leftover.c to file dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/dev.c | 417 +++++++++++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 9 +- net/devlink/leftover.c | 422 +------------------------------------------- 3 files changed, 427 insertions(+), 421 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 6a26dc7edef6..0d6c14b89d03 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -5,8 +5,131 @@ */ #include +#include #include "devl_internal.h" +struct devlink_reload_combination { + enum devlink_reload_action action; + enum devlink_reload_limit limit; +}; + +static const struct devlink_reload_combination devlink_reload_invalid_combinations[] = { + { + /* can't reinitialize driver with no down time */ + .action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT, + .limit = DEVLINK_RELOAD_LIMIT_NO_RESET, + }, +}; + +static bool +devlink_reload_combination_is_invalid(enum devlink_reload_action action, + enum devlink_reload_limit limit) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) + if (devlink_reload_invalid_combinations[i].action == action && + devlink_reload_invalid_combinations[i].limit == limit) + return true; + return false; +} + +static bool +devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action) +{ + return test_bit(action, &devlink->ops->reload_actions); +} + +static bool +devlink_reload_limit_is_supported(struct devlink *devlink, enum devlink_reload_limit limit) +{ + return test_bit(limit, &devlink->ops->reload_limits); +} + +static int devlink_reload_stat_put(struct sk_buff *msg, + enum devlink_reload_limit limit, u32 value) +{ + struct nlattr *reload_stats_entry; + + reload_stats_entry = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS_ENTRY); + if (!reload_stats_entry) + return -EMSGSIZE; + + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_STATS_LIMIT, limit) || + nla_put_u32(msg, DEVLINK_ATTR_RELOAD_STATS_VALUE, value)) + goto nla_put_failure; + nla_nest_end(msg, reload_stats_entry); + return 0; + +nla_put_failure: + nla_nest_cancel(msg, reload_stats_entry); + return -EMSGSIZE; +} + +static int +devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote) +{ + struct nlattr *reload_stats_attr, *act_info, *act_stats; + int i, j, stat_idx; + u32 value; + + if (!is_remote) + reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS); + else + reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_REMOTE_RELOAD_STATS); + + if (!reload_stats_attr) + return -EMSGSIZE; + + for (i = 0; i <= DEVLINK_RELOAD_ACTION_MAX; i++) { + if ((!is_remote && + !devlink_reload_action_is_supported(devlink, i)) || + i == DEVLINK_RELOAD_ACTION_UNSPEC) + continue; + act_info = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_INFO); + if (!act_info) + goto nla_put_failure; + + if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_ACTION, i)) + goto action_info_nest_cancel; + act_stats = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_STATS); + if (!act_stats) + goto action_info_nest_cancel; + + for (j = 0; j <= DEVLINK_RELOAD_LIMIT_MAX; j++) { + /* Remote stats are shown even if not locally supported. + * Stats of actions with unspecified limit are shown + * though drivers don't need to register unspecified + * limit. + */ + if ((!is_remote && j != DEVLINK_RELOAD_LIMIT_UNSPEC && + !devlink_reload_limit_is_supported(devlink, j)) || + devlink_reload_combination_is_invalid(i, j)) + continue; + + stat_idx = j * __DEVLINK_RELOAD_ACTION_MAX + i; + if (!is_remote) + value = devlink->stats.reload_stats[stat_idx]; + else + value = devlink->stats.remote_reload_stats[stat_idx]; + if (devlink_reload_stat_put(msg, j, value)) + goto action_stats_nest_cancel; + } + nla_nest_end(msg, act_stats); + nla_nest_end(msg, act_info); + } + nla_nest_end(msg, reload_stats_attr); + return 0; + +action_stats_nest_cancel: + nla_nest_cancel(msg, act_stats); +action_info_nest_cancel: + nla_nest_cancel(msg, act_info); +nla_put_failure: + nla_nest_cancel(msg, reload_stats_attr); + return -EMSGSIZE; +} + static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink, enum devlink_command cmd, u32 portid, u32 seq, int flags) @@ -97,3 +220,297 @@ devlink_nl_cmd_get_dump_one(struct sk_buff *msg, struct devlink *devlink, const struct devlink_cmd devl_cmd_get = { .dump_one = devlink_nl_cmd_get_dump_one, }; + +static void devlink_reload_failed_set(struct devlink *devlink, + bool reload_failed) +{ + if (devlink->reload_failed == reload_failed) + return; + devlink->reload_failed = reload_failed; + devlink_notify(devlink, DEVLINK_CMD_NEW); +} + +bool devlink_is_reload_failed(const struct devlink *devlink) +{ + return devlink->reload_failed; +} +EXPORT_SYMBOL_GPL(devlink_is_reload_failed); + +static void +__devlink_reload_stats_update(struct devlink *devlink, u32 *reload_stats, + enum devlink_reload_limit limit, u32 actions_performed) +{ + unsigned long actions = actions_performed; + int stat_idx; + int action; + + for_each_set_bit(action, &actions, __DEVLINK_RELOAD_ACTION_MAX) { + stat_idx = limit * __DEVLINK_RELOAD_ACTION_MAX + action; + reload_stats[stat_idx]++; + } + devlink_notify(devlink, DEVLINK_CMD_NEW); +} + +static void +devlink_reload_stats_update(struct devlink *devlink, enum devlink_reload_limit limit, + u32 actions_performed) +{ + __devlink_reload_stats_update(devlink, devlink->stats.reload_stats, limit, + actions_performed); +} + +/** + * devlink_remote_reload_actions_performed - Update devlink on reload actions + * performed which are not a direct result of devlink reload call. + * + * This should be called by a driver after performing reload actions in case it was not + * a result of devlink reload call. For example fw_activate was performed as a result + * of devlink reload triggered fw_activate on another host. + * The motivation for this function is to keep data on reload actions performed on this + * function whether it was done due to direct devlink reload call or not. + * + * @devlink: devlink + * @limit: reload limit + * @actions_performed: bitmask of actions performed + */ +void devlink_remote_reload_actions_performed(struct devlink *devlink, + enum devlink_reload_limit limit, + u32 actions_performed) +{ + if (WARN_ON(!actions_performed || + actions_performed & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || + actions_performed >= BIT(__DEVLINK_RELOAD_ACTION_MAX) || + limit > DEVLINK_RELOAD_LIMIT_MAX)) + return; + + __devlink_reload_stats_update(devlink, devlink->stats.remote_reload_stats, limit, + actions_performed); +} +EXPORT_SYMBOL_GPL(devlink_remote_reload_actions_performed); + +static struct net *devlink_netns_get(struct sk_buff *skb, + struct genl_info *info) +{ + struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID]; + struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD]; + struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID]; + struct net *net; + + if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) { + NL_SET_ERR_MSG_MOD(info->extack, "multiple netns identifying attributes specified"); + return ERR_PTR(-EINVAL); + } + + if (netns_pid_attr) { + net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr)); + } else if (netns_fd_attr) { + net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr)); + } else if (netns_id_attr) { + net = get_net_ns_by_id(sock_net(skb->sk), + nla_get_u32(netns_id_attr)); + if (!net) + net = ERR_PTR(-EINVAL); + } else { + WARN_ON(1); + net = ERR_PTR(-EINVAL); + } + if (IS_ERR(net)) { + NL_SET_ERR_MSG_MOD(info->extack, "Unknown network namespace"); + return ERR_PTR(-EINVAL); + } + if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) { + put_net(net); + return ERR_PTR(-EPERM); + } + return net; +} + +static void devlink_reload_netns_change(struct devlink *devlink, + struct net *curr_net, + struct net *dest_net) +{ + /* Userspace needs to be notified about devlink objects + * removed from original and entering new network namespace. + * The rest of the devlink objects are re-created during + * reload process so the notifications are generated separatelly. + */ + devlink_notify_unregister(devlink); + move_netdevice_notifier_net(curr_net, dest_net, + &devlink->netdevice_nb); + write_pnet(&devlink->_net, dest_net); + devlink_notify_register(devlink); +} + +int devlink_reload(struct devlink *devlink, struct net *dest_net, + enum devlink_reload_action action, + enum devlink_reload_limit limit, + u32 *actions_performed, struct netlink_ext_ack *extack) +{ + u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; + struct net *curr_net; + int err; + + memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, + sizeof(remote_reload_stats)); + + err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack); + if (err) + return err; + + curr_net = devlink_net(devlink); + if (dest_net && !net_eq(dest_net, curr_net)) + devlink_reload_netns_change(devlink, curr_net, dest_net); + + err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack); + devlink_reload_failed_set(devlink, !!err); + if (err) + return err; + + WARN_ON(!(*actions_performed & BIT(action))); + /* Catch driver on updating the remote action within devlink reload */ + WARN_ON(memcmp(remote_reload_stats, devlink->stats.remote_reload_stats, + sizeof(remote_reload_stats))); + devlink_reload_stats_update(devlink, limit, *actions_performed); + return 0; +} + +static int +devlink_nl_reload_actions_performed_snd(struct devlink *devlink, u32 actions_performed, + enum devlink_command cmd, struct genl_info *info) +{ + struct sk_buff *msg; + void *hdr; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &devlink_nl_family, 0, cmd); + if (!hdr) + goto free_msg; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (nla_put_bitfield32(msg, DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, actions_performed, + actions_performed)) + goto nla_put_failure; + genlmsg_end(msg, hdr); + + return genlmsg_reply(msg, info); + +nla_put_failure: + genlmsg_cancel(msg, hdr); +free_msg: + nlmsg_free(msg); + return -EMSGSIZE; +} + +int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + enum devlink_reload_action action; + enum devlink_reload_limit limit; + struct net *dest_net = NULL; + u32 actions_performed; + int err; + + err = devlink_resources_validate(devlink, NULL, info); + if (err) { + NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed"); + return err; + } + + if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION]) + action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]); + else + action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT; + + if (!devlink_reload_action_is_supported(devlink, action)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Requested reload action is not supported by the driver"); + return -EOPNOTSUPP; + } + + limit = DEVLINK_RELOAD_LIMIT_UNSPEC; + if (info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) { + struct nla_bitfield32 limits; + u32 limits_selected; + + limits = nla_get_bitfield32(info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]); + limits_selected = limits.value & limits.selector; + if (!limits_selected) { + NL_SET_ERR_MSG_MOD(info->extack, "Invalid limit selected"); + return -EINVAL; + } + for (limit = 0 ; limit <= DEVLINK_RELOAD_LIMIT_MAX ; limit++) + if (limits_selected & BIT(limit)) + break; + /* UAPI enables multiselection, but currently it is not used */ + if (limits_selected != BIT(limit)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Multiselection of limit is not supported"); + return -EOPNOTSUPP; + } + if (!devlink_reload_limit_is_supported(devlink, limit)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Requested limit is not supported by the driver"); + return -EOPNOTSUPP; + } + if (devlink_reload_combination_is_invalid(action, limit)) { + NL_SET_ERR_MSG_MOD(info->extack, + "Requested limit is invalid for this action"); + return -EINVAL; + } + } + if (info->attrs[DEVLINK_ATTR_NETNS_PID] || + info->attrs[DEVLINK_ATTR_NETNS_FD] || + info->attrs[DEVLINK_ATTR_NETNS_ID]) { + dest_net = devlink_netns_get(skb, info); + if (IS_ERR(dest_net)) + return PTR_ERR(dest_net); + } + + err = devlink_reload(devlink, dest_net, action, limit, &actions_performed, info->extack); + + if (dest_net) + put_net(dest_net); + + if (err) + return err; + /* For backward compatibility generate reply only if attributes used by user */ + if (!info->attrs[DEVLINK_ATTR_RELOAD_ACTION] && !info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) + return 0; + + return devlink_nl_reload_actions_performed_snd(devlink, actions_performed, + DEVLINK_CMD_RELOAD, info); +} + +bool devlink_reload_actions_valid(const struct devlink_ops *ops) +{ + const struct devlink_reload_combination *comb; + int i; + + if (!devlink_reload_supported(ops)) { + if (WARN_ON(ops->reload_actions)) + return false; + return true; + } + + if (WARN_ON(!ops->reload_actions || + ops->reload_actions & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || + ops->reload_actions >= BIT(__DEVLINK_RELOAD_ACTION_MAX))) + return false; + + if (WARN_ON(ops->reload_limits & BIT(DEVLINK_RELOAD_LIMIT_UNSPEC) || + ops->reload_limits >= BIT(__DEVLINK_RELOAD_LIMIT_MAX))) + return false; + + for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) { + comb = &devlink_reload_invalid_combinations[i]; + if (ops->reload_actions == BIT(comb->action) && + ops->reload_limits == BIT(comb->limit)) + return false; + } + return true; +} diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 60dce85885b8..cb4a89ac42ec 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -179,8 +179,6 @@ devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info); /* Reload */ bool devlink_reload_actions_valid(const struct devlink_ops *ops); -int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, - bool is_remote); int devlink_reload(struct devlink *devlink, struct net *dest_net, enum devlink_reload_action action, enum devlink_reload_limit limit, @@ -191,6 +189,12 @@ static inline bool devlink_reload_supported(const struct devlink_ops *ops) return ops->reload_down && ops->reload_up; } +/* Resources */ +struct devlink_resource; +int devlink_resources_validate(struct devlink *devlink, + struct devlink_resource *resource, + struct genl_info *info); + /* Line cards */ struct devlink_linecard; @@ -205,3 +209,4 @@ devlink_rate_node_get_from_info(struct devlink *devlink, struct genl_info *info); /* Devlink nl cmds */ int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 54c8ea87e76b..703a03d65df7 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -632,127 +632,6 @@ size_t devlink_nl_port_handle_size(struct devlink_port *devlink_port) + nla_total_size(4); /* DEVLINK_ATTR_PORT_INDEX */ } -struct devlink_reload_combination { - enum devlink_reload_action action; - enum devlink_reload_limit limit; -}; - -static const struct devlink_reload_combination devlink_reload_invalid_combinations[] = { - { - /* can't reinitialize driver with no down time */ - .action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT, - .limit = DEVLINK_RELOAD_LIMIT_NO_RESET, - }, -}; - -static bool -devlink_reload_combination_is_invalid(enum devlink_reload_action action, - enum devlink_reload_limit limit) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) - if (devlink_reload_invalid_combinations[i].action == action && - devlink_reload_invalid_combinations[i].limit == limit) - return true; - return false; -} - -static bool -devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action) -{ - return test_bit(action, &devlink->ops->reload_actions); -} - -static bool -devlink_reload_limit_is_supported(struct devlink *devlink, enum devlink_reload_limit limit) -{ - return test_bit(limit, &devlink->ops->reload_limits); -} - -static int devlink_reload_stat_put(struct sk_buff *msg, - enum devlink_reload_limit limit, u32 value) -{ - struct nlattr *reload_stats_entry; - - reload_stats_entry = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS_ENTRY); - if (!reload_stats_entry) - return -EMSGSIZE; - - if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_STATS_LIMIT, limit) || - nla_put_u32(msg, DEVLINK_ATTR_RELOAD_STATS_VALUE, value)) - goto nla_put_failure; - nla_nest_end(msg, reload_stats_entry); - return 0; - -nla_put_failure: - nla_nest_cancel(msg, reload_stats_entry); - return -EMSGSIZE; -} - -int devlink_reload_stats_put(struct sk_buff *msg, struct devlink *devlink, bool is_remote) -{ - struct nlattr *reload_stats_attr, *act_info, *act_stats; - int i, j, stat_idx; - u32 value; - - if (!is_remote) - reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_STATS); - else - reload_stats_attr = nla_nest_start(msg, DEVLINK_ATTR_REMOTE_RELOAD_STATS); - - if (!reload_stats_attr) - return -EMSGSIZE; - - for (i = 0; i <= DEVLINK_RELOAD_ACTION_MAX; i++) { - if ((!is_remote && - !devlink_reload_action_is_supported(devlink, i)) || - i == DEVLINK_RELOAD_ACTION_UNSPEC) - continue; - act_info = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_INFO); - if (!act_info) - goto nla_put_failure; - - if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_ACTION, i)) - goto action_info_nest_cancel; - act_stats = nla_nest_start(msg, DEVLINK_ATTR_RELOAD_ACTION_STATS); - if (!act_stats) - goto action_info_nest_cancel; - - for (j = 0; j <= DEVLINK_RELOAD_LIMIT_MAX; j++) { - /* Remote stats are shown even if not locally supported. - * Stats of actions with unspecified limit are shown - * though drivers don't need to register unspecified - * limit. - */ - if ((!is_remote && j != DEVLINK_RELOAD_LIMIT_UNSPEC && - !devlink_reload_limit_is_supported(devlink, j)) || - devlink_reload_combination_is_invalid(i, j)) - continue; - - stat_idx = j * __DEVLINK_RELOAD_ACTION_MAX + i; - if (!is_remote) - value = devlink->stats.reload_stats[stat_idx]; - else - value = devlink->stats.remote_reload_stats[stat_idx]; - if (devlink_reload_stat_put(msg, j, value)) - goto action_stats_nest_cancel; - } - nla_nest_end(msg, act_stats); - nla_nest_end(msg, act_info); - } - nla_nest_end(msg, reload_stats_attr); - return 0; - -action_stats_nest_cancel: - nla_nest_cancel(msg, act_stats); -action_info_nest_cancel: - nla_nest_cancel(msg, act_info); -nla_put_failure: - nla_nest_cancel(msg, reload_stats_attr); - return -EMSGSIZE; -} - static int devlink_nl_port_attrs_put(struct sk_buff *msg, struct devlink_port *devlink_port) { @@ -4070,10 +3949,9 @@ static int devlink_nl_cmd_resource_dump(struct sk_buff *skb, return devlink_resource_fill(info, DEVLINK_CMD_RESOURCE_DUMP, 0); } -static int -devlink_resources_validate(struct devlink *devlink, - struct devlink_resource *resource, - struct genl_info *info) +int devlink_resources_validate(struct devlink *devlink, + struct devlink_resource *resource, + struct genl_info *info) { struct list_head *resource_list; int err = 0; @@ -4093,271 +3971,6 @@ devlink_resources_validate(struct devlink *devlink, return err; } -static struct net *devlink_netns_get(struct sk_buff *skb, - struct genl_info *info) -{ - struct nlattr *netns_pid_attr = info->attrs[DEVLINK_ATTR_NETNS_PID]; - struct nlattr *netns_fd_attr = info->attrs[DEVLINK_ATTR_NETNS_FD]; - struct nlattr *netns_id_attr = info->attrs[DEVLINK_ATTR_NETNS_ID]; - struct net *net; - - if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) { - NL_SET_ERR_MSG_MOD(info->extack, "multiple netns identifying attributes specified"); - return ERR_PTR(-EINVAL); - } - - if (netns_pid_attr) { - net = get_net_ns_by_pid(nla_get_u32(netns_pid_attr)); - } else if (netns_fd_attr) { - net = get_net_ns_by_fd(nla_get_u32(netns_fd_attr)); - } else if (netns_id_attr) { - net = get_net_ns_by_id(sock_net(skb->sk), - nla_get_u32(netns_id_attr)); - if (!net) - net = ERR_PTR(-EINVAL); - } else { - WARN_ON(1); - net = ERR_PTR(-EINVAL); - } - if (IS_ERR(net)) { - NL_SET_ERR_MSG_MOD(info->extack, "Unknown network namespace"); - return ERR_PTR(-EINVAL); - } - if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) { - put_net(net); - return ERR_PTR(-EPERM); - } - return net; -} - -static void devlink_reload_netns_change(struct devlink *devlink, - struct net *curr_net, - struct net *dest_net) -{ - /* Userspace needs to be notified about devlink objects - * removed from original and entering new network namespace. - * The rest of the devlink objects are re-created during - * reload process so the notifications are generated separatelly. - */ - devlink_notify_unregister(devlink); - move_netdevice_notifier_net(curr_net, dest_net, - &devlink->netdevice_nb); - write_pnet(&devlink->_net, dest_net); - devlink_notify_register(devlink); -} - -static void devlink_reload_failed_set(struct devlink *devlink, - bool reload_failed) -{ - if (devlink->reload_failed == reload_failed) - return; - devlink->reload_failed = reload_failed; - devlink_notify(devlink, DEVLINK_CMD_NEW); -} - -bool devlink_is_reload_failed(const struct devlink *devlink) -{ - return devlink->reload_failed; -} -EXPORT_SYMBOL_GPL(devlink_is_reload_failed); - -static void -__devlink_reload_stats_update(struct devlink *devlink, u32 *reload_stats, - enum devlink_reload_limit limit, u32 actions_performed) -{ - unsigned long actions = actions_performed; - int stat_idx; - int action; - - for_each_set_bit(action, &actions, __DEVLINK_RELOAD_ACTION_MAX) { - stat_idx = limit * __DEVLINK_RELOAD_ACTION_MAX + action; - reload_stats[stat_idx]++; - } - devlink_notify(devlink, DEVLINK_CMD_NEW); -} - -static void -devlink_reload_stats_update(struct devlink *devlink, enum devlink_reload_limit limit, - u32 actions_performed) -{ - __devlink_reload_stats_update(devlink, devlink->stats.reload_stats, limit, - actions_performed); -} - -/** - * devlink_remote_reload_actions_performed - Update devlink on reload actions - * performed which are not a direct result of devlink reload call. - * - * This should be called by a driver after performing reload actions in case it was not - * a result of devlink reload call. For example fw_activate was performed as a result - * of devlink reload triggered fw_activate on another host. - * The motivation for this function is to keep data on reload actions performed on this - * function whether it was done due to direct devlink reload call or not. - * - * @devlink: devlink - * @limit: reload limit - * @actions_performed: bitmask of actions performed - */ -void devlink_remote_reload_actions_performed(struct devlink *devlink, - enum devlink_reload_limit limit, - u32 actions_performed) -{ - if (WARN_ON(!actions_performed || - actions_performed & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || - actions_performed >= BIT(__DEVLINK_RELOAD_ACTION_MAX) || - limit > DEVLINK_RELOAD_LIMIT_MAX)) - return; - - __devlink_reload_stats_update(devlink, devlink->stats.remote_reload_stats, limit, - actions_performed); -} -EXPORT_SYMBOL_GPL(devlink_remote_reload_actions_performed); - -int devlink_reload(struct devlink *devlink, struct net *dest_net, - enum devlink_reload_action action, - enum devlink_reload_limit limit, - u32 *actions_performed, struct netlink_ext_ack *extack) -{ - u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE]; - struct net *curr_net; - int err; - - memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, - sizeof(remote_reload_stats)); - - err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack); - if (err) - return err; - - curr_net = devlink_net(devlink); - if (dest_net && !net_eq(dest_net, curr_net)) - devlink_reload_netns_change(devlink, curr_net, dest_net); - - err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack); - devlink_reload_failed_set(devlink, !!err); - if (err) - return err; - - WARN_ON(!(*actions_performed & BIT(action))); - /* Catch driver on updating the remote action within devlink reload */ - WARN_ON(memcmp(remote_reload_stats, devlink->stats.remote_reload_stats, - sizeof(remote_reload_stats))); - devlink_reload_stats_update(devlink, limit, *actions_performed); - return 0; -} - -static int -devlink_nl_reload_actions_performed_snd(struct devlink *devlink, u32 actions_performed, - enum devlink_command cmd, struct genl_info *info) -{ - struct sk_buff *msg; - void *hdr; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &devlink_nl_family, 0, cmd); - if (!hdr) - goto free_msg; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (nla_put_bitfield32(msg, DEVLINK_ATTR_RELOAD_ACTIONS_PERFORMED, actions_performed, - actions_performed)) - goto nla_put_failure; - genlmsg_end(msg, hdr); - - return genlmsg_reply(msg, info); - -nla_put_failure: - genlmsg_cancel(msg, hdr); -free_msg: - nlmsg_free(msg); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - enum devlink_reload_action action; - enum devlink_reload_limit limit; - struct net *dest_net = NULL; - u32 actions_performed; - int err; - - err = devlink_resources_validate(devlink, NULL, info); - if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed"); - return err; - } - - if (info->attrs[DEVLINK_ATTR_RELOAD_ACTION]) - action = nla_get_u8(info->attrs[DEVLINK_ATTR_RELOAD_ACTION]); - else - action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT; - - if (!devlink_reload_action_is_supported(devlink, action)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested reload action is not supported by the driver"); - return -EOPNOTSUPP; - } - - limit = DEVLINK_RELOAD_LIMIT_UNSPEC; - if (info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) { - struct nla_bitfield32 limits; - u32 limits_selected; - - limits = nla_get_bitfield32(info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]); - limits_selected = limits.value & limits.selector; - if (!limits_selected) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid limit selected"); - return -EINVAL; - } - for (limit = 0 ; limit <= DEVLINK_RELOAD_LIMIT_MAX ; limit++) - if (limits_selected & BIT(limit)) - break; - /* UAPI enables multiselection, but currently it is not used */ - if (limits_selected != BIT(limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Multiselection of limit is not supported"); - return -EOPNOTSUPP; - } - if (!devlink_reload_limit_is_supported(devlink, limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested limit is not supported by the driver"); - return -EOPNOTSUPP; - } - if (devlink_reload_combination_is_invalid(action, limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested limit is invalid for this action"); - return -EINVAL; - } - } - if (info->attrs[DEVLINK_ATTR_NETNS_PID] || - info->attrs[DEVLINK_ATTR_NETNS_FD] || - info->attrs[DEVLINK_ATTR_NETNS_ID]) { - dest_net = devlink_netns_get(skb, info); - if (IS_ERR(dest_net)) - return PTR_ERR(dest_net); - } - - err = devlink_reload(devlink, dest_net, action, limit, &actions_performed, info->extack); - - if (dest_net) - put_net(dest_net); - - if (err) - return err; - /* For backward compatibility generate reply only if attributes used by user */ - if (!info->attrs[DEVLINK_ATTR_RELOAD_ACTION] && !info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]) - return 0; - - return devlink_nl_reload_actions_performed_snd(devlink, actions_performed, - DEVLINK_CMD_RELOAD, info); -} - static int devlink_nl_flash_update_fill(struct sk_buff *msg, struct devlink *devlink, enum devlink_command cmd, @@ -9157,35 +8770,6 @@ const struct genl_small_ops devlink_nl_ops[56] = { /* -- No new ops here! Use split ops going forward! -- */ }; -bool devlink_reload_actions_valid(const struct devlink_ops *ops) -{ - const struct devlink_reload_combination *comb; - int i; - - if (!devlink_reload_supported(ops)) { - if (WARN_ON(ops->reload_actions)) - return false; - return true; - } - - if (WARN_ON(!ops->reload_actions || - ops->reload_actions & BIT(DEVLINK_RELOAD_ACTION_UNSPEC) || - ops->reload_actions >= BIT(__DEVLINK_RELOAD_ACTION_MAX))) - return false; - - if (WARN_ON(ops->reload_limits & BIT(DEVLINK_RELOAD_LIMIT_UNSPEC) || - ops->reload_limits >= BIT(__DEVLINK_RELOAD_LIMIT_MAX))) - return false; - - for (i = 0; i < ARRAY_SIZE(devlink_reload_invalid_combinations); i++) { - comb = &devlink_reload_invalid_combinations[i]; - if (ops->reload_actions == BIT(comb->action) && - ops->reload_limits == BIT(comb->limit)) - return false; - } - return true; -} - static void devlink_trap_policer_notify(struct devlink *devlink, const struct devlink_trap_policer_item *policer_item, -- cgit v1.2.3 From af2f8c1f82294069c6c02ea7d94c43d7c3e73a36 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:02 +0200 Subject: devlink: Move devlink dev eswitch code to dev Move devlink dev eswitch callbacks and related code from leftover.c to file dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/dev.c | 120 +++++++++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 4 ++ net/devlink/leftover.c | 127 +------------------------------------------- 3 files changed, 126 insertions(+), 125 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 0d6c14b89d03..4de3dcc42022 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -514,3 +514,123 @@ bool devlink_reload_actions_valid(const struct devlink_ops *ops) } return true; } + +static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + const struct devlink_ops *ops = devlink->ops; + enum devlink_eswitch_encap_mode encap_mode; + u8 inline_mode; + void *hdr; + int err = 0; + u16 mode; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + err = devlink_nl_put_handle(msg, devlink); + if (err) + goto nla_put_failure; + + if (ops->eswitch_mode_get) { + err = ops->eswitch_mode_get(devlink, &mode); + if (err) + goto nla_put_failure; + err = nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode); + if (err) + goto nla_put_failure; + } + + if (ops->eswitch_inline_mode_get) { + err = ops->eswitch_inline_mode_get(devlink, &inline_mode); + if (err) + goto nla_put_failure; + err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_INLINE_MODE, + inline_mode); + if (err) + goto nla_put_failure; + } + + if (ops->eswitch_encap_mode_get) { + err = ops->eswitch_encap_mode_get(devlink, &encap_mode); + if (err) + goto nla_put_failure; + err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, encap_mode); + if (err) + goto nla_put_failure; + } + + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return err; +} + +int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_GET, + info->snd_portid, info->snd_seq, 0); + + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + const struct devlink_ops *ops = devlink->ops; + enum devlink_eswitch_encap_mode encap_mode; + u8 inline_mode; + int err = 0; + u16 mode; + + if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) { + if (!ops->eswitch_mode_set) + return -EOPNOTSUPP; + mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]); + err = devlink_rate_nodes_check(devlink, mode, info->extack); + if (err) + return err; + err = ops->eswitch_mode_set(devlink, mode, info->extack); + if (err) + return err; + } + + if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) { + if (!ops->eswitch_inline_mode_set) + return -EOPNOTSUPP; + inline_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]); + err = ops->eswitch_inline_mode_set(devlink, inline_mode, + info->extack); + if (err) + return err; + } + + if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) { + if (!ops->eswitch_encap_mode_set) + return -EOPNOTSUPP; + encap_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]); + err = ops->eswitch_encap_mode_set(devlink, encap_mode, + info->extack); + if (err) + return err; + } + + return 0; +} diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index cb4a89ac42ec..79165050feea 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -202,6 +202,8 @@ struct devlink_linecard * devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info); /* Rates */ +int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, + struct netlink_ext_ack *extack); struct devlink_rate * devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info); struct devlink_rate * @@ -210,3 +212,5 @@ devlink_rate_node_get_from_info(struct devlink *devlink, /* Devlink nl cmds */ int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 703a03d65df7..1fa58e69a7a2 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -2843,85 +2843,8 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb, return -EOPNOTSUPP; } -static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - const struct devlink_ops *ops = devlink->ops; - enum devlink_eswitch_encap_mode encap_mode; - u8 inline_mode; - void *hdr; - int err = 0; - u16 mode; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - err = devlink_nl_put_handle(msg, devlink); - if (err) - goto nla_put_failure; - - if (ops->eswitch_mode_get) { - err = ops->eswitch_mode_get(devlink, &mode); - if (err) - goto nla_put_failure; - err = nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode); - if (err) - goto nla_put_failure; - } - - if (ops->eswitch_inline_mode_get) { - err = ops->eswitch_inline_mode_get(devlink, &inline_mode); - if (err) - goto nla_put_failure; - err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_INLINE_MODE, - inline_mode); - if (err) - goto nla_put_failure; - } - - if (ops->eswitch_encap_mode_get) { - err = ops->eswitch_encap_mode_get(devlink, &encap_mode); - if (err) - goto nla_put_failure; - err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, encap_mode); - if (err) - goto nla_put_failure; - } - - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_GET, - info->snd_portid, info->snd_seq, 0); - - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, - struct netlink_ext_ack *extack) +int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, + struct netlink_ext_ack *extack) { struct devlink_rate *devlink_rate; @@ -2933,52 +2856,6 @@ static int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, return 0; } -static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - const struct devlink_ops *ops = devlink->ops; - enum devlink_eswitch_encap_mode encap_mode; - u8 inline_mode; - int err = 0; - u16 mode; - - if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) { - if (!ops->eswitch_mode_set) - return -EOPNOTSUPP; - mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]); - err = devlink_rate_nodes_check(devlink, mode, info->extack); - if (err) - return err; - err = ops->eswitch_mode_set(devlink, mode, info->extack); - if (err) - return err; - } - - if (info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]) { - if (!ops->eswitch_inline_mode_set) - return -EOPNOTSUPP; - inline_mode = nla_get_u8( - info->attrs[DEVLINK_ATTR_ESWITCH_INLINE_MODE]); - err = ops->eswitch_inline_mode_set(devlink, inline_mode, - info->extack); - if (err) - return err; - } - - if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) { - if (!ops->eswitch_encap_mode_set) - return -EOPNOTSUPP; - encap_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]); - err = ops->eswitch_encap_mode_set(devlink, encap_mode, - info->extack); - if (err) - return err; - } - - return 0; -} - int devlink_dpipe_match_put(struct sk_buff *skb, struct devlink_dpipe_match *match) { -- cgit v1.2.3 From d60191c46ec9379453c674fcaa40531e2c0b2ba4 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:03 +0200 Subject: devlink: Move devlink dev info code to dev Move devlink dev info callbacks, related drivers helpers functions and other related code from leftover.c to dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/dev.c | 246 ++++++++++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 10 ++ net/devlink/leftover.c | 255 -------------------------------------------- 3 files changed, 256 insertions(+), 255 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 4de3dcc42022..dc21d2520835 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -634,3 +634,249 @@ int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info) return 0; } + +int devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn) +{ + if (!req->msg) + return 0; + return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn); +} +EXPORT_SYMBOL_GPL(devlink_info_serial_number_put); + +int devlink_info_board_serial_number_put(struct devlink_info_req *req, + const char *bsn) +{ + if (!req->msg) + return 0; + return nla_put_string(req->msg, DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, + bsn); +} +EXPORT_SYMBOL_GPL(devlink_info_board_serial_number_put); + +static int devlink_info_version_put(struct devlink_info_req *req, int attr, + const char *version_name, + const char *version_value, + enum devlink_info_version_type version_type) +{ + struct nlattr *nest; + int err; + + if (req->version_cb) + req->version_cb(version_name, version_type, + req->version_cb_priv); + + if (!req->msg) + return 0; + + nest = nla_nest_start_noflag(req->msg, attr); + if (!nest) + return -EMSGSIZE; + + err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_NAME, + version_name); + if (err) + goto nla_put_failure; + + err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_VALUE, + version_value); + if (err) + goto nla_put_failure; + + nla_nest_end(req->msg, nest); + + return 0; + +nla_put_failure: + nla_nest_cancel(req->msg, nest); + return err; +} + +int devlink_info_version_fixed_put(struct devlink_info_req *req, + const char *version_name, + const char *version_value) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_FIXED, + version_name, version_value, + DEVLINK_INFO_VERSION_TYPE_NONE); +} +EXPORT_SYMBOL_GPL(devlink_info_version_fixed_put); + +int devlink_info_version_stored_put(struct devlink_info_req *req, + const char *version_name, + const char *version_value) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, + version_name, version_value, + DEVLINK_INFO_VERSION_TYPE_NONE); +} +EXPORT_SYMBOL_GPL(devlink_info_version_stored_put); + +int devlink_info_version_stored_put_ext(struct devlink_info_req *req, + const char *version_name, + const char *version_value, + enum devlink_info_version_type version_type) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, + version_name, version_value, + version_type); +} +EXPORT_SYMBOL_GPL(devlink_info_version_stored_put_ext); + +int devlink_info_version_running_put(struct devlink_info_req *req, + const char *version_name, + const char *version_value) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, + version_name, version_value, + DEVLINK_INFO_VERSION_TYPE_NONE); +} +EXPORT_SYMBOL_GPL(devlink_info_version_running_put); + +int devlink_info_version_running_put_ext(struct devlink_info_req *req, + const char *version_name, + const char *version_value, + enum devlink_info_version_type version_type) +{ + return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, + version_name, version_value, + version_type); +} +EXPORT_SYMBOL_GPL(devlink_info_version_running_put_ext); + +static int devlink_nl_driver_info_get(struct device_driver *drv, + struct devlink_info_req *req) +{ + if (!drv) + return 0; + + if (drv->name[0]) + return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRIVER_NAME, + drv->name); + + return 0; +} + +static int +devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink, + enum devlink_command cmd, u32 portid, + u32 seq, int flags, struct netlink_ext_ack *extack) +{ + struct device *dev = devlink_to_dev(devlink); + struct devlink_info_req req = {}; + void *hdr; + int err; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + err = -EMSGSIZE; + if (devlink_nl_put_handle(msg, devlink)) + goto err_cancel_msg; + + req.msg = msg; + if (devlink->ops->info_get) { + err = devlink->ops->info_get(devlink, &req, extack); + if (err) + goto err_cancel_msg; + } + + err = devlink_nl_driver_info_get(dev->driver, &req); + if (err) + goto err_cancel_msg; + + genlmsg_end(msg, hdr); + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); + return err; +} + +int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, + info->snd_portid, info->snd_seq, 0, + info->extack); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int +devlink_nl_cmd_info_get_dump_one(struct sk_buff *msg, struct devlink *devlink, + struct netlink_callback *cb) +{ + int err; + + err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); + if (err == -EOPNOTSUPP) + err = 0; + return err; +} + +const struct devlink_cmd devl_cmd_info_get = { + .dump_one = devlink_nl_cmd_info_get_dump_one, +}; + +static void __devlink_compat_running_version(struct devlink *devlink, + char *buf, size_t len) +{ + struct devlink_info_req req = {}; + const struct nlattr *nlattr; + struct sk_buff *msg; + int rem, err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + req.msg = msg; + err = devlink->ops->info_get(devlink, &req, NULL); + if (err) + goto free_msg; + + nla_for_each_attr(nlattr, (void *)msg->data, msg->len, rem) { + const struct nlattr *kv; + int rem_kv; + + if (nla_type(nlattr) != DEVLINK_ATTR_INFO_VERSION_RUNNING) + continue; + + nla_for_each_nested(kv, nlattr, rem_kv) { + if (nla_type(kv) != DEVLINK_ATTR_INFO_VERSION_VALUE) + continue; + + strlcat(buf, nla_data(kv), len); + strlcat(buf, " ", len); + } + } +free_msg: + nlmsg_free(msg); +} + +void devlink_compat_running_version(struct devlink *devlink, + char *buf, size_t len) +{ + if (!devlink->ops->info_get) + return; + + devl_lock(devlink); + if (devl_is_registered(devlink)) + __devlink_compat_running_version(devlink, buf, len); + devl_unlock(devlink); +} diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 79165050feea..a0f4cd51603e 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -189,6 +189,15 @@ static inline bool devlink_reload_supported(const struct devlink_ops *ops) return ops->reload_down && ops->reload_up; } +/* Dev info */ +struct devlink_info_req { + struct sk_buff *msg; + void (*version_cb)(const char *version_name, + enum devlink_info_version_type version_type, + void *version_cb_priv); + void *version_cb_priv; +}; + /* Resources */ struct devlink_resource; int devlink_resources_validate(struct devlink *devlink, @@ -214,3 +223,4 @@ int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 1fa58e69a7a2..323a5d533703 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -3976,14 +3976,6 @@ void devlink_flash_update_timeout_notify(struct devlink *devlink, } EXPORT_SYMBOL_GPL(devlink_flash_update_timeout_notify); -struct devlink_info_req { - struct sk_buff *msg; - void (*version_cb)(const char *version_name, - enum devlink_info_version_type version_type, - void *version_cb_priv); - void *version_cb_priv; -}; - struct devlink_flash_component_lookup_ctx { const char *lookup_name; bool lookup_name_found; @@ -5820,205 +5812,6 @@ out_unlock: return err; } -int devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn) -{ - if (!req->msg) - return 0; - return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn); -} -EXPORT_SYMBOL_GPL(devlink_info_serial_number_put); - -int devlink_info_board_serial_number_put(struct devlink_info_req *req, - const char *bsn) -{ - if (!req->msg) - return 0; - return nla_put_string(req->msg, DEVLINK_ATTR_INFO_BOARD_SERIAL_NUMBER, - bsn); -} -EXPORT_SYMBOL_GPL(devlink_info_board_serial_number_put); - -static int devlink_info_version_put(struct devlink_info_req *req, int attr, - const char *version_name, - const char *version_value, - enum devlink_info_version_type version_type) -{ - struct nlattr *nest; - int err; - - if (req->version_cb) - req->version_cb(version_name, version_type, - req->version_cb_priv); - - if (!req->msg) - return 0; - - nest = nla_nest_start_noflag(req->msg, attr); - if (!nest) - return -EMSGSIZE; - - err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_NAME, - version_name); - if (err) - goto nla_put_failure; - - err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_VALUE, - version_value); - if (err) - goto nla_put_failure; - - nla_nest_end(req->msg, nest); - - return 0; - -nla_put_failure: - nla_nest_cancel(req->msg, nest); - return err; -} - -int devlink_info_version_fixed_put(struct devlink_info_req *req, - const char *version_name, - const char *version_value) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_FIXED, - version_name, version_value, - DEVLINK_INFO_VERSION_TYPE_NONE); -} -EXPORT_SYMBOL_GPL(devlink_info_version_fixed_put); - -int devlink_info_version_stored_put(struct devlink_info_req *req, - const char *version_name, - const char *version_value) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, - version_name, version_value, - DEVLINK_INFO_VERSION_TYPE_NONE); -} -EXPORT_SYMBOL_GPL(devlink_info_version_stored_put); - -int devlink_info_version_stored_put_ext(struct devlink_info_req *req, - const char *version_name, - const char *version_value, - enum devlink_info_version_type version_type) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED, - version_name, version_value, - version_type); -} -EXPORT_SYMBOL_GPL(devlink_info_version_stored_put_ext); - -int devlink_info_version_running_put(struct devlink_info_req *req, - const char *version_name, - const char *version_value) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, - version_name, version_value, - DEVLINK_INFO_VERSION_TYPE_NONE); -} -EXPORT_SYMBOL_GPL(devlink_info_version_running_put); - -int devlink_info_version_running_put_ext(struct devlink_info_req *req, - const char *version_name, - const char *version_value, - enum devlink_info_version_type version_type) -{ - return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING, - version_name, version_value, - version_type); -} -EXPORT_SYMBOL_GPL(devlink_info_version_running_put_ext); - -static int devlink_nl_driver_info_get(struct device_driver *drv, - struct devlink_info_req *req) -{ - if (!drv) - return 0; - - if (drv->name[0]) - return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRIVER_NAME, - drv->name); - - return 0; -} - -static int -devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink, - enum devlink_command cmd, u32 portid, - u32 seq, int flags, struct netlink_ext_ack *extack) -{ - struct device *dev = devlink_to_dev(devlink); - struct devlink_info_req req = {}; - void *hdr; - int err; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - err = -EMSGSIZE; - if (devlink_nl_put_handle(msg, devlink)) - goto err_cancel_msg; - - req.msg = msg; - if (devlink->ops->info_get) { - err = devlink->ops->info_get(devlink, &req, extack); - if (err) - goto err_cancel_msg; - } - - err = devlink_nl_driver_info_get(dev->driver, &req); - if (err) - goto err_cancel_msg; - - genlmsg_end(msg, hdr); - return 0; - -err_cancel_msg: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, - info->snd_portid, info->snd_seq, 0, - info->extack); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int -devlink_nl_cmd_info_get_dump_one(struct sk_buff *msg, struct devlink *devlink, - struct netlink_callback *cb) -{ - int err; - - err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->extack); - if (err == -EOPNOTSUPP) - err = 0; - return err; -} - -const struct devlink_cmd devl_cmd_info_get = { - .dump_one = devlink_nl_cmd_info_get_dump_one, -}; - struct devlink_fmsg_item { struct list_head list; int attrtype; @@ -11429,54 +11222,6 @@ devl_trap_policers_unregister(struct devlink *devlink, } EXPORT_SYMBOL_GPL(devl_trap_policers_unregister); -static void __devlink_compat_running_version(struct devlink *devlink, - char *buf, size_t len) -{ - struct devlink_info_req req = {}; - const struct nlattr *nlattr; - struct sk_buff *msg; - int rem, err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - req.msg = msg; - err = devlink->ops->info_get(devlink, &req, NULL); - if (err) - goto free_msg; - - nla_for_each_attr(nlattr, (void *)msg->data, msg->len, rem) { - const struct nlattr *kv; - int rem_kv; - - if (nla_type(nlattr) != DEVLINK_ATTR_INFO_VERSION_RUNNING) - continue; - - nla_for_each_nested(kv, nlattr, rem_kv) { - if (nla_type(kv) != DEVLINK_ATTR_INFO_VERSION_VALUE) - continue; - - strlcat(buf, nla_data(kv), len); - strlcat(buf, " ", len); - } - } -free_msg: - nlmsg_free(msg); -} - -void devlink_compat_running_version(struct devlink *devlink, - char *buf, size_t len) -{ - if (!devlink->ops->info_get) - return; - - devl_lock(devlink); - if (devl_is_registered(devlink)) - __devlink_compat_running_version(devlink, buf, len); - devl_unlock(devlink); -} - int devlink_compat_flash_update(struct devlink *devlink, const char *file_name) { struct devlink_flash_update_params params = {}; -- cgit v1.2.3 From a13aab66cbe02daaed3dde3a6d296ad0a5fab543 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:04 +0200 Subject: devlink: Move devlink dev flash code to dev Move devlink dev flash callbacks, helpers and other related code from leftover.c to dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/dev.c | 271 ++++++++++++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 1 + net/devlink/leftover.c | 271 -------------------------------------------- 3 files changed, 272 insertions(+), 271 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index dc21d2520835..83760e6c8c19 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -833,6 +833,246 @@ const struct devlink_cmd devl_cmd_info_get = { .dump_one = devlink_nl_cmd_info_get_dump_one, }; +static int devlink_nl_flash_update_fill(struct sk_buff *msg, + struct devlink *devlink, + enum devlink_command cmd, + struct devlink_flash_notify *params) +{ + void *hdr; + + hdr = genlmsg_put(msg, 0, 0, &devlink_nl_family, 0, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto nla_put_failure; + + if (cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS) + goto out; + + if (params->status_msg && + nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG, + params->status_msg)) + goto nla_put_failure; + if (params->component && + nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_COMPONENT, + params->component)) + goto nla_put_failure; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE, + params->done, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL, + params->total, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, + params->timeout, DEVLINK_ATTR_PAD)) + goto nla_put_failure; + +out: + genlmsg_end(msg, hdr); + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static void __devlink_flash_update_notify(struct devlink *devlink, + enum devlink_command cmd, + struct devlink_flash_notify *params) +{ + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_FLASH_UPDATE && + cmd != DEVLINK_CMD_FLASH_UPDATE_END && + cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS); + + if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) + return; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_flash_update_fill(msg, devlink, cmd, params); + if (err) + goto out_free_msg; + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), + msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); + return; + +out_free_msg: + nlmsg_free(msg); +} + +static void devlink_flash_update_begin_notify(struct devlink *devlink) +{ + struct devlink_flash_notify params = {}; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE, + ¶ms); +} + +static void devlink_flash_update_end_notify(struct devlink *devlink) +{ + struct devlink_flash_notify params = {}; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE_END, + ¶ms); +} + +void devlink_flash_update_status_notify(struct devlink *devlink, + const char *status_msg, + const char *component, + unsigned long done, + unsigned long total) +{ + struct devlink_flash_notify params = { + .status_msg = status_msg, + .component = component, + .done = done, + .total = total, + }; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE_STATUS, + ¶ms); +} +EXPORT_SYMBOL_GPL(devlink_flash_update_status_notify); + +void devlink_flash_update_timeout_notify(struct devlink *devlink, + const char *status_msg, + const char *component, + unsigned long timeout) +{ + struct devlink_flash_notify params = { + .status_msg = status_msg, + .component = component, + .timeout = timeout, + }; + + __devlink_flash_update_notify(devlink, + DEVLINK_CMD_FLASH_UPDATE_STATUS, + ¶ms); +} +EXPORT_SYMBOL_GPL(devlink_flash_update_timeout_notify); + +struct devlink_flash_component_lookup_ctx { + const char *lookup_name; + bool lookup_name_found; +}; + +static void +devlink_flash_component_lookup_cb(const char *version_name, + enum devlink_info_version_type version_type, + void *version_cb_priv) +{ + struct devlink_flash_component_lookup_ctx *lookup_ctx = version_cb_priv; + + if (version_type != DEVLINK_INFO_VERSION_TYPE_COMPONENT || + lookup_ctx->lookup_name_found) + return; + + lookup_ctx->lookup_name_found = + !strcmp(lookup_ctx->lookup_name, version_name); +} + +static int devlink_flash_component_get(struct devlink *devlink, + struct nlattr *nla_component, + const char **p_component, + struct netlink_ext_ack *extack) +{ + struct devlink_flash_component_lookup_ctx lookup_ctx = {}; + struct devlink_info_req req = {}; + const char *component; + int ret; + + if (!nla_component) + return 0; + + component = nla_data(nla_component); + + if (!devlink->ops->info_get) { + NL_SET_ERR_MSG_ATTR(extack, nla_component, + "component update is not supported by this device"); + return -EOPNOTSUPP; + } + + lookup_ctx.lookup_name = component; + req.version_cb = devlink_flash_component_lookup_cb; + req.version_cb_priv = &lookup_ctx; + + ret = devlink->ops->info_get(devlink, &req, NULL); + if (ret) + return ret; + + if (!lookup_ctx.lookup_name_found) { + NL_SET_ERR_MSG_ATTR(extack, nla_component, + "selected component is not supported by this device"); + return -EINVAL; + } + *p_component = component; + return 0; +} + +int devlink_nl_cmd_flash_update(struct sk_buff *skb, struct genl_info *info) +{ + struct nlattr *nla_overwrite_mask, *nla_file_name; + struct devlink_flash_update_params params = {}; + struct devlink *devlink = info->user_ptr[0]; + const char *file_name; + u32 supported_params; + int ret; + + if (!devlink->ops->flash_update) + return -EOPNOTSUPP; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME)) + return -EINVAL; + + ret = devlink_flash_component_get(devlink, + info->attrs[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT], + ¶ms.component, info->extack); + if (ret) + return ret; + + supported_params = devlink->ops->supported_flash_update_params; + + nla_overwrite_mask = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK]; + if (nla_overwrite_mask) { + struct nla_bitfield32 sections; + + if (!(supported_params & DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK)) { + NL_SET_ERR_MSG_ATTR(info->extack, nla_overwrite_mask, + "overwrite settings are not supported by this device"); + return -EOPNOTSUPP; + } + sections = nla_get_bitfield32(nla_overwrite_mask); + params.overwrite_mask = sections.value & sections.selector; + } + + nla_file_name = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME]; + file_name = nla_data(nla_file_name); + ret = request_firmware(¶ms.fw, file_name, devlink->dev); + if (ret) { + NL_SET_ERR_MSG_ATTR(info->extack, nla_file_name, + "failed to locate the requested firmware file"); + return ret; + } + + devlink_flash_update_begin_notify(devlink); + ret = devlink->ops->flash_update(devlink, ¶ms, info->extack); + devlink_flash_update_end_notify(devlink); + + release_firmware(params.fw); + + return ret; +} + static void __devlink_compat_running_version(struct devlink *devlink, char *buf, size_t len) { @@ -880,3 +1120,34 @@ void devlink_compat_running_version(struct devlink *devlink, __devlink_compat_running_version(devlink, buf, len); devl_unlock(devlink); } + +int devlink_compat_flash_update(struct devlink *devlink, const char *file_name) +{ + struct devlink_flash_update_params params = {}; + int ret; + + devl_lock(devlink); + if (!devl_is_registered(devlink)) { + ret = -ENODEV; + goto out_unlock; + } + + if (!devlink->ops->flash_update) { + ret = -EOPNOTSUPP; + goto out_unlock; + } + + ret = request_firmware(¶ms.fw, file_name, devlink->dev); + if (ret) + goto out_unlock; + + devlink_flash_update_begin_notify(devlink); + ret = devlink->ops->flash_update(devlink, ¶ms, NULL); + devlink_flash_update_end_notify(devlink); + + release_firmware(params.fw); +out_unlock: + devl_unlock(devlink); + + return ret; +} diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index a0f4cd51603e..5fbd757b2f56 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -224,3 +224,4 @@ int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_flash_update(struct sk_buff *skb, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 323a5d533703..ed3fb6133501 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -3848,246 +3848,6 @@ int devlink_resources_validate(struct devlink *devlink, return err; } -static int devlink_nl_flash_update_fill(struct sk_buff *msg, - struct devlink *devlink, - enum devlink_command cmd, - struct devlink_flash_notify *params) -{ - void *hdr; - - hdr = genlmsg_put(msg, 0, 0, &devlink_nl_family, 0, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto nla_put_failure; - - if (cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS) - goto out; - - if (params->status_msg && - nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG, - params->status_msg)) - goto nla_put_failure; - if (params->component && - nla_put_string(msg, DEVLINK_ATTR_FLASH_UPDATE_COMPONENT, - params->component)) - goto nla_put_failure; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE, - params->done, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL, - params->total, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT, - params->timeout, DEVLINK_ATTR_PAD)) - goto nla_put_failure; - -out: - genlmsg_end(msg, hdr); - return 0; - -nla_put_failure: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - -static void __devlink_flash_update_notify(struct devlink *devlink, - enum devlink_command cmd, - struct devlink_flash_notify *params) -{ - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_FLASH_UPDATE && - cmd != DEVLINK_CMD_FLASH_UPDATE_END && - cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS); - - if (!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)) - return; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_flash_update_fill(msg, devlink, cmd, params); - if (err) - goto out_free_msg; - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), - msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); - return; - -out_free_msg: - nlmsg_free(msg); -} - -static void devlink_flash_update_begin_notify(struct devlink *devlink) -{ - struct devlink_flash_notify params = {}; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE, - ¶ms); -} - -static void devlink_flash_update_end_notify(struct devlink *devlink) -{ - struct devlink_flash_notify params = {}; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE_END, - ¶ms); -} - -void devlink_flash_update_status_notify(struct devlink *devlink, - const char *status_msg, - const char *component, - unsigned long done, - unsigned long total) -{ - struct devlink_flash_notify params = { - .status_msg = status_msg, - .component = component, - .done = done, - .total = total, - }; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE_STATUS, - ¶ms); -} -EXPORT_SYMBOL_GPL(devlink_flash_update_status_notify); - -void devlink_flash_update_timeout_notify(struct devlink *devlink, - const char *status_msg, - const char *component, - unsigned long timeout) -{ - struct devlink_flash_notify params = { - .status_msg = status_msg, - .component = component, - .timeout = timeout, - }; - - __devlink_flash_update_notify(devlink, - DEVLINK_CMD_FLASH_UPDATE_STATUS, - ¶ms); -} -EXPORT_SYMBOL_GPL(devlink_flash_update_timeout_notify); - -struct devlink_flash_component_lookup_ctx { - const char *lookup_name; - bool lookup_name_found; -}; - -static void -devlink_flash_component_lookup_cb(const char *version_name, - enum devlink_info_version_type version_type, - void *version_cb_priv) -{ - struct devlink_flash_component_lookup_ctx *lookup_ctx = version_cb_priv; - - if (version_type != DEVLINK_INFO_VERSION_TYPE_COMPONENT || - lookup_ctx->lookup_name_found) - return; - - lookup_ctx->lookup_name_found = - !strcmp(lookup_ctx->lookup_name, version_name); -} - -static int devlink_flash_component_get(struct devlink *devlink, - struct nlattr *nla_component, - const char **p_component, - struct netlink_ext_ack *extack) -{ - struct devlink_flash_component_lookup_ctx lookup_ctx = {}; - struct devlink_info_req req = {}; - const char *component; - int ret; - - if (!nla_component) - return 0; - - component = nla_data(nla_component); - - if (!devlink->ops->info_get) { - NL_SET_ERR_MSG_ATTR(extack, nla_component, - "component update is not supported by this device"); - return -EOPNOTSUPP; - } - - lookup_ctx.lookup_name = component; - req.version_cb = devlink_flash_component_lookup_cb; - req.version_cb_priv = &lookup_ctx; - - ret = devlink->ops->info_get(devlink, &req, NULL); - if (ret) - return ret; - - if (!lookup_ctx.lookup_name_found) { - NL_SET_ERR_MSG_ATTR(extack, nla_component, - "selected component is not supported by this device"); - return -EINVAL; - } - *p_component = component; - return 0; -} - -static int devlink_nl_cmd_flash_update(struct sk_buff *skb, - struct genl_info *info) -{ - struct nlattr *nla_overwrite_mask, *nla_file_name; - struct devlink_flash_update_params params = {}; - struct devlink *devlink = info->user_ptr[0]; - const char *file_name; - u32 supported_params; - int ret; - - if (!devlink->ops->flash_update) - return -EOPNOTSUPP; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME)) - return -EINVAL; - - ret = devlink_flash_component_get(devlink, - info->attrs[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT], - ¶ms.component, info->extack); - if (ret) - return ret; - - supported_params = devlink->ops->supported_flash_update_params; - - nla_overwrite_mask = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_OVERWRITE_MASK]; - if (nla_overwrite_mask) { - struct nla_bitfield32 sections; - - if (!(supported_params & DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK)) { - NL_SET_ERR_MSG_ATTR(info->extack, nla_overwrite_mask, - "overwrite settings are not supported by this device"); - return -EOPNOTSUPP; - } - sections = nla_get_bitfield32(nla_overwrite_mask); - params.overwrite_mask = sections.value & sections.selector; - } - - nla_file_name = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME]; - file_name = nla_data(nla_file_name); - ret = request_firmware(¶ms.fw, file_name, devlink->dev); - if (ret) { - NL_SET_ERR_MSG_ATTR(info->extack, nla_file_name, "failed to locate the requested firmware file"); - return ret; - } - - devlink_flash_update_begin_notify(devlink); - ret = devlink->ops->flash_update(devlink, ¶ms, info->extack); - devlink_flash_update_end_notify(devlink); - - release_firmware(params.fw); - - return ret; -} - static int devlink_nl_selftests_fill(struct sk_buff *msg, struct devlink *devlink, u32 portid, u32 seq, int flags, @@ -11222,37 +10982,6 @@ devl_trap_policers_unregister(struct devlink *devlink, } EXPORT_SYMBOL_GPL(devl_trap_policers_unregister); -int devlink_compat_flash_update(struct devlink *devlink, const char *file_name) -{ - struct devlink_flash_update_params params = {}; - int ret; - - devl_lock(devlink); - if (!devl_is_registered(devlink)) { - ret = -ENODEV; - goto out_unlock; - } - - if (!devlink->ops->flash_update) { - ret = -EOPNOTSUPP; - goto out_unlock; - } - - ret = request_firmware(¶ms.fw, file_name, devlink->dev); - if (ret) - goto out_unlock; - - devlink_flash_update_begin_notify(devlink); - ret = devlink->ops->flash_update(devlink, ¶ms, NULL); - devlink_flash_update_end_notify(devlink); - - release_firmware(params.fw); -out_unlock: - devl_unlock(devlink); - - return ret; -} - int devlink_compat_phys_port_name_get(struct net_device *dev, char *name, size_t len) { -- cgit v1.2.3 From ec4a0ce92e0c4abf833ad2c3b1face0f01432bff Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:05 +0200 Subject: devlink: Move devlink_info_req struct to be local As all users of the struct devlink_info_req are already in dev.c, move this struct from devl_internal.c to be local in dev.c. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/dev.c | 8 ++++++++ net/devlink/devl_internal.h | 9 --------- 2 files changed, 8 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 83760e6c8c19..dcf0935462e8 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -8,6 +8,14 @@ #include #include "devl_internal.h" +struct devlink_info_req { + struct sk_buff *msg; + void (*version_cb)(const char *version_name, + enum devlink_info_version_type version_type, + void *version_cb_priv); + void *version_cb_priv; +}; + struct devlink_reload_combination { enum devlink_reload_action action; enum devlink_reload_limit limit; diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 5fbd757b2f56..a5c29ad20564 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -189,15 +189,6 @@ static inline bool devlink_reload_supported(const struct devlink_ops *ops) return ops->reload_down && ops->reload_up; } -/* Dev info */ -struct devlink_info_req { - struct sk_buff *msg; - void (*version_cb)(const char *version_name, - enum devlink_info_version_type version_type, - void *version_cb_priv); - void *version_cb_priv; -}; - /* Resources */ struct devlink_resource; int devlink_resources_validate(struct devlink *devlink, -- cgit v1.2.3 From 7c976c7cfc70dbe7abfcd7a74cc0baa02b788c77 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 2 Feb 2023 16:47:06 +0200 Subject: devlink: Move devlink dev selftest code to dev Move devlink dev selftest callbacks and related code from leftover.c to file dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/dev.c | 182 +++++++++++++++++++++++++++++++++++++++++++ net/devlink/devl_internal.h | 2 + net/devlink/leftover.c | 183 -------------------------------------------- 3 files changed, 184 insertions(+), 183 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index dcf0935462e8..78d824eda5ec 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -1159,3 +1159,185 @@ out_unlock: return ret; } + +static int +devlink_nl_selftests_fill(struct sk_buff *msg, struct devlink *devlink, + u32 portid, u32 seq, int flags, + struct netlink_ext_ack *extack) +{ + struct nlattr *selftests; + void *hdr; + int err; + int i; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, + DEVLINK_CMD_SELFTESTS_GET); + if (!hdr) + return -EMSGSIZE; + + err = -EMSGSIZE; + if (devlink_nl_put_handle(msg, devlink)) + goto err_cancel_msg; + + selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); + if (!selftests) + goto err_cancel_msg; + + for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; + i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { + if (devlink->ops->selftest_check(devlink, i, extack)) { + err = nla_put_flag(msg, i); + if (err) + goto err_cancel_msg; + } + } + + nla_nest_end(msg, selftests); + genlmsg_end(msg, hdr); + return 0; + +err_cancel_msg: + genlmsg_cancel(msg, hdr); + return err; +} + +int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct sk_buff *msg; + int err; + + if (!devlink->ops->selftest_check) + return -EOPNOTSUPP; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_selftests_fill(msg, devlink, info->snd_portid, + info->snd_seq, 0, info->extack); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int +devlink_nl_cmd_selftests_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) +{ + if (!devlink->ops->selftest_check) + return 0; + + return devlink_nl_selftests_fill(msg, devlink, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, NLM_F_MULTI, + cb->extack); +} + +const struct devlink_cmd devl_cmd_selftests_get = { + .dump_one = devlink_nl_cmd_selftests_get_dump_one, +}; + +static int devlink_selftest_result_put(struct sk_buff *skb, unsigned int id, + enum devlink_selftest_status test_status) +{ + struct nlattr *result_attr; + + result_attr = nla_nest_start(skb, DEVLINK_ATTR_SELFTEST_RESULT); + if (!result_attr) + return -EMSGSIZE; + + if (nla_put_u32(skb, DEVLINK_ATTR_SELFTEST_RESULT_ID, id) || + nla_put_u8(skb, DEVLINK_ATTR_SELFTEST_RESULT_STATUS, + test_status)) + goto nla_put_failure; + + nla_nest_end(skb, result_attr); + return 0; + +nla_put_failure: + nla_nest_cancel(skb, result_attr); + return -EMSGSIZE; +} + +static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ID_MAX + 1] = { + [DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG }, +}; + +int devlink_nl_cmd_selftests_run(struct sk_buff *skb, struct genl_info *info) +{ + struct nlattr *tb[DEVLINK_ATTR_SELFTEST_ID_MAX + 1]; + struct devlink *devlink = info->user_ptr[0]; + struct nlattr *attrs, *selftests; + struct sk_buff *msg; + void *hdr; + int err; + int i; + + if (!devlink->ops->selftest_run || !devlink->ops->selftest_check) + return -EOPNOTSUPP; + + if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SELFTESTS)) + return -EINVAL; + + attrs = info->attrs[DEVLINK_ATTR_SELFTESTS]; + + err = nla_parse_nested(tb, DEVLINK_ATTR_SELFTEST_ID_MAX, attrs, + devlink_selftest_nl_policy, info->extack); + if (err < 0) + return err; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = -EMSGSIZE; + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, + &devlink_nl_family, 0, DEVLINK_CMD_SELFTESTS_RUN); + if (!hdr) + goto free_msg; + + if (devlink_nl_put_handle(msg, devlink)) + goto genlmsg_cancel; + + selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); + if (!selftests) + goto genlmsg_cancel; + + for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; + i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { + enum devlink_selftest_status test_status; + + if (nla_get_flag(tb[i])) { + if (!devlink->ops->selftest_check(devlink, i, + info->extack)) { + if (devlink_selftest_result_put(msg, i, + DEVLINK_SELFTEST_STATUS_SKIP)) + goto selftests_nest_cancel; + continue; + } + + test_status = devlink->ops->selftest_run(devlink, i, + info->extack); + if (devlink_selftest_result_put(msg, i, test_status)) + goto selftests_nest_cancel; + } + } + + nla_nest_end(msg, selftests); + genlmsg_end(msg, hdr); + return genlmsg_reply(msg, info); + +selftests_nest_cancel: + nla_nest_cancel(msg, selftests); +genlmsg_cancel: + genlmsg_cancel(msg, hdr); +free_msg: + nlmsg_free(msg); + return err; +} diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index a5c29ad20564..941174e157d4 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -216,3 +216,5 @@ int devlink_nl_cmd_eswitch_get_doit(struct sk_buff *skb, struct genl_info *info) int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_flash_update(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_selftests_run(struct sk_buff *skb, struct genl_info *info); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index ed3fb6133501..97d30ea98b00 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -143,10 +143,6 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), }; -static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ID_MAX + 1] = { - [DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG }, -}; - #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ WARN_ON_ONCE(!(devlink_port)->registered) #define ASSERT_DEVLINK_PORT_NOT_REGISTERED(devlink_port) \ @@ -3848,185 +3844,6 @@ int devlink_resources_validate(struct devlink *devlink, return err; } -static int -devlink_nl_selftests_fill(struct sk_buff *msg, struct devlink *devlink, - u32 portid, u32 seq, int flags, - struct netlink_ext_ack *extack) -{ - struct nlattr *selftests; - void *hdr; - int err; - int i; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, - DEVLINK_CMD_SELFTESTS_GET); - if (!hdr) - return -EMSGSIZE; - - err = -EMSGSIZE; - if (devlink_nl_put_handle(msg, devlink)) - goto err_cancel_msg; - - selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); - if (!selftests) - goto err_cancel_msg; - - for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; - i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { - if (devlink->ops->selftest_check(devlink, i, extack)) { - err = nla_put_flag(msg, i); - if (err) - goto err_cancel_msg; - } - } - - nla_nest_end(msg, selftests); - genlmsg_end(msg, hdr); - return 0; - -err_cancel_msg: - genlmsg_cancel(msg, hdr); - return err; -} - -static int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct sk_buff *msg; - int err; - - if (!devlink->ops->selftest_check) - return -EOPNOTSUPP; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_selftests_fill(msg, devlink, info->snd_portid, - info->snd_seq, 0, info->extack); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int -devlink_nl_cmd_selftests_get_dump_one(struct sk_buff *msg, - struct devlink *devlink, - struct netlink_callback *cb) -{ - if (!devlink->ops->selftest_check) - return 0; - - return devlink_nl_selftests_fill(msg, devlink, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, NLM_F_MULTI, - cb->extack); -} - -const struct devlink_cmd devl_cmd_selftests_get = { - .dump_one = devlink_nl_cmd_selftests_get_dump_one, -}; - -static int devlink_selftest_result_put(struct sk_buff *skb, unsigned int id, - enum devlink_selftest_status test_status) -{ - struct nlattr *result_attr; - - result_attr = nla_nest_start(skb, DEVLINK_ATTR_SELFTEST_RESULT); - if (!result_attr) - return -EMSGSIZE; - - if (nla_put_u32(skb, DEVLINK_ATTR_SELFTEST_RESULT_ID, id) || - nla_put_u8(skb, DEVLINK_ATTR_SELFTEST_RESULT_STATUS, - test_status)) - goto nla_put_failure; - - nla_nest_end(skb, result_attr); - return 0; - -nla_put_failure: - nla_nest_cancel(skb, result_attr); - return -EMSGSIZE; -} - -static int devlink_nl_cmd_selftests_run(struct sk_buff *skb, - struct genl_info *info) -{ - struct nlattr *tb[DEVLINK_ATTR_SELFTEST_ID_MAX + 1]; - struct devlink *devlink = info->user_ptr[0]; - struct nlattr *attrs, *selftests; - struct sk_buff *msg; - void *hdr; - int err; - int i; - - if (!devlink->ops->selftest_run || !devlink->ops->selftest_check) - return -EOPNOTSUPP; - - if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_SELFTESTS)) - return -EINVAL; - - attrs = info->attrs[DEVLINK_ATTR_SELFTESTS]; - - err = nla_parse_nested(tb, DEVLINK_ATTR_SELFTEST_ID_MAX, attrs, - devlink_selftest_nl_policy, info->extack); - if (err < 0) - return err; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = -EMSGSIZE; - hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, - &devlink_nl_family, 0, DEVLINK_CMD_SELFTESTS_RUN); - if (!hdr) - goto free_msg; - - if (devlink_nl_put_handle(msg, devlink)) - goto genlmsg_cancel; - - selftests = nla_nest_start(msg, DEVLINK_ATTR_SELFTESTS); - if (!selftests) - goto genlmsg_cancel; - - for (i = DEVLINK_ATTR_SELFTEST_ID_UNSPEC + 1; - i <= DEVLINK_ATTR_SELFTEST_ID_MAX; i++) { - enum devlink_selftest_status test_status; - - if (nla_get_flag(tb[i])) { - if (!devlink->ops->selftest_check(devlink, i, - info->extack)) { - if (devlink_selftest_result_put(msg, i, - DEVLINK_SELFTEST_STATUS_SKIP)) - goto selftests_nest_cancel; - continue; - } - - test_status = devlink->ops->selftest_run(devlink, i, - info->extack); - if (devlink_selftest_result_put(msg, i, test_status)) - goto selftests_nest_cancel; - } - } - - nla_nest_end(msg, selftests); - genlmsg_end(msg, hdr); - return genlmsg_reply(msg, info); - -selftests_nest_cancel: - nla_nest_cancel(msg, selftests); -genlmsg_cancel: - genlmsg_cancel(msg, hdr); -free_msg: - nlmsg_free(msg); - return err; -} - static const struct devlink_param devlink_param_generic[] = { { .id = DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, -- cgit v1.2.3 From 8d8ebd77f5ede7ff9e3072653221706655924191 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Feb 2023 09:40:58 +0000 Subject: ipv6: raw: add drop reasons Use existing helpers and drop reason codes for RAW input path. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/ipv6/raw.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index ada087b50541..2e1c8060b51a 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -355,17 +355,19 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr, static inline int rawv6_rcv_skb(struct sock *sk, struct sk_buff *skb) { + enum skb_drop_reason reason; + if ((raw6_sk(sk)->checksum || rcu_access_pointer(sk->sk_filter)) && skb_checksum_complete(skb)) { atomic_inc(&sk->sk_drops); - kfree_skb(skb); + kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM); return NET_RX_DROP; } /* Charge it to the socket. */ skb_dst_drop(skb); - if (sock_queue_rcv_skb(sk, skb) < 0) { - kfree_skb(skb); + if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) { + kfree_skb_reason(skb, reason); return NET_RX_DROP; } @@ -386,7 +388,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb) if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) { atomic_inc(&sk->sk_drops); - kfree_skb(skb); + kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY); return NET_RX_DROP; } @@ -410,7 +412,7 @@ int rawv6_rcv(struct sock *sk, struct sk_buff *skb) if (inet->hdrincl) { if (skb_checksum_complete(skb)) { atomic_inc(&sk->sk_drops); - kfree_skb(skb); + kfree_skb_reason(skb, SKB_DROP_REASON_SKB_CSUM); return NET_RX_DROP; } } -- cgit v1.2.3 From 42186e6c00352ce9df9e3f12b1ff82e61978d40b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Feb 2023 09:40:59 +0000 Subject: ipv4: raw: add drop reasons Use existing helpers and drop reason codes for RAW input path. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/ipv4/raw.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 006c1f0ed8b4..9865d15a08df 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -287,11 +287,13 @@ void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info) static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb) { + enum skb_drop_reason reason; + /* Charge it to the socket. */ ipv4_pktinfo_prepare(sk, skb); - if (sock_queue_rcv_skb(sk, skb) < 0) { - kfree_skb(skb); + if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) { + kfree_skb_reason(skb, reason); return NET_RX_DROP; } @@ -302,7 +304,7 @@ int raw_rcv(struct sock *sk, struct sk_buff *skb) { if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) { atomic_inc(&sk->sk_drops); - kfree_skb(skb); + kfree_skb_reason(skb, SKB_DROP_REASON_XFRM_POLICY); return NET_RX_DROP; } nf_reset_ct(skb); -- cgit v1.2.3 From 6579f5bacc2c4cbc5ef6abb45352416939d1f844 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Feb 2023 09:41:00 +0000 Subject: raw: use net_hash_mix() in hash function Some applications seem to rely on RAW sockets. If they use private netns, we can avoid piling all RAW sockets bound to a given protocol into a single bucket. Also place (struct raw_hashinfo).lock into its own cache line to limit false sharing. Alternative would be to have per-netns hashtables, but this seems too expensive for most netns where RAW sockets are not used. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/net/raw.h | 13 +++++++++++-- net/ipv4/raw.c | 13 +++++++------ net/ipv6/raw.c | 4 ++-- 3 files changed, 20 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/include/net/raw.h b/include/net/raw.h index 5e665934ebc7..2c004c20ed99 100644 --- a/include/net/raw.h +++ b/include/net/raw.h @@ -15,6 +15,8 @@ #include #include +#include +#include #include extern struct proto raw_prot; @@ -29,13 +31,20 @@ int raw_local_deliver(struct sk_buff *, int); int raw_rcv(struct sock *, struct sk_buff *); -#define RAW_HTABLE_SIZE MAX_INET_PROTOS +#define RAW_HTABLE_LOG 8 +#define RAW_HTABLE_SIZE (1U << RAW_HTABLE_LOG) struct raw_hashinfo { spinlock_t lock; - struct hlist_nulls_head ht[RAW_HTABLE_SIZE]; + + struct hlist_nulls_head ht[RAW_HTABLE_SIZE] ____cacheline_aligned; }; +static inline u32 raw_hashfunc(const struct net *net, u32 proto) +{ + return hash_32(net_hash_mix(net) ^ proto, RAW_HTABLE_LOG); +} + static inline void raw_hashinfo_init(struct raw_hashinfo *hashinfo) { int i; diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 9865d15a08df..94df935ee0c5 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -93,7 +93,7 @@ int raw_hash_sk(struct sock *sk) struct raw_hashinfo *h = sk->sk_prot->h.raw_hash; struct hlist_nulls_head *hlist; - hlist = &h->ht[inet_sk(sk)->inet_num & (RAW_HTABLE_SIZE - 1)]; + hlist = &h->ht[raw_hashfunc(sock_net(sk), inet_sk(sk)->inet_num)]; spin_lock(&h->lock); __sk_nulls_add_node_rcu(sk, hlist); @@ -160,9 +160,9 @@ static int icmp_filter(const struct sock *sk, const struct sk_buff *skb) * RFC 1122: SHOULD pass TOS value up to the transport layer. * -> It does. And not only TOS, but all IP header. */ -static int raw_v4_input(struct sk_buff *skb, const struct iphdr *iph, int hash) +static int raw_v4_input(struct net *net, struct sk_buff *skb, + const struct iphdr *iph, int hash) { - struct net *net = dev_net(skb->dev); struct hlist_nulls_head *hlist; struct hlist_nulls_node *hnode; int sdif = inet_sdif(skb); @@ -193,9 +193,10 @@ static int raw_v4_input(struct sk_buff *skb, const struct iphdr *iph, int hash) int raw_local_deliver(struct sk_buff *skb, int protocol) { - int hash = protocol & (RAW_HTABLE_SIZE - 1); + struct net *net = dev_net(skb->dev); - return raw_v4_input(skb, ip_hdr(skb), hash); + return raw_v4_input(net, skb, ip_hdr(skb), + raw_hashfunc(net, protocol)); } static void raw_err(struct sock *sk, struct sk_buff *skb, u32 info) @@ -271,7 +272,7 @@ void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info) struct sock *sk; int hash; - hash = protocol & (RAW_HTABLE_SIZE - 1); + hash = raw_hashfunc(net, protocol); hlist = &raw_v4_hashinfo.ht[hash]; rcu_read_lock(); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 2e1c8060b51a..bac9ba747bde 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -152,7 +152,7 @@ static bool ipv6_raw_deliver(struct sk_buff *skb, int nexthdr) saddr = &ipv6_hdr(skb)->saddr; daddr = saddr + 1; - hash = nexthdr & (RAW_HTABLE_SIZE - 1); + hash = raw_hashfunc(net, nexthdr); hlist = &raw_v6_hashinfo.ht[hash]; rcu_read_lock(); sk_nulls_for_each(sk, hnode, hlist) { @@ -338,7 +338,7 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr, struct sock *sk; int hash; - hash = nexthdr & (RAW_HTABLE_SIZE - 1); + hash = raw_hashfunc(net, nexthdr); hlist = &raw_v6_hashinfo.ht[hash]; rcu_read_lock(); sk_nulls_for_each(sk, hnode, hlist) { -- cgit v1.2.3 From b5dd4d6981717f7e2682c0419fe832328c7441cf Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 2 Feb 2023 16:26:39 +0800 Subject: net/smc: llc_conf_mutex refactor, replace it with rw_semaphore llc_conf_mutex was used to protect links and link related configurations in the same link group, for example, add or delete links. However, in most cases, the protected critical area has only read semantics and with no write semantics at all, such as obtaining a usable link or an available rmb_desc. This patch do simply code refactoring, replace mutex with rw_semaphore, replace mutex_lock with down_write and replace mutex_unlock with up_write. Theoretically, this replacement is equivalent, but after this patch, we can distinguish lock granularity according to different semantics of critical areas. Signed-off-by: D. Wythe Signed-off-by: David S. Miller --- net/smc/af_smc.c | 8 ++++---- net/smc/smc_core.c | 20 ++++++++++---------- net/smc/smc_core.h | 2 +- net/smc/smc_llc.c | 18 +++++++++--------- 4 files changed, 24 insertions(+), 24 deletions(-) (limited to 'net') diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 1c0fe9ba5358..6e93ad61d6a2 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -502,7 +502,7 @@ static int smcr_lgr_reg_sndbufs(struct smc_link *link, return -EINVAL; /* protect against parallel smcr_link_reg_buf() */ - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (!smc_link_active(&lgr->lnk[i])) continue; @@ -510,7 +510,7 @@ static int smcr_lgr_reg_sndbufs(struct smc_link *link, if (rc) break; } - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); return rc; } @@ -527,7 +527,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link, /* protect against parallel smc_llc_cli_rkey_exchange() and * parallel smcr_link_reg_buf() */ - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (!smc_link_active(&lgr->lnk[i])) continue; @@ -544,7 +544,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link, } rmb_desc->is_conf_rkey = true; out: - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl); return rc; } diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 7642b16c41d1..22ddce66d3ff 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1106,10 +1106,10 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb, rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY); if (!rc) { /* protect against smc_llc_cli_rkey_exchange() */ - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); smc_llc_do_delete_rkey(lgr, buf_desc); buf_desc->is_conf_rkey = false; - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl); } } @@ -1377,12 +1377,12 @@ static void smc_lgr_free(struct smc_link_group *lgr) int i; if (!lgr->is_smcd) { - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { if (lgr->lnk[i].state != SMC_LNK_UNUSED) smcr_link_clear(&lgr->lnk[i], false); } - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); smc_llc_lgr_clear(lgr); } @@ -1696,12 +1696,12 @@ static void smcr_link_down(struct smc_link *lnk) } else { if (lgr->llc_flow_lcl.type != SMC_LLC_FLOW_NONE) { /* another llc task is ongoing */ - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); wait_event_timeout(lgr->llc_flow_waiter, (list_empty(&lgr->list) || lgr->llc_flow_lcl.type == SMC_LLC_FLOW_NONE), SMC_LLC_WAIT_TIME); - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); } if (!list_empty(&lgr->list)) { smc_llc_send_delete_link(to_lnk, del_link_id, @@ -1761,9 +1761,9 @@ static void smc_link_down_work(struct work_struct *work) if (list_empty(&lgr->list)) return; wake_up_all(&lgr->llc_msg_waiter); - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); smcr_link_down(link); - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); } static int smc_vlan_by_tcpsk_walk(struct net_device *lower_dev, @@ -2247,7 +2247,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr, int i, rc = 0, cnt = 0; /* protect against parallel link reconfiguration */ - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { struct smc_link *lnk = &lgr->lnk[i]; @@ -2260,7 +2260,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr, cnt++; } out: - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); if (!rc && !cnt) rc = -EINVAL; return rc; diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 285f9bd8e232..bd7198b72dc1 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -298,7 +298,7 @@ struct smc_link_group { /* queue for llc events */ spinlock_t llc_event_q_lock; /* protects llc_event_q */ - struct mutex llc_conf_mutex; + struct rw_semaphore llc_conf_mutex; /* protects lgr reconfig. */ struct work_struct llc_add_link_work; struct work_struct llc_del_link_work; diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index 524649d0ab65..c0e73e436e98 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -1202,12 +1202,12 @@ static void smc_llc_process_cli_add_link(struct smc_link_group *lgr) qentry = smc_llc_flow_qentry_clr(&lgr->llc_flow_lcl); - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); if (smc_llc_is_local_add_link(&qentry->msg)) smc_llc_cli_add_link_invite(qentry->link, qentry); else smc_llc_cli_add_link(qentry->link, qentry); - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); } static int smc_llc_active_link_count(struct smc_link_group *lgr) @@ -1509,13 +1509,13 @@ static void smc_llc_process_srv_add_link(struct smc_link_group *lgr) qentry = smc_llc_flow_qentry_clr(&lgr->llc_flow_lcl); - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); rc = smc_llc_srv_add_link(link, qentry); if (!rc && lgr->type == SMC_LGR_SYMMETRIC) { /* delete any asymmetric link */ smc_llc_delete_asym_link(lgr); } - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); kfree(qentry); } @@ -1582,7 +1582,7 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr) smc_lgr_terminate_sched(lgr); goto out; } - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); /* delete single link */ for (lnk_idx = 0; lnk_idx < SMC_LINKS_PER_LGR_MAX; lnk_idx++) { if (lgr->lnk[lnk_idx].link_id != del_llc->link_num) @@ -1616,7 +1616,7 @@ static void smc_llc_process_cli_delete_link(struct smc_link_group *lgr) smc_lgr_terminate_sched(lgr); } out_unlock: - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); out: kfree(qentry); } @@ -1652,7 +1652,7 @@ static void smc_llc_process_srv_delete_link(struct smc_link_group *lgr) int active_links; int i; - mutex_lock(&lgr->llc_conf_mutex); + down_write(&lgr->llc_conf_mutex); qentry = smc_llc_flow_qentry_clr(&lgr->llc_flow_lcl); lnk = qentry->link; del_llc = &qentry->msg.delete_link; @@ -1708,7 +1708,7 @@ static void smc_llc_process_srv_delete_link(struct smc_link_group *lgr) smc_llc_add_link_local(lnk); } out: - mutex_unlock(&lgr->llc_conf_mutex); + up_write(&lgr->llc_conf_mutex); kfree(qentry); } @@ -2126,7 +2126,7 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, struct smc_sock *smc) spin_lock_init(&lgr->llc_flow_lock); init_waitqueue_head(&lgr->llc_flow_waiter); init_waitqueue_head(&lgr->llc_msg_waiter); - mutex_init(&lgr->llc_conf_mutex); + init_rwsem(&lgr->llc_conf_mutex); lgr->llc_testlink_time = READ_ONCE(net->smc.sysctl_smcr_testlink_time); } -- cgit v1.2.3 From f6421014e88983c5bb7a25c71c01ae6278a01df9 Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 2 Feb 2023 16:26:40 +0800 Subject: net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() Following is part of Off-CPU graph during frequent SMC-R short-lived processing: process_one_work (51.19%) smc_close_passive_work (28.36%) smcr_buf_unuse (28.34%) rwsem_down_write_slowpath (28.22%) smc_listen_work (22.83%) smc_clc_wait_msg (1.84%) smc_buf_create (20.45%) smcr_buf_map_usable_links rwsem_down_write_slowpath (20.43%) smcr_lgr_reg_rmbs (0.53%) rwsem_down_write_slowpath (0.43%) smc_llc_do_confirm_rkey (0.08%) We can clearly see that during the connection establishment time, waiting time of connections is not on IO, but on llc_conf_mutex. What is more important, the core critical area (smcr_buf_unuse() & smc_buf_create()) only perfroms read semantics on links, we can easily replace it with read semaphore. Signed-off-by: D. Wythe Signed-off-by: David S. Miller --- net/smc/smc_core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 22ddce66d3ff..10a9e094d9d3 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1106,10 +1106,10 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb, rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY); if (!rc) { /* protect against smc_llc_cli_rkey_exchange() */ - down_write(&lgr->llc_conf_mutex); + down_read(&lgr->llc_conf_mutex); smc_llc_do_delete_rkey(lgr, buf_desc); buf_desc->is_conf_rkey = false; - up_write(&lgr->llc_conf_mutex); + up_read(&lgr->llc_conf_mutex); smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl); } } @@ -2247,7 +2247,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr, int i, rc = 0, cnt = 0; /* protect against parallel link reconfiguration */ - down_write(&lgr->llc_conf_mutex); + down_read(&lgr->llc_conf_mutex); for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { struct smc_link *lnk = &lgr->lnk[i]; @@ -2260,7 +2260,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr, cnt++; } out: - up_write(&lgr->llc_conf_mutex); + up_read(&lgr->llc_conf_mutex); if (!rc && !cnt) rc = -EINVAL; return rc; -- cgit v1.2.3 From 4da687448de7b51dcca9d024c6049fa223f5363f Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 2 Feb 2023 16:26:41 +0800 Subject: net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() Unlike smc_buf_create() and smcr_buf_unuse(), smcr_lgr_reg_rmbs() is exclusive when assigned rmb_desc was not registered, although it can be executed in parallel when assigned rmb_desc was registered already and only performs read semtamics on it. Hence, we can not simply replace it with read semaphore. The idea here is that if the assigned rmb_desc was registered already, use read semaphore to protect the critical section, once the assigned rmb_desc was not registered, keep using keep write semaphore still to keep its exclusivity. Thanks to the reusable features of rmb_desc, which allows us to execute in parallel in most cases. Signed-off-by: D. Wythe Signed-off-by: David S. Miller --- net/smc/af_smc.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 6e93ad61d6a2..b163266e581a 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -519,11 +519,26 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link, struct smc_buf_desc *rmb_desc) { struct smc_link_group *lgr = link->lgr; + bool do_slow = false; int i, rc = 0; rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY); if (rc) return rc; + + down_read(&lgr->llc_conf_mutex); + for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) { + if (!smc_link_active(&lgr->lnk[i])) + continue; + if (!rmb_desc->is_reg_mr[link->link_idx]) { + up_read(&lgr->llc_conf_mutex); + goto slow_path; + } + } + /* mr register already */ + goto fast_path; +slow_path: + do_slow = true; /* protect against parallel smc_llc_cli_rkey_exchange() and * parallel smcr_link_reg_buf() */ @@ -535,7 +550,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link, if (rc) goto out; } - +fast_path: /* exchange confirm_rkey msg with peer */ rc = smc_llc_do_confirm_rkey(link, rmb_desc); if (rc) { @@ -544,7 +559,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link, } rmb_desc->is_conf_rkey = true; out: - up_write(&lgr->llc_conf_mutex); + do_slow ? up_write(&lgr->llc_conf_mutex) : up_read(&lgr->llc_conf_mutex); smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl); return rc; } -- cgit v1.2.3 From aff7bfed9097435ea38de919befbe2d7771a3e87 Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 2 Feb 2023 16:26:42 +0800 Subject: net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore It's clear that rmbs_lock and sndbufs_lock are aims to protect the rmbs list or the sndbufs list. During connection establieshment, smc_buf_get_slot() will always be invoked, and it only performs read semantics in rmbs list and sndbufs list. Based on the above considerations, we replace mutex with rw_semaphore. Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot() run concurrently, other part use down_write() to keep exclusive semantics. Signed-off-by: D. Wythe Signed-off-by: David S. Miller --- net/smc/smc_core.c | 55 +++++++++++++++++++++++++++--------------------------- net/smc/smc_core.h | 4 ++-- net/smc/smc_llc.c | 16 ++++++++-------- 3 files changed, 38 insertions(+), 37 deletions(-) (limited to 'net') diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 10a9e094d9d3..b330a1fa453e 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -854,8 +854,8 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini) lgr->freeing = 0; lgr->vlan_id = ini->vlan_id; refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */ - mutex_init(&lgr->sndbufs_lock); - mutex_init(&lgr->rmbs_lock); + init_rwsem(&lgr->sndbufs_lock); + init_rwsem(&lgr->rmbs_lock); rwlock_init(&lgr->conns_lock); for (i = 0; i < SMC_RMBE_SIZES; i++) { INIT_LIST_HEAD(&lgr->sndbufs[i]); @@ -1098,7 +1098,7 @@ err_out: static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb, struct smc_link_group *lgr) { - struct mutex *lock; /* lock buffer list */ + struct rw_semaphore *lock; /* lock buffer list */ int rc; if (is_rmb && buf_desc->is_conf_rkey && !list_empty(&lgr->list)) { @@ -1118,9 +1118,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb, /* buf registration failed, reuse not possible */ lock = is_rmb ? &lgr->rmbs_lock : &lgr->sndbufs_lock; - mutex_lock(lock); + down_write(lock); list_del(&buf_desc->list); - mutex_unlock(lock); + up_write(lock); smc_buf_free(lgr, is_rmb, buf_desc); } else { @@ -1224,15 +1224,16 @@ static void smcr_buf_unmap_lgr(struct smc_link *lnk) int i; for (i = 0; i < SMC_RMBE_SIZES; i++) { - mutex_lock(&lgr->rmbs_lock); + down_write(&lgr->rmbs_lock); list_for_each_entry_safe(buf_desc, bf, &lgr->rmbs[i], list) smcr_buf_unmap_link(buf_desc, true, lnk); - mutex_unlock(&lgr->rmbs_lock); - mutex_lock(&lgr->sndbufs_lock); + up_write(&lgr->rmbs_lock); + + down_write(&lgr->sndbufs_lock); list_for_each_entry_safe(buf_desc, bf, &lgr->sndbufs[i], list) smcr_buf_unmap_link(buf_desc, false, lnk); - mutex_unlock(&lgr->sndbufs_lock); + up_write(&lgr->sndbufs_lock); } } @@ -1990,19 +1991,19 @@ int smc_uncompress_bufsize(u8 compressed) * buffer size; if not available, return NULL */ static struct smc_buf_desc *smc_buf_get_slot(int compressed_bufsize, - struct mutex *lock, + struct rw_semaphore *lock, struct list_head *buf_list) { struct smc_buf_desc *buf_slot; - mutex_lock(lock); + down_read(lock); list_for_each_entry(buf_slot, buf_list, list) { if (cmpxchg(&buf_slot->used, 0, 1) == 0) { - mutex_unlock(lock); + up_read(lock); return buf_slot; } } - mutex_unlock(lock); + up_read(lock); return NULL; } @@ -2111,13 +2112,13 @@ int smcr_link_reg_buf(struct smc_link *link, struct smc_buf_desc *buf_desc) return 0; } -static int _smcr_buf_map_lgr(struct smc_link *lnk, struct mutex *lock, +static int _smcr_buf_map_lgr(struct smc_link *lnk, struct rw_semaphore *lock, struct list_head *lst, bool is_rmb) { struct smc_buf_desc *buf_desc, *bf; int rc = 0; - mutex_lock(lock); + down_write(lock); list_for_each_entry_safe(buf_desc, bf, lst, list) { if (!buf_desc->used) continue; @@ -2126,7 +2127,7 @@ static int _smcr_buf_map_lgr(struct smc_link *lnk, struct mutex *lock, goto out; } out: - mutex_unlock(lock); + up_write(lock); return rc; } @@ -2159,37 +2160,37 @@ int smcr_buf_reg_lgr(struct smc_link *lnk) int i, rc = 0; /* reg all RMBs for a new link */ - mutex_lock(&lgr->rmbs_lock); + down_write(&lgr->rmbs_lock); for (i = 0; i < SMC_RMBE_SIZES; i++) { list_for_each_entry_safe(buf_desc, bf, &lgr->rmbs[i], list) { if (!buf_desc->used) continue; rc = smcr_link_reg_buf(lnk, buf_desc); if (rc) { - mutex_unlock(&lgr->rmbs_lock); + up_write(&lgr->rmbs_lock); return rc; } } } - mutex_unlock(&lgr->rmbs_lock); + up_write(&lgr->rmbs_lock); if (lgr->buf_type == SMCR_PHYS_CONT_BUFS) return rc; /* reg all vzalloced sndbufs for a new link */ - mutex_lock(&lgr->sndbufs_lock); + down_write(&lgr->sndbufs_lock); for (i = 0; i < SMC_RMBE_SIZES; i++) { list_for_each_entry_safe(buf_desc, bf, &lgr->sndbufs[i], list) { if (!buf_desc->used || !buf_desc->is_vm) continue; rc = smcr_link_reg_buf(lnk, buf_desc); if (rc) { - mutex_unlock(&lgr->sndbufs_lock); + up_write(&lgr->sndbufs_lock); return rc; } } } - mutex_unlock(&lgr->sndbufs_lock); + up_write(&lgr->sndbufs_lock); return rc; } @@ -2309,8 +2310,8 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) struct smc_link_group *lgr = conn->lgr; struct list_head *buf_list; int bufsize, bufsize_short; + struct rw_semaphore *lock; /* lock buffer list */ bool is_dgraded = false; - struct mutex *lock; /* lock buffer list */ int sk_buf_size; if (is_rmb) @@ -2358,9 +2359,9 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb) SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb); SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, bufsize); buf_desc->used = 1; - mutex_lock(lock); + down_write(lock); list_add(&buf_desc->list, buf_list); - mutex_unlock(lock); + up_write(lock); break; /* found */ } @@ -2434,9 +2435,9 @@ int smc_buf_create(struct smc_sock *smc, bool is_smcd) /* create rmb */ rc = __smc_buf_create(smc, is_smcd, true); if (rc) { - mutex_lock(&smc->conn.lgr->sndbufs_lock); + down_write(&smc->conn.lgr->sndbufs_lock); list_del(&smc->conn.sndbuf_desc->list); - mutex_unlock(&smc->conn.lgr->sndbufs_lock); + up_write(&smc->conn.lgr->sndbufs_lock); smc_buf_free(smc->conn.lgr, false, smc->conn.sndbuf_desc); smc->conn.sndbuf_desc = NULL; } diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index bd7198b72dc1..08b457c2d294 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -252,9 +252,9 @@ struct smc_link_group { unsigned short vlan_id; /* vlan id of link group */ struct list_head sndbufs[SMC_RMBE_SIZES];/* tx buffers */ - struct mutex sndbufs_lock; /* protects tx buffers */ + struct rw_semaphore sndbufs_lock; /* protects tx buffers */ struct list_head rmbs[SMC_RMBE_SIZES]; /* rx buffers */ - struct mutex rmbs_lock; /* protects rx buffers */ + struct rw_semaphore rmbs_lock; /* protects rx buffers */ u8 id[SMC_LGR_ID_SIZE]; /* unique lgr id */ struct delayed_work free_work; /* delayed freeing of an lgr */ diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index c0e73e436e98..a0840b8c935b 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -608,7 +608,7 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext, prim_lnk_idx = link->link_idx; lnk_idx = link_new->link_idx; - mutex_lock(&lgr->rmbs_lock); + down_write(&lgr->rmbs_lock); ext->num_rkeys = lgr->conns_num; if (!ext->num_rkeys) goto out; @@ -628,7 +628,7 @@ static int smc_llc_fill_ext_v2(struct smc_llc_msg_add_link_v2_ext *ext, } len += i * sizeof(ext->rt[0]); out: - mutex_unlock(&lgr->rmbs_lock); + up_write(&lgr->rmbs_lock); return len; } @@ -889,7 +889,7 @@ static int smc_llc_cli_rkey_exchange(struct smc_link *link, int rc = 0; int i; - mutex_lock(&lgr->rmbs_lock); + down_write(&lgr->rmbs_lock); num_rkeys_send = lgr->conns_num; buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst); do { @@ -916,7 +916,7 @@ static int smc_llc_cli_rkey_exchange(struct smc_link *link, break; } while (num_rkeys_send || num_rkeys_recv); - mutex_unlock(&lgr->rmbs_lock); + up_write(&lgr->rmbs_lock); return rc; } @@ -999,14 +999,14 @@ static void smc_llc_save_add_link_rkeys(struct smc_link *link, ext = (struct smc_llc_msg_add_link_v2_ext *)((u8 *)lgr->wr_rx_buf_v2 + SMC_WR_TX_SIZE); max = min_t(u8, ext->num_rkeys, SMC_LLC_RKEYS_PER_MSG_V2); - mutex_lock(&lgr->rmbs_lock); + down_write(&lgr->rmbs_lock); for (i = 0; i < max; i++) { smc_rtoken_set(lgr, link->link_idx, link_new->link_idx, ext->rt[i].rmb_key, ext->rt[i].rmb_vaddr_new, ext->rt[i].rmb_key_new); } - mutex_unlock(&lgr->rmbs_lock); + up_write(&lgr->rmbs_lock); } static void smc_llc_save_add_link_info(struct smc_link *link, @@ -1313,7 +1313,7 @@ static int smc_llc_srv_rkey_exchange(struct smc_link *link, int rc = 0; int i; - mutex_lock(&lgr->rmbs_lock); + down_write(&lgr->rmbs_lock); num_rkeys_send = lgr->conns_num; buf_pos = smc_llc_get_first_rmb(lgr, &buf_lst); do { @@ -1338,7 +1338,7 @@ static int smc_llc_srv_rkey_exchange(struct smc_link *link, smc_llc_flow_qentry_del(&lgr->llc_flow_lcl); } while (num_rkeys_send || num_rkeys_recv); out: - mutex_unlock(&lgr->rmbs_lock); + up_write(&lgr->rmbs_lock); return rc; } -- cgit v1.2.3 From c00041cf1cb82fcc8002454c8c1d80bd7e9b7e3e Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:19 +0100 Subject: net: bridge: Set strict_start_type at two policies Make any attributes newly-added to br_port_policy or vlan_tunnel_policy parsed strictly, to prevent userspace from passing garbage. Note that this patchset only touches the former policy. The latter was adjusted for completeness' sake. There do not appear to be other _deprecated calls with non-NULL policies. Suggested-by: Ido Schimmel Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- net/bridge/br_netlink.c | 2 ++ net/bridge/br_netlink_tunnel.c | 3 +++ 2 files changed, 5 insertions(+) (limited to 'net') diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 4316cc82ae17..a6133d469885 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -858,6 +858,8 @@ static int br_afspec(struct net_bridge *br, } static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = { + [IFLA_BRPORT_UNSPEC] = { .strict_start_type = + IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 }, [IFLA_BRPORT_STATE] = { .type = NLA_U8 }, [IFLA_BRPORT_COST] = { .type = NLA_U32 }, [IFLA_BRPORT_PRIORITY] = { .type = NLA_U16 }, diff --git a/net/bridge/br_netlink_tunnel.c b/net/bridge/br_netlink_tunnel.c index 8914290c75d4..17abf092f7ca 100644 --- a/net/bridge/br_netlink_tunnel.c +++ b/net/bridge/br_netlink_tunnel.c @@ -188,6 +188,9 @@ initvars: } static const struct nla_policy vlan_tunnel_policy[IFLA_BRIDGE_VLAN_TUNNEL_MAX + 1] = { + [IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC] = { + .strict_start_type = IFLA_BRIDGE_VLAN_TUNNEL_FLAGS + 1 + }, [IFLA_BRIDGE_VLAN_TUNNEL_ID] = { .type = NLA_U32 }, [IFLA_BRIDGE_VLAN_TUNNEL_VID] = { .type = NLA_U16 }, [IFLA_BRIDGE_VLAN_TUNNEL_FLAGS] = { .type = NLA_U16 }, -- cgit v1.2.3 From 60977a0c63373bfc596b562b1e34e64ede6ef492 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:20 +0100 Subject: net: bridge: Add extack to br_multicast_new_port_group() Make it possible to set an extack in br_multicast_new_port_group(). Eventually, this function will check for per-port and per-port-vlan MDB maximums, and will use the extack to communicate the reason for the bounce. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- net/bridge/br_mdb.c | 5 +++-- net/bridge/br_multicast.c | 5 +++-- net/bridge/br_private.h | 3 ++- 3 files changed, 8 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 00e5743647b0..069061366541 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -849,7 +849,7 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg, } p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL, - MCAST_INCLUDE, cfg->rt_protocol); + MCAST_INCLUDE, cfg->rt_protocol, extack); if (unlikely(!p)) { NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group"); return -ENOMEM; @@ -1075,7 +1075,8 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg, } p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL, - cfg->filter_mode, cfg->rt_protocol); + cfg->filter_mode, cfg->rt_protocol, + extack); if (unlikely(!p)) { NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group"); return -ENOMEM; diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index dea1ee1bd095..de67d176838f 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1284,7 +1284,8 @@ struct net_bridge_port_group *br_multicast_new_port_group( unsigned char flags, const unsigned char *src, u8 filter_mode, - u8 rt_protocol) + u8 rt_protocol, + struct netlink_ext_ack *extack) { struct net_bridge_port_group *p; @@ -1387,7 +1388,7 @@ __br_multicast_add_group(struct net_bridge_mcast *brmctx, } p = br_multicast_new_port_group(pmctx->port, group, *pp, 0, src, - filter_mode, RTPROT_KERNEL); + filter_mode, RTPROT_KERNEL, NULL); if (unlikely(!p)) { p = ERR_PTR(-ENOMEM); goto out; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 15ef7fd508ee..1805c468ae03 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -956,7 +956,8 @@ br_multicast_new_port_group(struct net_bridge_port *port, const struct br_ip *group, struct net_bridge_port_group __rcu *next, unsigned char flags, const unsigned char *src, - u8 filter_mode, u8 rt_protocol); + u8 filter_mode, u8 rt_protocol, + struct netlink_ext_ack *extack); int br_mdb_hash_init(struct net_bridge *br); void br_mdb_hash_fini(struct net_bridge *br); void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp, -- cgit v1.2.3 From 1c85b80b20a13d07ec3a7d746ad52b7972c8c730 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:21 +0100 Subject: net: bridge: Move extack-setting to br_multicast_new_port_group() Now that br_multicast_new_port_group() takes an extack argument, move setting the extack there. The downside is that the error messages end up being less specific (the function cannot distinguish between (S,G) and (*,G) groups). However, the alternative is to check in the caller whether the callee set the extack, and if it didn't, set it. But that is only done when the callee is not exactly known. (E.g. in case of a notifier invocation.) Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- net/bridge/br_mdb.c | 9 +++------ net/bridge/br_multicast.c | 5 ++++- 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 069061366541..139de8ac532c 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -850,10 +850,9 @@ static int br_mdb_add_group_sg(const struct br_mdb_config *cfg, p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL, MCAST_INCLUDE, cfg->rt_protocol, extack); - if (unlikely(!p)) { - NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (S, G) port group"); + if (unlikely(!p)) return -ENOMEM; - } + rcu_assign_pointer(*pp, p); if (!(flags & MDB_PG_FLAGS_PERMANENT) && !cfg->src_entry) mod_timer(&p->timer, @@ -1077,10 +1076,8 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg, p = br_multicast_new_port_group(cfg->p, &cfg->group, *pp, flags, NULL, cfg->filter_mode, cfg->rt_protocol, extack); - if (unlikely(!p)) { - NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new (*, G) port group"); + if (unlikely(!p)) return -ENOMEM; - } err = br_mdb_add_group_srcs(cfg, p, brmctx, extack); if (err) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index de67d176838f..f9f4d54226fd 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1290,8 +1290,10 @@ struct net_bridge_port_group *br_multicast_new_port_group( struct net_bridge_port_group *p; p = kzalloc(sizeof(*p), GFP_ATOMIC); - if (unlikely(!p)) + if (unlikely(!p)) { + NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group"); return NULL; + } p->key.addr = *group; p->key.port = port; @@ -1306,6 +1308,7 @@ struct net_bridge_port_group *br_multicast_new_port_group( if (!br_multicast_is_star_g(group) && rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode, br_sg_port_rht_params)) { + NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group"); kfree(p); return NULL; } -- cgit v1.2.3 From 976b3858dd14914c5a9254535ad7440c99467944 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:22 +0100 Subject: net: bridge: Add br_multicast_del_port_group() Since cleaning up the effects of br_multicast_new_port_group() just consists of delisting and freeing the memory, the function br_mdb_add_group_star_g() inlines the corresponding code. In the following patches, number of per-port and per-port-VLAN MDB entries is going to be maintained, and that counter will have to be updated. Because that logic is going to be hidden in the br_multicast module, introduce a new hook intended to again remove a newly-created group. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- net/bridge/br_mdb.c | 3 +-- net/bridge/br_multicast.c | 11 +++++++++++ net/bridge/br_private.h | 1 + 3 files changed, 13 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 139de8ac532c..9f22ebfdc518 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -1099,8 +1099,7 @@ static int br_mdb_add_group_star_g(const struct br_mdb_config *cfg, return 0; err_del_port_group: - hlist_del_init(&p->mglist); - kfree(p); + br_multicast_del_port_group(p); return err; } diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index f9f4d54226fd..08da724ebfdd 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1326,6 +1326,17 @@ struct net_bridge_port_group *br_multicast_new_port_group( return p; } +void br_multicast_del_port_group(struct net_bridge_port_group *p) +{ + struct net_bridge_port *port = p->key.port; + + hlist_del_init(&p->mglist); + if (!br_multicast_is_star_g(&p->key.addr)) + rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode, + br_sg_port_rht_params); + kfree(p); +} + void br_multicast_host_join(const struct net_bridge_mcast *brmctx, struct net_bridge_mdb_entry *mp, bool notify) { diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 1805c468ae03..e4069e27b5c6 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -958,6 +958,7 @@ br_multicast_new_port_group(struct net_bridge_port *port, unsigned char flags, const unsigned char *src, u8 filter_mode, u8 rt_protocol, struct netlink_ext_ack *extack); +void br_multicast_del_port_group(struct net_bridge_port_group *p); int br_mdb_hash_init(struct net_bridge *br); void br_mdb_hash_fini(struct net_bridge *br); void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp, -- cgit v1.2.3 From eceb30854f6b7d354ae52551b11aef2e2fa3e82e Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:23 +0100 Subject: net: bridge: Change a cleanup in br_multicast_new_port_group() to goto This function is getting more to clean up in the following patches. Structuring the cleanups in one labeled block will allow reusing the same cleanup from several places. Signed-off-by: Petr Machata Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- net/bridge/br_multicast.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 08da724ebfdd..51b622afdb67 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1309,8 +1309,7 @@ struct net_bridge_port_group *br_multicast_new_port_group( rhashtable_lookup_insert_fast(&port->br->sg_port_tbl, &p->rhnode, br_sg_port_rht_params)) { NL_SET_ERR_MSG_MOD(extack, "Couldn't insert new port group"); - kfree(p); - return NULL; + goto free_out; } rcu_assign_pointer(p->next, next); @@ -1324,6 +1323,10 @@ struct net_bridge_port_group *br_multicast_new_port_group( eth_broadcast_addr(p->eth_addr); return p; + +free_out: + kfree(p); + return NULL; } void br_multicast_del_port_group(struct net_bridge_port_group *p) -- cgit v1.2.3 From d47230a3480a5f6df98c5870ba26843850a600d5 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:24 +0100 Subject: net: bridge: Add a tracepoint for MDB overflows The following patch will add two more maximum MDB allowances to the global one, mcast_hash_max, that exists today. In all these cases, attempts to add MDB entries above the configured maximums through netlink, fail noisily and obviously. Such visibility is missing when adding entries through the control plane traffic, by IGMP or MLD packets. To improve visibility in those cases, add a trace point that reports the violation, including the relevant netdevice (be it a slave or the bridge itself), and the MDB entry parameters: # perf record -e bridge:br_mdb_full & # [...] # perf script | cut -d: -f4- dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10 CC: Steven Rostedt CC: linux-trace-kernel@vger.kernel.org Signed-off-by: Petr Machata Reviewed-by: Steven Rostedt (Google) Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/trace/events/bridge.h | 58 +++++++++++++++++++++++++++++++++++++++++++ net/core/net-traces.c | 1 + 2 files changed, 59 insertions(+) (limited to 'net') diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h index 6b200059c2c5..a6b3a4e409f0 100644 --- a/include/trace/events/bridge.h +++ b/include/trace/events/bridge.h @@ -122,6 +122,64 @@ TRACE_EVENT(br_fdb_update, __entry->flags) ); +TRACE_EVENT(br_mdb_full, + + TP_PROTO(const struct net_device *dev, + const struct br_ip *group), + + TP_ARGS(dev, group), + + TP_STRUCT__entry( + __string(dev, dev->name) + __field(int, af) + __field(u16, vid) + __array(__u8, src, 16) + __array(__u8, grp, 16) + __array(__u8, grpmac, ETH_ALEN) /* For af == 0. */ + ), + + TP_fast_assign( + struct in6_addr *in6; + + __assign_str(dev, dev->name); + __entry->vid = group->vid; + + if (!group->proto) { + __entry->af = 0; + + memset(__entry->src, 0, sizeof(__entry->src)); + memset(__entry->grp, 0, sizeof(__entry->grp)); + memcpy(__entry->grpmac, group->dst.mac_addr, ETH_ALEN); + } else if (group->proto == htons(ETH_P_IP)) { + __entry->af = AF_INET; + + in6 = (struct in6_addr *)__entry->src; + ipv6_addr_set_v4mapped(group->src.ip4, in6); + + in6 = (struct in6_addr *)__entry->grp; + ipv6_addr_set_v4mapped(group->dst.ip4, in6); + + memset(__entry->grpmac, 0, ETH_ALEN); + +#if IS_ENABLED(CONFIG_IPV6) + } else { + __entry->af = AF_INET6; + + in6 = (struct in6_addr *)__entry->src; + *in6 = group->src.ip6; + + in6 = (struct in6_addr *)__entry->grp; + *in6 = group->dst.ip6; + + memset(__entry->grpmac, 0, ETH_ALEN); +#endif + } + ), + + TP_printk("dev %s af %u src %pI6c grp %pI6c/%pM vid %u", + __get_str(dev), __entry->af, __entry->src, __entry->grp, + __entry->grpmac, __entry->vid) +); #endif /* _TRACE_BRIDGE_H */ diff --git a/net/core/net-traces.c b/net/core/net-traces.c index ee7006bbe49b..805b7385dd8d 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -41,6 +41,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add); EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add); EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete); EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_update); +EXPORT_TRACEPOINT_SYMBOL_GPL(br_mdb_full); #endif #if IS_ENABLED(CONFIG_PAGE_POOL) -- cgit v1.2.3 From b57e8d870d522d905720052e6fd9c3bc9bc5f6fb Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:25 +0100 Subject: net: bridge: Maintain number of MDB entries in net_bridge_mcast_port The MDB maintained by the bridge is limited. When the bridge is configured for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its capacity. In SW datapath, the capacity is configurable through the IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a similar limit exists in the HW datapath for purposes of offloading. In order to prevent the issue of unilateral exhaustion of MDB resources, introduce two parameters in each of two contexts: - Per-port and per-port-VLAN number of MDB entries that the port is member in. - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled) per-port-VLAN maximum permitted number of MDB entries, or 0 for no limit. The per-port multicast context is used for tracking of MDB entries for the port as a whole. This is available for all bridges. The per-port-VLAN multicast context is then only available on VLAN-filtering bridges on VLANs that have multicast snooping on. With these changes in place, it will be possible to configure MDB limit for bridge as a whole, or any one port as a whole, or any single port-VLAN. Note that unlike the global limit, exhaustion of the per-port and per-port-VLAN maximums does not cause disablement of multicast snooping. It is also permitted to configure the local limit larger than hash_max, even though that is not useful. In this patch, introduce only the accounting for number of entries, and the max field itself, but not the means to toggle the max. The next patch introduces the netlink APIs to toggle and read the values. Signed-off-by: Petr Machata Acked-by: Nikolay Aleksandrov Reviewed-by: Ido Schimmel Signed-off-by: David S. Miller --- net/bridge/br_multicast.c | 136 +++++++++++++++++++++++++++++++++++++++++++++- net/bridge/br_private.h | 2 + 2 files changed, 137 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 51b622afdb67..b6aa0bad5817 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -31,6 +31,7 @@ #include #include #endif +#include #include "br_private.h" #include "br_private_mcast_eht.h" @@ -234,6 +235,29 @@ out: return pmctx; } +static struct net_bridge_mcast_port * +br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid) +{ + struct net_bridge_mcast_port *pmctx = NULL; + struct net_bridge_vlan *vlan; + + lockdep_assert_held_once(&port->br->multicast_lock); + + if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) + return NULL; + + /* Take RCU to access the vlan. */ + rcu_read_lock(); + + vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid); + if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx)) + pmctx = &vlan->port_mcast_ctx; + + rcu_read_unlock(); + + return pmctx; +} + /* when snooping we need to check if the contexts should be used * in the following order: * - if pmctx is non-NULL (port), check if it should be used @@ -668,6 +692,86 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src, __br_multicast_del_group_src(src); } +static int +br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx, + struct netlink_ext_ack *extack, + const char *what) +{ + u32 max = READ_ONCE(pmctx->mdb_max_entries); + u32 n = READ_ONCE(pmctx->mdb_n_entries); + + if (max && n >= max) { + NL_SET_ERR_MSG_FMT_MOD(extack, "%s is already in %u groups, and mcast_max_groups=%u", + what, n, max); + return -E2BIG; + } + + WRITE_ONCE(pmctx->mdb_n_entries, n + 1); + return 0; +} + +static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx) +{ + u32 n = READ_ONCE(pmctx->mdb_n_entries); + + WARN_ON_ONCE(n == 0); + WRITE_ONCE(pmctx->mdb_n_entries, n - 1); +} + +static int br_multicast_port_ngroups_inc(struct net_bridge_port *port, + const struct br_ip *group, + struct netlink_ext_ack *extack) +{ + struct net_bridge_mcast_port *pmctx; + int err; + + lockdep_assert_held_once(&port->br->multicast_lock); + + /* Always count on the port context. */ + err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack, + "Port"); + if (err) { + trace_br_mdb_full(port->dev, group); + return err; + } + + /* Only count on the VLAN context if VID is given, and if snooping on + * that VLAN is enabled. + */ + if (!group->vid) + return 0; + + pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid); + if (!pmctx) + return 0; + + err = br_multicast_port_ngroups_inc_one(pmctx, extack, "Port-VLAN"); + if (err) { + trace_br_mdb_full(port->dev, group); + goto dec_one_out; + } + + return 0; + +dec_one_out: + br_multicast_port_ngroups_dec_one(&port->multicast_ctx); + return err; +} + +static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid) +{ + struct net_bridge_mcast_port *pmctx; + + lockdep_assert_held_once(&port->br->multicast_lock); + + if (vid) { + pmctx = br_multicast_port_vid_to_port_ctx(port, vid); + if (pmctx) + br_multicast_port_ngroups_dec_one(pmctx); + } + br_multicast_port_ngroups_dec_one(&port->multicast_ctx); +} + static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc) { struct net_bridge_port_group *pg; @@ -702,6 +806,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp, } else { br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE); } + br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid); hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list); queue_work(system_long_wq, &br->mcast_gc_work); @@ -1165,6 +1270,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br, return mp; if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) { + trace_br_mdb_full(br->dev, group); br_mc_disabled_update(br->dev, false, NULL); br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false); return ERR_PTR(-E2BIG); @@ -1288,11 +1394,16 @@ struct net_bridge_port_group *br_multicast_new_port_group( struct netlink_ext_ack *extack) { struct net_bridge_port_group *p; + int err; + + err = br_multicast_port_ngroups_inc(port, group, extack); + if (err) + return NULL; p = kzalloc(sizeof(*p), GFP_ATOMIC); if (unlikely(!p)) { NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group"); - return NULL; + goto dec_out; } p->key.addr = *group; @@ -1326,18 +1437,22 @@ struct net_bridge_port_group *br_multicast_new_port_group( free_out: kfree(p); +dec_out: + br_multicast_port_ngroups_dec(port, group->vid); return NULL; } void br_multicast_del_port_group(struct net_bridge_port_group *p) { struct net_bridge_port *port = p->key.port; + __u16 vid = p->key.addr.vid; hlist_del_init(&p->mglist); if (!br_multicast_is_star_g(&p->key.addr)) rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode, br_sg_port_rht_params); kfree(p); + br_multicast_port_ngroups_dec(port, vid); } void br_multicast_host_join(const struct net_bridge_mcast *brmctx, @@ -1951,6 +2066,25 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx) br_ip4_multicast_add_router(brmctx, pmctx); br_ip6_multicast_add_router(brmctx, pmctx); } + + if (br_multicast_port_ctx_is_vlan(pmctx)) { + struct net_bridge_port_group *pg; + u32 n = 0; + + /* The mcast_n_groups counter might be wrong. First, + * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries + * are flushed, thus mcast_n_groups after the toggle does not + * reflect the true values. And second, permanent entries added + * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected + * either. Thus we have to refresh the counter. + */ + + hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) { + if (pg->key.addr.vid == pmctx->vlan->vid) + n++; + } + WRITE_ONCE(pmctx->mdb_n_entries, n); + } } void br_multicast_enable_port(struct net_bridge_port *port) diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index e4069e27b5c6..49f411a0a1f1 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -126,6 +126,8 @@ struct net_bridge_mcast_port { struct hlist_node ip6_rlist; #endif /* IS_ENABLED(CONFIG_IPV6) */ unsigned char multicast_router; + u32 mdb_n_entries; + u32 mdb_max_entries; #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */ }; -- cgit v1.2.3 From a1aee20d5db29dc73331067b6a338eb650f0b5f1 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Thu, 2 Feb 2023 18:59:26 +0100 Subject: net: bridge: Add netlink knobs for number / maximum MDB entries The previous patch added accounting for number of MDB entries per port and per port-VLAN, and the logic to verify that these values stay within configured bounds. However it didn't provide means to actually configure those bounds or read the occupancy. This patch does that. Two new netlink attributes are added for the MDB occupancy: IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy. And another two for the maximum number of MDB entries: IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one. Note that the two new IFLA_BRPORT_ attributes prompt bumping of RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough. The new attributes are used like this: # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \ mcast_vlan_snooping 1 mcast_querier 1 # ip link set dev v1 master br # bridge vlan add dev v1 vid 2 # bridge vlan set dev v1 vid 1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1. # bridge link set dev v1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2 Error: bridge: Port is already in 1 groups, and mcast_max_groups=1. # bridge -d link show 5: v1@v2: mtu 1500 master br [...] [...] mcast_n_groups 1 mcast_max_groups 1 # bridge -d vlan show port vlan-id br 1 PVID Egress Untagged state forwarding mcast_router 1 v1 1 PVID Egress Untagged [...] mcast_n_groups 1 mcast_max_groups 1 2 [...] mcast_n_groups 0 mcast_max_groups 0 Signed-off-by: Petr Machata Acked-by: Nikolay Aleksandrov Reviewed-by: Ido Schimmel Signed-off-by: David S. Miller --- include/uapi/linux/if_bridge.h | 2 ++ include/uapi/linux/if_link.h | 2 ++ net/bridge/br_multicast.c | 15 +++++++++++++++ net/bridge/br_netlink.c | 17 ++++++++++++++++- net/bridge/br_private.h | 6 +++++- net/bridge/br_vlan.c | 11 +++++++---- net/bridge/br_vlan_options.c | 27 ++++++++++++++++++++++++++- net/core/rtnetlink.c | 2 +- 8 files changed, 74 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index d9de241d90f9..d60c456710b3 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -523,6 +523,8 @@ enum { BRIDGE_VLANDB_ENTRY_TUNNEL_INFO, BRIDGE_VLANDB_ENTRY_STATS, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER, + BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, + BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS, __BRIDGE_VLANDB_ENTRY_MAX, }; #define BRIDGE_VLANDB_ENTRY_MAX (__BRIDGE_VLANDB_ENTRY_MAX - 1) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 02b87e4c65be..57ceb788250f 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -567,6 +567,8 @@ enum { IFLA_BRPORT_MCAST_EHT_HOSTS_CNT, IFLA_BRPORT_LOCKED, IFLA_BRPORT_MAB, + IFLA_BRPORT_MCAST_N_GROUPS, + IFLA_BRPORT_MCAST_MAX_GROUPS, __IFLA_BRPORT_MAX }; #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index b6aa0bad5817..96d1fc78dd39 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -772,6 +772,21 @@ static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid) br_multicast_port_ngroups_dec_one(&port->multicast_ctx); } +u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx) +{ + return READ_ONCE(pmctx->mdb_n_entries); +} + +void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max) +{ + WRITE_ONCE(pmctx->mdb_max_entries, max); +} + +u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx) +{ + return READ_ONCE(pmctx->mdb_max_entries); +} + static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc) { struct net_bridge_port_group *pg; diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index a6133d469885..9173e52b89e2 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -202,6 +202,8 @@ static inline size_t br_port_info_size(void) + nla_total_size_64bit(sizeof(u64)) /* IFLA_BRPORT_HOLD_TIMER */ #ifdef CONFIG_BRIDGE_IGMP_SNOOPING + nla_total_size(sizeof(u8)) /* IFLA_BRPORT_MULTICAST_ROUTER */ + + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_MCAST_N_GROUPS */ + + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_MCAST_MAX_GROUPS */ #endif + nla_total_size(sizeof(u16)) /* IFLA_BRPORT_GROUP_FWD_MASK */ + nla_total_size(sizeof(u8)) /* IFLA_BRPORT_MRP_RING_OPEN */ @@ -298,7 +300,11 @@ static int br_port_fill_attrs(struct sk_buff *skb, nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT, p->multicast_eht_hosts_limit) || nla_put_u32(skb, IFLA_BRPORT_MCAST_EHT_HOSTS_CNT, - p->multicast_eht_hosts_cnt)) + p->multicast_eht_hosts_cnt) || + nla_put_u32(skb, IFLA_BRPORT_MCAST_N_GROUPS, + br_multicast_ngroups_get(&p->multicast_ctx)) || + nla_put_u32(skb, IFLA_BRPORT_MCAST_MAX_GROUPS, + br_multicast_ngroups_get_max(&p->multicast_ctx))) return -EMSGSIZE; #endif @@ -883,6 +889,8 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = { [IFLA_BRPORT_MAB] = { .type = NLA_U8 }, [IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 }, [IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 }, + [IFLA_BRPORT_MCAST_N_GROUPS] = { .type = NLA_REJECT }, + [IFLA_BRPORT_MCAST_MAX_GROUPS] = { .type = NLA_U32 }, }; /* Change the state of the port and notify spanning tree */ @@ -1017,6 +1025,13 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[], if (err) return err; } + + if (tb[IFLA_BRPORT_MCAST_MAX_GROUPS]) { + u32 max_groups; + + max_groups = nla_get_u32(tb[IFLA_BRPORT_MCAST_MAX_GROUPS]); + br_multicast_ngroups_set_max(&p->multicast_ctx, max_groups); + } #endif if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) { diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 49f411a0a1f1..cef5f6ea850c 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -978,6 +978,9 @@ void br_multicast_uninit_stats(struct net_bridge *br); void br_multicast_get_stats(const struct net_bridge *br, const struct net_bridge_port *p, struct br_mcast_stats *dest); +u32 br_multicast_ngroups_get(const struct net_bridge_mcast_port *pmctx); +void br_multicast_ngroups_set_max(struct net_bridge_mcast_port *pmctx, u32 max); +u32 br_multicast_ngroups_get_max(const struct net_bridge_mcast_port *pmctx); void br_mdb_init(void); void br_mdb_uninit(void); void br_multicast_host_join(const struct net_bridge_mcast *brmctx, @@ -1761,7 +1764,8 @@ static inline u16 br_vlan_flags(const struct net_bridge_vlan *v, u16 pvid) #ifdef CONFIG_BRIDGE_VLAN_FILTERING bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr, const struct net_bridge_vlan *range_end); -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v); +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v, + const struct net_bridge_port *p); size_t br_vlan_opts_nl_size(void); int br_vlan_process_options(const struct net_bridge *br, const struct net_bridge_port *p, diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index bc75fa1e4666..8a3dbc09ba38 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -1816,6 +1816,7 @@ out_err: /* v_opts is used to dump the options which must be equal in the whole range */ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range, const struct net_bridge_vlan *v_opts, + const struct net_bridge_port *p, u16 flags, bool dump_stats) { @@ -1842,7 +1843,7 @@ static bool br_vlan_fill_vids(struct sk_buff *skb, u16 vid, u16 vid_range, goto out_err; if (v_opts) { - if (!br_vlan_opts_fill(skb, v_opts)) + if (!br_vlan_opts_fill(skb, v_opts, p)) goto out_err; if (dump_stats && !br_vlan_stats_fill(skb, v_opts)) @@ -1925,7 +1926,7 @@ void br_vlan_notify(const struct net_bridge *br, goto out_kfree; } - if (!br_vlan_fill_vids(skb, vid, vid_range, v, flags, false)) + if (!br_vlan_fill_vids(skb, vid, vid_range, v, p, flags, false)) goto out_err; nlmsg_end(skb, nlh); @@ -2030,7 +2031,7 @@ static int br_vlan_dump_dev(const struct net_device *dev, if (!br_vlan_fill_vids(skb, range_start->vid, range_end->vid, range_start, - vlan_flags, dump_stats)) { + p, vlan_flags, dump_stats)) { err = -EMSGSIZE; break; } @@ -2056,7 +2057,7 @@ update_end: else if (!dump_global && !br_vlan_fill_vids(skb, range_start->vid, range_end->vid, range_start, - br_vlan_flags(range_start, pvid), + p, br_vlan_flags(range_start, pvid), dump_stats)) err = -EMSGSIZE; } @@ -2131,6 +2132,8 @@ static const struct nla_policy br_vlan_db_policy[BRIDGE_VLANDB_ENTRY_MAX + 1] = [BRIDGE_VLANDB_ENTRY_STATE] = { .type = NLA_U8 }, [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO] = { .type = NLA_NESTED }, [BRIDGE_VLANDB_ENTRY_MCAST_ROUTER] = { .type = NLA_U8 }, + [BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS] = { .type = NLA_REJECT }, + [BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS] = { .type = NLA_U32 }, }; static int br_vlan_rtm_process_one(struct net_device *dev, diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c index a2724d03278c..e378c2f3a9e2 100644 --- a/net/bridge/br_vlan_options.c +++ b/net/bridge/br_vlan_options.c @@ -48,7 +48,8 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr, curr_mc_rtr == range_mc_rtr; } -bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v) +bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v, + const struct net_bridge_port *p) { if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) || !__vlan_tun_put(skb, v)) @@ -58,6 +59,12 @@ bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v) if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_MCAST_ROUTER, br_vlan_multicast_router(v))) return false; + if (p && !br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx) && + (nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, + br_multicast_ngroups_get(&v->port_mcast_ctx)) || + nla_put_u32(skb, BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS, + br_multicast_ngroups_get_max(&v->port_mcast_ctx)))) + return false; #endif return true; @@ -70,6 +77,8 @@ size_t br_vlan_opts_nl_size(void) + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_TINFO_ID */ #ifdef CONFIG_BRIDGE_IGMP_SNOOPING + nla_total_size(sizeof(u8)) /* BRIDGE_VLANDB_ENTRY_MCAST_ROUTER */ + + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS */ + + nla_total_size(sizeof(u32)) /* BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS */ #endif + 0; } @@ -212,6 +221,22 @@ static int br_vlan_process_one_opts(const struct net_bridge *br, return err; *changed = true; } + if (tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]) { + u32 val; + + if (!p) { + NL_SET_ERR_MSG_MOD(extack, "Can't set mcast_max_groups for non-port vlans"); + return -EINVAL; + } + if (br_multicast_port_ctx_vlan_disabled(&v->port_mcast_ctx)) { + NL_SET_ERR_MSG_MOD(extack, "Multicast snooping disabled on this VLAN"); + return -EINVAL; + } + + val = nla_get_u32(tb[BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS]); + br_multicast_ngroups_set_max(&v->port_mcast_ctx, val); + *changed = true; + } #endif return 0; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index b9f584955b77..5d8eb57867a9 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -58,7 +58,7 @@ #include "dev.h" #define RTNL_MAX_TYPE 50 -#define RTNL_SLAVE_MAX_TYPE 40 +#define RTNL_SLAVE_MAX_TYPE 42 struct rtnl_link { rtnl_doit_func doit; -- cgit v1.2.3 From 542bcea4be866b14b3a5c8e90773329066656c43 Mon Sep 17 00:00:00 2001 From: Qingfang DENG Date: Fri, 3 Feb 2023 09:16:11 +0800 Subject: net: page_pool: use in_softirq() instead We use BH context only for synchronization, so we don't care if it's actually serving softirq or not. As a side node, in case of threaded NAPI, in_serving_softirq() will return false because it's in process context with BH off, making page_pool_recycle_in_cache() unreachable. Signed-off-by: Qingfang DENG Tested-by: Felix Fietkau Signed-off-by: David S. Miller --- include/net/page_pool.h | 4 ++-- net/core/page_pool.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 813c93499f20..34bf531ffc8d 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -386,7 +386,7 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) static inline void page_pool_ring_lock(struct page_pool *pool) __acquires(&pool->ring.producer_lock) { - if (in_serving_softirq()) + if (in_softirq()) spin_lock(&pool->ring.producer_lock); else spin_lock_bh(&pool->ring.producer_lock); @@ -395,7 +395,7 @@ static inline void page_pool_ring_lock(struct page_pool *pool) static inline void page_pool_ring_unlock(struct page_pool *pool) __releases(&pool->ring.producer_lock) { - if (in_serving_softirq()) + if (in_softirq()) spin_unlock(&pool->ring.producer_lock); else spin_unlock_bh(&pool->ring.producer_lock); diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 9b203d8660e4..193c18799865 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -511,8 +511,8 @@ static void page_pool_return_page(struct page_pool *pool, struct page *page) static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page) { int ret; - /* BH protection not needed if current is serving softirq */ - if (in_serving_softirq()) + /* BH protection not needed if current is softirq */ + if (in_softirq()) ret = ptr_ring_produce(&pool->ring, page); else ret = ptr_ring_produce_bh(&pool->ring, page); @@ -570,7 +570,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, page_pool_dma_sync_for_device(pool, page, dma_sync_size); - if (allow_direct && in_serving_softirq() && + if (allow_direct && in_softirq() && page_pool_recycle_in_cache(page, pool)) return NULL; -- cgit v1.2.3 From 9dde0cd3b10f63bc4100ebadc7e32275baabfa68 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Fri, 3 Feb 2023 13:59:29 +0100 Subject: net: introduce skb_poison_list and use in kfree_skb_list First user of skb_poison_list is in kfree_skb_list_reason, to catch bugs earlier like introduced in commit eedade12f4cb ("net: kfree_skb_list use kmem_cache_free_bulk"). For completeness mentioned bug have been fixed in commit f72ff8b81ebc ("net: fix kfree_skb_list use of skb_mark_not_on_list"). In case of a bug like mentioned commit we would have seen OOPS with: general protection fault, probably for non-canonical address 0xdead000000000870 And content of one the registers e.g. R13: dead000000000800 In this case skb->len is at offset 112 bytes (0x70) why fault happens at 0x800+0x70 = 0x870 Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller --- include/linux/poison.h | 3 +++ include/linux/skbuff.h | 7 +++++++ net/core/skbuff.c | 4 +++- 3 files changed, 13 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/linux/poison.h b/include/linux/poison.h index 2d3249eb0e62..2823f90fdab4 100644 --- a/include/linux/poison.h +++ b/include/linux/poison.h @@ -81,6 +81,9 @@ /********** net/core/page_pool.c **********/ #define PP_SIGNATURE (0x40 + POISON_POINTER_DELTA) +/********** net/core/skbuff.c **********/ +#define SKB_LIST_POISON_NEXT ((void *)(0x800 + POISON_POINTER_DELTA)) + /********** kernel/bpf/ **********/ #define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA)) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5ba12185f43e..1fa95b916342 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1738,6 +1738,13 @@ static inline void skb_mark_not_on_list(struct sk_buff *skb) skb->next = NULL; } +static inline void skb_poison_list(struct sk_buff *skb) +{ +#ifdef CONFIG_DEBUG_NET + skb->next = SKB_LIST_POISON_NEXT; +#endif +} + /* Iterate through singly-linked GSO fragments of an skb. */ #define skb_list_walk_safe(first, skb, next_skb) \ for ((skb) = (first), (next_skb) = (skb) ? (skb)->next : NULL; (skb); \ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 44a19805c355..624e9e4ec116 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1000,8 +1000,10 @@ kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) while (segs) { struct sk_buff *next = segs->next; - if (__kfree_skb_reason(segs, reason)) + if (__kfree_skb_reason(segs, reason)) { + skb_poison_list(segs); kfree_skb_add_bulk(segs, &sa, reason); + } segs = next; } -- cgit v1.2.3 From feb2cf3dcfb930aec2ca65c66d1365543d5ba943 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:52:55 +0200 Subject: net/sched: mqprio: refactor nlattr parsing to a separate function mqprio_init() is quite large and unwieldy to add more code to. Split the netlink attribute parsing to a dedicated function. Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/sch_mqprio.c | 114 +++++++++++++++++++++++++++---------------------- 1 file changed, 63 insertions(+), 51 deletions(-) (limited to 'net') diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 4c68abaa289b..d2d8a02ded05 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -130,6 +130,67 @@ static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla, return 0; } +static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt, + struct nlattr *opt) +{ + struct mqprio_sched *priv = qdisc_priv(sch); + struct nlattr *tb[TCA_MQPRIO_MAX + 1]; + struct nlattr *attr; + int i, rem, err; + + err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy, + sizeof(*qopt)); + if (err < 0) + return err; + + if (!qopt->hw) + return -EINVAL; + + if (tb[TCA_MQPRIO_MODE]) { + priv->flags |= TC_MQPRIO_F_MODE; + priv->mode = *(u16 *)nla_data(tb[TCA_MQPRIO_MODE]); + } + + if (tb[TCA_MQPRIO_SHAPER]) { + priv->flags |= TC_MQPRIO_F_SHAPER; + priv->shaper = *(u16 *)nla_data(tb[TCA_MQPRIO_SHAPER]); + } + + if (tb[TCA_MQPRIO_MIN_RATE64]) { + if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) + return -EINVAL; + i = 0; + nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64], + rem) { + if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64) + return -EINVAL; + if (i >= qopt->num_tc) + break; + priv->min_rate[i] = *(u64 *)nla_data(attr); + i++; + } + priv->flags |= TC_MQPRIO_F_MIN_RATE; + } + + if (tb[TCA_MQPRIO_MAX_RATE64]) { + if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) + return -EINVAL; + i = 0; + nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64], + rem) { + if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64) + return -EINVAL; + if (i >= qopt->num_tc) + break; + priv->max_rate[i] = *(u64 *)nla_data(attr); + i++; + } + priv->flags |= TC_MQPRIO_F_MAX_RATE; + } + + return 0; +} + static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { @@ -139,9 +200,6 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, struct Qdisc *qdisc; int i, err = -EOPNOTSUPP; struct tc_mqprio_qopt *qopt = NULL; - struct nlattr *tb[TCA_MQPRIO_MAX + 1]; - struct nlattr *attr; - int rem; int len; BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE); @@ -166,55 +224,9 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); if (len > 0) { - err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy, - sizeof(*qopt)); - if (err < 0) + err = mqprio_parse_nlattr(sch, qopt, opt); + if (err) return err; - - if (!qopt->hw) - return -EINVAL; - - if (tb[TCA_MQPRIO_MODE]) { - priv->flags |= TC_MQPRIO_F_MODE; - priv->mode = *(u16 *)nla_data(tb[TCA_MQPRIO_MODE]); - } - - if (tb[TCA_MQPRIO_SHAPER]) { - priv->flags |= TC_MQPRIO_F_SHAPER; - priv->shaper = *(u16 *)nla_data(tb[TCA_MQPRIO_SHAPER]); - } - - if (tb[TCA_MQPRIO_MIN_RATE64]) { - if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) - return -EINVAL; - i = 0; - nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64], - rem) { - if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64) - return -EINVAL; - if (i >= qopt->num_tc) - break; - priv->min_rate[i] = *(u64 *)nla_data(attr); - i++; - } - priv->flags |= TC_MQPRIO_F_MIN_RATE; - } - - if (tb[TCA_MQPRIO_MAX_RATE64]) { - if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) - return -EINVAL; - i = 0; - nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64], - rem) { - if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64) - return -EINVAL; - if (i >= qopt->num_tc) - break; - priv->max_rate[i] = *(u64 *)nla_data(attr); - i++; - } - priv->flags |= TC_MQPRIO_F_MAX_RATE; - } } /* pre-allocate qdisc, attachment can't fail */ -- cgit v1.2.3 From 5cfb45e2fb71f58e795d96c708c27e2fa13134b7 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:52:56 +0200 Subject: net/sched: mqprio: refactor offloading and unoffloading to dedicated functions Some more logic will be added to mqprio offloading, so split that code up from mqprio_init(), which is already large, and create a new function, mqprio_enable_offload(), similar to taprio_enable_offload(). Also create the opposite function mqprio_disable_offload(). Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/sch_mqprio.c | 102 ++++++++++++++++++++++++++++--------------------- 1 file changed, 59 insertions(+), 43 deletions(-) (limited to 'net') diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index d2d8a02ded05..3579a64da06e 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -27,6 +27,61 @@ struct mqprio_sched { u64 max_rate[TC_QOPT_MAX_QUEUE]; }; +static int mqprio_enable_offload(struct Qdisc *sch, + const struct tc_mqprio_qopt *qopt) +{ + struct tc_mqprio_qopt_offload mqprio = {.qopt = *qopt}; + struct mqprio_sched *priv = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + int err, i; + + switch (priv->mode) { + case TC_MQPRIO_MODE_DCB: + if (priv->shaper != TC_MQPRIO_SHAPER_DCB) + return -EINVAL; + break; + case TC_MQPRIO_MODE_CHANNEL: + mqprio.flags = priv->flags; + if (priv->flags & TC_MQPRIO_F_MODE) + mqprio.mode = priv->mode; + if (priv->flags & TC_MQPRIO_F_SHAPER) + mqprio.shaper = priv->shaper; + if (priv->flags & TC_MQPRIO_F_MIN_RATE) + for (i = 0; i < mqprio.qopt.num_tc; i++) + mqprio.min_rate[i] = priv->min_rate[i]; + if (priv->flags & TC_MQPRIO_F_MAX_RATE) + for (i = 0; i < mqprio.qopt.num_tc; i++) + mqprio.max_rate[i] = priv->max_rate[i]; + break; + default: + return -EINVAL; + } + + err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_MQPRIO, + &mqprio); + if (err) + return err; + + priv->hw_offload = mqprio.qopt.hw; + + return 0; +} + +static void mqprio_disable_offload(struct Qdisc *sch) +{ + struct tc_mqprio_qopt_offload mqprio = { { 0 } }; + struct mqprio_sched *priv = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + + switch (priv->mode) { + case TC_MQPRIO_MODE_DCB: + case TC_MQPRIO_MODE_CHANNEL: + dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_MQPRIO, + &mqprio); + break; + } +} + static void mqprio_destroy(struct Qdisc *sch) { struct net_device *dev = qdisc_dev(sch); @@ -41,22 +96,10 @@ static void mqprio_destroy(struct Qdisc *sch) kfree(priv->qdiscs); } - if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) { - struct tc_mqprio_qopt_offload mqprio = { { 0 } }; - - switch (priv->mode) { - case TC_MQPRIO_MODE_DCB: - case TC_MQPRIO_MODE_CHANNEL: - dev->netdev_ops->ndo_setup_tc(dev, - TC_SETUP_QDISC_MQPRIO, - &mqprio); - break; - default: - return; - } - } else { + if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) + mqprio_disable_offload(sch); + else netdev_set_num_tc(dev, 0); - } } static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt) @@ -253,36 +296,9 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, * supplied and verified mapping */ if (qopt->hw) { - struct tc_mqprio_qopt_offload mqprio = {.qopt = *qopt}; - - switch (priv->mode) { - case TC_MQPRIO_MODE_DCB: - if (priv->shaper != TC_MQPRIO_SHAPER_DCB) - return -EINVAL; - break; - case TC_MQPRIO_MODE_CHANNEL: - mqprio.flags = priv->flags; - if (priv->flags & TC_MQPRIO_F_MODE) - mqprio.mode = priv->mode; - if (priv->flags & TC_MQPRIO_F_SHAPER) - mqprio.shaper = priv->shaper; - if (priv->flags & TC_MQPRIO_F_MIN_RATE) - for (i = 0; i < mqprio.qopt.num_tc; i++) - mqprio.min_rate[i] = priv->min_rate[i]; - if (priv->flags & TC_MQPRIO_F_MAX_RATE) - for (i = 0; i < mqprio.qopt.num_tc; i++) - mqprio.max_rate[i] = priv->max_rate[i]; - break; - default: - return -EINVAL; - } - err = dev->netdev_ops->ndo_setup_tc(dev, - TC_SETUP_QDISC_MQPRIO, - &mqprio); + err = mqprio_enable_offload(sch, qopt); if (err) return err; - - priv->hw_offload = mqprio.qopt.hw; } else { netdev_set_num_tc(dev, qopt->num_tc); for (i = 0; i < qopt->num_tc; i++) -- cgit v1.2.3 From d7045f520a74eac28d54fa96cc020c5344baea98 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:52:58 +0200 Subject: net/sched: mqprio: allow reverse TC:TXQ mappings By imposing that the last TXQ of TC i is smaller than the first TXQ of any TC j (j := i+1 .. n), mqprio imposes a strict ordering condition for the TXQ indices (they must increase as TCs increase). Claudiu points out that the complexity of the TXQ count validation is too high for this logic, i.e. instead of iterating over j, it is sufficient that the TXQ indices of TC i and i + 1 are ordered, and that will eventually ensure global ordering. This is true, however it doesn't appear to me that is what the code really intended to do. Instead, based on the comments, it just wanted to check for overlaps (and this isn't how one does that). So the following mqprio configuration, which I had recommended to Vinicius more than once for igb/igc (to account for the fact that on this hardware, lower numbered TXQs have higher dequeue priority than higher ones): num_tc 4 map 0 1 2 3 queues 1@3 1@2 1@1 1@0 is in fact denied today by mqprio. The full story is that in fact, it's only denied with "hw 0"; if hardware offloading is requested, mqprio defers TXQ range overlap validation to the device driver (a strange decision in itself). This is most certainly a bug, but it's not one that has any merit for being fixed on "stable" as far as I can tell. This is because mqprio always rejected a configuration which was in fact valid, and this has shaped the way in which mqprio configuration scripts got built for various hardware (see igb/igc in the link below). Therefore, one could consider it to be merely an improvement for mqprio to allow reverse TC:TXQ mappings. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-9-vladimir.oltean@nxp.com/#25188310 Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230128010719.2182346-6-vladimir.oltean@nxp.com/#25186442 Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Gerhard Engleder Signed-off-by: David S. Miller --- net/sched/sch_mqprio.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 3579a64da06e..25ab215641a2 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -27,6 +27,14 @@ struct mqprio_sched { u64 max_rate[TC_QOPT_MAX_QUEUE]; }; +/* Returns true if the intervals [a, b) and [c, d) overlap. */ +static bool intervals_overlap(int a, int b, int c, int d) +{ + int left = max(a, c), right = min(b, d); + + return left < right; +} + static int mqprio_enable_offload(struct Qdisc *sch, const struct tc_mqprio_qopt *qopt) { @@ -144,7 +152,10 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt) /* Verify that the offset and counts do not overlap */ for (j = i + 1; j < qopt->num_tc; j++) { - if (last > qopt->offset[j]) + if (intervals_overlap(qopt->offset[i], last, + qopt->offset[j], + qopt->offset[j] + + qopt->count[j])) return -EINVAL; } } -- cgit v1.2.3 From 19278d76915d6b28269e1af1d7b6754c16576572 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:52:59 +0200 Subject: net/sched: mqprio: allow offloading drivers to request queue count validation mqprio_parse_opt() proudly has a comment: /* If hardware offload is requested we will leave it to the device * to either populate the queue counts itself or to validate the * provided queue counts. */ Unfortunately some device drivers did not get this memo, and don't validate the queue counts, or populate them. In case drivers don't want to populate the queue counts themselves, just act upon the requested configuration, it makes sense to introduce a tc capability, and make mqprio query it, so they don't have to do the validation themselves. Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/pkt_sched.h | 4 +++ net/sched/sch_mqprio.c | 81 +++++++++++++++++++++++++++++++------------------ 2 files changed, 56 insertions(+), 29 deletions(-) (limited to 'net') diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 6c5e64e0a0bb..02e3ccfbc7d1 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -160,6 +160,10 @@ struct tc_etf_qopt_offload { s32 queue; }; +struct tc_mqprio_caps { + bool validate_queue_counts:1; +}; + struct tc_mqprio_qopt_offload { /* struct tc_mqprio_qopt must always be the first element */ struct tc_mqprio_qopt qopt; diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 25ab215641a2..0f04b17588ca 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -35,6 +35,35 @@ static bool intervals_overlap(int a, int b, int c, int d) return left < right; } +static int mqprio_validate_queue_counts(struct net_device *dev, + const struct tc_mqprio_qopt *qopt) +{ + int i, j; + + for (i = 0; i < qopt->num_tc; i++) { + unsigned int last = qopt->offset[i] + qopt->count[i]; + + /* Verify the queue count is in tx range being equal to the + * real_num_tx_queues indicates the last queue is in use. + */ + if (qopt->offset[i] >= dev->real_num_tx_queues || + !qopt->count[i] || + last > dev->real_num_tx_queues) + return -EINVAL; + + /* Verify that the offset and counts do not overlap */ + for (j = i + 1; j < qopt->num_tc; j++) { + if (intervals_overlap(qopt->offset[i], last, + qopt->offset[j], + qopt->offset[j] + + qopt->count[j])) + return -EINVAL; + } + } + + return 0; +} + static int mqprio_enable_offload(struct Qdisc *sch, const struct tc_mqprio_qopt *qopt) { @@ -110,9 +139,10 @@ static void mqprio_destroy(struct Qdisc *sch) netdev_set_num_tc(dev, 0); } -static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt) +static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, + const struct tc_mqprio_caps *caps) { - int i, j; + int i, err; /* Verify num_tc is not out of max range */ if (qopt->num_tc > TC_MAX_QUEUE) @@ -131,35 +161,24 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt) if (qopt->hw > TC_MQPRIO_HW_OFFLOAD_MAX) qopt->hw = TC_MQPRIO_HW_OFFLOAD_MAX; - /* If hardware offload is requested we will leave it to the device - * to either populate the queue counts itself or to validate the - * provided queue counts. If ndo_setup_tc is not present then - * hardware doesn't support offload and we should return an error. + /* If hardware offload is requested, we will leave 3 options to the + * device driver: + * - populate the queue counts itself (and ignore what was requested) + * - validate the provided queue counts by itself (and apply them) + * - request queue count validation here (and apply them) */ - if (qopt->hw) - return dev->netdev_ops->ndo_setup_tc ? 0 : -EINVAL; - - for (i = 0; i < qopt->num_tc; i++) { - unsigned int last = qopt->offset[i] + qopt->count[i]; - - /* Verify the queue count is in tx range being equal to the - * real_num_tx_queues indicates the last queue is in use. - */ - if (qopt->offset[i] >= dev->real_num_tx_queues || - !qopt->count[i] || - last > dev->real_num_tx_queues) - return -EINVAL; - - /* Verify that the offset and counts do not overlap */ - for (j = i + 1; j < qopt->num_tc; j++) { - if (intervals_overlap(qopt->offset[i], last, - qopt->offset[j], - qopt->offset[j] + - qopt->count[j])) - return -EINVAL; - } + if (!qopt->hw || caps->validate_queue_counts) { + err = mqprio_validate_queue_counts(dev, qopt); + if (err) + return err; } + /* If ndo_setup_tc is not present then hardware doesn't support offload + * and we should return an error. + */ + if (qopt->hw && !dev->netdev_ops->ndo_setup_tc) + return -EINVAL; + return 0; } @@ -254,6 +273,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, struct Qdisc *qdisc; int i, err = -EOPNOTSUPP; struct tc_mqprio_qopt *qopt = NULL; + struct tc_mqprio_caps caps; int len; BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE); @@ -272,8 +292,11 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, if (!opt || nla_len(opt) < sizeof(*qopt)) return -EINVAL; + qdisc_offload_query_caps(dev, TC_SETUP_QDISC_MQPRIO, + &caps, sizeof(caps)); + qopt = nla_data(opt); - if (mqprio_parse_opt(dev, qopt)) + if (mqprio_parse_opt(dev, qopt, &caps)) return -EINVAL; len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); -- cgit v1.2.3 From d404959fa23a6fc79ba3989f0491a1e747e30532 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:53:00 +0200 Subject: net/sched: mqprio: add extack messages for queue count validation To make mqprio more user-friendly, create netlink extended ack messages which say exactly what is wrong about the queue counts. This uses the new support for printf-formatted extack messages. Example: $ tc qdisc add dev eno0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 3@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 Error: sch_mqprio: TC 0 queues 3@0 overlap with TC 1 queues 1@1. Signed-off-by: Vladimir Oltean Reviewed-by: Jacob Keller Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/sch_mqprio.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 0f04b17588ca..d2a2dc068408 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -36,28 +36,44 @@ static bool intervals_overlap(int a, int b, int c, int d) } static int mqprio_validate_queue_counts(struct net_device *dev, - const struct tc_mqprio_qopt *qopt) + const struct tc_mqprio_qopt *qopt, + struct netlink_ext_ack *extack) { int i, j; for (i = 0; i < qopt->num_tc; i++) { unsigned int last = qopt->offset[i] + qopt->count[i]; + if (!qopt->count[i]) { + NL_SET_ERR_MSG_FMT_MOD(extack, "No queues for TC %d", + i); + return -EINVAL; + } + /* Verify the queue count is in tx range being equal to the * real_num_tx_queues indicates the last queue is in use. */ if (qopt->offset[i] >= dev->real_num_tx_queues || - !qopt->count[i] || - last > dev->real_num_tx_queues) + last > dev->real_num_tx_queues) { + NL_SET_ERR_MSG_FMT_MOD(extack, + "Queues %d:%d for TC %d exceed the %d TX queues available", + qopt->count[i], qopt->offset[i], + i, dev->real_num_tx_queues); return -EINVAL; + } /* Verify that the offset and counts do not overlap */ for (j = i + 1; j < qopt->num_tc; j++) { if (intervals_overlap(qopt->offset[i], last, qopt->offset[j], qopt->offset[j] + - qopt->count[j])) + qopt->count[j])) { + NL_SET_ERR_MSG_FMT_MOD(extack, + "TC %d queues %d@%d overlap with TC %d queues %d@%d", + i, qopt->count[i], qopt->offset[i], + j, qopt->count[j], qopt->offset[j]); return -EINVAL; + } } } @@ -65,7 +81,8 @@ static int mqprio_validate_queue_counts(struct net_device *dev, } static int mqprio_enable_offload(struct Qdisc *sch, - const struct tc_mqprio_qopt *qopt) + const struct tc_mqprio_qopt *qopt, + struct netlink_ext_ack *extack) { struct tc_mqprio_qopt_offload mqprio = {.qopt = *qopt}; struct mqprio_sched *priv = qdisc_priv(sch); @@ -140,7 +157,8 @@ static void mqprio_destroy(struct Qdisc *sch) } static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, - const struct tc_mqprio_caps *caps) + const struct tc_mqprio_caps *caps, + struct netlink_ext_ack *extack) { int i, err; @@ -168,7 +186,7 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, * - request queue count validation here (and apply them) */ if (!qopt->hw || caps->validate_queue_counts) { - err = mqprio_validate_queue_counts(dev, qopt); + err = mqprio_validate_queue_counts(dev, qopt, extack); if (err) return err; } @@ -296,7 +314,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, &caps, sizeof(caps)); qopt = nla_data(opt); - if (mqprio_parse_opt(dev, qopt, &caps)) + if (mqprio_parse_opt(dev, qopt, &caps, extack)) return -EINVAL; len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt)); @@ -330,7 +348,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt, * supplied and verified mapping */ if (qopt->hw) { - err = mqprio_enable_offload(sch, qopt); + err = mqprio_enable_offload(sch, qopt, extack); if (err) return err; } else { -- cgit v1.2.3 From 1dfe086dd7efb36d3d619a90782c6ca186a1bae9 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:53:01 +0200 Subject: net/sched: taprio: centralize mqprio qopt validation There is a lot of code in taprio which is "borrowed" from mqprio. It makes sense to put a stop to the "borrowing" and start actually reusing code. Because taprio and mqprio are built as part of different kernel modules, code reuse can only take place either by writing it as static inline (limiting), putting it in sch_generic.o (not generic enough), or creating a third auto-selectable kernel module which only holds library code. I opted for the third variant. In a previous change, mqprio gained support for reverse TC:TXQ mappings, something which taprio still denies. Make taprio use the same validation logic so that it supports this configuration as well. The taprio code didn't enforce TXQ overlaps in txtime-assist mode and that looks intentional, even if I've no idea why that might be. Preserve that, but add a comment. There isn't any dedicated MAINTAINERS entry for mqprio, so nothing to update there. Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Reviewed-by: Gerhard Engleder Signed-off-by: David S. Miller --- net/sched/Kconfig | 7 +++ net/sched/Makefile | 1 + net/sched/sch_mqprio.c | 77 ++++----------------------------- net/sched/sch_mqprio_lib.c | 103 +++++++++++++++++++++++++++++++++++++++++++++ net/sched/sch_mqprio_lib.h | 16 +++++++ net/sched/sch_taprio.c | 49 ++++----------------- 6 files changed, 143 insertions(+), 110 deletions(-) create mode 100644 net/sched/sch_mqprio_lib.c create mode 100644 net/sched/sch_mqprio_lib.h (limited to 'net') diff --git a/net/sched/Kconfig b/net/sched/Kconfig index de18a0dda6df..f5acb535413d 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -195,8 +195,14 @@ config NET_SCH_ETF To compile this code as a module, choose M here: the module will be called sch_etf. +config NET_SCH_MQPRIO_LIB + tristate + help + Common library for manipulating mqprio queue configurations. + config NET_SCH_TAPRIO tristate "Time Aware Priority (taprio) Scheduler" + select NET_SCH_MQPRIO_LIB help Say Y here if you want to use the Time Aware Priority (taprio) packet scheduling algorithm. @@ -253,6 +259,7 @@ config NET_SCH_DRR config NET_SCH_MQPRIO tristate "Multi-queue priority scheduler (MQPRIO)" + select NET_SCH_MQPRIO_LIB help Say Y here if you want to use the Multi-queue Priority scheduler. This scheduler allows QOS to be offloaded on NICs that have support diff --git a/net/sched/Makefile b/net/sched/Makefile index dd14ef413fda..7911eec09837 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o obj-$(CONFIG_NET_SCH_PLUG) += sch_plug.o obj-$(CONFIG_NET_SCH_ETS) += sch_ets.o obj-$(CONFIG_NET_SCH_MQPRIO) += sch_mqprio.o +obj-$(CONFIG_NET_SCH_MQPRIO_LIB) += sch_mqprio_lib.o obj-$(CONFIG_NET_SCH_SKBPRIO) += sch_skbprio.o obj-$(CONFIG_NET_SCH_CHOKE) += sch_choke.o obj-$(CONFIG_NET_SCH_QFQ) += sch_qfq.o diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index d2a2dc068408..9303d2a1e840 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -17,6 +17,8 @@ #include #include +#include "sch_mqprio_lib.h" + struct mqprio_sched { struct Qdisc **qdiscs; u16 mode; @@ -27,59 +29,6 @@ struct mqprio_sched { u64 max_rate[TC_QOPT_MAX_QUEUE]; }; -/* Returns true if the intervals [a, b) and [c, d) overlap. */ -static bool intervals_overlap(int a, int b, int c, int d) -{ - int left = max(a, c), right = min(b, d); - - return left < right; -} - -static int mqprio_validate_queue_counts(struct net_device *dev, - const struct tc_mqprio_qopt *qopt, - struct netlink_ext_ack *extack) -{ - int i, j; - - for (i = 0; i < qopt->num_tc; i++) { - unsigned int last = qopt->offset[i] + qopt->count[i]; - - if (!qopt->count[i]) { - NL_SET_ERR_MSG_FMT_MOD(extack, "No queues for TC %d", - i); - return -EINVAL; - } - - /* Verify the queue count is in tx range being equal to the - * real_num_tx_queues indicates the last queue is in use. - */ - if (qopt->offset[i] >= dev->real_num_tx_queues || - last > dev->real_num_tx_queues) { - NL_SET_ERR_MSG_FMT_MOD(extack, - "Queues %d:%d for TC %d exceed the %d TX queues available", - qopt->count[i], qopt->offset[i], - i, dev->real_num_tx_queues); - return -EINVAL; - } - - /* Verify that the offset and counts do not overlap */ - for (j = i + 1; j < qopt->num_tc; j++) { - if (intervals_overlap(qopt->offset[i], last, - qopt->offset[j], - qopt->offset[j] + - qopt->count[j])) { - NL_SET_ERR_MSG_FMT_MOD(extack, - "TC %d queues %d@%d overlap with TC %d queues %d@%d", - i, qopt->count[i], qopt->offset[i], - j, qopt->count[j], qopt->offset[j]); - return -EINVAL; - } - } - } - - return 0; -} - static int mqprio_enable_offload(struct Qdisc *sch, const struct tc_mqprio_qopt *qopt, struct netlink_ext_ack *extack) @@ -160,17 +109,7 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, const struct tc_mqprio_caps *caps, struct netlink_ext_ack *extack) { - int i, err; - - /* Verify num_tc is not out of max range */ - if (qopt->num_tc > TC_MAX_QUEUE) - return -EINVAL; - - /* Verify priority mapping uses valid tcs */ - for (i = 0; i < TC_BITMASK + 1; i++) { - if (qopt->prio_tc_map[i] >= qopt->num_tc) - return -EINVAL; - } + int err; /* Limit qopt->hw to maximum supported offload value. Drivers have * the option of overriding this later if they don't support the a @@ -185,11 +124,11 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt, * - validate the provided queue counts by itself (and apply them) * - request queue count validation here (and apply them) */ - if (!qopt->hw || caps->validate_queue_counts) { - err = mqprio_validate_queue_counts(dev, qopt, extack); - if (err) - return err; - } + err = mqprio_validate_qopt(dev, qopt, + !qopt->hw || caps->validate_queue_counts, + false, extack); + if (err) + return err; /* If ndo_setup_tc is not present then hardware doesn't support offload * and we should return an error. diff --git a/net/sched/sch_mqprio_lib.c b/net/sched/sch_mqprio_lib.c new file mode 100644 index 000000000000..e782b412a000 --- /dev/null +++ b/net/sched/sch_mqprio_lib.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include + +#include "sch_mqprio_lib.h" + +/* Returns true if the intervals [a, b) and [c, d) overlap. */ +static bool intervals_overlap(int a, int b, int c, int d) +{ + int left = max(a, c), right = min(b, d); + + return left < right; +} + +static int mqprio_validate_queue_counts(struct net_device *dev, + const struct tc_mqprio_qopt *qopt, + bool allow_overlapping_txqs, + struct netlink_ext_ack *extack) +{ + int i, j; + + for (i = 0; i < qopt->num_tc; i++) { + unsigned int last = qopt->offset[i] + qopt->count[i]; + + if (!qopt->count[i]) { + NL_SET_ERR_MSG_FMT_MOD(extack, "No queues for TC %d", + i); + return -EINVAL; + } + + /* Verify the queue count is in tx range being equal to the + * real_num_tx_queues indicates the last queue is in use. + */ + if (qopt->offset[i] >= dev->real_num_tx_queues || + last > dev->real_num_tx_queues) { + NL_SET_ERR_MSG_FMT_MOD(extack, + "Queues %d:%d for TC %d exceed the %d TX queues available", + qopt->count[i], qopt->offset[i], + i, dev->real_num_tx_queues); + return -EINVAL; + } + + if (allow_overlapping_txqs) + continue; + + /* Verify that the offset and counts do not overlap */ + for (j = i + 1; j < qopt->num_tc; j++) { + if (intervals_overlap(qopt->offset[i], last, + qopt->offset[j], + qopt->offset[j] + + qopt->count[j])) { + NL_SET_ERR_MSG_FMT_MOD(extack, + "TC %d queues %d@%d overlap with TC %d queues %d@%d", + i, qopt->count[i], qopt->offset[i], + j, qopt->count[j], qopt->offset[j]); + return -EINVAL; + } + } + } + + return 0; +} + +int mqprio_validate_qopt(struct net_device *dev, struct tc_mqprio_qopt *qopt, + bool validate_queue_counts, + bool allow_overlapping_txqs, + struct netlink_ext_ack *extack) +{ + int i, err; + + /* Verify num_tc is not out of max range */ + if (qopt->num_tc > TC_MAX_QUEUE) { + NL_SET_ERR_MSG(extack, + "Number of traffic classes is outside valid range"); + return -EINVAL; + } + + /* Verify priority mapping uses valid tcs */ + for (i = 0; i <= TC_BITMASK; i++) { + if (qopt->prio_tc_map[i] >= qopt->num_tc) { + NL_SET_ERR_MSG(extack, + "Invalid traffic class in priority to traffic class mapping"); + return -EINVAL; + } + } + + if (validate_queue_counts) { + err = mqprio_validate_queue_counts(dev, qopt, + allow_overlapping_txqs, + extack); + if (err) + return err; + } + + return 0; +} +EXPORT_SYMBOL_GPL(mqprio_validate_qopt); + +MODULE_LICENSE("GPL"); diff --git a/net/sched/sch_mqprio_lib.h b/net/sched/sch_mqprio_lib.h new file mode 100644 index 000000000000..353787a25648 --- /dev/null +++ b/net/sched/sch_mqprio_lib.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __SCH_MQPRIO_LIB_H +#define __SCH_MQPRIO_LIB_H + +#include + +struct net_device; +struct netlink_ext_ack; +struct tc_mqprio_qopt; + +int mqprio_validate_qopt(struct net_device *dev, struct tc_mqprio_qopt *qopt, + bool validate_queue_counts, + bool allow_overlapping_txqs, + struct netlink_ext_ack *extack); + +#endif diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index c322a61eaeea..888a29ee1da6 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -26,6 +26,8 @@ #include #include +#include "sch_mqprio_lib.h" + static LIST_HEAD(taprio_list); #define TAPRIO_ALL_GATES_OPEN -1 @@ -924,7 +926,7 @@ static int taprio_parse_mqprio_opt(struct net_device *dev, struct netlink_ext_ack *extack, u32 taprio_flags) { - int i, j; + bool allow_overlapping_txqs = TXTIME_ASSIST_IS_ENABLED(taprio_flags); if (!qopt && !dev->num_tc) { NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); @@ -937,52 +939,17 @@ static int taprio_parse_mqprio_opt(struct net_device *dev, if (dev->num_tc) return 0; - /* Verify num_tc is not out of max range */ - if (qopt->num_tc > TC_MAX_QUEUE) { - NL_SET_ERR_MSG(extack, "Number of traffic classes is outside valid range"); - return -EINVAL; - } - /* taprio imposes that traffic classes map 1:n to tx queues */ if (qopt->num_tc > dev->num_tx_queues) { NL_SET_ERR_MSG(extack, "Number of traffic classes is greater than number of HW queues"); return -EINVAL; } - /* Verify priority mapping uses valid tcs */ - for (i = 0; i <= TC_BITMASK; i++) { - if (qopt->prio_tc_map[i] >= qopt->num_tc) { - NL_SET_ERR_MSG(extack, "Invalid traffic class in priority to traffic class mapping"); - return -EINVAL; - } - } - - for (i = 0; i < qopt->num_tc; i++) { - unsigned int last = qopt->offset[i] + qopt->count[i]; - - /* Verify the queue count is in tx range being equal to the - * real_num_tx_queues indicates the last queue is in use. - */ - if (qopt->offset[i] >= dev->num_tx_queues || - !qopt->count[i] || - last > dev->real_num_tx_queues) { - NL_SET_ERR_MSG(extack, "Invalid queue in traffic class to queue mapping"); - return -EINVAL; - } - - if (TXTIME_ASSIST_IS_ENABLED(taprio_flags)) - continue; - - /* Verify that the offset and counts do not overlap */ - for (j = i + 1; j < qopt->num_tc; j++) { - if (last > qopt->offset[j]) { - NL_SET_ERR_MSG(extack, "Detected overlap in the traffic class to queue mapping"); - return -EINVAL; - } - } - } - - return 0; + /* For some reason, in txtime-assist mode, we allow TXQ ranges for + * different TCs to overlap, and just validate the TXQ ranges. + */ + return mqprio_validate_qopt(dev, qopt, true, allow_overlapping_txqs, + extack); } static int taprio_get_start_time(struct Qdisc *sch, -- cgit v1.2.3 From 9dd6ad674cc74a62cc85de6e02cb29d47e9c4eb5 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:53:02 +0200 Subject: net/sched: refactor mqprio qopt reconstruction to a library function The taprio qdisc will need to reconstruct a struct tc_mqprio_qopt from netdev settings once more in a future patch, but this code was already written twice, once in taprio and once in mqprio. Refactor the code to a helper in the common mqprio library. Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/sch_mqprio.c | 10 ++-------- net/sched/sch_mqprio_lib.c | 14 ++++++++++++++ net/sched/sch_mqprio_lib.h | 2 ++ net/sched/sch_taprio.c | 9 +-------- 4 files changed, 19 insertions(+), 16 deletions(-) (limited to 'net') diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c index 9303d2a1e840..48ed87b91086 100644 --- a/net/sched/sch_mqprio.c +++ b/net/sched/sch_mqprio.c @@ -406,7 +406,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb); struct tc_mqprio_qopt opt = { 0 }; struct Qdisc *qdisc; - unsigned int ntx, tc; + unsigned int ntx; sch->q.qlen = 0; gnet_stats_basic_sync_init(&sch->bstats); @@ -430,15 +430,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb) spin_unlock_bh(qdisc_lock(qdisc)); } - opt.num_tc = netdev_get_num_tc(dev); - memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map)); + mqprio_qopt_reconstruct(dev, &opt); opt.hw = priv->hw_offload; - for (tc = 0; tc < netdev_get_num_tc(dev); tc++) { - opt.count[tc] = dev->tc_to_txq[tc].count; - opt.offset[tc] = dev->tc_to_txq[tc].offset; - } - if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt)) goto nla_put_failure; diff --git a/net/sched/sch_mqprio_lib.c b/net/sched/sch_mqprio_lib.c index e782b412a000..c58a533b8ec5 100644 --- a/net/sched/sch_mqprio_lib.c +++ b/net/sched/sch_mqprio_lib.c @@ -100,4 +100,18 @@ int mqprio_validate_qopt(struct net_device *dev, struct tc_mqprio_qopt *qopt, } EXPORT_SYMBOL_GPL(mqprio_validate_qopt); +void mqprio_qopt_reconstruct(struct net_device *dev, struct tc_mqprio_qopt *qopt) +{ + int tc, num_tc = netdev_get_num_tc(dev); + + qopt->num_tc = num_tc; + memcpy(qopt->prio_tc_map, dev->prio_tc_map, sizeof(qopt->prio_tc_map)); + + for (tc = 0; tc < num_tc; tc++) { + qopt->count[tc] = dev->tc_to_txq[tc].count; + qopt->offset[tc] = dev->tc_to_txq[tc].offset; + } +} +EXPORT_SYMBOL_GPL(mqprio_qopt_reconstruct); + MODULE_LICENSE("GPL"); diff --git a/net/sched/sch_mqprio_lib.h b/net/sched/sch_mqprio_lib.h index 353787a25648..63f725ab8761 100644 --- a/net/sched/sch_mqprio_lib.h +++ b/net/sched/sch_mqprio_lib.h @@ -12,5 +12,7 @@ int mqprio_validate_qopt(struct net_device *dev, struct tc_mqprio_qopt *qopt, bool validate_queue_counts, bool allow_overlapping_txqs, struct netlink_ext_ack *extack); +void mqprio_qopt_reconstruct(struct net_device *dev, + struct tc_mqprio_qopt *qopt); #endif diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 888a29ee1da6..6b3cecbe9f1f 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1948,18 +1948,11 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) struct sched_gate_list *oper, *admin; struct tc_mqprio_qopt opt = { 0 }; struct nlattr *nest, *sched_nest; - unsigned int i; oper = rtnl_dereference(q->oper_sched); admin = rtnl_dereference(q->admin_sched); - opt.num_tc = netdev_get_num_tc(dev); - memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map)); - - for (i = 0; i < netdev_get_num_tc(dev); i++) { - opt.count[i] = dev->tc_to_txq[i].count; - opt.offset[i] = dev->tc_to_txq[i].offset; - } + mqprio_qopt_reconstruct(dev, &opt); nest = nla_nest_start_noflag(skb, TCA_OPTIONS); if (!nest) -- cgit v1.2.3 From 09c794c0a88d959a603ec49b23df8e6bba68e7b7 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:53:03 +0200 Subject: net/sched: taprio: pass mqprio queue configuration to ndo_setup_tc() The taprio qdisc does not currently pass the mqprio queue configuration down to the offloading device driver. So the driver cannot act upon the TXQ counts/offsets per TC, or upon the prio->tc map. It was probably assumed that the driver only wants to offload num_tc (see TC_MQPRIO_HW_OFFLOAD_TCS), which it can get from netdev_get_num_tc(), but there's clearly more to the mqprio configuration than that. I've considered 2 mechanisms to remedy that. First is to pass a struct tc_mqprio_qopt_offload as part of the tc_taprio_qopt_offload. The second is to make taprio actually call TC_SETUP_QDISC_MQPRIO, *in addition to* TC_SETUP_QDISC_TAPRIO. The difference is that in the first case, existing drivers (offloading or not) all ignore taprio's mqprio portion currently, whereas in the second case, we could control whether to call TC_SETUP_QDISC_MQPRIO, based on a new capability. The question is which approach would be better. I'm afraid that calling TC_SETUP_QDISC_MQPRIO unconditionally (not based on a taprio capability bit) would risk introducing regressions. For example, taprio doesn't populate (or validate) qopt->hw, as well as mqprio.flags, mqprio.shaper, mqprio.min_rate, mqprio.max_rate. In comparison, adding a capability is functionally equivalent to just passing the mqprio in a way that drivers can ignore it, except it's slightly more complicated to use it (need to set the capability). Ultimately, what made me go for the "mqprio in taprio" variant was that it's easier for offloading drivers to interpret the mqprio qopt slightly differently when it comes from taprio vs when it comes from mqprio, should that ever become necessary. Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/pkt_sched.h | 1 + net/sched/sch_taprio.c | 1 + 2 files changed, 2 insertions(+) (limited to 'net') diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 02e3ccfbc7d1..ace8be520fb0 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -187,6 +187,7 @@ struct tc_taprio_sched_entry { }; struct tc_taprio_qopt_offload { + struct tc_mqprio_qopt_offload mqprio; u8 enable; ktime_t base_time; u64 cycle_time; diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 6b3cecbe9f1f..aba8a16842c1 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1228,6 +1228,7 @@ static int taprio_enable_offload(struct net_device *dev, return -ENOMEM; } offload->enable = 1; + mqprio_qopt_reconstruct(dev, &offload->mqprio.qopt); taprio_sched_to_offload(dev, sched, offload); for (tc = 0; tc < TC_MAX_QUEUE; tc++) -- cgit v1.2.3 From 522d15ea831f88717084304f105b1d195104880e Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sat, 4 Feb 2023 15:53:04 +0200 Subject: net/sched: taprio: only pass gate mask per TXQ for igc, stmmac, tsnep, am65_cpsw There are 2 classes of in-tree drivers currently: - those who act upon struct tc_taprio_sched_entry :: gate_mask as if it holds a bit mask of TXQs - those who act upon the gate_mask as if it holds a bit mask of TCs When it comes to the standard, IEEE 802.1Q-2018 does say this in the second paragraph of section 8.6.8.4 Enhancements for scheduled traffic: | A gate control list associated with each Port contains an ordered list | of gate operations. Each gate operation changes the transmission gate | state for the gate associated with each of the Port's traffic class | queues and allows associated control operations to be scheduled. In typically obtuse language, it refers to a "traffic class queue" rather than a "traffic class" or a "queue". But careful reading of 802.1Q clarifies that "traffic class" and "queue" are in fact synonymous (see 8.6.6 Queuing frames): | A queue in this context is not necessarily a single FIFO data structure. | A queue is a record of all frames of a given traffic class awaiting | transmission on a given Bridge Port. The structure of this record is not | specified. i.o.w. their definition of "queue" isn't the Linux TX queue. The gate_mask really is input into taprio via its UAPI as a mask of traffic classes, but taprio_sched_to_offload() converts it into a TXQ mask. The breakdown of drivers which handle TC_SETUP_QDISC_TAPRIO is: - hellcreek, felix, sja1105: these are DSA switches, it's not even very clear what TXQs correspond to, other than purely software constructs. Only the mqprio configuration with 8 TCs and 1 TXQ per TC makes sense. So it's fine to convert these to a gate mask per TC. - enetc: I have the hardware and can confirm that the gate mask is per TC, and affects all TXQs (BD rings) configured for that priority. - igc: in igc_save_qbv_schedule(), the gate_mask is clearly interpreted to be per-TXQ. - tsnep: Gerhard Engleder clarifies that even though this hardware supports at most 1 TXQ per TC, the TXQ indices may be different from the TC values themselves, and it is the TXQ indices that matter to this hardware. So keep it per-TXQ as well. - stmmac: I have a GMAC datasheet, and in the EST section it does specify that the gate events are per TXQ rather than per TC. - lan966x: again, this is a switch, and while not a DSA one, the way in which it implements lan966x_mqprio_add() - by only allowing num_tc == NUM_PRIO_QUEUES (8) - makes it clear to me that TXQs are a purely software construct here as well. They seem to map 1:1 with TCs. - am65_cpsw: from looking at am65_cpsw_est_set_sched_cmds(), I get the impression that the fetch_allow variable is treated like a prio_mask. This definitely sounds closer to a per-TC gate mask rather than a per-TXQ one, and TI documentation does seem to recomment an identity mapping between TCs and TXQs. However, Roger Quadros would like to do some testing before making changes, so I'm leaving this driver to operate as it did before, for now. Link with more details at the end. Based on this breakdown, we have 5 drivers with a gate mask per TC and 4 with a gate mask per TXQ. So let's make the gate mask per TXQ the opt-in and the gate mask per TC the default. Benefit from the TC_QUERY_CAPS feature that Jakub suggested we add, and query the device driver before calling the proper ndo_setup_tc(), and figure out if it expects one or the other format. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230202003621.2679603-15-vladimir.oltean@nxp.com/#25193204 Cc: Horatiu Vultur Cc: Siddharth Vadapalli Cc: Roger Quadros Signed-off-by: Vladimir Oltean Acked-by: Kurt Kanzenbach # hellcreek Reviewed-by: Gerhard Engleder Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/ethernet/engleder/tsnep_tc.c | 21 +++++++++++++++++++++ drivers/net/ethernet/intel/igc/igc_main.c | 23 +++++++++++++++++++++++ drivers/net/ethernet/stmicro/stmmac/hwif.h | 5 +++++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 20 ++++++++++++++++++++ drivers/net/ethernet/ti/am65-cpsw-qos.c | 22 ++++++++++++++++++++++ include/net/pkt_sched.h | 1 + net/sched/sch_taprio.c | 11 ++++++++--- 8 files changed, 102 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/engleder/tsnep_tc.c b/drivers/net/ethernet/engleder/tsnep_tc.c index c4c6e1357317..d083e6684f12 100644 --- a/drivers/net/ethernet/engleder/tsnep_tc.c +++ b/drivers/net/ethernet/engleder/tsnep_tc.c @@ -403,12 +403,33 @@ static int tsnep_taprio(struct tsnep_adapter *adapter, return 0; } +static int tsnep_tc_query_caps(struct tsnep_adapter *adapter, + struct tc_query_caps_base *base) +{ + switch (base->type) { + case TC_SETUP_QDISC_TAPRIO: { + struct tc_taprio_caps *caps = base->caps; + + if (!adapter->gate_control) + return -EOPNOTSUPP; + + caps->gate_mask_per_txq = true; + + return 0; + } + default: + return -EOPNOTSUPP; + } +} + int tsnep_tc_setup(struct net_device *netdev, enum tc_setup_type type, void *type_data) { struct tsnep_adapter *adapter = netdev_priv(netdev); switch (type) { + case TC_QUERY_CAPS: + return tsnep_tc_query_caps(adapter, type_data); case TC_SETUP_QDISC_TAPRIO: return tsnep_taprio(adapter, type_data); default: diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 6ddcbc8b7b6a..cf7f6a5eea3d 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -6205,12 +6205,35 @@ static int igc_tsn_enable_cbs(struct igc_adapter *adapter, return igc_tsn_offload_apply(adapter); } +static int igc_tc_query_caps(struct igc_adapter *adapter, + struct tc_query_caps_base *base) +{ + struct igc_hw *hw = &adapter->hw; + + switch (base->type) { + case TC_SETUP_QDISC_TAPRIO: { + struct tc_taprio_caps *caps = base->caps; + + if (hw->mac.type != igc_i225) + return -EOPNOTSUPP; + + caps->gate_mask_per_txq = true; + + return 0; + } + default: + return -EOPNOTSUPP; + } +} + static int igc_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) { struct igc_adapter *adapter = netdev_priv(dev); switch (type) { + case TC_QUERY_CAPS: + return igc_tc_query_caps(adapter, type_data); case TC_SETUP_QDISC_TAPRIO: return igc_tsn_enable_qbv_scheduling(adapter, type_data); diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h index 592b4067f9b8..16a7421715cb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/hwif.h +++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h @@ -567,6 +567,7 @@ struct tc_cbs_qopt_offload; struct flow_cls_offload; struct tc_taprio_qopt_offload; struct tc_etf_qopt_offload; +struct tc_query_caps_base; struct stmmac_tc_ops { int (*init)(struct stmmac_priv *priv); @@ -580,6 +581,8 @@ struct stmmac_tc_ops { struct tc_taprio_qopt_offload *qopt); int (*setup_etf)(struct stmmac_priv *priv, struct tc_etf_qopt_offload *qopt); + int (*query_caps)(struct stmmac_priv *priv, + struct tc_query_caps_base *base); }; #define stmmac_tc_init(__priv, __args...) \ @@ -594,6 +597,8 @@ struct stmmac_tc_ops { stmmac_do_callback(__priv, tc, setup_taprio, __args) #define stmmac_tc_setup_etf(__priv, __args...) \ stmmac_do_callback(__priv, tc, setup_etf, __args) +#define stmmac_tc_query_caps(__priv, __args...) \ + stmmac_do_callback(__priv, tc, query_caps, __args) struct stmmac_counters; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 1a5b8dab5e9b..f44e4e4b4f16 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5992,6 +5992,8 @@ static int stmmac_setup_tc(struct net_device *ndev, enum tc_setup_type type, struct stmmac_priv *priv = netdev_priv(ndev); switch (type) { + case TC_QUERY_CAPS: + return stmmac_tc_query_caps(priv, priv, type_data); case TC_SETUP_BLOCK: return flow_block_cb_setup_simple(type_data, &stmmac_block_cb_list, diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c index 2cfb18cef1d4..9d55226479b4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c @@ -1107,6 +1107,25 @@ static int tc_setup_etf(struct stmmac_priv *priv, return 0; } +static int tc_query_caps(struct stmmac_priv *priv, + struct tc_query_caps_base *base) +{ + switch (base->type) { + case TC_SETUP_QDISC_TAPRIO: { + struct tc_taprio_caps *caps = base->caps; + + if (!priv->dma_cap.estsel) + return -EOPNOTSUPP; + + caps->gate_mask_per_txq = true; + + return 0; + } + default: + return -EOPNOTSUPP; + } +} + const struct stmmac_tc_ops dwmac510_tc_ops = { .init = tc_init, .setup_cls_u32 = tc_setup_cls_u32, @@ -1114,4 +1133,5 @@ const struct stmmac_tc_ops dwmac510_tc_ops = { .setup_cls = tc_setup_cls, .setup_taprio = tc_setup_taprio, .setup_etf = tc_setup_etf, + .query_caps = tc_query_caps, }; diff --git a/drivers/net/ethernet/ti/am65-cpsw-qos.c b/drivers/net/ethernet/ti/am65-cpsw-qos.c index e162771893af..8dc2c3085dcf 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-qos.c +++ b/drivers/net/ethernet/ti/am65-cpsw-qos.c @@ -585,6 +585,26 @@ static int am65_cpsw_setup_taprio(struct net_device *ndev, void *type_data) return am65_cpsw_set_taprio(ndev, type_data); } +static int am65_cpsw_tc_query_caps(struct net_device *ndev, void *type_data) +{ + struct tc_query_caps_base *base = type_data; + + switch (base->type) { + case TC_SETUP_QDISC_TAPRIO: { + struct tc_taprio_caps *caps = base->caps; + + if (!IS_ENABLED(CONFIG_TI_AM65_CPSW_TAS)) + return -EOPNOTSUPP; + + caps->gate_mask_per_txq = true; + + return 0; + } + default: + return -EOPNOTSUPP; + } +} + static int am65_cpsw_qos_clsflower_add_policer(struct am65_cpsw_port *port, struct netlink_ext_ack *extack, struct flow_cls_offload *cls, @@ -765,6 +785,8 @@ int am65_cpsw_qos_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type, void *type_data) { switch (type) { + case TC_QUERY_CAPS: + return am65_cpsw_tc_query_caps(ndev, type_data); case TC_SETUP_QDISC_TAPRIO: return am65_cpsw_setup_taprio(ndev, type_data); case TC_SETUP_BLOCK: diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index ace8be520fb0..fd889fc4912b 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -176,6 +176,7 @@ struct tc_mqprio_qopt_offload { struct tc_taprio_caps { bool supports_queue_max_sdu:1; + bool gate_mask_per_txq:1; }; struct tc_taprio_sched_entry { diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index aba8a16842c1..1c95785932b9 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1170,7 +1170,8 @@ static u32 tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask) static void taprio_sched_to_offload(struct net_device *dev, struct sched_gate_list *sched, - struct tc_taprio_qopt_offload *offload) + struct tc_taprio_qopt_offload *offload, + const struct tc_taprio_caps *caps) { struct sched_entry *entry; int i = 0; @@ -1184,7 +1185,11 @@ static void taprio_sched_to_offload(struct net_device *dev, e->command = entry->command; e->interval = entry->interval; - e->gate_mask = tc_map_to_queue_mask(dev, entry->gate_mask); + if (caps->gate_mask_per_txq) + e->gate_mask = tc_map_to_queue_mask(dev, + entry->gate_mask); + else + e->gate_mask = entry->gate_mask; i++; } @@ -1229,7 +1234,7 @@ static int taprio_enable_offload(struct net_device *dev, } offload->enable = 1; mqprio_qopt_reconstruct(dev, &offload->mqprio.qopt); - taprio_sched_to_offload(dev, sched, offload); + taprio_sched_to_offload(dev, sched, offload, &caps); for (tc = 0; tc < TC_MAX_QUEUE; tc++) offload->max_sdu[tc] = q->max_sdu[tc]; -- cgit v1.2.3 From 584f3742890e966d2f0a1f3c418c9ead70b2d99e Mon Sep 17 00:00:00 2001 From: Pietro Borrello Date: Sat, 4 Feb 2023 17:39:20 +0000 Subject: net: add sock_init_data_uid() Add sock_init_data_uid() to explicitly initialize the socket uid. To initialise the socket uid, sock_init_data() assumes a the struct socket* sock is always embedded in a struct socket_alloc, used to access the corresponding inode uid. This may not be true. Examples are sockets created in tun_chr_open() and tap_open(). Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Pietro Borrello Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/sock.h | 7 ++++++- net/core/sock.c | 15 ++++++++++++--- 2 files changed, 18 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/include/net/sock.h b/include/net/sock.h index dcd72e6285b2..937e842dc930 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1956,7 +1956,12 @@ void sk_common_release(struct sock *sk); * Default socket callbacks and setup code */ -/* Initialise core socket variables */ +/* Initialise core socket variables using an explicit uid. */ +void sock_init_data_uid(struct socket *sock, struct sock *sk, kuid_t uid); + +/* Initialise core socket variables. + * Assumes struct socket *sock is embedded in a struct socket_alloc. + */ void sock_init_data(struct socket *sock, struct sock *sk); /* diff --git a/net/core/sock.c b/net/core/sock.c index f08b76acde9b..208634b01df5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3383,7 +3383,7 @@ void sk_stop_timer_sync(struct sock *sk, struct timer_list *timer) } EXPORT_SYMBOL(sk_stop_timer_sync); -void sock_init_data(struct socket *sock, struct sock *sk) +void sock_init_data_uid(struct socket *sock, struct sock *sk, kuid_t uid) { sk_init_common(sk); sk->sk_send_head = NULL; @@ -3403,11 +3403,10 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_type = sock->type; RCU_INIT_POINTER(sk->sk_wq, &sock->wq); sock->sk = sk; - sk->sk_uid = SOCK_INODE(sock)->i_uid; } else { RCU_INIT_POINTER(sk->sk_wq, NULL); - sk->sk_uid = make_kuid(sock_net(sk)->user_ns, 0); } + sk->sk_uid = uid; rwlock_init(&sk->sk_callback_lock); if (sk->sk_kern_sock) @@ -3466,6 +3465,16 @@ void sock_init_data(struct socket *sock, struct sock *sk) refcount_set(&sk->sk_refcnt, 1); atomic_set(&sk->sk_drops, 0); } +EXPORT_SYMBOL(sock_init_data_uid); + +void sock_init_data(struct socket *sock, struct sock *sk) +{ + kuid_t uid = sock ? + SOCK_INODE(sock)->i_uid : + make_kuid(sock_net(sk)->user_ns, 0); + + sock_init_data_uid(sock, sk, uid); +} EXPORT_SYMBOL(sock_init_data); void lock_sock_nested(struct sock *sk, int subclass) -- cgit v1.2.3 From 15ea59a0e9bf0dce546b6fcab5b00af8b35b870d Mon Sep 17 00:00:00 2001 From: Eddy Tao Date: Sun, 5 Feb 2023 09:35:37 +0800 Subject: net: openvswitch: reduce cpu_used_mask memory Use actual CPU number instead of hardcoded value to decide the size of 'cpu_used_mask' in 'struct sw_flow'. Below is the reason. 'struct cpumask cpu_used_mask' is embedded in struct sw_flow. Its size is hardcoded to CONFIG_NR_CPUS bits, which can be 8192 by default, it costs memory and slows down ovs_flow_alloc. To address this: Redefine cpu_used_mask to pointer. Append cpumask_size() bytes after 'stat' to hold cpumask. Initialization cpu_used_mask right after stats_last_writer. APIs like cpumask_next and cpumask_set_cpu never access bits beyond cpu count, cpumask_size() bytes of memory is enough. Signed-off-by: Eddy Tao Acked-by: Eelco Chaudron Link: https://lore.kernel.org/r/OS3P286MB229570CCED618B20355D227AF5D59@OS3P286MB2295.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Jakub Kicinski --- net/openvswitch/flow.c | 9 ++++++--- net/openvswitch/flow.h | 2 +- net/openvswitch/flow_table.c | 8 +++++--- 3 files changed, 12 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index e20d1a973417..416976f70322 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -107,7 +107,8 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags, rcu_assign_pointer(flow->stats[cpu], new_stats); - cpumask_set_cpu(cpu, &flow->cpu_used_mask); + cpumask_set_cpu(cpu, + flow->cpu_used_mask); goto unlock; } } @@ -135,7 +136,8 @@ void ovs_flow_stats_get(const struct sw_flow *flow, memset(ovs_stats, 0, sizeof(*ovs_stats)); /* We open code this to make sure cpu 0 is always considered */ - for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, &flow->cpu_used_mask)) { + for (cpu = 0; cpu < nr_cpu_ids; + cpu = cpumask_next(cpu, flow->cpu_used_mask)) { struct sw_flow_stats *stats = rcu_dereference_ovsl(flow->stats[cpu]); if (stats) { @@ -159,7 +161,8 @@ void ovs_flow_stats_clear(struct sw_flow *flow) int cpu; /* We open code this to make sure cpu 0 is always considered */ - for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, &flow->cpu_used_mask)) { + for (cpu = 0; cpu < nr_cpu_ids; + cpu = cpumask_next(cpu, flow->cpu_used_mask)) { struct sw_flow_stats *stats = ovsl_dereference(flow->stats[cpu]); if (stats) { diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h index 073ab73ffeaa..b5711aff6e76 100644 --- a/net/openvswitch/flow.h +++ b/net/openvswitch/flow.h @@ -229,7 +229,7 @@ struct sw_flow { */ struct sw_flow_key key; struct sw_flow_id id; - struct cpumask cpu_used_mask; + struct cpumask *cpu_used_mask; struct sw_flow_mask *mask; struct sw_flow_actions __rcu *sf_acts; struct sw_flow_stats __rcu *stats[]; /* One for each CPU. First one diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 0a0e4c283f02..791504b7f42b 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -79,6 +79,7 @@ struct sw_flow *ovs_flow_alloc(void) return ERR_PTR(-ENOMEM); flow->stats_last_writer = -1; + flow->cpu_used_mask = (struct cpumask *)&flow->stats[nr_cpu_ids]; /* Initialize the default stat node. */ stats = kmem_cache_alloc_node(flow_stats_cache, @@ -91,7 +92,7 @@ struct sw_flow *ovs_flow_alloc(void) RCU_INIT_POINTER(flow->stats[0], stats); - cpumask_set_cpu(0, &flow->cpu_used_mask); + cpumask_set_cpu(0, flow->cpu_used_mask); return flow; err: @@ -115,7 +116,7 @@ static void flow_free(struct sw_flow *flow) flow->sf_acts); /* We open code this to make sure cpu 0 is always considered */ for (cpu = 0; cpu < nr_cpu_ids; - cpu = cpumask_next(cpu, &flow->cpu_used_mask)) { + cpu = cpumask_next(cpu, flow->cpu_used_mask)) { if (flow->stats[cpu]) kmem_cache_free(flow_stats_cache, (struct sw_flow_stats __force *)flow->stats[cpu]); @@ -1196,7 +1197,8 @@ int ovs_flow_init(void) flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow) + (nr_cpu_ids - * sizeof(struct sw_flow_stats *)), + * sizeof(struct sw_flow_stats *)) + + cpumask_size(), 0, 0, NULL); if (flow_cache == NULL) return -ENOMEM; -- cgit v1.2.3 From ca8e4cbff6d5360efc2ced519c4609e02e88cc59 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 6 Feb 2023 11:49:32 +0200 Subject: ethtool: mm: fix get_mm() return code not propagating to user space If ops->get_mm() returns a non-zero error code, we goto out_complete, but there, we return 0. Fix that to propagate the "ret" variable to the caller. If ops->get_mm() succeeds, it will always return 0. Fixes: 2b30f8291a30 ("net: ethtool: add support for MAC Merge layer") Signed-off-by: Vladimir Oltean Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230206094932.446379-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni --- net/ethtool/mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c index 7e51f7633001..e612856eed8c 100644 --- a/net/ethtool/mm.c +++ b/net/ethtool/mm.c @@ -56,7 +56,7 @@ static int mm_prepare_data(const struct ethnl_req_info *req_base, out_complete: ethnl_ops_complete(dev); - return 0; + return ret; } static int mm_reply_size(const struct ethnl_req_info *req_base, -- cgit v1.2.3 From 115f1a5c42bdad9a9ea356fc0b4a39ec7537947f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 Feb 2023 17:31:00 +0000 Subject: net: add SKB_HEAD_ALIGN() helper We have many places using this expression: SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) Use of SKB_HEAD_ALIGN() will allow to clean them. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Paolo Abeni Reviewed-by: Alexander Duyck Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 8 ++++++++ net/core/skbuff.c | 18 ++++++------------ 2 files changed, 14 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1fa95b916342..c3df3b55da97 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -255,6 +255,14 @@ #define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES) #define SKB_WITH_OVERHEAD(X) \ ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + +/* For X bytes available in skb->head, what is the minimal + * allocation needed, knowing struct skb_shared_info needs + * to be aligned. + */ +#define SKB_HEAD_ALIGN(X) (SKB_DATA_ALIGN(X) + \ + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + #define SKB_MAX_ORDER(X, ORDER) \ SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X)) #define SKB_MAX_HEAD(X) (SKB_MAX_ORDER((X), 0)) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 624e9e4ec116..4abfc3ba6898 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -558,8 +558,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * aligned memory blocks, unless SLUB/SLAB debug is enabled. * Both skb->head and skb_shared_info are cache line aligned. */ - size = SKB_DATA_ALIGN(size); - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + size = SKB_HEAD_ALIGN(size); osize = kmalloc_size_roundup(size); data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc); if (unlikely(!data)) @@ -632,8 +631,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, goto skb_success; } - len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - len = SKB_DATA_ALIGN(len); + len = SKB_HEAD_ALIGN(len); if (sk_memalloc_socks()) gfp_mask |= __GFP_MEMALLOC; @@ -732,8 +730,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, data = page_frag_alloc_1k(&nc->page_small, gfp_mask); pfmemalloc = NAPI_SMALL_PAGE_PFMEMALLOC(nc->page_small); } else { - len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - len = SKB_DATA_ALIGN(len); + len = SKB_HEAD_ALIGN(len); data = page_frag_alloc(&nc->page, len, gfp_mask); pfmemalloc = nc->page.pfmemalloc; @@ -1938,8 +1935,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - size = SKB_DATA_ALIGN(size); - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + size = SKB_HEAD_ALIGN(size); size = kmalloc_size_roundup(size); data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) @@ -6289,8 +6285,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - size = SKB_DATA_ALIGN(size); - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + size = SKB_HEAD_ALIGN(size); size = kmalloc_size_roundup(size); data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) @@ -6408,8 +6403,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - size = SKB_DATA_ALIGN(size); - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + size = SKB_HEAD_ALIGN(size); size = kmalloc_size_roundup(size); data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) -- cgit v1.2.3 From 65998d2bf857b9ae5acc1f3b70892bd1b429ccab Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 Feb 2023 17:31:01 +0000 Subject: net: remove osize variable in __alloc_skb() This is a cleanup patch, to prepare following change. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Paolo Abeni Reviewed-by: Alexander Duyck Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4abfc3ba6898..333f793f9cdb 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -533,7 +533,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, { struct kmem_cache *cache; struct sk_buff *skb; - unsigned int osize; bool pfmemalloc; u8 *data; @@ -559,16 +558,15 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * Both skb->head and skb_shared_info are cache line aligned. */ size = SKB_HEAD_ALIGN(size); - osize = kmalloc_size_roundup(size); - data = kmalloc_reserve(osize, gfp_mask, node, &pfmemalloc); + size = kmalloc_size_roundup(size); + data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); if (unlikely(!data)) goto nodata; /* kmalloc_size_roundup() might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ - size = SKB_WITH_OVERHEAD(osize); - prefetchw(data + size); + prefetchw(data + SKB_WITH_OVERHEAD(size)); /* * Only clear those fields we need to clear, not those that we will @@ -576,7 +574,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * the tail pointer in struct sk_buff! */ memset(skb, 0, offsetof(struct sk_buff, tail)); - __build_skb_around(skb, data, osize); + __build_skb_around(skb, data, size); skb->pfmemalloc = pfmemalloc; if (flags & SKB_ALLOC_FCLONE) { -- cgit v1.2.3 From 5c0e820cbbbe2d1c4cea5cd2bfc1302c123436df Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 Feb 2023 17:31:02 +0000 Subject: net: factorize code in kmalloc_reserve() All kmalloc_reserve() callers have to make the same computation, we can factorize them, to prepare following patch in the series. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Paolo Abeni Reviewed-by: Alexander Duyck Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 333f793f9cdb..c1232837cd0c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -478,17 +478,20 @@ EXPORT_SYMBOL(napi_build_skb); * may be used. Otherwise, the packet data may be discarded until enough * memory is free */ -static void *kmalloc_reserve(size_t size, gfp_t flags, int node, +static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node, bool *pfmemalloc) { - void *obj; bool ret_pfmemalloc = false; + unsigned int obj_size; + void *obj; + obj_size = SKB_HEAD_ALIGN(*size); + *size = obj_size = kmalloc_size_roundup(obj_size); /* * Try a regular allocation, when that fails and we're not entitled * to the reserves, fail. */ - obj = kmalloc_node_track_caller(size, + obj = kmalloc_node_track_caller(obj_size, flags | __GFP_NOMEMALLOC | __GFP_NOWARN, node); if (obj || !(gfp_pfmemalloc_allowed(flags))) @@ -496,7 +499,7 @@ static void *kmalloc_reserve(size_t size, gfp_t flags, int node, /* Try again but now we are using pfmemalloc reserves */ ret_pfmemalloc = true; - obj = kmalloc_node_track_caller(size, flags, node); + obj = kmalloc_node_track_caller(obj_size, flags, node); out: if (pfmemalloc) @@ -557,9 +560,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * aligned memory blocks, unless SLUB/SLAB debug is enabled. * Both skb->head and skb_shared_info are cache line aligned. */ - size = SKB_HEAD_ALIGN(size); - size = kmalloc_size_roundup(size); - data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); + data = kmalloc_reserve(&size, gfp_mask, node, &pfmemalloc); if (unlikely(!data)) goto nodata; /* kmalloc_size_roundup() might give us more room than requested. @@ -1933,9 +1934,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - size = SKB_HEAD_ALIGN(size); - size = kmalloc_size_roundup(size); - data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL); + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) goto nodata; size = SKB_WITH_OVERHEAD(size); @@ -6283,9 +6282,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - size = SKB_HEAD_ALIGN(size); - size = kmalloc_size_roundup(size); - data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL); + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) return -ENOMEM; size = SKB_WITH_OVERHEAD(size); @@ -6401,9 +6398,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - size = SKB_HEAD_ALIGN(size); - size = kmalloc_size_roundup(size); - data = kmalloc_reserve(size, gfp_mask, NUMA_NO_NODE, NULL); + data = kmalloc_reserve(&size, gfp_mask, NUMA_NO_NODE, NULL); if (!data) return -ENOMEM; size = SKB_WITH_OVERHEAD(size); -- cgit v1.2.3 From bf9f1baa279f0758dc2297080360c5a616843927 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 Feb 2023 17:31:03 +0000 Subject: net: add dedicated kmem_cache for typical/small skb->head Recent removal of ksize() in alloc_skb() increased performance because we no longer read the associated struct page. We have an equivalent cost at kfree_skb() time. kfree(skb->head) has to access a struct page, often cold in cpu caches to get the owning struct kmem_cache. Considering that many allocations are small (at least for TCP ones) we can have our own kmem_cache to avoid the cache line miss. This also saves memory because these small heads are no longer padded to 1024 bytes. CONFIG_SLUB=y $ grep skbuff_small_head /proc/slabinfo skbuff_small_head 2907 2907 640 51 8 : tunables 0 0 0 : slabdata 57 57 0 CONFIG_SLAB=y $ grep skbuff_small_head /proc/slabinfo skbuff_small_head 607 624 640 6 1 : tunables 54 27 8 : slabdata 104 104 5 Notes: - After Kees Cook patches and this one, we might be able to revert commit dbae2b062824 ("net: skb: introduce and use a single page frag cache") because GRO_MAX_HEAD is also small. - This patch is a NOP for CONFIG_SLOB=y builds. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Paolo Abeni Reviewed-by: Alexander Duyck Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 67 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c1232837cd0c..bdb1e015e32b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -89,6 +89,34 @@ static struct kmem_cache *skbuff_fclone_cache __ro_after_init; #ifdef CONFIG_SKB_EXTENSIONS static struct kmem_cache *skbuff_ext_cache __ro_after_init; #endif + +/* skb_small_head_cache and related code is only supported + * for CONFIG_SLAB and CONFIG_SLUB. + * As soon as SLOB is removed from the kernel, we can clean up this. + */ +#if !defined(CONFIG_SLOB) +# define HAVE_SKB_SMALL_HEAD_CACHE 1 +#endif + +#ifdef HAVE_SKB_SMALL_HEAD_CACHE +static struct kmem_cache *skb_small_head_cache __ro_after_init; + +#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER) + +/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two. + * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique + * size, and we can differentiate heads from skb_small_head_cache + * vs system slabs by looking at their size (skb_end_offset()). + */ +#define SKB_SMALL_HEAD_CACHE_SIZE \ + (is_power_of_2(SKB_SMALL_HEAD_SIZE) ? \ + (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) : \ + SKB_SMALL_HEAD_SIZE) + +#define SKB_SMALL_HEAD_HEADROOM \ + SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE) +#endif /* HAVE_SKB_SMALL_HEAD_CACHE */ + int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS; EXPORT_SYMBOL(sysctl_max_skb_frags); @@ -486,6 +514,23 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node, void *obj; obj_size = SKB_HEAD_ALIGN(*size); +#ifdef HAVE_SKB_SMALL_HEAD_CACHE + if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE && + !(flags & KMALLOC_NOT_NORMAL_BITS)) { + + /* skb_small_head_cache has non power of two size, + * likely forcing SLUB to use order-3 pages. + * We deliberately attempt a NOMEMALLOC allocation only. + */ + obj = kmem_cache_alloc_node(skb_small_head_cache, + flags | __GFP_NOMEMALLOC | __GFP_NOWARN, + node); + if (obj) { + *size = SKB_SMALL_HEAD_CACHE_SIZE; + goto out; + } + } +#endif *size = obj_size = kmalloc_size_roundup(obj_size); /* * Try a regular allocation, when that fails and we're not entitled @@ -805,6 +850,16 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data) return page_pool_return_skb_page(virt_to_page(data)); } +static void skb_kfree_head(void *head, unsigned int end_offset) +{ +#ifdef HAVE_SKB_SMALL_HEAD_CACHE + if (end_offset == SKB_SMALL_HEAD_HEADROOM) + kmem_cache_free(skb_small_head_cache, head); + else +#endif + kfree(head); +} + static void skb_free_head(struct sk_buff *skb) { unsigned char *head = skb->head; @@ -814,7 +869,7 @@ static void skb_free_head(struct sk_buff *skb) return; skb_free_frag(head); } else { - kfree(head); + skb_kfree_head(head, skb_end_offset(skb)); } } @@ -1997,7 +2052,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, return 0; nofrags: - kfree(data); + skb_kfree_head(data, size); nodata: return -ENOMEM; } @@ -4634,6 +4689,13 @@ void __init skb_init(void) 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); +#ifdef HAVE_SKB_SMALL_HEAD_CACHE + skb_small_head_cache = kmem_cache_create("skbuff_small_head", + SKB_SMALL_HEAD_CACHE_SIZE, + 0, + SLAB_HWCACHE_ALIGN | SLAB_PANIC, + NULL); +#endif skb_extensions_init(); } @@ -6298,7 +6360,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, if (skb_cloned(skb)) { /* drop the old head gracefully */ if (skb_orphan_frags(skb, gfp_mask)) { - kfree(data); + skb_kfree_head(data, size); return -ENOMEM; } for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) @@ -6406,7 +6468,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, memcpy((struct skb_shared_info *)(data + size), skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0])); if (skb_orphan_frags(skb, gfp_mask)) { - kfree(data); + skb_kfree_head(data, size); return -ENOMEM; } shinfo = (struct skb_shared_info *)(data + size); @@ -6442,7 +6504,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, /* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */ if (skb_has_frag_list(skb)) kfree_skb_list(skb_shinfo(skb)->frag_list); - kfree(data); + skb_kfree_head(data, size); return -ENOMEM; } skb_release_data(skb, SKB_CONSUMED); -- cgit v1.2.3 From 16d5677ef1041beee18b5709bf5759611ec82875 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 7 Feb 2023 22:11:30 +0000 Subject: rxrpc: Use consume_skb() rather than kfree_skb_reason() Use consume_skb() rather than kfree_skb_reason(). Reported-by: Paolo Abeni Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/skbuff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c index 944320e65ea8..3bcd6ee80396 100644 --- a/net/rxrpc/skbuff.c +++ b/net/rxrpc/skbuff.c @@ -63,7 +63,7 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) if (skb) { int n = atomic_dec_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, why); - kfree_skb_reason(skb, SKB_CONSUMED); + consume_skb(skb); } } @@ -78,6 +78,6 @@ void rxrpc_purge_queue(struct sk_buff_head *list) int n = atomic_dec_return(select_skb_count(skb)); trace_rxrpc_skb(skb, refcount_read(&skb->users), n, rxrpc_skb_put_purge); - kfree_skb_reason(skb, SKB_CONSUMED); + consume_skb(skb); } } -- cgit v1.2.3 From a33395ab85b9b9cff83948a03a1d6d96347935d8 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 29 Nov 2022 12:37:37 +0000 Subject: rxrpc: Fix overwaking on call poking If an rxrpc call is given a poke, it will get woken up unconditionally, even if there's already a poke pending (for which there will have been a wake) or if the call refcount has gone to 0. Fix this by only waking the call if it is still referenced and if it doesn't already have a poke pending. Fixes: 15f661dc95da ("rxrpc: Implement a mechanism to send an event notification to a call") Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/call_object.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 6eaffb0d8fdc..e9f1f49d18c2 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -54,12 +54,14 @@ void rxrpc_poke_call(struct rxrpc_call *call, enum rxrpc_call_poke_trace what) spin_lock_bh(&local->lock); busy = !list_empty(&call->attend_link); trace_rxrpc_poke_call(call, busy, what); + if (!busy && !rxrpc_try_get_call(call, rxrpc_call_get_poke)) + busy = true; if (!busy) { - rxrpc_get_call(call, rxrpc_call_get_poke); list_add_tail(&call->attend_link, &local->call_attend_q); } spin_unlock_bh(&local->lock); - rxrpc_wake_up_io_thread(local); + if (!busy) + rxrpc_wake_up_io_thread(local); } } -- cgit v1.2.3 From f789bff2deb3ddae08950f8e4a1e6f41b916c520 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 31 Jan 2023 15:31:49 +0000 Subject: rxrpc: Trace ack.rwind Log ack.rwind in the rxrpc_tx_ack tracepoint. This value is useful to see as it represents flow-control information to the peer. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- include/trace/events/rxrpc.h | 11 +++++++---- net/rxrpc/conn_event.c | 2 +- net/rxrpc/output.c | 10 +++++++--- 3 files changed, 15 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index d7bb4acf4580..c3c0b0aa8381 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -1118,9 +1118,9 @@ TRACE_EVENT(rxrpc_tx_data, TRACE_EVENT(rxrpc_tx_ack, TP_PROTO(unsigned int call, rxrpc_serial_t serial, rxrpc_seq_t ack_first, rxrpc_serial_t ack_serial, - u8 reason, u8 n_acks), + u8 reason, u8 n_acks, u16 rwind), - TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks), + TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks, rwind), TP_STRUCT__entry( __field(unsigned int, call) @@ -1129,6 +1129,7 @@ TRACE_EVENT(rxrpc_tx_ack, __field(rxrpc_serial_t, ack_serial) __field(u8, reason) __field(u8, n_acks) + __field(u16, rwind) ), TP_fast_assign( @@ -1138,15 +1139,17 @@ TRACE_EVENT(rxrpc_tx_ack, __entry->ack_serial = ack_serial; __entry->reason = reason; __entry->n_acks = n_acks; + __entry->rwind = rwind; ), - TP_printk(" c=%08x ACK %08x %s f=%08x r=%08x n=%u", + TP_printk(" c=%08x ACK %08x %s f=%08x r=%08x n=%u rw=%u", __entry->call, __entry->serial, __print_symbolic(__entry->reason, rxrpc_ack_names), __entry->ack_first, __entry->ack_serial, - __entry->n_acks) + __entry->n_acks, + __entry->rwind) ); TRACE_EVENT(rxrpc_receive, diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 44414e724415..95f4bc206b3d 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -163,7 +163,7 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, trace_rxrpc_tx_ack(chan->call_debug_id, serial, ntohl(pkt.ack.firstPacket), ntohl(pkt.ack.serial), - pkt.ack.reason, 0); + pkt.ack.reason, 0, rxrpc_rx_window_size); break; default: diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 6b2022240076..5e53429c6922 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -80,7 +80,8 @@ static void rxrpc_set_keepalive(struct rxrpc_call *call) */ static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn, struct rxrpc_call *call, - struct rxrpc_txbuf *txb) + struct rxrpc_txbuf *txb, + u16 *_rwind) { struct rxrpc_ackinfo ackinfo; unsigned int qsize, sack, wrap, to; @@ -124,6 +125,7 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn, jmax = rxrpc_rx_jumbo_max; qsize = (window - 1) - call->rx_consumed; rsize = max_t(int, call->rx_winsize - qsize, 0); + *_rwind = rsize; ackinfo.rxMTU = htonl(rxrpc_rx_mtu); ackinfo.maxMTU = htonl(mtu); ackinfo.rwind = htonl(rsize); @@ -190,6 +192,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) rxrpc_serial_t serial; size_t len, n; int ret, rtt_slot = -1; + u16 rwind; if (test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) return -ECONNRESET; @@ -205,7 +208,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) if (txb->ack.reason == RXRPC_ACK_PING) txb->wire.flags |= RXRPC_REQUEST_ACK; - n = rxrpc_fill_out_ack(conn, call, txb); + n = rxrpc_fill_out_ack(conn, call, txb, &rwind); if (n == 0) return 0; @@ -217,7 +220,8 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) txb->wire.serial = htonl(serial); trace_rxrpc_tx_ack(call->debug_id, serial, ntohl(txb->ack.firstPacket), - ntohl(txb->ack.serial), txb->ack.reason, txb->ack.nAcks); + ntohl(txb->ack.serial), txb->ack.reason, txb->ack.nAcks, + rwind); if (txb->ack.reason == RXRPC_ACK_PING) rtt_slot = rxrpc_begin_rtt_probe(call, serial, rxrpc_rtt_tx_ping); -- cgit v1.2.3 From 5a2c5a5b0829ef8bcb5d868145c1d8c1221c5637 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 31 Jan 2023 21:40:18 +0000 Subject: rxrpc: Reduce unnecessary ack transmission rxrpc_recvmsg_data() schedules an ACK to be transmitted every time at least two packets have been consumed and any time it runs out of data and would return -EAGAIN to the caller. Both events may occur within a single loop, however, and if the I/O thread is quick enough it may send duplicate ACKs. The ACKs are sent to inform the peer that more space has been made in the local Rx window, but the I/O thread is going to send an ACK every couple of DATA packets anyway, so we end up sending a lot more ACKs than we really need to. So reduce the rate at which recvmsg() schedules ACKs, such that if the I/O thread sends ACKs at its normal faster rate, recvmsg() won't actually schedule ACKs until the Rx flow stops (call->rx_consumed is cleared any time we transmit an ACK for that call, resetting the counter used by recvmsg). Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org --- net/rxrpc/recvmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index 50d263a6359d..76eb2b9cd936 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -137,7 +137,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call) /* Check to see if there's an ACK that needs sending. */ acked = atomic_add_return(call->rx_consumed - old_consumed, &call->ackr_nr_consumed); - if (acked > 2 && + if (acked > 8 && !test_and_set_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags)) rxrpc_poke_call(call, rxrpc_call_poke_idle); } -- cgit v1.2.3 From cb6b2e11a42decea2afc77df73ec7326db1ac25f Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Mon, 6 Feb 2023 17:56:16 +0200 Subject: devlink: Fix memleak in health diagnose callback The callback devlink_nl_cmd_health_reporter_diagnose_doit() miss devlink_fmsg_free(), which leads to memory leak. Fix it by adding devlink_fmsg_free(). Fixes: e994a75fb7f9 ("devlink: remove reporter reference counting") Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/1675698976-45993-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 97d30ea98b00..9d6373603340 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -6626,18 +6626,22 @@ static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, err = devlink_fmsg_obj_nest_start(fmsg); if (err) - return err; + goto out; err = reporter->ops->diagnose(reporter, fmsg, info->extack); if (err) - return err; + goto out; err = devlink_fmsg_obj_nest_end(fmsg); if (err) - return err; + goto out; + + err = devlink_fmsg_snd(fmsg, info, + DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); - return devlink_fmsg_snd(fmsg, info, - DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); +out: + devlink_fmsg_free(fmsg); + return err; } static int -- cgit v1.2.3 From ecc0cc98632ac80ead7821997fdd5ad9cdede9de Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:26 +0200 Subject: net/sched: taprio: delete peek() implementation There isn't any code in the network stack which calls taprio_peek(). We only see qdisc->ops->peek() being called on child qdiscs of other classful qdiscs, never from the generic qdisc code. Whereas taprio is never a child qdisc, it is always root. This snippet of a comment from qdisc_peek_dequeued() seems to confirm: /* we can reuse ->gso_skb because peek isn't called for root qdiscs */ Since I've been known to be wrong many times though, I'm not completely removing it, but leaving a stub function in place which emits a warning. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 43 +------------------------------------------ 1 file changed, 1 insertion(+), 42 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 1c95785932b9..d9e26ddaa7f2 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -499,50 +499,9 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, return taprio_enqueue_one(skb, sch, child, to_free); } -/* Will not be called in the full offload case, since the TX queues are - * attached to the Qdisc created using qdisc_create_dflt() - */ static struct sk_buff *taprio_peek(struct Qdisc *sch) { - struct taprio_sched *q = qdisc_priv(sch); - struct net_device *dev = qdisc_dev(sch); - struct sched_entry *entry; - struct sk_buff *skb; - u32 gate_mask; - int i; - - rcu_read_lock(); - entry = rcu_dereference(q->current_entry); - gate_mask = entry ? entry->gate_mask : TAPRIO_ALL_GATES_OPEN; - rcu_read_unlock(); - - if (!gate_mask) - return NULL; - - for (i = 0; i < dev->num_tx_queues; i++) { - struct Qdisc *child = q->qdiscs[i]; - int prio; - u8 tc; - - if (unlikely(!child)) - continue; - - skb = child->ops->peek(child); - if (!skb) - continue; - - if (TXTIME_ASSIST_IS_ENABLED(q->flags)) - return skb; - - prio = skb->priority; - tc = netdev_get_prio_tc_map(dev, prio); - - if (!(gate_mask & BIT(tc))) - continue; - - return skb; - } - + WARN_ONCE(1, "taprio only supports operating as root qdisc, peek() not implemented"); return NULL; } -- cgit v1.2.3 From 1638bbbe4ececa615b273497d347d59ad71060a2 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:27 +0200 Subject: net/sched: taprio: continue with other TXQs if one dequeue() failed This changes the handling of an unlikely condition to not stop dequeuing if taprio failed to dequeue the peeked skb in taprio_dequeue(). I've no idea when this can happen, but the only side effect seems to be that the atomic_sub_return() call right above will have consumed some budget. This isn't a big deal, since either that made us remain without any budget (and therefore, we'd exit on the next peeked skb anyway), or we could send some packets from other TXQs. I'm making this change because in a future patch I'll be refactoring the dequeue procedure to simplify it, and this corner case will have to go away. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index d9e26ddaa7f2..0fde303978a5 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -587,7 +587,7 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) skb = child->ops->dequeue(child); if (unlikely(!skb)) - goto done; + continue; skb_found: qdisc_bstats_update(sch, skb); -- cgit v1.2.3 From 92f966674f6a257eddfa60a85f9b6741d6087ccb Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:28 +0200 Subject: net/sched: taprio: refactor one skb dequeue from TXQ to separate function Future changes will refactor the TXQ selection procedure, and a lot of stuff will become messy, the indentation of the bulk of the dequeue procedure would increase, etc. Break out the bulk of the function into a new one, which knows the TXQ (child qdisc) we should perform a dequeue from. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 121 +++++++++++++++++++++++++------------------------ 1 file changed, 63 insertions(+), 58 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 0fde303978a5..272a8b7c0f9f 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -512,6 +512,66 @@ static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry) atomic64_read(&q->picos_per_byte))); } +static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, + struct sched_entry *entry, + u32 gate_mask) +{ + struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + struct Qdisc *child = q->qdiscs[txq]; + struct sk_buff *skb; + ktime_t guard; + int prio; + int len; + u8 tc; + + if (unlikely(!child)) + return NULL; + + if (TXTIME_ASSIST_IS_ENABLED(q->flags)) { + skb = child->ops->dequeue(child); + if (!skb) + return NULL; + goto skb_found; + } + + skb = child->ops->peek(child); + if (!skb) + return NULL; + + prio = skb->priority; + tc = netdev_get_prio_tc_map(dev, prio); + + if (!(gate_mask & BIT(tc))) + return NULL; + + len = qdisc_pkt_len(skb); + guard = ktime_add_ns(taprio_get_time(q), length_to_duration(q, len)); + + /* In the case that there's no gate entry, there's no + * guard band ... + */ + if (gate_mask != TAPRIO_ALL_GATES_OPEN && + ktime_after(guard, entry->close_time)) + return NULL; + + /* ... and no budget. */ + if (gate_mask != TAPRIO_ALL_GATES_OPEN && + atomic_sub_return(len, &entry->budget) < 0) + return NULL; + + skb = child->ops->dequeue(child); + if (unlikely(!skb)) + return NULL; + +skb_found: + qdisc_bstats_update(sch, skb); + qdisc_qstats_backlog_dec(sch, skb); + sch->q.qlen--; + + return skb; +} + /* Will not be called in the full offload case, since the TX queues are * attached to the Qdisc created using qdisc_create_dflt() */ @@ -537,64 +597,9 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) goto done; for (i = 0; i < dev->num_tx_queues; i++) { - struct Qdisc *child = q->qdiscs[i]; - ktime_t guard; - int prio; - int len; - u8 tc; - - if (unlikely(!child)) - continue; - - if (TXTIME_ASSIST_IS_ENABLED(q->flags)) { - skb = child->ops->dequeue(child); - if (!skb) - continue; - goto skb_found; - } - - skb = child->ops->peek(child); - if (!skb) - continue; - - prio = skb->priority; - tc = netdev_get_prio_tc_map(dev, prio); - - if (!(gate_mask & BIT(tc))) { - skb = NULL; - continue; - } - - len = qdisc_pkt_len(skb); - guard = ktime_add_ns(taprio_get_time(q), - length_to_duration(q, len)); - - /* In the case that there's no gate entry, there's no - * guard band ... - */ - if (gate_mask != TAPRIO_ALL_GATES_OPEN && - ktime_after(guard, entry->close_time)) { - skb = NULL; - continue; - } - - /* ... and no budget. */ - if (gate_mask != TAPRIO_ALL_GATES_OPEN && - atomic_sub_return(len, &entry->budget) < 0) { - skb = NULL; - continue; - } - - skb = child->ops->dequeue(child); - if (unlikely(!skb)) - continue; - -skb_found: - qdisc_bstats_update(sch, skb); - qdisc_qstats_backlog_dec(sch, skb); - sch->q.qlen--; - - goto done; + skb = taprio_dequeue_from_txq(sch, i, entry, gate_mask); + if (skb) + goto done; } done: -- cgit v1.2.3 From 4c22942734f0814d3c928c25a80f48df0a6ce45e Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:29 +0200 Subject: net/sched: taprio: avoid calling child->ops->dequeue(child) twice Simplify taprio_dequeue_from_txq() by noticing that we can goto one call earlier than the previous skb_found label. This is possible because we've unified the treatment of the child->ops->dequeue(child) return call, we always try other TXQs now, instead of abandoning the root dequeue completely if we failed in the peek() case. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 272a8b7c0f9f..a3770d599a84 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -528,12 +528,8 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, if (unlikely(!child)) return NULL; - if (TXTIME_ASSIST_IS_ENABLED(q->flags)) { - skb = child->ops->dequeue(child); - if (!skb) - return NULL; - goto skb_found; - } + if (TXTIME_ASSIST_IS_ENABLED(q->flags)) + goto skip_peek_checks; skb = child->ops->peek(child); if (!skb) @@ -560,11 +556,11 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, atomic_sub_return(len, &entry->budget) < 0) return NULL; +skip_peek_checks: skb = child->ops->dequeue(child); if (unlikely(!skb)) return NULL; -skb_found: qdisc_bstats_update(sch, skb); qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; -- cgit v1.2.3 From 2f530df76c8cb5551d7d9395c77eb02282c3dc68 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:30 +0200 Subject: net/sched: taprio: give higher priority to higher TCs in software dequeue mode Current taprio software implementation is haunted by the shadow of the igb/igc hardware model. It iterates over child qdiscs in increasing order of TXQ index, therefore giving higher xmit priority to TXQ 0 and lower to TXQ N. According to discussions with Vinicius, that is the default (perhaps even unchangeable) prioritization scheme used for the NICs that taprio was first written for (igb, igc), and we have a case of two bugs canceling out, resulting in a functional setup on igb/igc, but a less sane one on other NICs. To the best of my understanding, taprio should prioritize based on the traffic class, so it should really dequeue starting with the highest traffic class and going down from there. We get to the TXQ using the tc_to_txq[] netdev property. TXQs within the same TC have the same (strict) priority, so we should pick from them as fairly as we can. We can achieve that by implementing something very similar to q->curband from multiq_dequeue(). Since igb/igc really do have TXQ 0 of higher hardware priority than TXQ 1 etc, we need to preserve the behavior for them as well. We really have no choice, because in txtime-assist mode, taprio is essentially a software scheduler towards offloaded child tc-etf qdiscs, so the TXQ selection really does matter (not all igb TXQs support ETF/SO_TXTIME, says Kurt Kanzenbach). To preserve the behavior, we need a capability bit so that taprio can determine if it's running on igb/igc, or on something else. Because igb doesn't offload taprio at all, we can't piggyback on the qdisc_offload_query_caps() call from taprio_enable_offload(), but instead we need a separate call which is also made for software scheduling. Introduce two static keys to minimize the performance penalty on systems which only have igb/igc NICs, and on systems which only have other NICs. For mixed systems, taprio will have to dynamically check whether to dequeue using one prioritization algorithm or using the other. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- drivers/net/ethernet/intel/igb/igb_main.c | 18 +++++ drivers/net/ethernet/intel/igc/igc_main.c | 6 +- include/net/pkt_sched.h | 5 ++ net/sched/sch_taprio.c | 125 ++++++++++++++++++++++++++++-- 4 files changed, 143 insertions(+), 11 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index c56b991fa610..45fbd8346de7 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2810,6 +2810,22 @@ static int igb_offload_txtime(struct igb_adapter *adapter, return 0; } +static int igb_tc_query_caps(struct igb_adapter *adapter, + struct tc_query_caps_base *base) +{ + switch (base->type) { + case TC_SETUP_QDISC_TAPRIO: { + struct tc_taprio_caps *caps = base->caps; + + caps->broken_mqprio = true; + + return 0; + } + default: + return -EOPNOTSUPP; + } +} + static LIST_HEAD(igb_block_cb_list); static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, @@ -2818,6 +2834,8 @@ static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, struct igb_adapter *adapter = netdev_priv(dev); switch (type) { + case TC_QUERY_CAPS: + return igb_tc_query_caps(adapter, type_data); case TC_SETUP_QDISC_CBS: return igb_offload_cbs(adapter, type_data); case TC_SETUP_BLOCK: diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index cf7f6a5eea3d..4c626f756a8b 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -6214,10 +6214,10 @@ static int igc_tc_query_caps(struct igc_adapter *adapter, case TC_SETUP_QDISC_TAPRIO: { struct tc_taprio_caps *caps = base->caps; - if (hw->mac.type != igc_i225) - return -EOPNOTSUPP; + caps->broken_mqprio = true; - caps->gate_mask_per_txq = true; + if (hw->mac.type == igc_i225) + caps->gate_mask_per_txq = true; return 0; } diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fd889fc4912b..2016839991a4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -177,6 +177,11 @@ struct tc_mqprio_qopt_offload { struct tc_taprio_caps { bool supports_queue_max_sdu:1; bool gate_mask_per_txq:1; + /* Device expects lower TXQ numbers to have higher priority over higher + * TXQs, regardless of their TC mapping. DO NOT USE FOR NEW DRIVERS, + * INSTEAD ENFORCE A PROPER TC:TXQ MAPPING COMING FROM USER SPACE. + */ + bool broken_mqprio:1; }; struct tc_taprio_sched_entry { diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index a3770d599a84..5f57dcfafffd 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -29,6 +29,8 @@ #include "sch_mqprio_lib.h" static LIST_HEAD(taprio_list); +static struct static_key_false taprio_have_broken_mqprio; +static struct static_key_false taprio_have_working_mqprio; #define TAPRIO_ALL_GATES_OPEN -1 @@ -69,6 +71,8 @@ struct taprio_sched { enum tk_offsets tk_offset; int clockid; bool offloaded; + bool detected_mqprio; + bool broken_mqprio; atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+ * speeds it's sub-nanoseconds per byte */ @@ -80,6 +84,7 @@ struct taprio_sched { struct sched_gate_list __rcu *admin_sched; struct hrtimer advance_timer; struct list_head taprio_list; + int cur_txq[TC_MAX_QUEUE]; u32 max_frm_len[TC_MAX_QUEUE]; /* for the fast path */ u32 max_sdu[TC_MAX_QUEUE]; /* for dump and offloading */ u32 txtime_delay; @@ -568,17 +573,78 @@ skip_peek_checks: return skb; } +static void taprio_next_tc_txq(struct net_device *dev, int tc, int *txq) +{ + int offset = dev->tc_to_txq[tc].offset; + int count = dev->tc_to_txq[tc].count; + + (*txq)++; + if (*txq == offset + count) + *txq = offset; +} + +/* Prioritize higher traffic classes, and select among TXQs belonging to the + * same TC using round robin + */ +static struct sk_buff *taprio_dequeue_tc_priority(struct Qdisc *sch, + struct sched_entry *entry, + u32 gate_mask) +{ + struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + int num_tc = netdev_get_num_tc(dev); + struct sk_buff *skb; + int tc; + + for (tc = num_tc - 1; tc >= 0; tc--) { + int first_txq = q->cur_txq[tc]; + + if (!(gate_mask & BIT(tc))) + continue; + + do { + skb = taprio_dequeue_from_txq(sch, q->cur_txq[tc], + entry, gate_mask); + + taprio_next_tc_txq(dev, tc, &q->cur_txq[tc]); + + if (skb) + return skb; + } while (q->cur_txq[tc] != first_txq); + } + + return NULL; +} + +/* Broken way of prioritizing smaller TXQ indices and ignoring the traffic + * class other than to determine whether the gate is open or not + */ +static struct sk_buff *taprio_dequeue_txq_priority(struct Qdisc *sch, + struct sched_entry *entry, + u32 gate_mask) +{ + struct net_device *dev = qdisc_dev(sch); + struct sk_buff *skb; + int i; + + for (i = 0; i < dev->num_tx_queues; i++) { + skb = taprio_dequeue_from_txq(sch, i, entry, gate_mask); + if (skb) + return skb; + } + + return NULL; +} + /* Will not be called in the full offload case, since the TX queues are * attached to the Qdisc created using qdisc_create_dflt() */ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) { struct taprio_sched *q = qdisc_priv(sch); - struct net_device *dev = qdisc_dev(sch); struct sk_buff *skb = NULL; struct sched_entry *entry; u32 gate_mask; - int i; rcu_read_lock(); entry = rcu_dereference(q->current_entry); @@ -588,14 +654,23 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) * "AdminGateStates" */ gate_mask = entry ? entry->gate_mask : TAPRIO_ALL_GATES_OPEN; - if (!gate_mask) goto done; - for (i = 0; i < dev->num_tx_queues; i++) { - skb = taprio_dequeue_from_txq(sch, i, entry, gate_mask); - if (skb) - goto done; + if (static_branch_unlikely(&taprio_have_broken_mqprio) && + !static_branch_likely(&taprio_have_working_mqprio)) { + /* Single NIC kind which is broken */ + skb = taprio_dequeue_txq_priority(sch, entry, gate_mask); + } else if (static_branch_likely(&taprio_have_working_mqprio) && + !static_branch_unlikely(&taprio_have_broken_mqprio)) { + /* Single NIC kind which prioritizes properly */ + skb = taprio_dequeue_tc_priority(sch, entry, gate_mask); + } else { + /* Mixed NIC kinds present in system, need dynamic testing */ + if (q->broken_mqprio) + skb = taprio_dequeue_txq_priority(sch, entry, gate_mask); + else + skb = taprio_dequeue_tc_priority(sch, entry, gate_mask); } done: @@ -1157,6 +1232,34 @@ static void taprio_sched_to_offload(struct net_device *dev, offload->num_entries = i; } +static void taprio_detect_broken_mqprio(struct taprio_sched *q) +{ + struct net_device *dev = qdisc_dev(q->root); + struct tc_taprio_caps caps; + + qdisc_offload_query_caps(dev, TC_SETUP_QDISC_TAPRIO, + &caps, sizeof(caps)); + + q->broken_mqprio = caps.broken_mqprio; + if (q->broken_mqprio) + static_branch_inc(&taprio_have_broken_mqprio); + else + static_branch_inc(&taprio_have_working_mqprio); + + q->detected_mqprio = true; +} + +static void taprio_cleanup_broken_mqprio(struct taprio_sched *q) +{ + if (!q->detected_mqprio) + return; + + if (q->broken_mqprio) + static_branch_dec(&taprio_have_broken_mqprio); + else + static_branch_dec(&taprio_have_working_mqprio); +} + static int taprio_enable_offload(struct net_device *dev, struct taprio_sched *q, struct sched_gate_list *sched, @@ -1538,10 +1641,12 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, err = netdev_set_num_tc(dev, mqprio->num_tc); if (err) goto free_sched; - for (i = 0; i < mqprio->num_tc; i++) + for (i = 0; i < mqprio->num_tc; i++) { netdev_set_tc_queue(dev, i, mqprio->count[i], mqprio->offset[i]); + q->cur_txq[i] = mqprio->offset[i]; + } /* Always use supplied priority mappings */ for (i = 0; i <= TC_BITMASK; i++) @@ -1676,6 +1781,8 @@ static void taprio_destroy(struct Qdisc *sch) if (admin) call_rcu(&admin->rcu, taprio_free_sched_cb); + + taprio_cleanup_broken_mqprio(q); } static int taprio_init(struct Qdisc *sch, struct nlattr *opt, @@ -1740,6 +1847,8 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt, q->qdiscs[i] = qdisc; } + taprio_detect_broken_mqprio(q); + return taprio_change(sch, opt, extack); } -- cgit v1.2.3 From a306a90c8ffe1c4a29f8e8a1221d1c000e58a410 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:31 +0200 Subject: net/sched: taprio: calculate tc gate durations Current taprio code operates on a very simplistic (and incorrect) assumption: that egress scheduling for a traffic class can only take place for the duration of the current interval, or i.o.w., it assumes that at the end of each schedule entry, there is a "gate close" event for all traffic classes. As an example, traffic sent with the schedule below will be jumpy, even though all 8 TC gates are open, so there is absolutely no "gate close" event (effectively a transition from BIT(tc)==1 to BIT(tc)==0 in consecutive schedule entries): tc qdisc replace dev veth0 parent root taprio \ num_tc 2 \ map 0 1 \ queues 1@0 1@1 \ base-time 0 \ sched-entry S 0xff 4000000000 \ clockid CLOCK_TAI \ flags 0x0 This qdisc simply does not have what it takes in terms of logic to *actually* compute the durations of traffic classes. Also, it does not recognize the need to use this information on a per-traffic-class basis: it always looks at entry->interval and entry->close_time. This change proposes that each schedule entry has an array called tc_gate_duration[tc]. This holds the information: "for how long will this traffic class gate remain open, starting from *this* schedule entry". If the traffic class gate is always open, that value is equal to the cycle time of the schedule. We'll also need to keep track, for the purpose of queueMaxSDU[tc] calculation, what is the maximum time duration for a traffic class having an open gate. This gives us directly what is the maximum sized packet that this traffic class will have to accept. For everything else it has to qdisc_drop() it in qdisc_enqueue(). Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 5f57dcfafffd..7d897bbd48ca 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -39,6 +39,10 @@ static struct static_key_false taprio_have_working_mqprio; #define TAPRIO_FLAGS_INVALID U32_MAX struct sched_entry { + /* Durations between this GCL entry and the GCL entry where the + * respective traffic class gate closes + */ + u64 gate_duration[TC_MAX_QUEUE]; struct list_head list; /* The instant that this entry "closes" and the next one @@ -55,6 +59,10 @@ struct sched_entry { }; struct sched_gate_list { + /* Longest non-zero contiguous gate durations per traffic class, + * or 0 if a traffic class gate never opens during the schedule. + */ + u64 max_open_gate_duration[TC_MAX_QUEUE]; struct rcu_head rcu; struct list_head entries; size_t num_entries; @@ -95,6 +103,51 @@ struct __tc_taprio_qopt_offload { struct tc_taprio_qopt_offload offload; }; +static void taprio_calculate_gate_durations(struct taprio_sched *q, + struct sched_gate_list *sched) +{ + struct net_device *dev = qdisc_dev(q->root); + int num_tc = netdev_get_num_tc(dev); + struct sched_entry *entry, *cur; + int tc; + + list_for_each_entry(entry, &sched->entries, list) { + u32 gates_still_open = entry->gate_mask; + + /* For each traffic class, calculate each open gate duration, + * starting at this schedule entry and ending at the schedule + * entry containing a gate close event for that TC. + */ + cur = entry; + + do { + if (!gates_still_open) + break; + + for (tc = 0; tc < num_tc; tc++) { + if (!(gates_still_open & BIT(tc))) + continue; + + if (cur->gate_mask & BIT(tc)) + entry->gate_duration[tc] += cur->interval; + else + gates_still_open &= ~BIT(tc); + } + + cur = list_next_entry_circular(cur, &sched->entries, list); + } while (cur != entry); + + /* Keep track of the maximum gate duration for each traffic + * class, taking care to not confuse a traffic class which is + * temporarily closed with one that is always closed. + */ + for (tc = 0; tc < num_tc; tc++) + if (entry->gate_duration[tc] && + sched->max_open_gate_duration[tc] < entry->gate_duration[tc]) + sched->max_open_gate_duration[tc] = entry->gate_duration[tc]; + } +} + static ktime_t sched_base_time(const struct sched_gate_list *sched) { if (!sched) @@ -953,6 +1006,8 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb, new->cycle_time = cycle; } + taprio_calculate_gate_durations(q, new); + return 0; } -- cgit v1.2.3 From e5517551112ff2395611e552443932152f83672d Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:32 +0200 Subject: net/sched: taprio: rename close_time to end_time There is a confusion in terms in taprio which makes what is called "close_time" to be actually used for 2 things: 1. determining when an entry "closes" such that transmitted skbs are never allowed to overrun that time (?!) 2. an aid for determining when to advance and/or restart the schedule using the hrtimer It makes more sense to call this so-called "close_time" "end_time", because it's not clear at all to me what "closes". Future patches will hopefully make better use of the term "to close". This is an absolutely mechanical change. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 52 +++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 26 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 7d897bbd48ca..3e798c8406ae 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -45,11 +45,11 @@ struct sched_entry { u64 gate_duration[TC_MAX_QUEUE]; struct list_head list; - /* The instant that this entry "closes" and the next one + /* The instant that this entry ends and the next one * should open, the qdisc will make some effort so that no * packet leaves after this time. */ - ktime_t close_time; + ktime_t end_time; ktime_t next_txtime; atomic_t budget; int index; @@ -66,7 +66,7 @@ struct sched_gate_list { struct rcu_head rcu; struct list_head entries; size_t num_entries; - ktime_t cycle_close_time; + ktime_t cycle_end_time; s64 cycle_time; s64 cycle_time_extension; s64 base_time; @@ -606,7 +606,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, * guard band ... */ if (gate_mask != TAPRIO_ALL_GATES_OPEN && - ktime_after(guard, entry->close_time)) + ktime_after(guard, entry->end_time)) return NULL; /* ... and no budget. */ @@ -738,7 +738,7 @@ static bool should_restart_cycle(const struct sched_gate_list *oper, if (list_is_last(&entry->list, &oper->entries)) return true; - if (ktime_compare(entry->close_time, oper->cycle_close_time) == 0) + if (ktime_compare(entry->end_time, oper->cycle_end_time) == 0) return true; return false; @@ -746,7 +746,7 @@ static bool should_restart_cycle(const struct sched_gate_list *oper, static bool should_change_schedules(const struct sched_gate_list *admin, const struct sched_gate_list *oper, - ktime_t close_time) + ktime_t end_time) { ktime_t next_base_time, extension_time; @@ -755,18 +755,18 @@ static bool should_change_schedules(const struct sched_gate_list *admin, next_base_time = sched_base_time(admin); - /* This is the simple case, the close_time would fall after + /* This is the simple case, the end_time would fall after * the next schedule base_time. */ - if (ktime_compare(next_base_time, close_time) <= 0) + if (ktime_compare(next_base_time, end_time) <= 0) return true; - /* This is the cycle_time_extension case, if the close_time + /* This is the cycle_time_extension case, if the end_time * plus the amount that can be extended would fall after the * next schedule base_time, we can extend the current schedule * for that amount. */ - extension_time = ktime_add_ns(close_time, oper->cycle_time_extension); + extension_time = ktime_add_ns(end_time, oper->cycle_time_extension); /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about * how precisely the extension should be made. So after @@ -785,7 +785,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) struct sched_gate_list *oper, *admin; struct sched_entry *entry, *next; struct Qdisc *sch = q->root; - ktime_t close_time; + ktime_t end_time; spin_lock(&q->current_entry_lock); entry = rcu_dereference_protected(q->current_entry, @@ -804,41 +804,41 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) * entry of all schedules are pre-calculated during the * schedule initialization. */ - if (unlikely(!entry || entry->close_time == oper->base_time)) { + if (unlikely(!entry || entry->end_time == oper->base_time)) { next = list_first_entry(&oper->entries, struct sched_entry, list); - close_time = next->close_time; + end_time = next->end_time; goto first_run; } if (should_restart_cycle(oper, entry)) { next = list_first_entry(&oper->entries, struct sched_entry, list); - oper->cycle_close_time = ktime_add_ns(oper->cycle_close_time, - oper->cycle_time); + oper->cycle_end_time = ktime_add_ns(oper->cycle_end_time, + oper->cycle_time); } else { next = list_next_entry(entry, list); } - close_time = ktime_add_ns(entry->close_time, next->interval); - close_time = min_t(ktime_t, close_time, oper->cycle_close_time); + end_time = ktime_add_ns(entry->end_time, next->interval); + end_time = min_t(ktime_t, end_time, oper->cycle_end_time); - if (should_change_schedules(admin, oper, close_time)) { + if (should_change_schedules(admin, oper, end_time)) { /* Set things so the next time this runs, the new * schedule runs. */ - close_time = sched_base_time(admin); + end_time = sched_base_time(admin); switch_schedules(q, &admin, &oper); } - next->close_time = close_time; + next->end_time = end_time; taprio_set_budget(q, next); first_run: rcu_assign_pointer(q->current_entry, next); spin_unlock(&q->current_entry_lock); - hrtimer_set_expires(&q->advance_timer, close_time); + hrtimer_set_expires(&q->advance_timer, end_time); rcu_read_lock(); __netif_schedule(sch); @@ -1076,8 +1076,8 @@ static int taprio_get_start_time(struct Qdisc *sch, return 0; } -static void setup_first_close_time(struct taprio_sched *q, - struct sched_gate_list *sched, ktime_t base) +static void setup_first_end_time(struct taprio_sched *q, + struct sched_gate_list *sched, ktime_t base) { struct sched_entry *first; ktime_t cycle; @@ -1088,9 +1088,9 @@ static void setup_first_close_time(struct taprio_sched *q, cycle = sched->cycle_time; /* FIXME: find a better place to do this */ - sched->cycle_close_time = ktime_add_ns(base, cycle); + sched->cycle_end_time = ktime_add_ns(base, cycle); - first->close_time = ktime_add_ns(base, first->interval); + first->end_time = ktime_add_ns(base, first->interval); taprio_set_budget(q, first); rcu_assign_pointer(q->current_entry, NULL); } @@ -1756,7 +1756,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, if (admin) call_rcu(&admin->rcu, taprio_free_sched_cb); } else { - setup_first_close_time(q, new_admin, start); + setup_first_end_time(q, new_admin, start); /* Protects against advance_sched() */ spin_lock_irqsave(&q->current_entry_lock, flags); -- cgit v1.2.3 From d2ad689dec10d4f61647f6963e2c94113049ed6c Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:33 +0200 Subject: net/sched: taprio: calculate budgets per traffic class Currently taprio assumes that the budget for a traffic class expires at the end of the current interval as if the next interval contains a "gate close" event for this traffic class. This is, however, an unfounded assumption. Allow schedule entry intervals to be fused together for a particular traffic class by calculating the budget until the gate *actually* closes. This means we need to keep budgets per traffic class, and we also need to update the budget consumption procedure. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 54 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 46 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 3e798c8406ae..08099c1747cc 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -43,6 +43,7 @@ struct sched_entry { * respective traffic class gate closes */ u64 gate_duration[TC_MAX_QUEUE]; + atomic_t budget[TC_MAX_QUEUE]; struct list_head list; /* The instant that this entry ends and the next one @@ -51,7 +52,6 @@ struct sched_entry { */ ktime_t end_time; ktime_t next_txtime; - atomic_t budget; int index; u32 gate_mask; u32 interval; @@ -563,11 +563,48 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch) return NULL; } -static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry) +static void taprio_set_budgets(struct taprio_sched *q, + struct sched_gate_list *sched, + struct sched_entry *entry) { - atomic_set(&entry->budget, - div64_u64((u64)entry->interval * PSEC_PER_NSEC, - atomic64_read(&q->picos_per_byte))); + struct net_device *dev = qdisc_dev(q->root); + int num_tc = netdev_get_num_tc(dev); + int tc, budget; + + for (tc = 0; tc < num_tc; tc++) { + /* Traffic classes which never close have infinite budget */ + if (entry->gate_duration[tc] == sched->cycle_time) + budget = INT_MAX; + else + budget = div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC, + atomic64_read(&q->picos_per_byte)); + + atomic_set(&entry->budget[tc], budget); + } +} + +/* When an skb is sent, it consumes from the budget of all traffic classes */ +static int taprio_update_budgets(struct sched_entry *entry, size_t len, + int tc_consumed, int num_tc) +{ + int tc, budget, new_budget = 0; + + for (tc = 0; tc < num_tc; tc++) { + budget = atomic_read(&entry->budget[tc]); + /* Don't consume from infinite budget */ + if (budget == INT_MAX) { + if (tc == tc_consumed) + new_budget = budget; + continue; + } + + if (tc == tc_consumed) + new_budget = atomic_sub_return(len, &entry->budget[tc]); + else + atomic_sub(len, &entry->budget[tc]); + } + + return new_budget; } static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, @@ -577,6 +614,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, struct taprio_sched *q = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); struct Qdisc *child = q->qdiscs[txq]; + int num_tc = netdev_get_num_tc(dev); struct sk_buff *skb; ktime_t guard; int prio; @@ -611,7 +649,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, /* ... and no budget. */ if (gate_mask != TAPRIO_ALL_GATES_OPEN && - atomic_sub_return(len, &entry->budget) < 0) + taprio_update_budgets(entry, len, tc, num_tc) < 0) return NULL; skip_peek_checks: @@ -832,7 +870,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) } next->end_time = end_time; - taprio_set_budget(q, next); + taprio_set_budgets(q, oper, next); first_run: rcu_assign_pointer(q->current_entry, next); @@ -1091,7 +1129,7 @@ static void setup_first_end_time(struct taprio_sched *q, sched->cycle_end_time = ktime_add_ns(base, cycle); first->end_time = ktime_add_ns(base, first->interval); - taprio_set_budget(q, first); + taprio_set_budgets(q, sched, first); rcu_assign_pointer(q->current_entry, NULL); } -- cgit v1.2.3 From a1e6ad30fa193962b5aa61ea4d12ee83a7ce9020 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:34 +0200 Subject: net/sched: taprio: calculate guard band against actual TC gate close time taprio_dequeue_from_txq() looks at the entry->end_time to determine whether the skb will overrun its traffic class gate, as if at the end of the schedule entry there surely is a "gate close" event for it. Hint: maybe there isn't. For each schedule entry, introduce an array of kernel times which actually tracks when in the future will there be an *actual* gate close event for that traffic class, and use that in the guard band overrun calculation. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 40 ++++++++++++++++++++++++++++++++++------ 1 file changed, 34 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 08099c1747cc..e625f8f8704f 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -44,12 +44,12 @@ struct sched_entry { */ u64 gate_duration[TC_MAX_QUEUE]; atomic_t budget[TC_MAX_QUEUE]; - struct list_head list; - - /* The instant that this entry ends and the next one - * should open, the qdisc will make some effort so that no - * packet leaves after this time. + /* The qdisc makes some effort so that no packet leaves + * after this time */ + ktime_t gate_close_time[TC_MAX_QUEUE]; + struct list_head list; + /* Used to calculate when to advance the schedule */ ktime_t end_time; ktime_t next_txtime; int index; @@ -148,6 +148,12 @@ static void taprio_calculate_gate_durations(struct taprio_sched *q, } } +static bool taprio_entry_allows_tx(ktime_t skb_end_time, + struct sched_entry *entry, int tc) +{ + return ktime_before(skb_end_time, entry->gate_close_time[tc]); +} + static ktime_t sched_base_time(const struct sched_gate_list *sched) { if (!sched) @@ -644,7 +650,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq, * guard band ... */ if (gate_mask != TAPRIO_ALL_GATES_OPEN && - ktime_after(guard, entry->end_time)) + !taprio_entry_allows_tx(guard, entry, tc)) return NULL; /* ... and no budget. */ @@ -820,10 +826,13 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) { struct taprio_sched *q = container_of(timer, struct taprio_sched, advance_timer); + struct net_device *dev = qdisc_dev(q->root); struct sched_gate_list *oper, *admin; + int num_tc = netdev_get_num_tc(dev); struct sched_entry *entry, *next; struct Qdisc *sch = q->root; ktime_t end_time; + int tc; spin_lock(&q->current_entry_lock); entry = rcu_dereference_protected(q->current_entry, @@ -861,6 +870,14 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) end_time = ktime_add_ns(entry->end_time, next->interval); end_time = min_t(ktime_t, end_time, oper->cycle_end_time); + for (tc = 0; tc < num_tc; tc++) { + if (next->gate_duration[tc] == oper->cycle_time) + next->gate_close_time[tc] = KTIME_MAX; + else + next->gate_close_time[tc] = ktime_add_ns(entry->end_time, + next->gate_duration[tc]); + } + if (should_change_schedules(admin, oper, end_time)) { /* Set things so the next time this runs, the new * schedule runs. @@ -1117,8 +1134,11 @@ static int taprio_get_start_time(struct Qdisc *sch, static void setup_first_end_time(struct taprio_sched *q, struct sched_gate_list *sched, ktime_t base) { + struct net_device *dev = qdisc_dev(q->root); + int num_tc = netdev_get_num_tc(dev); struct sched_entry *first; ktime_t cycle; + int tc; first = list_first_entry(&sched->entries, struct sched_entry, list); @@ -1130,6 +1150,14 @@ static void setup_first_end_time(struct taprio_sched *q, first->end_time = ktime_add_ns(base, first->interval); taprio_set_budgets(q, sched, first); + + for (tc = 0; tc < num_tc; tc++) { + if (first->gate_duration[tc] == sched->cycle_time) + first->gate_close_time[tc] = KTIME_MAX; + else + first->gate_close_time[tc] = ktime_add_ns(base, first->gate_duration[tc]); + } + rcu_assign_pointer(q->current_entry, NULL); } -- cgit v1.2.3 From 1f62879e36324d710dfd5ab92833fb1c51d8334a Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:35 +0200 Subject: net/sched: make stab available before ops->init() call Some qdiscs like taprio turn out to be actually pretty reliant on a well configured stab, to not underestimate the skb transmission time (by properly accounting for L1 overhead). In a future change, taprio will need the stab, if configured by the user, to be available at ops->init() time. It will become even more important in upcoming work, when the overhead will be used for the queueMaxSDU calculation that is passed to an offloading driver. However, rcu_assign_pointer(sch->stab, stab) is called right after ops->init(), making it unavailable, and I don't really see a good reason for that. Move it earlier, which nicely seems to simplify the error handling path as well. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_api.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) (limited to 'net') diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index c14018a8052c..e9780631b5b5 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1282,12 +1282,6 @@ static struct Qdisc *qdisc_create(struct net_device *dev, if (err) goto err_out3; - if (ops->init) { - err = ops->init(sch, tca[TCA_OPTIONS], extack); - if (err != 0) - goto err_out5; - } - if (tca[TCA_STAB]) { stab = qdisc_get_stab(tca[TCA_STAB], extack); if (IS_ERR(stab)) { @@ -1296,11 +1290,18 @@ static struct Qdisc *qdisc_create(struct net_device *dev, } rcu_assign_pointer(sch->stab, stab); } + + if (ops->init) { + err = ops->init(sch, tca[TCA_OPTIONS], extack); + if (err != 0) + goto err_out5; + } + if (tca[TCA_RATE]) { err = -EOPNOTSUPP; if (sch->flags & TCQ_F_MQROOT) { NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to a multi-queue root qdisc"); - goto err_out4; + goto err_out5; } err = gen_new_estimator(&sch->bstats, @@ -1311,7 +1312,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, tca[TCA_RATE]); if (err) { NL_SET_ERR_MSG(extack, "Failed to generate new estimator"); - goto err_out4; + goto err_out5; } } @@ -1321,6 +1322,8 @@ static struct Qdisc *qdisc_create(struct net_device *dev, return sch; err_out5: + qdisc_put_stab(rtnl_dereference(sch->stab)); +err_out4: /* ops->init() failed, we call ->destroy() like qdisc_create_dflt() */ if (ops->destroy) ops->destroy(sch); @@ -1332,16 +1335,6 @@ err_out2: err_out: *errp = err; return NULL; - -err_out4: - /* - * Any broken qdiscs that would require a ops->reset() here? - * The qdisc was never in action so it shouldn't be necessary. - */ - qdisc_put_stab(rtnl_dereference(sch->stab)); - if (ops->destroy) - ops->destroy(sch); - goto err_out3; } static int qdisc_change(struct Qdisc *sch, struct nlattr **tca, -- cgit v1.2.3 From a3d91b2c6f6b8ef88785bf8d2fba74916af6c7c5 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:36 +0200 Subject: net/sched: taprio: warn about missing size table Vinicius intended taprio to take the L1 overhead into account when estimating packet transmission time through user input, specifically through the qdisc size table (man tc-stab). Something like this: tc qdisc replace dev $eth root stab overhead 24 taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 \ sched-entry S 0x7e 9000000 \ sched-entry S 0x82 1000000 \ max-sdu 0 0 0 0 0 0 0 200 \ flags 0x0 clockid CLOCK_TAI Without the overhead being specified, transmission times will be underestimated and will cause late transmissions. For an offloading driver, it might even cause TX hangs if there is no open gate large enough to send the maximum sized packets for that TC (including L1 overhead). Properly knowing the L1 overhead will ensure that we are able to auto-calculate the queueMaxSDU per traffic class just right, and avoid these hangs due to head-of-line blocking. We can't make the stab mandatory due to existing setups, but we can warn the user that it's important with a warning netlink extack. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220505160357.298794-1-vladimir.oltean@nxp.com/ Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index e625f8f8704f..7553bc82cf6f 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1690,6 +1690,7 @@ static int taprio_new_flags(const struct nlattr *attr, u32 old, static int taprio_change(struct Qdisc *sch, struct nlattr *opt, struct netlink_ext_ack *extack) { + struct qdisc_size_table *stab = rtnl_dereference(sch->stab); struct nlattr *tb[TCA_TAPRIO_ATTR_MAX + 1] = { }; struct sched_gate_list *oper, *admin, *new_admin; struct taprio_sched *q = qdisc_priv(sch); @@ -1842,6 +1843,10 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, new_admin = NULL; err = 0; + if (!stab) + NL_SET_ERR_MSG_MOD(extack, + "Size table not specified, frame length estimations may be inaccurate"); + unlock: spin_unlock_bh(qdisc_lock(sch)); -- cgit v1.2.3 From a878fd46fe43ec97f3f8664173fe1d23984c3453 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:37 +0200 Subject: net/sched: keep the max_frm_len information inside struct sched_gate_list I have one practical reason for doing this and one concerning correctness. The practical reason has to do with a follow-up patch, which aims to mix 2 sources of max_sdu (one coming from the user and the other automatically calculated based on TC gate durations @current link speed). Among those 2 sources of input, we must always select the smaller max_sdu value, but this can change at various link speeds. So the max_sdu coming from the user must be kept separated from the value that is operationally used (the minimum of the 2), because otherwise we overwrite it and forget what the user asked us to do. To solve that, this patch proposes that struct sched_gate_list contains the operationally active max_frm_len, and q->max_sdu contains just what was requested by the user. The reason having to do with correctness is based on the following observation: the admin sched_gate_list becomes operational at a given base_time in the future. Until then, it is inactive and applies no shaping, all gates are open, etc. So the queueMaxSDU dropping shouldn't apply either (this is a mechanism to ensure that packets smaller than the largest gate duration for that TC don't hang the port; clearly it makes little sense if the gates are always open). Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 53 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 13 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 7553bc82cf6f..d3e3be543fae 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -63,6 +63,7 @@ struct sched_gate_list { * or 0 if a traffic class gate never opens during the schedule. */ u64 max_open_gate_duration[TC_MAX_QUEUE]; + u32 max_frm_len[TC_MAX_QUEUE]; /* for the fast path */ struct rcu_head rcu; struct list_head entries; size_t num_entries; @@ -93,7 +94,6 @@ struct taprio_sched { struct hrtimer advance_timer; struct list_head taprio_list; int cur_txq[TC_MAX_QUEUE]; - u32 max_frm_len[TC_MAX_QUEUE]; /* for the fast path */ u32 max_sdu[TC_MAX_QUEUE]; /* for dump and offloading */ u32 txtime_delay; }; @@ -246,6 +246,21 @@ static int length_to_duration(struct taprio_sched *q, int len) return div_u64(len * atomic64_read(&q->picos_per_byte), PSEC_PER_NSEC); } +static void taprio_update_queue_max_sdu(struct taprio_sched *q, + struct sched_gate_list *sched) +{ + struct net_device *dev = qdisc_dev(q->root); + int num_tc = netdev_get_num_tc(dev); + int tc; + + for (tc = 0; tc < num_tc; tc++) { + if (q->max_sdu[tc]) + sched->max_frm_len[tc] = q->max_sdu[tc] + dev->hard_header_len; + else + sched->max_frm_len[tc] = U32_MAX; /* never oversized */ + } +} + /* Returns the entry corresponding to next available interval. If * validate_interval is set, it only validates whether the timestamp occurs * when the gate corresponding to the skb's traffic class is open. @@ -479,14 +494,33 @@ done: return txtime; } -static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, - struct Qdisc *child, struct sk_buff **to_free) +/* Devices with full offload are expected to honor this in hardware */ +static bool taprio_skb_exceeds_queue_max_sdu(struct Qdisc *sch, + struct sk_buff *skb) { struct taprio_sched *q = qdisc_priv(sch); struct net_device *dev = qdisc_dev(sch); + struct sched_gate_list *sched; int prio = skb->priority; + bool exceeds = false; u8 tc; + tc = netdev_get_prio_tc_map(dev, prio); + + rcu_read_lock(); + sched = rcu_dereference(q->oper_sched); + if (sched && skb->len > sched->max_frm_len[tc]) + exceeds = true; + rcu_read_unlock(); + + return exceeds; +} + +static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, + struct Qdisc *child, struct sk_buff **to_free) +{ + struct taprio_sched *q = qdisc_priv(sch); + /* sk_flags are only safe to use on full sockets. */ if (skb->sk && sk_fullsock(skb->sk) && sock_flag(skb->sk, SOCK_TXTIME)) { if (!is_valid_interval(skb, sch)) @@ -497,9 +531,7 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, return qdisc_drop(skb, sch, to_free); } - /* Devices with full offload are expected to honor this in hardware */ - tc = netdev_get_prio_tc_map(dev, prio); - if (skb->len > q->max_frm_len[tc]) + if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) return qdisc_drop(skb, sch, to_free); qdisc_qstats_backlog_inc(sch, skb); @@ -1609,7 +1641,6 @@ static int taprio_parse_tc_entries(struct Qdisc *sch, struct netlink_ext_ack *extack) { struct taprio_sched *q = qdisc_priv(sch); - struct net_device *dev = qdisc_dev(sch); u32 max_sdu[TC_QOPT_MAX_QUEUE]; unsigned long seen_tcs = 0; struct nlattr *n; @@ -1628,13 +1659,8 @@ static int taprio_parse_tc_entries(struct Qdisc *sch, goto out; } - for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) { + for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) q->max_sdu[tc] = max_sdu[tc]; - if (max_sdu[tc]) - q->max_frm_len[tc] = max_sdu[tc] + dev->hard_header_len; - else - q->max_frm_len[tc] = U32_MAX; /* never oversized */ - } out: return err; @@ -1758,6 +1784,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, goto free_sched; taprio_set_picos_per_byte(dev, q); + taprio_update_queue_max_sdu(q, new_admin); if (mqprio) { err = netdev_set_num_tc(dev, mqprio->num_tc); -- cgit v1.2.3 From fed87cc6718ad5f80aa739fee3c5979a8b09d3a6 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:38 +0200 Subject: net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations taprio today has a huge problem with small TC gate durations, because it might accept packets in taprio_enqueue() which will never be sent by taprio_dequeue(). Since not much infrastructure was available, a kludge was added in commit 497cc00224cf ("taprio: Handle short intervals and large packets"), which segmented large TCP segments, but the fact of the matter is that the issue isn't specific to large TCP segments (and even worse, the performance penalty in segmenting those is absolutely huge). In commit a54fc09e4cba ("net/sched: taprio: allow user input of per-tc max SDU"), taprio gained support for queueMaxSDU, which is precisely the mechanism through which packets should be dropped at qdisc_enqueue() if they cannot be sent. After that patch, it was necessary for the user to manually limit the maximum MTU per TC. This change adds the necessary logic for taprio to further limit the values specified (or not specified) by the user to some minimum values which never allow oversized packets to be sent. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 70 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 60 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index d3e3be543fae..e7163d6fab77 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -64,6 +64,7 @@ struct sched_gate_list { */ u64 max_open_gate_duration[TC_MAX_QUEUE]; u32 max_frm_len[TC_MAX_QUEUE]; /* for the fast path */ + u32 max_sdu[TC_MAX_QUEUE]; /* for dump */ struct rcu_head rcu; struct list_head entries; size_t num_entries; @@ -94,7 +95,7 @@ struct taprio_sched { struct hrtimer advance_timer; struct list_head taprio_list; int cur_txq[TC_MAX_QUEUE]; - u32 max_sdu[TC_MAX_QUEUE]; /* for dump and offloading */ + u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */ u32 txtime_delay; }; @@ -246,18 +247,52 @@ static int length_to_duration(struct taprio_sched *q, int len) return div_u64(len * atomic64_read(&q->picos_per_byte), PSEC_PER_NSEC); } +static int duration_to_length(struct taprio_sched *q, u64 duration) +{ + return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte)); +} + +/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the + * q->max_sdu[] requested by the user and the max_sdu dynamically determined by + * the maximum open gate durations at the given link speed. + */ static void taprio_update_queue_max_sdu(struct taprio_sched *q, - struct sched_gate_list *sched) + struct sched_gate_list *sched, + struct qdisc_size_table *stab) { struct net_device *dev = qdisc_dev(q->root); int num_tc = netdev_get_num_tc(dev); + u32 max_sdu_from_user; + u32 max_sdu_dynamic; + u32 max_sdu; int tc; for (tc = 0; tc < num_tc; tc++) { - if (q->max_sdu[tc]) - sched->max_frm_len[tc] = q->max_sdu[tc] + dev->hard_header_len; - else + max_sdu_from_user = q->max_sdu[tc] ?: U32_MAX; + + /* TC gate never closes => keep the queueMaxSDU + * selected by the user + */ + if (sched->max_open_gate_duration[tc] == sched->cycle_time) { + max_sdu_dynamic = U32_MAX; + } else { + u32 max_frm_len; + + max_frm_len = duration_to_length(q, sched->max_open_gate_duration[tc]); + if (stab) + max_frm_len -= stab->szopts.overhead; + max_sdu_dynamic = max_frm_len - dev->hard_header_len; + } + + max_sdu = min(max_sdu_dynamic, max_sdu_from_user); + + if (max_sdu != U32_MAX) { + sched->max_frm_len[tc] = max_sdu + dev->hard_header_len; + sched->max_sdu[tc] = max_sdu; + } else { sched->max_frm_len[tc] = U32_MAX; /* never oversized */ + sched->max_sdu[tc] = 0; + } } } @@ -1243,6 +1278,8 @@ static int taprio_dev_notifier(struct notifier_block *nb, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct sched_gate_list *oper, *admin; + struct qdisc_size_table *stab; struct taprio_sched *q; ASSERT_RTNL(); @@ -1255,6 +1292,17 @@ static int taprio_dev_notifier(struct notifier_block *nb, unsigned long event, continue; taprio_set_picos_per_byte(dev, q); + + stab = rtnl_dereference(q->root->stab); + + oper = rtnl_dereference(q->oper_sched); + if (oper) + taprio_update_queue_max_sdu(q, oper, stab); + + admin = rtnl_dereference(q->admin_sched); + if (admin) + taprio_update_queue_max_sdu(q, admin, stab); + break; } @@ -1654,7 +1702,8 @@ static int taprio_parse_tc_entries(struct Qdisc *sch, if (nla_type(n) != TCA_TAPRIO_ATTR_TC_ENTRY) continue; - err = taprio_parse_tc_entry(sch, n, max_sdu, &seen_tcs, extack); + err = taprio_parse_tc_entry(sch, n, max_sdu, &seen_tcs, + extack); if (err) goto out; } @@ -1784,7 +1833,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, goto free_sched; taprio_set_picos_per_byte(dev, q); - taprio_update_queue_max_sdu(q, new_admin); + taprio_update_queue_max_sdu(q, new_admin, stab); if (mqprio) { err = netdev_set_num_tc(dev, mqprio->num_tc); @@ -2142,7 +2191,8 @@ error_nest: return -1; } -static int taprio_dump_tc_entries(struct taprio_sched *q, struct sk_buff *skb) +static int taprio_dump_tc_entries(struct sk_buff *skb, + struct sched_gate_list *sched) { struct nlattr *n; int tc; @@ -2156,7 +2206,7 @@ static int taprio_dump_tc_entries(struct taprio_sched *q, struct sk_buff *skb) goto nla_put_failure; if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_MAX_SDU, - q->max_sdu[tc])) + sched->max_sdu[tc])) goto nla_put_failure; nla_nest_end(skb, n); @@ -2200,7 +2250,7 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay)) goto options_error; - if (taprio_dump_tc_entries(q, skb)) + if (oper && taprio_dump_tc_entries(skb, oper)) goto options_error; if (oper && dump_schedule(skb, oper)) -- cgit v1.2.3 From 2d5e8071c47a03218f3f658ed13b8a9ff703b396 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:39 +0200 Subject: net/sched: taprio: split segmentation logic from qdisc_enqueue() The majority of the taprio_enqueue()'s function is spent doing TCP segmentation, which doesn't look right to me. Compilers shouldn't have a problem in inlining code no matter how we write it, so move the segmentation logic to a separate function. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 66 +++++++++++++++++++++++++++----------------------- 1 file changed, 36 insertions(+), 30 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index e7163d6fab77..839beb599f55 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -575,6 +575,40 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, return qdisc_enqueue(skb, child, to_free); } +static int taprio_enqueue_segmented(struct sk_buff *skb, struct Qdisc *sch, + struct Qdisc *child, + struct sk_buff **to_free) +{ + unsigned int slen = 0, numsegs = 0, len = qdisc_pkt_len(skb); + netdev_features_t features = netif_skb_features(skb); + struct sk_buff *segs, *nskb; + int ret; + + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); + if (IS_ERR_OR_NULL(segs)) + return qdisc_drop(skb, sch, to_free); + + skb_list_walk_safe(segs, segs, nskb) { + skb_mark_not_on_list(segs); + qdisc_skb_cb(segs)->pkt_len = segs->len; + slen += segs->len; + + ret = taprio_enqueue_one(segs, sch, child, to_free); + if (ret != NET_XMIT_SUCCESS) { + if (net_xmit_drop_count(ret)) + qdisc_qstats_drop(sch); + } else { + numsegs++; + } + } + + if (numsegs > 1) + qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen); + consume_skb(skb); + + return numsegs > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP; +} + /* Will not be called in the full offload case, since the TX queues are * attached to the Qdisc created using qdisc_create_dflt() */ @@ -596,36 +630,8 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, * smaller chunks. Drivers with full offload are expected to handle * this in hardware. */ - if (skb_is_gso(skb)) { - unsigned int slen = 0, numsegs = 0, len = qdisc_pkt_len(skb); - netdev_features_t features = netif_skb_features(skb); - struct sk_buff *segs, *nskb; - int ret; - - segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); - if (IS_ERR_OR_NULL(segs)) - return qdisc_drop(skb, sch, to_free); - - skb_list_walk_safe(segs, segs, nskb) { - skb_mark_not_on_list(segs); - qdisc_skb_cb(segs)->pkt_len = segs->len; - slen += segs->len; - - ret = taprio_enqueue_one(segs, sch, child, to_free); - if (ret != NET_XMIT_SUCCESS) { - if (net_xmit_drop_count(ret)) - qdisc_qstats_drop(sch); - } else { - numsegs++; - } - } - - if (numsegs > 1) - qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen); - consume_skb(skb); - - return numsegs > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP; - } + if (skb_is_gso(skb)) + return taprio_enqueue_segmented(skb, sch, child, to_free); return taprio_enqueue_one(skb, sch, child, to_free); } -- cgit v1.2.3 From 39b02d6d104a285836d98be2ad00c7f484d43a16 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 7 Feb 2023 15:54:40 +0200 Subject: net/sched: taprio: don't segment unnecessarily Improve commit 497cc00224cf ("taprio: Handle short intervals and large packets") to only perform segmentation when skb->len exceeds what taprio_dequeue() expects. In practice, this will make the biggest difference when a traffic class gate is always open in the schedule. This is because the max_frm_len will be U32_MAX, and such large skb->len values as Kurt reported will be sent just fine unsegmented. What I don't seem to know how to handle is how to make sure that the segmented skbs themselves are smaller than the maximum frame size given by the current queueMaxSDU[tc]. Nonetheless, we still need to drop those, otherwise the Qdisc will hang. Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 839beb599f55..9781b47962bb 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -566,9 +566,6 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch, return qdisc_drop(skb, sch, to_free); } - if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) - return qdisc_drop(skb, sch, to_free); - qdisc_qstats_backlog_inc(sch, skb); sch->q.qlen++; @@ -593,7 +590,14 @@ static int taprio_enqueue_segmented(struct sk_buff *skb, struct Qdisc *sch, qdisc_skb_cb(segs)->pkt_len = segs->len; slen += segs->len; - ret = taprio_enqueue_one(segs, sch, child, to_free); + /* FIXME: we should be segmenting to a smaller size + * rather than dropping these + */ + if (taprio_skb_exceeds_queue_max_sdu(sch, segs)) + ret = qdisc_drop(segs, sch, to_free); + else + ret = taprio_enqueue_one(segs, sch, child, to_free); + if (ret != NET_XMIT_SUCCESS) { if (net_xmit_drop_count(ret)) qdisc_qstats_drop(sch); @@ -625,13 +629,18 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (unlikely(!child)) return qdisc_drop(skb, sch, to_free); - /* Large packets might not be transmitted when the transmission duration - * exceeds any configured interval. Therefore, segment the skb into - * smaller chunks. Drivers with full offload are expected to handle - * this in hardware. - */ - if (skb_is_gso(skb)) - return taprio_enqueue_segmented(skb, sch, child, to_free); + if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) { + /* Large packets might not be transmitted when the transmission + * duration exceeds any configured interval. Therefore, segment + * the skb into smaller chunks. Drivers with full offload are + * expected to handle this in hardware. + */ + if (skb_is_gso(skb)) + return taprio_enqueue_segmented(skb, sch, child, + to_free); + + return qdisc_drop(skb, sch, to_free); + } return taprio_enqueue_one(skb, sch, child, to_free); } -- cgit v1.2.3 From f2f527d595963aa86464ca3e05ec27dd5153d56e Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Fri, 3 Feb 2023 10:08:07 +0100 Subject: can: raw: use temp variable instead of rolling back config Introduce a temporary variable to check for an invalid configuration attempt from user space. Before this patch the value was copied to the real config variable and rolled back in the case of an error. Suggested-by: Marc Kleine-Budde Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20230203090807.97100-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- net/can/raw.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/can/raw.c b/net/can/raw.c index ba86782ba8bb..f64469b98260 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -523,6 +523,7 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, struct can_filter sfilter; /* single filter */ struct net_device *dev = NULL; can_err_mask_t err_mask = 0; + int fd_frames; int count = 0; int err = 0; @@ -664,17 +665,17 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, break; case CAN_RAW_FD_FRAMES: - if (optlen != sizeof(ro->fd_frames)) + if (optlen != sizeof(fd_frames)) return -EINVAL; - if (copy_from_sockptr(&ro->fd_frames, optval, optlen)) + if (copy_from_sockptr(&fd_frames, optval, optlen)) return -EFAULT; /* Enabling CAN XL includes CAN FD */ - if (ro->xl_frames && !ro->fd_frames) { - ro->fd_frames = ro->xl_frames; + if (ro->xl_frames && !fd_frames) return -EINVAL; - } + + ro->fd_frames = fd_frames; break; case CAN_RAW_XL_FRAMES: -- cgit v1.2.3 From 0b34d68049b09821499b93d50b5a9d7d2ca449f6 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 8 Feb 2023 14:25:08 +0000 Subject: net: enable usercopy for skb_small_head_cache syzbot and other bots reported that we have to enable user copy to/from skb->head. [1] We can prevent access to skb_shared_info, which is a nice improvement over standard kmem_cache. Layout of these kmem_cache objects is: < SKB_SMALL_HEAD_HEADROOM >< struct skb_shared_info > usercopy: Kernel memory overwrite attempt detected to SLUB object 'skbuff_small_head' (offset 32, size 20)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102 ! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6-syzkaller-01425-gcb6b2e11a42d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102 Code: e8 ee ad ba f7 49 89 d9 4d 89 e8 4c 89 e1 41 56 48 89 ee 48 c7 c7 20 2b 5b 8a ff 74 24 08 41 57 48 8b 54 24 20 e8 7a 17 fe ff <0f> 0b e8 c2 ad ba f7 e8 7d fb 08 f8 48 8b 0c 24 49 89 d8 44 89 ea RSP: 0000:ffffc90000067a48 EFLAGS: 00010286 RAX: 000000000000006b RBX: ffffffff8b5b6ea0 RCX: 0000000000000000 RDX: ffff8881401c0000 RSI: ffffffff8166195c RDI: fffff5200000cf3b RBP: ffffffff8a5b2a60 R08: 000000000000006b R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8bf2a925 R13: ffffffff8a5b29a0 R14: 0000000000000014 R15: ffffffff8a5b2960 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000c48e000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __check_heap_object+0xdd/0x110 mm/slub.c:4761 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:251 [inline] __check_object_size+0x1da/0x5a0 mm/usercopy.c:213 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_from_iter include/linux/uio.h:186 [inline] copy_from_iter_full include/linux/uio.h:194 [inline] memcpy_from_msg include/linux/skbuff.h:3977 [inline] qrtr_sendmsg+0x65f/0x970 net/qrtr/af_qrtr.c:965 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 say_hello+0xf6/0x170 net/qrtr/ns.c:325 qrtr_ns_init+0x220/0x2b0 net/qrtr/ns.c:804 qrtr_proto_init+0x59/0x95 net/qrtr/af_qrtr.c:1296 do_one_initcall+0x141/0x790 init/main.c:1306 do_initcall_level init/main.c:1379 [inline] do_initcalls init/main.c:1395 [inline] do_basic_setup init/main.c:1414 [inline] kernel_init_freeable+0x6f9/0x782 init/main.c:1634 kernel_init+0x1e/0x1d0 init/main.c:1522 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 Fixes: bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head") Reported-by: syzbot Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Tested-by: Ido Schimmel Reported-by: Linux Kernel Functional Testing Tested-by: Linux Kernel Functional Testing Link: https://lore.kernel.org/linux-next/CA+G9fYs-i-c2KTSA7Ai4ES_ZESY1ZnM=Zuo8P1jN00oed6KHMA@mail.gmail.com Link: https://lore.kernel.org/r/20230208142508.3278406-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index bdb1e015e32b..70a6088e8326 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4690,10 +4690,16 @@ void __init skb_init(void) SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); #ifdef HAVE_SKB_SMALL_HEAD_CACHE - skb_small_head_cache = kmem_cache_create("skbuff_small_head", + /* usercopy should only access first SKB_SMALL_HEAD_HEADROOM bytes. + * struct skb_shared_info is located at the end of skb->head, + * and should not be copied to/from user. + */ + skb_small_head_cache = kmem_cache_create_usercopy("skbuff_small_head", SKB_SMALL_HEAD_CACHE_SIZE, 0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, + 0, + SKB_SMALL_HEAD_HEADROOM, NULL); #endif skb_extensions_init(); -- cgit v1.2.3 From a00a29b0eeea6caaaf9edc3dd284f81b072ee343 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 27 Jan 2023 16:51:54 -0800 Subject: Bluetooth: hci_conn: Refactor hci_bind_bis() since it always succeeds The compiler thinks "conn" might be NULL after a call to hci_bind_bis(), which cannot happen. Avoid any confusion by just making it not return a value since it cannot fail. Fixes the warnings seen with GCC 13: In function 'arch_atomic_dec_and_test', inlined from 'atomic_dec_and_test' at ../include/linux/atomic/atomic-instrumented.h:576:9, inlined from 'hci_conn_drop' at ../include/net/bluetooth/hci_core.h:1391:6, inlined from 'hci_connect_bis' at ../net/bluetooth/hci_conn.c:2124:3: ../arch/x86/include/asm/rmwcc.h:37:9: warning: array subscript 0 is outside array bounds of 'atomic_t[0]' [-Warray-bounds=] 37 | asm volatile (fullop CC_SET(cc) \ | ^~~ ... In function 'hci_connect_bis': cc1: note: source object is likely at address zero Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Cc: Luiz Augusto von Dentz Cc: Marcel Holtmann Cc: Johan Hedberg Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Reviewed-by: Simon Horman Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_conn.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index acf563fbdfd9..61a34801e61e 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1981,16 +1981,14 @@ static void hci_iso_qos_setup(struct hci_dev *hdev, struct hci_conn *conn, qos->latency = conn->le_conn_latency; } -static struct hci_conn *hci_bind_bis(struct hci_conn *conn, - struct bt_iso_qos *qos) +static void hci_bind_bis(struct hci_conn *conn, + struct bt_iso_qos *qos) { /* Update LINK PHYs according to QoS preference */ conn->le_tx_phy = qos->out.phy; conn->le_tx_phy = qos->out.phy; conn->iso_qos = *qos; conn->state = BT_BOUND; - - return conn; } static int create_big_sync(struct hci_dev *hdev, void *data) @@ -2119,11 +2117,7 @@ struct hci_conn *hci_connect_bis(struct hci_dev *hdev, bdaddr_t *dst, if (IS_ERR(conn)) return conn; - conn = hci_bind_bis(conn, qos); - if (!conn) { - hci_conn_drop(conn); - return ERR_PTR(-ENOMEM); - } + hci_bind_bis(conn, qos); /* Add Basic Announcement into Peridic Adv Data if BASE is set */ if (base_len && base) { -- cgit v1.2.3 From 2394186a2cefb9a45a029281a55749804dd8c556 Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Mon, 30 Jan 2023 20:37:01 +0200 Subject: Bluetooth: MGMT: add CIS feature bits to controller information Userspace needs to know whether the adapter has feature support for Connected Isochronous Stream - Central/Peripheral, so it can set up LE Audio features accordingly. Expose these feature bits as settings in MGMT controller info. Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/mgmt.h | 2 ++ net/bluetooth/mgmt.c | 12 ++++++++++++ 2 files changed, 14 insertions(+) (limited to 'net') diff --git a/include/net/bluetooth/mgmt.h b/include/net/bluetooth/mgmt.h index 743f6f59dff8..e18a927669c0 100644 --- a/include/net/bluetooth/mgmt.h +++ b/include/net/bluetooth/mgmt.h @@ -109,6 +109,8 @@ struct mgmt_rp_read_index_list { #define MGMT_SETTING_STATIC_ADDRESS 0x00008000 #define MGMT_SETTING_PHY_CONFIGURATION 0x00010000 #define MGMT_SETTING_WIDEBAND_SPEECH 0x00020000 +#define MGMT_SETTING_CIS_CENTRAL 0x00040000 +#define MGMT_SETTING_CIS_PERIPHERAL 0x00080000 #define MGMT_OP_READ_INFO 0x0004 #define MGMT_READ_INFO_SIZE 0 diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index d2ea8e19aa1b..7add66f30e4d 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -859,6 +859,12 @@ static u32 get_supported_settings(struct hci_dev *hdev) hdev->set_bdaddr) settings |= MGMT_SETTING_CONFIGURATION; + if (cis_central_capable(hdev)) + settings |= MGMT_SETTING_CIS_CENTRAL; + + if (cis_peripheral_capable(hdev)) + settings |= MGMT_SETTING_CIS_PERIPHERAL; + settings |= MGMT_SETTING_PHY_CONFIGURATION; return settings; @@ -932,6 +938,12 @@ static u32 get_current_settings(struct hci_dev *hdev) if (hci_dev_test_flag(hdev, HCI_WIDEBAND_SPEECH_ENABLED)) settings |= MGMT_SETTING_WIDEBAND_SPEECH; + if (cis_central_capable(hdev)) + settings |= MGMT_SETTING_CIS_CENTRAL; + + if (cis_peripheral_capable(hdev)) + settings |= MGMT_SETTING_CIS_PERIPHERAL; + return settings; } -- cgit v1.2.3 From df5703348813235874d851934e957c3723d71644 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Wed, 1 Feb 2023 14:01:11 -0800 Subject: Bluetooth: L2CAP: Fix potential user-after-free This fixes all instances of which requires to allocate a buffer calling alloc_skb which may release the chan lock and reacquire later which makes it possible that the chan is disconnected in the meantime. Fixes: a6a5568c03c4 ("Bluetooth: Lock the L2CAP channel when sending") Reported-by: Alexander Coffin Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/l2cap_core.c | 24 ------------------------ net/bluetooth/l2cap_sock.c | 8 ++++++++ 2 files changed, 8 insertions(+), 24 deletions(-) (limited to 'net') diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index a3e0dc6a6e73..adfc3ea06d08 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -2683,14 +2683,6 @@ int l2cap_chan_send(struct l2cap_chan *chan, struct msghdr *msg, size_t len) if (IS_ERR(skb)) return PTR_ERR(skb); - /* Channel lock is released before requesting new skb and then - * reacquired thus we need to recheck channel state. - */ - if (chan->state != BT_CONNECTED) { - kfree_skb(skb); - return -ENOTCONN; - } - l2cap_do_send(chan, skb); return len; } @@ -2735,14 +2727,6 @@ int l2cap_chan_send(struct l2cap_chan *chan, struct msghdr *msg, size_t len) if (IS_ERR(skb)) return PTR_ERR(skb); - /* Channel lock is released before requesting new skb and then - * reacquired thus we need to recheck channel state. - */ - if (chan->state != BT_CONNECTED) { - kfree_skb(skb); - return -ENOTCONN; - } - l2cap_do_send(chan, skb); err = len; break; @@ -2763,14 +2747,6 @@ int l2cap_chan_send(struct l2cap_chan *chan, struct msghdr *msg, size_t len) */ err = l2cap_segment_sdu(chan, &seg_queue, msg, len); - /* The channel could have been closed while segmenting, - * check that it is still connected. - */ - if (chan->state != BT_CONNECTED) { - __skb_queue_purge(&seg_queue); - err = -ENOTCONN; - } - if (err) break; diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index ca8f07f3542b..eebe256104bc 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -1624,6 +1624,14 @@ static struct sk_buff *l2cap_sock_alloc_skb_cb(struct l2cap_chan *chan, if (!skb) return ERR_PTR(err); + /* Channel lock is released before requesting new skb and then + * reacquired thus we need to recheck channel state. + */ + if (chan->state != BT_CONNECTED) { + kfree_skb(skb); + return ERR_PTR(-ENOTCONN); + } + skb->priority = sk->sk_priority; bt_cb(skb)->l2cap.chan = chan; -- cgit v1.2.3 From 0f00cd322d22d4441de51aa80bcce5bb6a8cbb44 Mon Sep 17 00:00:00 2001 From: Archie Pusaka Date: Fri, 3 Feb 2023 17:30:55 +0800 Subject: Bluetooth: Free potentially unfreed SCO connection It is possible to initiate a SCO connection while deleting the corresponding ACL connection, e.g. in below scenario: (1) < hci setup sync connect command (2) > hci disconn complete event (for the acl connection) (3) > hci command complete event (for(1), failure) When it happens, hci_cs_setup_sync_conn won't be able to obtain the reference to the SCO connection, so it will be stuck and potentially hinder subsequent connections to the same device. This patch prevents that by also deleting the SCO connection if it is still not established when the corresponding ACL connection is deleted. Signed-off-by: Archie Pusaka Reviewed-by: Ying Hsu Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_conn.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index 61a34801e61e..838f51c272a6 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1061,8 +1061,15 @@ int hci_conn_del(struct hci_conn *conn) if (conn->type == ACL_LINK) { struct hci_conn *sco = conn->link; - if (sco) + if (sco) { sco->link = NULL; + /* Due to race, SCO connection might be not established + * yet at this point. Delete it now, otherwise it is + * possible for it to be stuck and can't be deleted. + */ + if (sco->handle == HCI_CONN_HANDLE_UNSET) + hci_conn_del(sco); + } /* Unacked frames */ hdev->acl_cnt += conn->sent; -- cgit v1.2.3 From 5cd39700de9b699ef2f333ba3c405e51afbf0eb0 Mon Sep 17 00:00:00 2001 From: Archie Pusaka Date: Fri, 3 Feb 2023 17:39:36 +0800 Subject: Bluetooth: Make sure LE create conn cancel is sent when timeout When sending LE create conn command, we set a timer with a duration of HCI_LE_CONN_TIMEOUT before timing out and calling create_le_conn_complete. Additionally, when receiving the command complete, we also set a timer with the same duration to call le_conn_timeout. Usually the latter will be triggered first, which then sends a LE create conn cancel command. However, due to the nature of racing, it is possible for the former to be called first, thereby calling the chain hci_conn_failed -> hci_conn_del -> cancel_delayed_work, thereby preventing LE create conn cancel to be sent. In this situation, the controller will be stuck in trying the LE connection. This patch flushes le_conn_timeout on create_le_conn_complete to make sure we always send LE create connection cancel, if necessary. Signed-off-by: Archie Pusaka Reviewed-by: Ying Hsu Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_conn.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index 838f51c272a6..17b946f9ba31 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1250,6 +1250,8 @@ static void create_le_conn_complete(struct hci_dev *hdev, void *data, int err) if (conn != hci_lookup_le_connect(hdev)) goto done; + /* Flush to make sure we send create conn cancel command if needed */ + flush_delayed_work(&conn->le_conn_timeout); hci_conn_failed(conn, bt_status(err)); done: -- cgit v1.2.3 From 135746c61fa6d7f66dc079027304eaa4d35fe942 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 7 Feb 2023 19:44:55 +0100 Subject: net-sysctl: factor out cpumask parsing helper Will be used by the following patch to avoid code duplication. No functional changes intended. The only difference is that now flow_limit_cpu_sysctl() will always compute the flow limit mask on each read operation, even when read() will not return any byte to user-space. Note that the new helper is placed under a new #ifdef at the file start to better fit the usage in the later patch Signed-off-by: Paolo Abeni Reviewed-by: Simon Horman Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/core/sysctl_net_core.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) (limited to 'net') diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index e7b98162c632..6935ecdc84b0 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -45,6 +45,33 @@ EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net); int sysctl_devconf_inherit_init_net __read_mostly; EXPORT_SYMBOL(sysctl_devconf_inherit_init_net); +#if IS_ENABLED(CONFIG_NET_FLOW_LIMIT) +static void dump_cpumask(void *buffer, size_t *lenp, loff_t *ppos, + struct cpumask *mask) +{ + char kbuf[128]; + int len; + + if (*ppos || !*lenp) { + *lenp = 0; + return; + } + + len = min(sizeof(kbuf) - 1, *lenp); + len = scnprintf(kbuf, len, "%*pb", cpumask_pr_args(mask)); + if (!len) { + *lenp = 0; + return; + } + + if (len < *lenp) + kbuf[len++] = '\n'; + memcpy(buffer, kbuf, len); + *lenp = len; + *ppos += len; +} +#endif + #ifdef CONFIG_RPS static int rps_sock_flow_sysctl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) @@ -155,13 +182,6 @@ static int flow_limit_cpu_sysctl(struct ctl_table *table, int write, write_unlock: mutex_unlock(&flow_limit_update_mutex); } else { - char kbuf[128]; - - if (*ppos || !*lenp) { - *lenp = 0; - goto done; - } - cpumask_clear(mask); rcu_read_lock(); for_each_possible_cpu(i) { @@ -171,17 +191,7 @@ write_unlock: } rcu_read_unlock(); - len = min(sizeof(kbuf) - 1, *lenp); - len = scnprintf(kbuf, len, "%*pb", cpumask_pr_args(mask)); - if (!len) { - *lenp = 0; - goto done; - } - if (len < *lenp) - kbuf[len++] = '\n'; - memcpy(buffer, kbuf, len); - *lenp = len; - *ppos += len; + dump_cpumask(buffer, lenp, ppos, mask); } done: -- cgit v1.2.3 From 370ca718fd5e1fd45ccfdf7a9d76d010f561e607 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 7 Feb 2023 19:44:56 +0100 Subject: net-sysctl: factor-out rpm mask manipulation helpers Will simplify the following patch. No functional change intended. Signed-off-by: Paolo Abeni Reviewed-by: Simon Horman Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- net/core/dev.h | 2 ++ net/core/net-sysfs.c | 72 ++++++++++++++++++++++++++++++---------------------- 2 files changed, 44 insertions(+), 30 deletions(-) (limited to 'net') diff --git a/net/core/dev.h b/net/core/dev.h index a065b7571441..e075e198092c 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -9,6 +9,7 @@ struct net_device; struct netdev_bpf; struct netdev_phys_item_id; struct netlink_ext_ack; +struct cpumask; /* Random bits of netdevice that don't need to be exposed */ #define FLOW_LIMIT_HISTORY (1 << 7) /* must be ^2 and !overflow buckets */ @@ -134,4 +135,5 @@ static inline void netif_set_gro_ipv4_max_size(struct net_device *dev, WRITE_ONCE(dev->gro_ipv4_max_size, size); } +int rps_cpumask_housekeeping(struct cpumask *mask); #endif diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index ca55dd747d6c..2126970a4bfd 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -831,42 +831,18 @@ static ssize_t show_rps_map(struct netdev_rx_queue *queue, char *buf) return len < PAGE_SIZE ? len : -EINVAL; } -static ssize_t store_rps_map(struct netdev_rx_queue *queue, - const char *buf, size_t len) +static int netdev_rx_queue_set_rps_mask(struct netdev_rx_queue *queue, + cpumask_var_t mask) { - struct rps_map *old_map, *map; - cpumask_var_t mask; - int err, cpu, i; static DEFINE_MUTEX(rps_map_mutex); - - if (!capable(CAP_NET_ADMIN)) - return -EPERM; - - if (!alloc_cpumask_var(&mask, GFP_KERNEL)) - return -ENOMEM; - - err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits); - if (err) { - free_cpumask_var(mask); - return err; - } - - if (!cpumask_empty(mask)) { - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); - cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); - if (cpumask_empty(mask)) { - free_cpumask_var(mask); - return -EINVAL; - } - } + struct rps_map *old_map, *map; + int cpu, i; map = kzalloc(max_t(unsigned int, RPS_MAP_SIZE(cpumask_weight(mask)), L1_CACHE_BYTES), GFP_KERNEL); - if (!map) { - free_cpumask_var(mask); + if (!map) return -ENOMEM; - } i = 0; for_each_cpu_and(cpu, mask, cpu_online_mask) @@ -893,9 +869,45 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, if (old_map) kfree_rcu(old_map, rcu); + return 0; +} +int rps_cpumask_housekeeping(struct cpumask *mask) +{ + if (!cpumask_empty(mask)) { + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_DOMAIN)); + cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_WQ)); + if (cpumask_empty(mask)) + return -EINVAL; + } + return 0; +} + +static ssize_t store_rps_map(struct netdev_rx_queue *queue, + const char *buf, size_t len) +{ + cpumask_var_t mask; + int err; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (!alloc_cpumask_var(&mask, GFP_KERNEL)) + return -ENOMEM; + + err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits); + if (err) + goto out; + + err = rps_cpumask_housekeeping(mask); + if (err) + goto out; + + err = netdev_rx_queue_set_rps_mask(queue, mask); + +out: free_cpumask_var(mask); - return len; + return err ? : len; } static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue, -- cgit v1.2.3 From 605cfa1b1090b5d9e227d8a8f7d08fdd04f07724 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 7 Feb 2023 19:44:57 +0100 Subject: net: introduce default_rps_mask netns attribute If RPS is enabled, this allows configuring a default rps mask, which is effective since receive queue creation time. A default RPS mask allows the system admin to ensure proper isolation, avoiding races at network namespace or device creation time. The default RPS mask is initially empty, and can be modified via a newly added sysctl entry. Signed-off-by: Paolo Abeni Reviewed-by: Simon Horman Reviewed-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- Documentation/admin-guide/sysctl/net.rst | 6 ++++++ include/linux/netdevice.h | 1 + net/core/net-sysfs.c | 7 ++++++ net/core/sysctl_net_core.c | 37 +++++++++++++++++++++++++++++++- 4 files changed, 50 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 6394f5dc2303..466c560b0c30 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -215,6 +215,12 @@ rmem_max The maximum receive socket buffer size in bytes. +rps_default_mask +---------------- + +The default RPS CPU mask used on newly created network devices. An empty +mask means RPS disabled by default. + tstamp_allow_data ----------------- Allow processes to receive tx timestamps looped together with the original diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d5ef4c1fedd2..38ab96ae0d68 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -223,6 +223,7 @@ struct net_device_core_stats { #include extern struct static_key_false rps_needed; extern struct static_key_false rfs_needed; +extern struct cpumask rps_default_mask; #endif struct neighbour; diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 2126970a4bfd..4b361ac6a252 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1083,6 +1083,13 @@ static int rx_queue_add_kobject(struct net_device *dev, int index) goto err; } +#if IS_ENABLED(CONFIG_RPS) && IS_ENABLED(CONFIG_SYSCTL) + if (!cpumask_empty(&rps_default_mask)) { + error = netdev_rx_queue_set_rps_mask(queue, &rps_default_mask); + if (error) + goto err; + } +#endif kobject_uevent(kobj, KOBJ_ADD); return error; diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 6935ecdc84b0..7130e6d9e263 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -45,7 +46,7 @@ EXPORT_SYMBOL(sysctl_fb_tunnels_only_for_init_net); int sysctl_devconf_inherit_init_net __read_mostly; EXPORT_SYMBOL(sysctl_devconf_inherit_init_net); -#if IS_ENABLED(CONFIG_NET_FLOW_LIMIT) +#if IS_ENABLED(CONFIG_NET_FLOW_LIMIT) || IS_ENABLED(CONFIG_RPS) static void dump_cpumask(void *buffer, size_t *lenp, loff_t *ppos, struct cpumask *mask) { @@ -73,6 +74,31 @@ static void dump_cpumask(void *buffer, size_t *lenp, loff_t *ppos, #endif #ifdef CONFIG_RPS +struct cpumask rps_default_mask; + +static int rps_default_mask_sysctl(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int err = 0; + + rtnl_lock(); + if (write) { + err = cpumask_parse(buffer, &rps_default_mask); + if (err) + goto done; + + err = rps_cpumask_housekeeping(&rps_default_mask); + if (err) + goto done; + } else { + dump_cpumask(buffer, lenp, ppos, &rps_default_mask); + } + +done: + rtnl_unlock(); + return err; +} + static int rps_sock_flow_sysctl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -482,6 +508,11 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = rps_sock_flow_sysctl }, + { + .procname = "rps_default_mask", + .mode = 0644, + .proc_handler = rps_default_mask_sysctl + }, #endif #ifdef CONFIG_NET_FLOW_LIMIT { @@ -685,6 +716,10 @@ static __net_initdata struct pernet_operations sysctl_core_ops = { static __init int sysctl_core_init(void) { +#if IS_ENABLED(CONFIG_RPS) + cpumask_copy(&rps_default_mask, cpu_none_mask); +#endif + register_net_sysctl(&init_net, "net/core", net_core_table); return register_pernet_subsys(&sysctl_core_ops); } -- cgit v1.2.3 From f1db99c07b4f0a9edc45f3498d4d13d13a41836a Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 8 Feb 2023 15:31:51 +0200 Subject: string_helpers: Move string_is_valid() to the header Move string_is_valid() to the header for wider use. While at it, rename to string_is_terminated() to be precise about its semantics. Signed-off-by: Andy Shevchenko Reviewed-by: Simon Horman Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230208133153.22528-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski --- include/linux/string_helpers.h | 5 +++++ net/tipc/netlink_compat.c | 16 ++++++---------- 2 files changed, 11 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 8530c7328269..fae6beaaa217 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -11,6 +11,11 @@ struct device; struct file; struct task_struct; +static inline bool string_is_terminated(const char *s, int len) +{ + return memchr(s, '\0', len) ? true : false; +} + /* Descriptions of the types of units to * print in */ enum string_size_units { diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c index dfea27a906f2..9b47c8409231 100644 --- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -39,6 +39,7 @@ #include "node.h" #include "net.h" #include +#include #include /* The legacy API had an artificial message length limit called @@ -173,11 +174,6 @@ static struct sk_buff *tipc_get_err_tlv(char *str) return buf; } -static inline bool string_is_valid(char *s, int len) -{ - return memchr(s, '\0', len) ? true : false; -} - static int __tipc_nl_compat_dumpit(struct tipc_nl_compat_cmd_dump *cmd, struct tipc_nl_compat_msg *msg, struct sk_buff *arg) @@ -445,7 +441,7 @@ static int tipc_nl_compat_bearer_enable(struct tipc_nl_compat_cmd_doit *cmd, return -EINVAL; len = min_t(int, len, TIPC_MAX_BEARER_NAME); - if (!string_is_valid(b->name, len)) + if (!string_is_terminated(b->name, len)) return -EINVAL; if (nla_put_string(skb, TIPC_NLA_BEARER_NAME, b->name)) @@ -486,7 +482,7 @@ static int tipc_nl_compat_bearer_disable(struct tipc_nl_compat_cmd_doit *cmd, return -EINVAL; len = min_t(int, len, TIPC_MAX_BEARER_NAME); - if (!string_is_valid(name, len)) + if (!string_is_terminated(name, len)) return -EINVAL; if (nla_put_string(skb, TIPC_NLA_BEARER_NAME, name)) @@ -584,7 +580,7 @@ static int tipc_nl_compat_link_stat_dump(struct tipc_nl_compat_msg *msg, return -EINVAL; len = min_t(int, len, TIPC_MAX_LINK_NAME); - if (!string_is_valid(name, len)) + if (!string_is_terminated(name, len)) return -EINVAL; if (strcmp(name, nla_data(link[TIPC_NLA_LINK_NAME])) != 0) @@ -819,7 +815,7 @@ static int tipc_nl_compat_link_set(struct tipc_nl_compat_cmd_doit *cmd, return -EINVAL; len = min_t(int, len, TIPC_MAX_LINK_NAME); - if (!string_is_valid(lc->name, len)) + if (!string_is_terminated(lc->name, len)) return -EINVAL; media = tipc_media_find(lc->name); @@ -856,7 +852,7 @@ static int tipc_nl_compat_link_reset_stats(struct tipc_nl_compat_cmd_doit *cmd, return -EINVAL; len = min_t(int, len, TIPC_MAX_LINK_NAME); - if (!string_is_valid(name, len)) + if (!string_is_terminated(name, len)) return -EINVAL; if (nla_put_string(skb, TIPC_NLA_LINK_NAME, name)) -- cgit v1.2.3 From d4545bf9c33baa482daea845ddf8d5fdb26daee3 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 8 Feb 2023 15:31:52 +0200 Subject: genetlink: Use string_is_terminated() helper Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko Reviewed-by: Simon Horman Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230208133153.22528-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski --- net/netlink/genetlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c index 600993c80050..04c4036bf406 100644 --- a/net/netlink/genetlink.c +++ b/net/netlink/genetlink.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include @@ -457,7 +457,7 @@ static int genl_validate_assign_mc_groups(struct genl_family *family) if (WARN_ON(grp->name[0] == '\0')) return -EINVAL; - if (WARN_ON(memchr(grp->name, '\0', GENL_NAMSIZ) == NULL)) + if (WARN_ON(!string_is_terminated(grp->name, GENL_NAMSIZ))) return -EINVAL; } -- cgit v1.2.3 From 5c72b4c644eb68843b3538f452a0a89db7011015 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 8 Feb 2023 15:31:53 +0200 Subject: openvswitch: Use string_is_terminated() helper Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko Reviewed-by: Simon Horman Reviewed-by: Jiri Pirko Link: https://lore.kernel.org/r/20230208133153.22528-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski --- net/openvswitch/conntrack.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 2172930b1f17..f95272ebfa08 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -1383,7 +1384,7 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, #endif case OVS_CT_ATTR_HELPER: *helper = nla_data(a); - if (!memchr(*helper, '\0', nla_len(a))) { + if (!string_is_terminated(*helper, nla_len(a))) { OVS_NLERR(log, "Invalid conntrack helper"); return -EINVAL; } @@ -1404,7 +1405,7 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, #ifdef CONFIG_NF_CONNTRACK_TIMEOUT case OVS_CT_ATTR_TIMEOUT: memcpy(info->timeout, nla_data(a), nla_len(a)); - if (!memchr(info->timeout, '\0', nla_len(a))) { + if (!string_is_terminated(info->timeout, nla_len(a))) { OVS_NLERR(log, "Invalid conntrack timeout"); return -EINVAL; } -- cgit v1.2.3 From 025a785ff083729819dc82ac81baf190cb4aee5c Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 8 Feb 2023 22:06:42 -0800 Subject: net: skbuff: drop the word head from skb cache skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 +- kernel/bpf/cpumap.c | 2 +- net/bpf/test_run.c | 2 +- net/core/skbuff.c | 31 +++++++++++++++---------------- net/core/xdp.c | 5 ++--- 5 files changed, 20 insertions(+), 22 deletions(-) (limited to 'net') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c3df3b55da97..47ab28a37f2f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1243,7 +1243,7 @@ static inline void consume_skb(struct sk_buff *skb) void __consume_stateless_skb(struct sk_buff *skb); void __kfree_skb(struct sk_buff *skb); -extern struct kmem_cache *skbuff_head_cache; +extern struct kmem_cache *skbuff_cache; void kfree_skb_partial(struct sk_buff *skb, bool head_stolen); bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index e0b2d016f0bf..d2110c1f6fa6 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -361,7 +361,7 @@ static int cpu_map_kthread_run(void *data) /* Support running another XDP prog on this CPU */ nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, &stats, &list); if (nframes) { - m = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, skbs); + m = kmem_cache_alloc_bulk(skbuff_cache, gfp, nframes, skbs); if (unlikely(m == 0)) { for (i = 0; i < nframes; i++) skbs[i] = NULL; /* effect: xdp_return_frame */ diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 8da0d73b368e..2b954326894f 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -234,7 +234,7 @@ static int xdp_recv_frames(struct xdp_frame **frames, int nframes, int i, n; LIST_HEAD(list); - n = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, (void **)skbs); + n = kmem_cache_alloc_bulk(skbuff_cache, gfp, nframes, (void **)skbs); if (unlikely(n == 0)) { for (i = 0; i < nframes; i++) xdp_return_frame(frames[i]); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 70a6088e8326..13ea10cf8544 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -84,7 +84,7 @@ #include "dev.h" #include "sock_destructor.h" -struct kmem_cache *skbuff_head_cache __ro_after_init; +struct kmem_cache *skbuff_cache __ro_after_init; static struct kmem_cache *skbuff_fclone_cache __ro_after_init; #ifdef CONFIG_SKB_EXTENSIONS static struct kmem_cache *skbuff_ext_cache __ro_after_init; @@ -285,7 +285,7 @@ static struct sk_buff *napi_skb_cache_get(void) struct sk_buff *skb; if (unlikely(!nc->skb_count)) { - nc->skb_count = kmem_cache_alloc_bulk(skbuff_head_cache, + nc->skb_count = kmem_cache_alloc_bulk(skbuff_cache, GFP_ATOMIC, NAPI_SKB_CACHE_BULK, nc->skb_cache); @@ -294,7 +294,7 @@ static struct sk_buff *napi_skb_cache_get(void) } skb = nc->skb_cache[--nc->skb_count]; - kasan_unpoison_object_data(skbuff_head_cache, skb); + kasan_unpoison_object_data(skbuff_cache, skb); return skb; } @@ -352,7 +352,7 @@ struct sk_buff *slab_build_skb(void *data) struct sk_buff *skb; unsigned int size; - skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC); + skb = kmem_cache_alloc(skbuff_cache, GFP_ATOMIC); if (unlikely(!skb)) return NULL; @@ -403,7 +403,7 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size) { struct sk_buff *skb; - skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC); + skb = kmem_cache_alloc(skbuff_cache, GFP_ATOMIC); if (unlikely(!skb)) return NULL; @@ -585,7 +585,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, u8 *data; cache = (flags & SKB_ALLOC_FCLONE) - ? skbuff_fclone_cache : skbuff_head_cache; + ? skbuff_fclone_cache : skbuff_cache; if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX)) gfp_mask |= __GFP_MEMALLOC; @@ -921,7 +921,7 @@ static void kfree_skbmem(struct sk_buff *skb) switch (skb->fclone) { case SKB_FCLONE_UNAVAILABLE: - kmem_cache_free(skbuff_head_cache, skb); + kmem_cache_free(skbuff_cache, skb); return; case SKB_FCLONE_ORIG: @@ -1035,7 +1035,7 @@ static void kfree_skb_add_bulk(struct sk_buff *skb, sa->skb_array[sa->skb_count++] = skb; if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) { - kmem_cache_free_bulk(skbuff_head_cache, KFREE_SKB_BULK_SIZE, + kmem_cache_free_bulk(skbuff_cache, KFREE_SKB_BULK_SIZE, sa->skb_array); sa->skb_count = 0; } @@ -1060,8 +1060,7 @@ kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) } if (sa.skb_count) - kmem_cache_free_bulk(skbuff_head_cache, sa.skb_count, - sa.skb_array); + kmem_cache_free_bulk(skbuff_cache, sa.skb_count, sa.skb_array); } EXPORT_SYMBOL(kfree_skb_list_reason); @@ -1215,15 +1214,15 @@ static void napi_skb_cache_put(struct sk_buff *skb) struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); u32 i; - kasan_poison_object_data(skbuff_head_cache, skb); + kasan_poison_object_data(skbuff_cache, skb); nc->skb_cache[nc->skb_count++] = skb; if (unlikely(nc->skb_count == NAPI_SKB_CACHE_SIZE)) { for (i = NAPI_SKB_CACHE_HALF; i < NAPI_SKB_CACHE_SIZE; i++) - kasan_unpoison_object_data(skbuff_head_cache, + kasan_unpoison_object_data(skbuff_cache, nc->skb_cache[i]); - kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_HALF, + kmem_cache_free_bulk(skbuff_cache, NAPI_SKB_CACHE_HALF, nc->skb_cache + NAPI_SKB_CACHE_HALF); nc->skb_count = NAPI_SKB_CACHE_HALF; } @@ -1807,7 +1806,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) if (skb_pfmemalloc(skb)) gfp_mask |= __GFP_MEMALLOC; - n = kmem_cache_alloc(skbuff_head_cache, gfp_mask); + n = kmem_cache_alloc(skbuff_cache, gfp_mask); if (!n) return NULL; @@ -4677,7 +4676,7 @@ static void skb_extensions_init(void) {} void __init skb_init(void) { - skbuff_head_cache = kmem_cache_create_usercopy("skbuff_head_cache", + skbuff_cache = kmem_cache_create_usercopy("skbuff_head_cache", sizeof(struct sk_buff), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, @@ -5556,7 +5555,7 @@ void kfree_skb_partial(struct sk_buff *skb, bool head_stolen) { if (head_stolen) { skb_release_head_state(skb); - kmem_cache_free(skbuff_head_cache, skb); + kmem_cache_free(skbuff_cache, skb); } else { __kfree_skb(skb); } diff --git a/net/core/xdp.c b/net/core/xdp.c index a5a7ecf6391c..03938fe6d33a 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -603,8 +603,7 @@ EXPORT_SYMBOL_GPL(xdp_warn); int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp) { - n_skb = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, - n_skb, skbs); + n_skb = kmem_cache_alloc_bulk(skbuff_cache, gfp, n_skb, skbs); if (unlikely(!n_skb)) return -ENOMEM; @@ -673,7 +672,7 @@ struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf, { struct sk_buff *skb; - skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC); + skb = kmem_cache_alloc(skbuff_cache, GFP_ATOMIC); if (unlikely(!skb)) return NULL; -- cgit v1.2.3 From c0c3ab63de603b40f89a9c0d7217209a8840d053 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Tue, 7 Feb 2023 17:52:06 -0500 Subject: net: create nf_conntrack_ovs for ovs and tc use Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long Reviewed-by: Simon Horman Reviewed-by: Aaron Conole Acked-by: Florian Westphal Signed-off-by: Jakub Kicinski --- net/netfilter/Kconfig | 3 ++ net/netfilter/Makefile | 1 + net/netfilter/nf_conntrack_helper.c | 98 --------------------------------- net/netfilter/nf_conntrack_ovs.c | 104 ++++++++++++++++++++++++++++++++++++ net/openvswitch/Kconfig | 1 + net/sched/Kconfig | 1 + 6 files changed, 110 insertions(+), 98 deletions(-) create mode 100644 net/netfilter/nf_conntrack_ovs.c (limited to 'net') diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index f71b41c7ce2f..4d6737160857 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -189,6 +189,9 @@ config NF_CONNTRACK_LABELS to connection tracking entries. It can be used with xtables connlabel match and the nftables ct expression. +config NF_CONNTRACK_OVS + bool + config NF_CT_PROTO_DCCP bool 'DCCP protocol connection tracking support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index ba2a6b5e93d9..5ffef1cd6143 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -11,6 +11,7 @@ nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMEOUT) += nf_conntrack_timeout.o nf_conntrack-$(CONFIG_NF_CONNTRACK_TIMESTAMP) += nf_conntrack_timestamp.o nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += nf_conntrack_ecache.o nf_conntrack-$(CONFIG_NF_CONNTRACK_LABELS) += nf_conntrack_labels.o +nf_conntrack-$(CONFIG_NF_CONNTRACK_OVS) += nf_conntrack_ovs.o nf_conntrack-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 48ea6d0264b5..0c4db2f2ac43 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -242,104 +242,6 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl, } EXPORT_SYMBOL_GPL(__nf_ct_try_assign_helper); -/* 'skb' should already be pulled to nh_ofs. */ -int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, - enum ip_conntrack_info ctinfo, u16 proto) -{ - const struct nf_conntrack_helper *helper; - const struct nf_conn_help *help; - unsigned int protoff; - int err; - - if (ctinfo == IP_CT_RELATED_REPLY) - return NF_ACCEPT; - - help = nfct_help(ct); - if (!help) - return NF_ACCEPT; - - helper = rcu_dereference(help->helper); - if (!helper) - return NF_ACCEPT; - - if (helper->tuple.src.l3num != NFPROTO_UNSPEC && - helper->tuple.src.l3num != proto) - return NF_ACCEPT; - - switch (proto) { - case NFPROTO_IPV4: - protoff = ip_hdrlen(skb); - proto = ip_hdr(skb)->protocol; - break; - case NFPROTO_IPV6: { - u8 nexthdr = ipv6_hdr(skb)->nexthdr; - __be16 frag_off; - int ofs; - - ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, - &frag_off); - if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { - pr_debug("proto header not found\n"); - return NF_ACCEPT; - } - protoff = ofs; - proto = nexthdr; - break; - } - default: - WARN_ONCE(1, "helper invoked on non-IP family!"); - return NF_DROP; - } - - if (helper->tuple.dst.protonum != proto) - return NF_ACCEPT; - - err = helper->help(skb, protoff, ct, ctinfo); - if (err != NF_ACCEPT) - return err; - - /* Adjust seqs after helper. This is needed due to some helpers (e.g., - * FTP with NAT) adusting the TCP payload size when mangling IP - * addresses and/or port numbers in the text-based control connection. - */ - if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) && - !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) - return NF_DROP; - return NF_ACCEPT; -} -EXPORT_SYMBOL_GPL(nf_ct_helper); - -int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, - u8 proto, bool nat, struct nf_conntrack_helper **hp) -{ - struct nf_conntrack_helper *helper; - struct nf_conn_help *help; - int ret = 0; - - helper = nf_conntrack_helper_try_module_get(name, family, proto); - if (!helper) - return -EINVAL; - - help = nf_ct_helper_ext_add(ct, GFP_KERNEL); - if (!help) { - nf_conntrack_helper_put(helper); - return -ENOMEM; - } -#if IS_ENABLED(CONFIG_NF_NAT) - if (nat) { - ret = nf_nat_helper_try_module_get(name, family, proto); - if (ret) { - nf_conntrack_helper_put(helper); - return ret; - } - } -#endif - rcu_assign_pointer(help->helper, helper); - *hp = helper; - return ret; -} -EXPORT_SYMBOL_GPL(nf_ct_add_helper); - /* appropriate ct lock protecting must be taken by caller */ static int unhelp(struct nf_conn *ct, void *me) { diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c new file mode 100644 index 000000000000..eff4d53f8b8c --- /dev/null +++ b/net/netfilter/nf_conntrack_ovs.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Support ct functions for openvswitch and used by OVS and TC conntrack. */ + +#include +#include +#include + +/* 'skb' should already be pulled to nh_ofs. */ +int nf_ct_helper(struct sk_buff *skb, struct nf_conn *ct, + enum ip_conntrack_info ctinfo, u16 proto) +{ + const struct nf_conntrack_helper *helper; + const struct nf_conn_help *help; + unsigned int protoff; + int err; + + if (ctinfo == IP_CT_RELATED_REPLY) + return NF_ACCEPT; + + help = nfct_help(ct); + if (!help) + return NF_ACCEPT; + + helper = rcu_dereference(help->helper); + if (!helper) + return NF_ACCEPT; + + if (helper->tuple.src.l3num != NFPROTO_UNSPEC && + helper->tuple.src.l3num != proto) + return NF_ACCEPT; + + switch (proto) { + case NFPROTO_IPV4: + protoff = ip_hdrlen(skb); + proto = ip_hdr(skb)->protocol; + break; + case NFPROTO_IPV6: { + u8 nexthdr = ipv6_hdr(skb)->nexthdr; + __be16 frag_off; + int ofs; + + ofs = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, + &frag_off); + if (ofs < 0 || (frag_off & htons(~0x7)) != 0) { + pr_debug("proto header not found\n"); + return NF_ACCEPT; + } + protoff = ofs; + proto = nexthdr; + break; + } + default: + WARN_ONCE(1, "helper invoked on non-IP family!"); + return NF_DROP; + } + + if (helper->tuple.dst.protonum != proto) + return NF_ACCEPT; + + err = helper->help(skb, protoff, ct, ctinfo); + if (err != NF_ACCEPT) + return err; + + /* Adjust seqs after helper. This is needed due to some helpers (e.g., + * FTP with NAT) adusting the TCP payload size when mangling IP + * addresses and/or port numbers in the text-based control connection. + */ + if (test_bit(IPS_SEQ_ADJUST_BIT, &ct->status) && + !nf_ct_seq_adjust(skb, ct, ctinfo, protoff)) + return NF_DROP; + return NF_ACCEPT; +} +EXPORT_SYMBOL_GPL(nf_ct_helper); + +int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, + u8 proto, bool nat, struct nf_conntrack_helper **hp) +{ + struct nf_conntrack_helper *helper; + struct nf_conn_help *help; + int ret = 0; + + helper = nf_conntrack_helper_try_module_get(name, family, proto); + if (!helper) + return -EINVAL; + + help = nf_ct_helper_ext_add(ct, GFP_KERNEL); + if (!help) { + nf_conntrack_helper_put(helper); + return -ENOMEM; + } +#if IS_ENABLED(CONFIG_NF_NAT) + if (nat) { + ret = nf_nat_helper_try_module_get(name, family, proto); + if (ret) { + nf_conntrack_helper_put(helper); + return ret; + } + } +#endif + rcu_assign_pointer(help->helper, helper); + *hp = helper; + return ret; +} +EXPORT_SYMBOL_GPL(nf_ct_add_helper); diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig index 747d537a3f06..29a7081858cd 100644 --- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@ -15,6 +15,7 @@ config OPENVSWITCH select NET_MPLS_GSO select DST_CACHE select NET_NSH + select NF_CONNTRACK_OVS if NF_CONNTRACK select NF_NAT_OVS if NF_NAT help Open vSwitch is a multilayer Ethernet switch targeted at virtualized diff --git a/net/sched/Kconfig b/net/sched/Kconfig index f5acb535413d..4f7b52f5a11c 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -984,6 +984,7 @@ config NET_ACT_TUNNEL_KEY config NET_ACT_CT tristate "connection tracking tc action" depends on NET_CLS_ACT && NF_CONNTRACK && (!NF_NAT || NF_NAT) && NF_FLOW_TABLE + select NF_CONNTRACK_OVS select NF_NAT_OVS if NF_NAT help Say Y here to allow sending the packets to conntrack module. -- cgit v1.2.3 From 67fc5d7ffbd4f9cf52adf166f5bc9a35fef37f24 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Tue, 7 Feb 2023 17:52:07 -0500 Subject: net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long Reviewed-by: Simon Horman Reviewed-by: Aaron Conole Acked-by: Florian Westphal Signed-off-by: Jakub Kicinski --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_ovs.c | 26 ++++++++++++++++++++++++++ net/openvswitch/conntrack.c | 36 ++++-------------------------------- net/sched/act_ct.c | 27 +-------------------------- 4 files changed, 33 insertions(+), 58 deletions(-) (limited to 'net') diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 6a2019aaa464..a6e89d7212f8 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -362,6 +362,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net) return net_generic(net, nf_conntrack_net_id); } +int nf_ct_skb_network_trim(struct sk_buff *skb, int family); + #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_ADD_ATOMIC(net, count, v) this_cpu_add((net)->ct.stat->count, (v)) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index eff4d53f8b8c..c60ef71d1aea 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -102,3 +102,29 @@ int nf_ct_add_helper(struct nf_conn *ct, const char *name, u8 family, return ret; } EXPORT_SYMBOL_GPL(nf_ct_add_helper); + +/* Trim the skb to the length specified by the IP/IPv6 header, + * removing any trailing lower-layer padding. This prepares the skb + * for higher-layer processing that assumes skb->len excludes padding + * (such as nf_ip_checksum). The caller needs to pull the skb to the + * network header, and ensure ip_hdr/ipv6_hdr points to valid data. + */ +int nf_ct_skb_network_trim(struct sk_buff *skb, int family) +{ + unsigned int len; + + switch (family) { + case NFPROTO_IPV4: + len = skb_ip_totlen(skb); + break; + case NFPROTO_IPV6: + len = sizeof(struct ipv6hdr) + + ntohs(ipv6_hdr(skb)->payload_len); + break; + default: + len = skb->len; + } + + return pskb_trim_rcsum(skb, len); +} +EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index f95272ebfa08..4be877acc7f0 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -1091,36 +1091,6 @@ static int ovs_ct_commit(struct net *net, struct sw_flow_key *key, return 0; } -/* Trim the skb to the length specified by the IP/IPv6 header, - * removing any trailing lower-layer padding. This prepares the skb - * for higher-layer processing that assumes skb->len excludes padding - * (such as nf_ip_checksum). The caller needs to pull the skb to the - * network header, and ensure ip_hdr/ipv6_hdr points to valid data. - */ -static int ovs_skb_network_trim(struct sk_buff *skb) -{ - unsigned int len; - int err; - - switch (skb->protocol) { - case htons(ETH_P_IP): - len = skb_ip_totlen(skb); - break; - case htons(ETH_P_IPV6): - len = sizeof(struct ipv6hdr) - + ntohs(ipv6_hdr(skb)->payload_len); - break; - default: - len = skb->len; - } - - err = pskb_trim_rcsum(skb, len); - if (err) - kfree_skb(skb); - - return err; -} - /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ @@ -1135,9 +1105,11 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, nh_ofs = skb_network_offset(skb); skb_pull_rcsum(skb, nh_ofs); - err = ovs_skb_network_trim(skb); - if (err) + err = nf_ct_skb_network_trim(skb, info->family); + if (err) { + kfree_skb(skb); return err; + } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { err = handle_fragments(net, key, info->zone.id, skb); diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b126f03c1bb6..0a1ecc972a8b 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -726,31 +726,6 @@ drop_ct: return false; } -/* Trim the skb to the length specified by the IP/IPv6 header, - * removing any trailing lower-layer padding. This prepares the skb - * for higher-layer processing that assumes skb->len excludes padding - * (such as nf_ip_checksum). The caller needs to pull the skb to the - * network header, and ensure ip_hdr/ipv6_hdr points to valid data. - */ -static int tcf_ct_skb_network_trim(struct sk_buff *skb, int family) -{ - unsigned int len; - - switch (family) { - case NFPROTO_IPV4: - len = skb_ip_totlen(skb); - break; - case NFPROTO_IPV6: - len = sizeof(struct ipv6hdr) - + ntohs(ipv6_hdr(skb)->payload_len); - break; - default: - len = skb->len; - } - - return pskb_trim_rcsum(skb, len); -} - static u8 tcf_ct_skb_nf_family(struct sk_buff *skb) { u8 family = NFPROTO_UNSPEC; @@ -1011,7 +986,7 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, if (err) goto drop; - err = tcf_ct_skb_network_trim(skb, family); + err = nf_ct_skb_network_trim(skb, family); if (err) goto drop; -- cgit v1.2.3 From 1b83bf4489cbc47d88976291cc967a17adb8e118 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Tue, 7 Feb 2023 17:52:08 -0500 Subject: openvswitch: move key and ovs_cb update out of handle_fragments This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long Reviewed-by: Simon Horman Reviewed-by: Aaron Conole Acked-by: Florian Westphal Signed-off-by: Jakub Kicinski --- net/openvswitch/conntrack.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 4be877acc7f0..211e05828fe5 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -438,13 +438,12 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, /* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero * value if 'skb' is freed. */ -static int handle_fragments(struct net *net, struct sw_flow_key *key, - u16 zone, struct sk_buff *skb) +static int handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru) { - struct ovs_skb_cb ovs_cb = *OVS_CB(skb); int err; - if (key->eth.type == htons(ETH_P_IP)) { + if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); @@ -452,9 +451,9 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, if (err) return err; - ovs_cb.mru = IPCB(skb)->frag_max_size; + *mru = IPCB(skb)->frag_max_size; #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - } else if (key->eth.type == htons(ETH_P_IPV6)) { + } else if (family == NFPROTO_IPV6) { enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); @@ -465,22 +464,35 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, return err; } - key->ip.proto = ipv6_hdr(skb)->nexthdr; - ovs_cb.mru = IP6CB(skb)->frag_max_size; + *proto = ipv6_hdr(skb)->nexthdr; + *mru = IP6CB(skb)->frag_max_size; #endif } else { kfree_skb(skb); return -EPFNOSUPPORT; } + skb_clear_hash(skb); + skb->ignore_df = 1; + + return 0; +} + +static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, + u16 zone, int family, struct sk_buff *skb) +{ + struct ovs_skb_cb ovs_cb = *OVS_CB(skb); + int err; + + err = handle_fragments(net, skb, zone, family, &key->ip.proto, &ovs_cb.mru); + if (err) + return err; + /* The key extracted from the fragment that completed this datagram * likely didn't have an L4 header, so regenerate it. */ ovs_flow_key_update_l3l4(skb, key); - key->ip.frag = OVS_FRAG_TYPE_NONE; - skb_clear_hash(skb); - skb->ignore_df = 1; *OVS_CB(skb) = ovs_cb; return 0; @@ -1112,7 +1124,8 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb, } if (key->ip.frag != OVS_FRAG_TYPE_NONE) { - err = handle_fragments(net, key, info->zone.id, skb); + err = ovs_ct_handle_fragments(net, key, info->zone.id, + info->family, skb); if (err) return err; } -- cgit v1.2.3 From 558d95e7e11cd9844806e804f4c1243f2c744b00 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Tue, 7 Feb 2023 17:52:09 -0500 Subject: net: sched: move frag check and tc_skb_cb update out of handle_fragments This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long Reviewed-by: Simon Horman Acked-by: Florian Westphal Signed-off-by: Jakub Kicinski --- net/sched/act_ct.c | 71 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 39 insertions(+), 32 deletions(-) (limited to 'net') diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 0a1ecc972a8b..9f133ed93815 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -778,29 +778,10 @@ static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, bool *frag) return 0; } -static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, - u8 family, u16 zone, bool *defrag) +static int handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u16 *mru) { - enum ip_conntrack_info ctinfo; - struct nf_conn *ct; - int err = 0; - bool frag; - u16 mru; - - /* Previously seen (loopback)? Ignore. */ - ct = nf_ct_get(skb, &ctinfo); - if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED) - return 0; - - if (family == NFPROTO_IPV4) - err = tcf_ct_ipv4_is_fragment(skb, &frag); - else - err = tcf_ct_ipv6_is_fragment(skb, &frag); - if (err || !frag) - return err; - - skb_get(skb); - mru = tc_skb_cb(skb)->mru; + int err; if (family == NFPROTO_IPV4) { enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; @@ -812,10 +793,8 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) return err; - if (!err) { - *defrag = true; - mru = IPCB(skb)->frag_max_size; - } + if (!err) + *mru = IPCB(skb)->frag_max_size; } else { /* NFPROTO_IPV6 */ #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; @@ -825,18 +804,14 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, if (err && err != -EINPROGRESS) goto out_free; - if (!err) { - *defrag = true; - mru = IP6CB(skb)->frag_max_size; - } + if (!err) + *mru = IP6CB(skb)->frag_max_size; #else err = -EOPNOTSUPP; goto out_free; #endif } - if (err != -EINPROGRESS) - tc_skb_cb(skb)->mru = mru; skb_clear_hash(skb); skb->ignore_df = 1; return err; @@ -846,6 +821,38 @@ out_free: return err; } +static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u8 family, u16 zone, bool *defrag) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + int err = 0; + bool frag; + u16 mru; + + /* Previously seen (loopback)? Ignore. */ + ct = nf_ct_get(skb, &ctinfo); + if ((ct && !nf_ct_is_template(ct)) || ctinfo == IP_CT_UNTRACKED) + return 0; + + if (family == NFPROTO_IPV4) + err = tcf_ct_ipv4_is_fragment(skb, &frag); + else + err = tcf_ct_ipv6_is_fragment(skb, &frag); + if (err || !frag) + return err; + + skb_get(skb); + err = handle_fragments(net, skb, zone, family, &mru); + if (err) + return err; + + *defrag = true; + tc_skb_cb(skb)->mru = mru; + + return 0; +} + static void tcf_ct_params_free(struct tcf_ct_params *params) { if (params->helper) { -- cgit v1.2.3 From 0785407e78d4bce56e04d92a6c961900b3d513dd Mon Sep 17 00:00:00 2001 From: Xin Long Date: Tue, 7 Feb 2023 17:52:10 -0500 Subject: net: extract nf_ct_handle_fragments to nf_conntrack_ovs Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long Reviewed-by: Simon Horman Reviewed-by: Aaron Conole Acked-by: Florian Westphal Signed-off-by: Jakub Kicinski --- include/net/netfilter/nf_conntrack.h | 2 ++ net/netfilter/nf_conntrack_ovs.c | 48 ++++++++++++++++++++++++++++++++++++ net/openvswitch/conntrack.c | 45 +-------------------------------- net/sched/act_ct.c | 46 ++-------------------------------- 4 files changed, 53 insertions(+), 88 deletions(-) (limited to 'net') diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index a6e89d7212f8..7bbab8f2b73d 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -363,6 +363,8 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net) } int nf_ct_skb_network_trim(struct sk_buff *skb, int family); +int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru); #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count) #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count) diff --git a/net/netfilter/nf_conntrack_ovs.c b/net/netfilter/nf_conntrack_ovs.c index c60ef71d1aea..52b776bdf526 100644 --- a/net/netfilter/nf_conntrack_ovs.c +++ b/net/netfilter/nf_conntrack_ovs.c @@ -3,6 +3,8 @@ #include #include +#include +#include #include /* 'skb' should already be pulled to nh_ofs. */ @@ -128,3 +130,49 @@ int nf_ct_skb_network_trim(struct sk_buff *skb, int family) return pskb_trim_rcsum(skb, len); } EXPORT_SYMBOL_GPL(nf_ct_skb_network_trim); + +/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero + * value if 'skb' is freed. + */ +int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb, + u16 zone, u8 family, u8 *proto, u16 *mru) +{ + int err; + + if (family == NFPROTO_IPV4) { + enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; + + memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); + local_bh_disable(); + err = ip_defrag(net, skb, user); + local_bh_enable(); + if (err) + return err; + + *mru = IPCB(skb)->frag_max_size; +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) + } else if (family == NFPROTO_IPV6) { + enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; + + memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); + err = nf_ct_frag6_gather(net, skb, user); + if (err) { + if (err != -EINPROGRESS) + kfree_skb(skb); + return err; + } + + *proto = ipv6_hdr(skb)->nexthdr; + *mru = IP6CB(skb)->frag_max_size; +#endif + } else { + kfree_skb(skb); + return -EPFNOSUPPORT; + } + + skb_clear_hash(skb); + skb->ignore_df = 1; + + return 0; +} +EXPORT_SYMBOL_GPL(nf_ct_handle_fragments); diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 211e05828fe5..331730fd3580 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -435,56 +435,13 @@ static int ovs_ct_set_labels(struct nf_conn *ct, struct sw_flow_key *key, return 0; } -/* Returns 0 on success, -EINPROGRESS if 'skb' is stolen, or other nonzero - * value if 'skb' is freed. - */ -static int handle_fragments(struct net *net, struct sk_buff *skb, - u16 zone, u8 family, u8 *proto, u16 *mru) -{ - int err; - - if (family == NFPROTO_IPV4) { - enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; - - memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); - err = ip_defrag(net, skb, user); - if (err) - return err; - - *mru = IPCB(skb)->frag_max_size; -#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - } else if (family == NFPROTO_IPV6) { - enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; - - memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); - err = nf_ct_frag6_gather(net, skb, user); - if (err) { - if (err != -EINPROGRESS) - kfree_skb(skb); - return err; - } - - *proto = ipv6_hdr(skb)->nexthdr; - *mru = IP6CB(skb)->frag_max_size; -#endif - } else { - kfree_skb(skb); - return -EPFNOSUPPORT; - } - - skb_clear_hash(skb); - skb->ignore_df = 1; - - return 0; -} - static int ovs_ct_handle_fragments(struct net *net, struct sw_flow_key *key, u16 zone, int family, struct sk_buff *skb) { struct ovs_skb_cb ovs_cb = *OVS_CB(skb); int err; - err = handle_fragments(net, skb, zone, family, &key->ip.proto, &ovs_cb.mru); + err = nf_ct_handle_fragments(net, skb, zone, family, &key->ip.proto, &ovs_cb.mru); if (err) return err; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 9f133ed93815..9cc0bc7c71ed 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -778,49 +778,6 @@ static int tcf_ct_ipv6_is_fragment(struct sk_buff *skb, bool *frag) return 0; } -static int handle_fragments(struct net *net, struct sk_buff *skb, - u16 zone, u8 family, u16 *mru) -{ - int err; - - if (family == NFPROTO_IPV4) { - enum ip_defrag_users user = IP_DEFRAG_CONNTRACK_IN + zone; - - memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); - local_bh_disable(); - err = ip_defrag(net, skb, user); - local_bh_enable(); - if (err && err != -EINPROGRESS) - return err; - - if (!err) - *mru = IPCB(skb)->frag_max_size; - } else { /* NFPROTO_IPV6 */ -#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) - enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone; - - memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm)); - err = nf_ct_frag6_gather(net, skb, user); - if (err && err != -EINPROGRESS) - goto out_free; - - if (!err) - *mru = IP6CB(skb)->frag_max_size; -#else - err = -EOPNOTSUPP; - goto out_free; -#endif - } - - skb_clear_hash(skb); - skb->ignore_df = 1; - return err; - -out_free: - kfree_skb(skb); - return err; -} - static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, u8 family, u16 zone, bool *defrag) { @@ -828,6 +785,7 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, struct nf_conn *ct; int err = 0; bool frag; + u8 proto; u16 mru; /* Previously seen (loopback)? Ignore. */ @@ -843,7 +801,7 @@ static int tcf_ct_handle_fragments(struct net *net, struct sk_buff *skb, return err; skb_get(skb); - err = handle_fragments(net, skb, zone, family, &mru); + err = nf_ct_handle_fragments(net, skb, zone, family, &proto, &mru); if (err) return err; -- cgit v1.2.3 From ccd7f25b5b04869ed0786323940b8d1642459cc0 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Thu, 9 Feb 2023 09:18:49 +0200 Subject: bridge: mcast: Use correct define in MDB dump 'MDB_PG_FLAGS_PERMANENT' and 'MDB_PERMANENT' happen to have the same value, but the latter is uAPI and cannot change, so use it when dumping an MDB entry. Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- net/bridge/br_mdb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 9f22ebfdc518..13076206e497 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -259,7 +259,7 @@ static int __mdb_fill_info(struct sk_buff *skb, #endif } else { ether_addr_copy(e.addr.u.mac_addr, mp->addr.dst.mac_addr); - e.state = MDB_PG_FLAGS_PERMANENT; + e.state = MDB_PERMANENT; } e.addr.proto = mp->addr.proto; nest_ent = nla_nest_start_noflag(skb, -- cgit v1.2.3 From 7ea829664d3ce1977c310d532d5494ce3ec8592a Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Thu, 9 Feb 2023 09:18:50 +0200 Subject: bridge: mcast: Remove pointless sequence generation counter assignment The purpose of the sequence generation counter in the netlink callback is to identify if a multipart dump is consistent or not by calling nl_dump_check_consistent() whenever a message is generated. The function is not invoked by the MDB code, rendering the sequence generation counter assignment pointless. Remove it. Note that even if the function was invoked, we still could not accurately determine if the dump is consistent or not, as there is no sequence generation counter for MDB entries, unlike nexthop objects, for example. Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- net/bridge/br_mdb.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'net') diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 13076206e497..96f36febfb30 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -421,8 +421,6 @@ static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb) rcu_read_lock(); - cb->seq = net->dev_base_seq; - for_each_netdev_rcu(net, dev) { if (netif_is_bridge_master(dev)) { struct net_bridge *br = netdev_priv(dev); -- cgit v1.2.3 From 170afa71e3a2bd4ddaa3bac44512ce0b828a026f Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Thu, 9 Feb 2023 09:18:51 +0200 Subject: bridge: mcast: Move validation to a policy Future patches are going to move parts of the bridge MDB code to the common rtnetlink code in preparation for VXLAN MDB support. To facilitate code sharing between both drivers, move the validation of the top level attributes in RTM_{NEW,DEL}MDB messages to a policy that will eventually be moved to the rtnetlink code. Use 'NLA_NESTED' for 'MDBA_SET_ENTRY_ATTRS' instead of NLA_POLICY_NESTED() as this attribute is going to be validated using different policies in the underlying drivers. Signed-off-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- net/bridge/br_mdb.c | 45 +++++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 18 deletions(-) (limited to 'net') diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 96f36febfb30..25c48d81a597 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -683,51 +683,58 @@ static const struct nla_policy br_mdbe_attrs_pol[MDBE_ATTR_MAX + 1] = { [MDBE_ATTR_RTPROT] = NLA_POLICY_MIN(NLA_U8, RTPROT_STATIC), }; -static bool is_valid_mdb_entry(struct br_mdb_entry *entry, - struct netlink_ext_ack *extack) +static int validate_mdb_entry(const struct nlattr *attr, + struct netlink_ext_ack *extack) { + struct br_mdb_entry *entry = nla_data(attr); + + if (nla_len(attr) != sizeof(struct br_mdb_entry)) { + NL_SET_ERR_MSG_MOD(extack, "Invalid MDBA_SET_ENTRY attribute length"); + return -EINVAL; + } + if (entry->ifindex == 0) { NL_SET_ERR_MSG_MOD(extack, "Zero entry ifindex is not allowed"); - return false; + return -EINVAL; } if (entry->addr.proto == htons(ETH_P_IP)) { if (!ipv4_is_multicast(entry->addr.u.ip4)) { NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is not multicast"); - return false; + return -EINVAL; } if (ipv4_is_local_multicast(entry->addr.u.ip4)) { NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is local multicast"); - return false; + return -EINVAL; } #if IS_ENABLED(CONFIG_IPV6) } else if (entry->addr.proto == htons(ETH_P_IPV6)) { if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6)) { NL_SET_ERR_MSG_MOD(extack, "IPv6 entry group address is link-local all nodes"); - return false; + return -EINVAL; } #endif } else if (entry->addr.proto == 0) { /* L2 mdb */ if (!is_multicast_ether_addr(entry->addr.u.mac_addr)) { NL_SET_ERR_MSG_MOD(extack, "L2 entry group is not multicast"); - return false; + return -EINVAL; } } else { NL_SET_ERR_MSG_MOD(extack, "Unknown entry protocol"); - return false; + return -EINVAL; } if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY) { NL_SET_ERR_MSG_MOD(extack, "Unknown entry state"); - return false; + return -EINVAL; } if (entry->vid >= VLAN_VID_MASK) { NL_SET_ERR_MSG_MOD(extack, "Invalid entry VLAN id"); - return false; + return -EINVAL; } - return true; + return 0; } static bool is_valid_mdb_source(struct nlattr *attr, __be16 proto, @@ -1292,6 +1299,14 @@ static int br_mdb_config_attrs_init(struct nlattr *set_attrs, return 0; } +static const struct nla_policy mdba_policy[MDBA_SET_ENTRY_MAX + 1] = { + [MDBA_SET_ENTRY_UNSPEC] = { .strict_start_type = MDBA_SET_ENTRY_ATTRS + 1 }, + [MDBA_SET_ENTRY] = NLA_POLICY_VALIDATE_FN(NLA_BINARY, + validate_mdb_entry, + sizeof(struct br_mdb_entry)), + [MDBA_SET_ENTRY_ATTRS] = { .type = NLA_NESTED }, +}; + static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, struct br_mdb_config *cfg, struct netlink_ext_ack *extack) @@ -1302,7 +1317,7 @@ static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, int err; err = nlmsg_parse_deprecated(nlh, sizeof(*bpm), tb, - MDBA_SET_ENTRY_MAX, NULL, extack); + MDBA_SET_ENTRY_MAX, mdba_policy, extack); if (err) return err; @@ -1344,14 +1359,8 @@ static int br_mdb_config_init(struct net *net, const struct nlmsghdr *nlh, NL_SET_ERR_MSG_MOD(extack, "Missing MDBA_SET_ENTRY attribute"); return -EINVAL; } - if (nla_len(tb[MDBA_SET_ENTRY]) != sizeof(struct br_mdb_entry)) { - NL_SET_ERR_MSG_MOD(extack, "Invalid MDBA_SET_ENTRY attribute length"); - return -EINVAL; - } cfg->entry = nla_data(tb[MDBA_SET_ENTRY]); - if (!is_valid_mdb_entry(cfg->entry, extack)) - return -EINVAL; if (cfg->entry->ifindex != cfg->br->dev->ifindex) { struct net_device *pdev; -- cgit v1.2.3 From 68762148d1b011d47bc2ceed7321739b5aea1e63 Mon Sep 17 00:00:00 2001 From: Pietro Borrello Date: Thu, 9 Feb 2023 12:26:23 +0000 Subject: rds: rds_rm_zerocopy_callback() correct order for list_add_tail() rds_rm_zerocopy_callback() uses list_add_tail() with swapped arguments. This links the list head with the new entry, losing the references to the remaining part of the list. Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Suggested-by: Paolo Abeni Signed-off-by: Pietro Borrello Signed-off-by: David S. Miller --- net/rds/message.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/rds/message.c b/net/rds/message.c index c19c93561227..7af59d2443e5 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -118,7 +118,7 @@ static void rds_rm_zerocopy_callback(struct rds_sock *rs, ck = &info->zcookies; memset(ck, 0, sizeof(*ck)); WARN_ON(!rds_zcookie_add(info, cookie)); - list_add_tail(&q->zcookie_head, &info->rs_zcookie_next); + list_add_tail(&info->rs_zcookie_next, &q->zcookie_head); spin_unlock_irqrestore(&q->lock, flags); /* caller invokes rds_wake_sk_sleep() */ -- cgit v1.2.3 From 6d86bb0a5cb82bd459b017f180bdc47bec561b2d Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Thu, 9 Feb 2023 14:20:45 -0800 Subject: devlink: stop using NL_SET_ERR_MSG_MOD NL_SET_ERR_MSG_MOD inserts the KBUILD_MODNAME and a ':' before the actual extended error message. The devlink feature hasn't been able to be compiled as a module since commit f4b6bcc7002f ("net: devlink: turn devlink into a built-in"). Stop using NL_SET_ERR_MSG_MOD, and just use the base NL_SET_ERR_MSG. This aligns the extended error messages better with the NL_SET_ERR_MSG_ATTR messages as well. Signed-off-by: Jacob Keller Acked-by: Jakub Kicinski Signed-off-by: David S. Miller --- net/devlink/dev.c | 20 ++++----- net/devlink/leftover.c | 115 ++++++++++++++++++++++++------------------------- 2 files changed, 65 insertions(+), 70 deletions(-) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 78d824eda5ec..7cf2de44e02f 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -305,7 +305,7 @@ static struct net *devlink_netns_get(struct sk_buff *skb, struct net *net; if (!!netns_pid_attr + !!netns_fd_attr + !!netns_id_attr > 1) { - NL_SET_ERR_MSG_MOD(info->extack, "multiple netns identifying attributes specified"); + NL_SET_ERR_MSG(info->extack, "multiple netns identifying attributes specified"); return ERR_PTR(-EINVAL); } @@ -323,7 +323,7 @@ static struct net *devlink_netns_get(struct sk_buff *skb, net = ERR_PTR(-EINVAL); } if (IS_ERR(net)) { - NL_SET_ERR_MSG_MOD(info->extack, "Unknown network namespace"); + NL_SET_ERR_MSG(info->extack, "Unknown network namespace"); return ERR_PTR(-EINVAL); } if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) { @@ -425,7 +425,7 @@ int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) err = devlink_resources_validate(devlink, NULL, info); if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed"); + NL_SET_ERR_MSG(info->extack, "resources size validation failed"); return err; } @@ -435,8 +435,7 @@ int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) action = DEVLINK_RELOAD_ACTION_DRIVER_REINIT; if (!devlink_reload_action_is_supported(devlink, action)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested reload action is not supported by the driver"); + NL_SET_ERR_MSG(info->extack, "Requested reload action is not supported by the driver"); return -EOPNOTSUPP; } @@ -448,7 +447,7 @@ int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) limits = nla_get_bitfield32(info->attrs[DEVLINK_ATTR_RELOAD_LIMITS]); limits_selected = limits.value & limits.selector; if (!limits_selected) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid limit selected"); + NL_SET_ERR_MSG(info->extack, "Invalid limit selected"); return -EINVAL; } for (limit = 0 ; limit <= DEVLINK_RELOAD_LIMIT_MAX ; limit++) @@ -456,18 +455,15 @@ int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) break; /* UAPI enables multiselection, but currently it is not used */ if (limits_selected != BIT(limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Multiselection of limit is not supported"); + NL_SET_ERR_MSG(info->extack, "Multiselection of limit is not supported"); return -EOPNOTSUPP; } if (!devlink_reload_limit_is_supported(devlink, limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested limit is not supported by the driver"); + NL_SET_ERR_MSG(info->extack, "Requested limit is not supported by the driver"); return -EOPNOTSUPP; } if (devlink_reload_combination_is_invalid(action, limit)) { - NL_SET_ERR_MSG_MOD(info->extack, - "Requested limit is invalid for this action"); + NL_SET_ERR_MSG(info->extack, "Requested limit is invalid for this action"); return -EINVAL; } } diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index f05ab093d231..61b51ed287b3 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -810,13 +810,12 @@ static int devlink_port_fn_state_fill(const struct devlink_ops *ops, } if (!devlink_port_fn_state_valid(state)) { WARN_ON_ONCE(1); - NL_SET_ERR_MSG_MOD(extack, "Invalid state read from driver"); + NL_SET_ERR_MSG(extack, "Invalid state read from driver"); return -EINVAL; } if (!devlink_port_fn_opstate_valid(opstate)) { WARN_ON_ONCE(1); - NL_SET_ERR_MSG_MOD(extack, - "Invalid operational state read from driver"); + NL_SET_ERR_MSG(extack, "Invalid operational state read from driver"); return -EINVAL; } if (nla_put_u8(msg, DEVLINK_PORT_FN_ATTR_STATE, state) || @@ -1171,16 +1170,16 @@ static int devlink_port_function_hw_addr_set(struct devlink_port *port, hw_addr = nla_data(attr); hw_addr_len = nla_len(attr); if (hw_addr_len > MAX_ADDR_LEN) { - NL_SET_ERR_MSG_MOD(extack, "Port function hardware address too long"); + NL_SET_ERR_MSG(extack, "Port function hardware address too long"); return -EINVAL; } if (port->type == DEVLINK_PORT_TYPE_ETH) { if (hw_addr_len != ETH_ALEN) { - NL_SET_ERR_MSG_MOD(extack, "Address must be 6 bytes for Ethernet device"); + NL_SET_ERR_MSG(extack, "Address must be 6 bytes for Ethernet device"); return -EINVAL; } if (!is_unicast_ether_addr(hw_addr)) { - NL_SET_ERR_MSG_MOD(extack, "Non-unicast hardware address unsupported"); + NL_SET_ERR_MSG(extack, "Non-unicast hardware address unsupported"); return -EINVAL; } } @@ -1256,7 +1255,7 @@ static int devlink_port_function_set(struct devlink_port *port, err = nla_parse_nested(tb, DEVLINK_PORT_FUNCTION_ATTR_MAX, attr, devlink_function_nl_policy, extack); if (err < 0) { - NL_SET_ERR_MSG_MOD(extack, "Fail to parse port function attributes"); + NL_SET_ERR_MSG(extack, "Fail to parse port function attributes"); return err; } @@ -1335,14 +1334,14 @@ static int devlink_nl_cmd_port_split_doit(struct sk_buff *skb, if (!devlink_port->attrs.splittable) { /* Split ports cannot be split. */ if (devlink_port->attrs.split) - NL_SET_ERR_MSG_MOD(info->extack, "Port cannot be split further"); + NL_SET_ERR_MSG(info->extack, "Port cannot be split further"); else - NL_SET_ERR_MSG_MOD(info->extack, "Port cannot be split"); + NL_SET_ERR_MSG(info->extack, "Port cannot be split"); return -EINVAL; } if (count < 2 || !is_power_of_2(count) || count > devlink_port->attrs.lanes) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid split count"); + NL_SET_ERR_MSG(info->extack, "Invalid split count"); return -EINVAL; } @@ -1406,7 +1405,7 @@ static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb, if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] || !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) { - NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified"); + NL_SET_ERR_MSG(extack, "Port flavour or PCI PF are not specified"); return -EINVAL; } new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]); @@ -1454,7 +1453,7 @@ static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb, return -EOPNOTSUPP; if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_PORT_INDEX)) { - NL_SET_ERR_MSG_MOD(extack, "Port index is not specified"); + NL_SET_ERR_MSG(extack, "Port index is not specified"); return -EINVAL; } port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); @@ -1496,13 +1495,13 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate, return -ENODEV; if (parent == devlink_rate) { - NL_SET_ERR_MSG_MOD(info->extack, "Parent to self is not allowed"); + NL_SET_ERR_MSG(info->extack, "Parent to self is not allowed"); return -EINVAL; } if (devlink_rate_is_node(devlink_rate) && devlink_rate_is_parent_node(devlink_rate, parent->parent)) { - NL_SET_ERR_MSG_MOD(info->extack, "Node is already a parent of parent node."); + NL_SET_ERR_MSG(info->extack, "Node is already a parent of parent node."); return -EEXIST; } @@ -1611,16 +1610,16 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops, if (type == DEVLINK_RATE_TYPE_LEAF) { if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_leaf_tx_share_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the leafs"); + NL_SET_ERR_MSG(info->extack, "TX share set isn't supported for the leafs"); return false; } if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_leaf_tx_max_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the leafs"); + NL_SET_ERR_MSG(info->extack, "TX max set isn't supported for the leafs"); return false; } if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] && !ops->rate_leaf_parent_set) { - NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs"); + NL_SET_ERR_MSG(info->extack, "Parent set isn't supported for the leafs"); return false; } if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) { @@ -1637,16 +1636,16 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops, } } else if (type == DEVLINK_RATE_TYPE_NODE) { if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes"); + NL_SET_ERR_MSG(info->extack, "TX share set isn't supported for the nodes"); return false; } if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_node_tx_max_set) { - NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the nodes"); + NL_SET_ERR_MSG(info->extack, "TX max set isn't supported for the nodes"); return false; } if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] && !ops->rate_node_parent_set) { - NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes"); + NL_SET_ERR_MSG(info->extack, "Parent set isn't supported for the nodes"); return false; } if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) { @@ -1697,7 +1696,7 @@ static int devlink_nl_cmd_rate_new_doit(struct sk_buff *skb, ops = devlink->ops; if (!ops || !ops->rate_node_new || !ops->rate_node_del) { - NL_SET_ERR_MSG_MOD(info->extack, "Rate nodes aren't supported"); + NL_SET_ERR_MSG(info->extack, "Rate nodes aren't supported"); return -EOPNOTSUPP; } @@ -1753,7 +1752,7 @@ static int devlink_nl_cmd_rate_del_doit(struct sk_buff *skb, int err; if (refcount_read(&rate_node->refcnt) > 1) { - NL_SET_ERR_MSG_MOD(info->extack, "Node has children. Cannot delete node."); + NL_SET_ERR_MSG(info->extack, "Node has children. Cannot delete node."); return -EBUSY; } @@ -1941,26 +1940,26 @@ static int devlink_linecard_type_set(struct devlink_linecard *linecard, mutex_lock(&linecard->state_lock); if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being provisioned"); + NL_SET_ERR_MSG(extack, "Line card is currently being provisioned"); err = -EBUSY; goto out; } if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being unprovisioned"); + NL_SET_ERR_MSG(extack, "Line card is currently being unprovisioned"); err = -EBUSY; goto out; } linecard_type = devlink_linecard_type_lookup(linecard, type); if (!linecard_type) { - NL_SET_ERR_MSG_MOD(extack, "Unsupported line card type provided"); + NL_SET_ERR_MSG(extack, "Unsupported line card type provided"); err = -EINVAL; goto out; } if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED && linecard->state != DEVLINK_LINECARD_STATE_PROVISIONING_FAILED) { - NL_SET_ERR_MSG_MOD(extack, "Line card already provisioned"); + NL_SET_ERR_MSG(extack, "Line card already provisioned"); err = -EBUSY; /* Check if the line card is provisioned in the same * way the user asks. In case it is, make the operation @@ -2004,12 +2003,12 @@ static int devlink_linecard_type_unset(struct devlink_linecard *linecard, mutex_lock(&linecard->state_lock); if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being provisioned"); + NL_SET_ERR_MSG(extack, "Line card is currently being provisioned"); err = -EBUSY; goto out; } if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) { - NL_SET_ERR_MSG_MOD(extack, "Line card is currently being unprovisioned"); + NL_SET_ERR_MSG(extack, "Line card is currently being unprovisioned"); err = -EBUSY; goto out; } @@ -2022,7 +2021,7 @@ static int devlink_linecard_type_unset(struct devlink_linecard *linecard, } if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) { - NL_SET_ERR_MSG_MOD(extack, "Line card is not provisioned"); + NL_SET_ERR_MSG(extack, "Line card is not provisioned"); err = 0; goto out; } @@ -2846,7 +2845,7 @@ int devlink_rate_nodes_check(struct devlink *devlink, u16 mode, list_for_each_entry(devlink_rate, &devlink->rate_list, list) if (devlink_rate_is_node(devlink_rate)) { - NL_SET_ERR_MSG_MOD(extack, "Rate node(s) exists."); + NL_SET_ERR_MSG(extack, "Rate node(s) exists."); return -EBUSY; } return 0; @@ -3612,18 +3611,18 @@ devlink_resource_validate_size(struct devlink_resource *resource, u64 size, int err = 0; if (size > resource->size_params.size_max) { - NL_SET_ERR_MSG_MOD(extack, "Size larger than maximum"); + NL_SET_ERR_MSG(extack, "Size larger than maximum"); err = -EINVAL; } if (size < resource->size_params.size_min) { - NL_SET_ERR_MSG_MOD(extack, "Size smaller than minimum"); + NL_SET_ERR_MSG(extack, "Size smaller than minimum"); err = -EINVAL; } div64_u64_rem(size, resource->size_params.size_granularity, &reminder); if (reminder) { - NL_SET_ERR_MSG_MOD(extack, "Wrong granularity"); + NL_SET_ERR_MSG(extack, "Wrong granularity"); err = -EINVAL; } @@ -4419,21 +4418,21 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, static int devlink_nl_cmd_port_param_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) { - NL_SET_ERR_MSG_MOD(cb->extack, "Port params are not supported"); + NL_SET_ERR_MSG(cb->extack, "Port params are not supported"); return msg->len; } static int devlink_nl_cmd_port_param_get_doit(struct sk_buff *skb, struct genl_info *info) { - NL_SET_ERR_MSG_MOD(info->extack, "Port params are not supported"); + NL_SET_ERR_MSG(info->extack, "Port params are not supported"); return -EINVAL; } static int devlink_nl_cmd_port_param_set_doit(struct sk_buff *skb, struct genl_info *info) { - NL_SET_ERR_MSG_MOD(info->extack, "Port params are not supported"); + NL_SET_ERR_MSG(info->extack, "Port params are not supported"); return -EINVAL; } @@ -5002,7 +5001,7 @@ devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info) int err; if (GENL_REQ_ATTR_CHECK(info, DEVLINK_ATTR_REGION_NAME)) { - NL_SET_ERR_MSG_MOD(info->extack, "No region name provided"); + NL_SET_ERR_MSG(info->extack, "No region name provided"); return -EINVAL; } @@ -5022,19 +5021,19 @@ devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info) region = devlink_region_get_by_name(devlink, region_name); if (!region) { - NL_SET_ERR_MSG_MOD(info->extack, "The requested region does not exist"); + NL_SET_ERR_MSG(info->extack, "The requested region does not exist"); return -EINVAL; } if (!region->ops->snapshot) { - NL_SET_ERR_MSG_MOD(info->extack, "The requested region does not support taking an immediate snapshot"); + NL_SET_ERR_MSG(info->extack, "The requested region does not support taking an immediate snapshot"); return -EOPNOTSUPP; } mutex_lock(®ion->snapshot_lock); if (region->cur_snapshots == region->max_snapshots) { - NL_SET_ERR_MSG_MOD(info->extack, "The region has reached the maximum number of stored snapshots"); + NL_SET_ERR_MSG(info->extack, "The region has reached the maximum number of stored snapshots"); err = -ENOSPC; goto unlock; } @@ -5044,7 +5043,7 @@ devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info) snapshot_id = nla_get_u32(snapshot_id_attr); if (devlink_region_snapshot_get_by_id(region, snapshot_id)) { - NL_SET_ERR_MSG_MOD(info->extack, "The requested snapshot id is already in use"); + NL_SET_ERR_MSG(info->extack, "The requested snapshot id is already in use"); err = -EEXIST; goto unlock; } @@ -5055,7 +5054,7 @@ devlink_nl_cmd_region_new(struct sk_buff *skb, struct genl_info *info) } else { err = __devlink_region_snapshot_id_get(devlink, &snapshot_id); if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "Failed to allocate a new snapshot id"); + NL_SET_ERR_MSG(info->extack, "Failed to allocate a new snapshot id"); goto unlock; } } @@ -6667,7 +6666,7 @@ devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, state->dump_ts = reporter->dump_ts; } if (!reporter->dump_fmsg || state->dump_ts != reporter->dump_ts) { - NL_SET_ERR_MSG_MOD(cb->extack, "Dump trampled, please retry"); + NL_SET_ERR_MSG(cb->extack, "Dump trampled, please retry"); err = -EAGAIN; goto unlock; } @@ -7025,7 +7024,7 @@ static int devlink_nl_cmd_trap_get_doit(struct sk_buff *skb, trap_item = devlink_trap_item_get_from_info(devlink, info); if (!trap_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap"); + NL_SET_ERR_MSG(extack, "Device did not register this trap"); return -ENOENT; } @@ -7088,7 +7087,7 @@ static int __devlink_trap_action_set(struct devlink *devlink, if (trap_item->action != trap_action && trap_item->trap->type != DEVLINK_TRAP_TYPE_DROP) { - NL_SET_ERR_MSG_MOD(extack, "Cannot change action of non-drop traps. Skipping"); + NL_SET_ERR_MSG(extack, "Cannot change action of non-drop traps. Skipping"); return 0; } @@ -7114,7 +7113,7 @@ static int devlink_trap_action_set(struct devlink *devlink, err = devlink_trap_action_get_from_info(info, &trap_action); if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action"); + NL_SET_ERR_MSG(info->extack, "Invalid trap action"); return -EINVAL; } @@ -7134,7 +7133,7 @@ static int devlink_nl_cmd_trap_set_doit(struct sk_buff *skb, trap_item = devlink_trap_item_get_from_info(devlink, info); if (!trap_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap"); + NL_SET_ERR_MSG(extack, "Device did not register this trap"); return -ENOENT; } @@ -7236,7 +7235,7 @@ static int devlink_nl_cmd_trap_group_get_doit(struct sk_buff *skb, group_item = devlink_trap_group_item_get_from_info(devlink, info); if (!group_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group"); + NL_SET_ERR_MSG(extack, "Device did not register this trap group"); return -ENOENT; } @@ -7345,7 +7344,7 @@ devlink_trap_group_action_set(struct devlink *devlink, err = devlink_trap_action_get_from_info(info, &trap_action); if (err) { - NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action"); + NL_SET_ERR_MSG(info->extack, "Invalid trap action"); return -EINVAL; } @@ -7379,7 +7378,7 @@ static int devlink_trap_group_set(struct devlink *devlink, policer_id = nla_get_u32(attrs[DEVLINK_ATTR_TRAP_POLICER_ID]); policer_item = devlink_trap_policer_item_lookup(devlink, policer_id); if (policer_id && !policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + NL_SET_ERR_MSG(extack, "Device did not register this trap policer"); return -ENOENT; } policer = policer_item ? policer_item->policer : NULL; @@ -7408,7 +7407,7 @@ static int devlink_nl_cmd_trap_group_set_doit(struct sk_buff *skb, group_item = devlink_trap_group_item_get_from_info(devlink, info); if (!group_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group"); + NL_SET_ERR_MSG(extack, "Device did not register this trap group"); return -ENOENT; } @@ -7425,7 +7424,7 @@ static int devlink_nl_cmd_trap_group_set_doit(struct sk_buff *skb, err_trap_group_set: if (modified) - NL_SET_ERR_MSG_MOD(extack, "Trap group set failed, but some changes were committed already"); + NL_SET_ERR_MSG(extack, "Trap group set failed, but some changes were committed already"); return err; } @@ -7530,7 +7529,7 @@ static int devlink_nl_cmd_trap_policer_get_doit(struct sk_buff *skb, policer_item = devlink_trap_policer_item_get_from_info(devlink, info); if (!policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + NL_SET_ERR_MSG(extack, "Device did not register this trap policer"); return -ENOENT; } @@ -7605,22 +7604,22 @@ devlink_trap_policer_set(struct devlink *devlink, burst = nla_get_u64(attrs[DEVLINK_ATTR_TRAP_POLICER_BURST]); if (rate < policer_item->policer->min_rate) { - NL_SET_ERR_MSG_MOD(extack, "Policer rate lower than limit"); + NL_SET_ERR_MSG(extack, "Policer rate lower than limit"); return -EINVAL; } if (rate > policer_item->policer->max_rate) { - NL_SET_ERR_MSG_MOD(extack, "Policer rate higher than limit"); + NL_SET_ERR_MSG(extack, "Policer rate higher than limit"); return -EINVAL; } if (burst < policer_item->policer->min_burst) { - NL_SET_ERR_MSG_MOD(extack, "Policer burst size lower than limit"); + NL_SET_ERR_MSG(extack, "Policer burst size lower than limit"); return -EINVAL; } if (burst > policer_item->policer->max_burst) { - NL_SET_ERR_MSG_MOD(extack, "Policer burst size higher than limit"); + NL_SET_ERR_MSG(extack, "Policer burst size higher than limit"); return -EINVAL; } @@ -7650,7 +7649,7 @@ static int devlink_nl_cmd_trap_policer_set_doit(struct sk_buff *skb, policer_item = devlink_trap_policer_item_get_from_info(devlink, info); if (!policer_item) { - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap policer"); + NL_SET_ERR_MSG(extack, "Device did not register this trap policer"); return -ENOENT; } -- cgit v1.2.3 From fa2f921f3bf1ed98956ad341b920a55aab69bc35 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:25 +0100 Subject: devlink: don't use strcpy() to copy param value No need to treat string params any different comparing to other types. Rely on the struct assign to copy the whole struct, including the string. Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Acked-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 61b51ed287b3..2f9e309727c7 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -4387,10 +4387,7 @@ static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink, return -EOPNOTSUPP; if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) { - if (param->type == DEVLINK_PARAM_TYPE_STRING) - strcpy(param_item->driverinit_value.vstr, value.vstr); - else - param_item->driverinit_value = value; + param_item->driverinit_value = value; param_item->driverinit_value_valid = true; } else { if (!param->set) @@ -9655,10 +9652,7 @@ int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT))) return -EOPNOTSUPP; - if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) - strcpy(init_val->vstr, param_item->driverinit_value.vstr); - else - *init_val = param_item->driverinit_value; + *init_val = param_item->driverinit_value; return 0; } @@ -9689,10 +9683,7 @@ void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT))) return; - if (param_item->param->type == DEVLINK_PARAM_TYPE_STRING) - strcpy(param_item->driverinit_value.vstr, init_val.vstr); - else - param_item->driverinit_value = init_val; + param_item->driverinit_value = init_val; param_item->driverinit_value_valid = true; devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); -- cgit v1.2.3 From afd888c3e19ceb5247158fe2fabbf7234937a515 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:26 +0100 Subject: devlink: make sure driver does not read updated driverinit param before reload The driverinit param purpose is to serve the driver during init/reload time to provide a value, either default or set by user. Make sure that driver does not read value updated by user before the reload is performed. Hold the new value in a separate struct and switch it during reload. Note that this is required to be eventually possible to call devl_param_driverinit_value_get() without holding instance lock. Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/devlink.h | 4 ++++ net/devlink/dev.c | 3 +++ net/devlink/devl_internal.h | 3 +++ net/devlink/leftover.c | 26 ++++++++++++++++++++++---- 4 files changed, 32 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/include/net/devlink.h b/include/net/devlink.h index 2e85a5970a32..8ed960345f37 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -489,6 +489,10 @@ struct devlink_param_item { const struct devlink_param *param; union devlink_param_value driverinit_value; bool driverinit_value_valid; + union devlink_param_value driverinit_value_new; /* Not reachable + * until reload. + */ + bool driverinit_value_new_valid; }; enum devlink_param_generic_id { diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 7cf2de44e02f..ab4e0f3c4e3d 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -369,6 +369,9 @@ int devlink_reload(struct devlink *devlink, struct net *dest_net, if (dest_net && !net_eq(dest_net, curr_net)) devlink_reload_netns_change(devlink, curr_net, dest_net); + if (action == DEVLINK_RELOAD_ACTION_DRIVER_REINIT) + devlink_params_driverinit_load_new(devlink); + err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack); devlink_reload_failed_set(devlink, !!err); if (err) diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 941174e157d4..5c117e8d4377 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -189,6 +189,9 @@ static inline bool devlink_reload_supported(const struct devlink_ops *ops) return ops->reload_down && ops->reload_up; } +/* Params */ +void devlink_params_driverinit_load_new(struct devlink *devlink); + /* Resources */ struct devlink_resource; int devlink_resources_validate(struct devlink *devlink, diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 2f9e309727c7..f74aca8ba3ab 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -4097,9 +4097,12 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink, if (!devlink_param_cmode_is_supported(param, i)) continue; if (i == DEVLINK_PARAM_CMODE_DRIVERINIT) { - if (!param_item->driverinit_value_valid) + if (param_item->driverinit_value_new_valid) + param_value[i] = param_item->driverinit_value_new; + else if (param_item->driverinit_value_valid) + param_value[i] = param_item->driverinit_value; + else return -EOPNOTSUPP; - param_value[i] = param_item->driverinit_value; } else { ctx.cmode = i; err = devlink_param_get(devlink, param, &ctx); @@ -4387,8 +4390,8 @@ static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink, return -EOPNOTSUPP; if (cmode == DEVLINK_PARAM_CMODE_DRIVERINIT) { - param_item->driverinit_value = value; - param_item->driverinit_value_valid = true; + param_item->driverinit_value_new = value; + param_item->driverinit_value_new_valid = true; } else { if (!param->set) return -EOPNOTSUPP; @@ -9690,6 +9693,21 @@ void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, } EXPORT_SYMBOL_GPL(devl_param_driverinit_value_set); +void devlink_params_driverinit_load_new(struct devlink *devlink) +{ + struct devlink_param_item *param_item; + + list_for_each_entry(param_item, &devlink->param_list, list) { + if (!devlink_param_cmode_is_supported(param_item->param, + DEVLINK_PARAM_CMODE_DRIVERINIT) || + !param_item->driverinit_value_new_valid) + continue; + param_item->driverinit_value = param_item->driverinit_value_new; + param_item->driverinit_value_valid = true; + param_item->driverinit_value_new_valid = false; + } +} + /** * devl_param_value_changed - notify devlink on a parameter's value * change. Should be called by the driver -- cgit v1.2.3 From 94ba1c316b9c0f9b017f7cd7eac84adae693e80f Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:27 +0100 Subject: devlink: fix the name of value arg of devl_param_driverinit_value_get() Probably due to copy-paste error, the name of the arg is "init_val" which is misleading, as the pointer is used to point to struct where to store the current value. Rename it to "val" and change the arg comment a bit on the way. Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Acked-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- include/net/devlink.h | 2 +- net/devlink/leftover.c | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/include/net/devlink.h b/include/net/devlink.h index 8ed960345f37..6a942e70e451 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1784,7 +1784,7 @@ void devlink_params_unregister(struct devlink *devlink, const struct devlink_param *params, size_t params_count); int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, - union devlink_param_value *init_val); + union devlink_param_value *val); void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, union devlink_param_value init_val); void devl_param_value_changed(struct devlink *devlink, u32 param_id); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index f74aca8ba3ab..c3c82d5d563e 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -9629,13 +9629,14 @@ EXPORT_SYMBOL_GPL(devlink_params_unregister); * * @devlink: devlink * @param_id: parameter ID - * @init_val: value of parameter in driverinit configuration mode + * @val: pointer to store the value of parameter in driverinit + * configuration mode * * This function should be used by the driver to get driverinit * configuration for initialization after reload command. */ int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, - union devlink_param_value *init_val) + union devlink_param_value *val) { struct devlink_param_item *param_item; @@ -9655,7 +9656,7 @@ int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, DEVLINK_PARAM_CMODE_DRIVERINIT))) return -EOPNOTSUPP; - *init_val = param_item->driverinit_value; + *val = param_item->driverinit_value; return 0; } -- cgit v1.2.3 From fbcf938150ecd686c6a6195573957db25b43a182 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:28 +0100 Subject: devlink: use xa_for_each_start() helper in devlink_nl_cmd_port_get_dump_one() As xarray has an iterator helper that allows to start from specified index, use this directly and avoid repeated iteration from 0. Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Acked-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index c3c82d5d563e..1f3332a6f1ac 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -1110,24 +1110,18 @@ devlink_nl_cmd_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink, struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_port *devlink_port; unsigned long port_index; - int idx = 0; int err = 0; - xa_for_each(&devlink->ports, port_index, devlink_port) { - if (idx < state->idx) { - idx++; - continue; - } + xa_for_each_start(&devlink->ports, port_index, devlink_port, state->idx) { err = devlink_nl_port_fill(msg, devlink_port, DEVLINK_CMD_NEW, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, cb->extack); if (err) { - state->idx = idx; + state->idx = port_index; break; } - idx++; } return err; -- cgit v1.2.3 From a72e17b4523223015d3b3fd79bac2b065d6d09a9 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:29 +0100 Subject: devlink: convert param list to xarray Loose the linked list for params and use xarray instead. Note that this is required to be eventually possible to call devl_param_driverinit_value_get() without holding instance lock. Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Acked-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/core.c | 4 +-- net/devlink/devl_internal.h | 2 +- net/devlink/leftover.c | 72 ++++++++++++++++++++++----------------------- 3 files changed, 39 insertions(+), 39 deletions(-) (limited to 'net') diff --git a/net/devlink/core.c b/net/devlink/core.c index a4f47dafb864..777b091ef74d 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -212,6 +212,7 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, devlink->dev = dev; devlink->ops = ops; xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC); + xa_init_flags(&devlink->params, XA_FLAGS_ALLOC); xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC); write_pnet(&devlink->_net, net); INIT_LIST_HEAD(&devlink->rate_list); @@ -219,7 +220,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops, INIT_LIST_HEAD(&devlink->sb_list); INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list); INIT_LIST_HEAD(&devlink->resource_list); - INIT_LIST_HEAD(&devlink->param_list); INIT_LIST_HEAD(&devlink->region_list); INIT_LIST_HEAD(&devlink->reporter_list); INIT_LIST_HEAD(&devlink->trap_list); @@ -255,7 +255,6 @@ void devlink_free(struct devlink *devlink) WARN_ON(!list_empty(&devlink->trap_list)); WARN_ON(!list_empty(&devlink->reporter_list)); WARN_ON(!list_empty(&devlink->region_list)); - WARN_ON(!list_empty(&devlink->param_list)); WARN_ON(!list_empty(&devlink->resource_list)); WARN_ON(!list_empty(&devlink->dpipe_table_list)); WARN_ON(!list_empty(&devlink->sb_list)); @@ -264,6 +263,7 @@ void devlink_free(struct devlink *devlink) WARN_ON(!xa_empty(&devlink->ports)); xa_destroy(&devlink->snapshot_ids); + xa_destroy(&devlink->params); xa_destroy(&devlink->ports); WARN_ON_ONCE(unregister_netdevice_notifier(&devlink->netdevice_nb)); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 5c117e8d4377..2f4820e40d27 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -29,7 +29,7 @@ struct devlink { struct list_head sb_list; struct list_head dpipe_table_list; struct list_head resource_list; - struct list_head param_list; + struct xarray params; struct list_head region_list; struct list_head reporter_list; struct devlink_dpipe_headers *dpipe_headers; diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 1f3332a6f1ac..c6f0eccc700a 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -3953,26 +3953,22 @@ static int devlink_param_driver_verify(const struct devlink_param *param) } static struct devlink_param_item * -devlink_param_find_by_name(struct list_head *param_list, - const char *param_name) +devlink_param_find_by_name(struct xarray *params, const char *param_name) { struct devlink_param_item *param_item; + unsigned long param_id; - list_for_each_entry(param_item, param_list, list) + xa_for_each(params, param_id, param_item) { if (!strcmp(param_item->param->name, param_name)) return param_item; + } return NULL; } static struct devlink_param_item * -devlink_param_find_by_id(struct list_head *param_list, u32 param_id) +devlink_param_find_by_id(struct xarray *params, u32 param_id) { - struct devlink_param_item *param_item; - - list_for_each_entry(param_item, param_list, list) - if (param_item->param->id == param_id) - return param_item; - return NULL; + return xa_load(params, param_id); } static bool @@ -4201,14 +4197,10 @@ devlink_nl_cmd_param_get_dump_one(struct sk_buff *msg, struct devlink *devlink, { struct devlink_nl_dump_state *state = devlink_dump_state(cb); struct devlink_param_item *param_item; - int idx = 0; + unsigned long param_id; int err = 0; - list_for_each_entry(param_item, &devlink->param_list, list) { - if (idx < state->idx) { - idx++; - continue; - } + xa_for_each_start(&devlink->params, param_id, param_item, state->idx) { err = devlink_nl_param_fill(msg, devlink, 0, param_item, DEVLINK_CMD_PARAM_GET, NETLINK_CB(cb->skb).portid, @@ -4217,10 +4209,9 @@ devlink_nl_cmd_param_get_dump_one(struct sk_buff *msg, struct devlink *devlink, if (err == -EOPNOTSUPP) { err = 0; } else if (err) { - state->idx = idx; + state->idx = param_id; break; } - idx++; } return err; @@ -4306,8 +4297,7 @@ devlink_param_value_get_from_info(const struct devlink_param *param, } static struct devlink_param_item * -devlink_param_get_from_info(struct list_head *param_list, - struct genl_info *info) +devlink_param_get_from_info(struct xarray *params, struct genl_info *info) { char *param_name; @@ -4315,7 +4305,7 @@ devlink_param_get_from_info(struct list_head *param_list, return NULL; param_name = nla_data(info->attrs[DEVLINK_ATTR_PARAM_NAME]); - return devlink_param_find_by_name(param_list, param_name); + return devlink_param_find_by_name(params, param_name); } static int devlink_nl_cmd_param_get_doit(struct sk_buff *skb, @@ -4326,7 +4316,7 @@ static int devlink_nl_cmd_param_get_doit(struct sk_buff *skb, struct sk_buff *msg; int err; - param_item = devlink_param_get_from_info(&devlink->param_list, info); + param_item = devlink_param_get_from_info(&devlink->params, info); if (!param_item) return -EINVAL; @@ -4347,7 +4337,7 @@ static int devlink_nl_cmd_param_get_doit(struct sk_buff *skb, static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink, unsigned int port_index, - struct list_head *param_list, + struct xarray *params, struct genl_info *info, enum devlink_command cmd) { @@ -4359,7 +4349,7 @@ static int __devlink_nl_cmd_param_set_doit(struct devlink *devlink, union devlink_param_value value; int err = 0; - param_item = devlink_param_get_from_info(param_list, info); + param_item = devlink_param_get_from_info(params, info); if (!param_item) return -EINVAL; param = param_item->param; @@ -4405,7 +4395,7 @@ static int devlink_nl_cmd_param_set_doit(struct sk_buff *skb, { struct devlink *devlink = info->user_ptr[0]; - return __devlink_nl_cmd_param_set_doit(devlink, 0, &devlink->param_list, + return __devlink_nl_cmd_param_set_doit(devlink, 0, &devlink->params, info, DEVLINK_CMD_PARAM_NEW); } @@ -8037,6 +8027,7 @@ void devlink_notify_register(struct devlink *devlink) struct devlink_rate *rate_node; struct devlink_region *region; unsigned long port_index; + unsigned long param_id; devlink_notify(devlink, DEVLINK_CMD_NEW); list_for_each_entry(linecard, &devlink->linecard_list, list) @@ -8062,7 +8053,7 @@ void devlink_notify_register(struct devlink *devlink) list_for_each_entry(region, &devlink->region_list, list) devlink_nl_region_notify(region, NULL, DEVLINK_CMD_REGION_NEW); - list_for_each_entry(param_item, &devlink->param_list, list) + xa_for_each(&devlink->params, param_id, param_item) devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); } @@ -8077,8 +8068,9 @@ void devlink_notify_unregister(struct devlink *devlink) struct devlink_rate *rate_node; struct devlink_region *region; unsigned long port_index; + unsigned long param_id; - list_for_each_entry_reverse(param_item, &devlink->param_list, list) + xa_for_each(&devlink->params, param_id, param_item) devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_DEL); @@ -9505,9 +9497,10 @@ static int devlink_param_register(struct devlink *devlink, const struct devlink_param *param) { struct devlink_param_item *param_item; + int err; WARN_ON(devlink_param_verify(param)); - WARN_ON(devlink_param_find_by_name(&devlink->param_list, param->name)); + WARN_ON(devlink_param_find_by_name(&devlink->params, param->name)); if (param->supported_cmodes == BIT(DEVLINK_PARAM_CMODE_DRIVERINIT)) WARN_ON(param->get || param->set); @@ -9520,9 +9513,16 @@ static int devlink_param_register(struct devlink *devlink, param_item->param = param; - list_add_tail(¶m_item->list, &devlink->param_list); + err = xa_insert(&devlink->params, param->id, param_item, GFP_KERNEL); + if (err) + goto err_xa_insert; + devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); return 0; + +err_xa_insert: + kfree(param_item); + return err; } static void devlink_param_unregister(struct devlink *devlink, @@ -9530,12 +9530,11 @@ static void devlink_param_unregister(struct devlink *devlink, { struct devlink_param_item *param_item; - param_item = - devlink_param_find_by_name(&devlink->param_list, param->name); + param_item = devlink_param_find_by_id(&devlink->params, param->id); if (WARN_ON(!param_item)) return; devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_DEL); - list_del(¶m_item->list); + xa_erase(&devlink->params, param->id); kfree(param_item); } @@ -9639,7 +9638,7 @@ int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, if (WARN_ON(!devlink_reload_supported(devlink->ops))) return -EOPNOTSUPP; - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); + param_item = devlink_param_find_by_id(&devlink->params, param_id); if (!param_item) return -EINVAL; @@ -9673,7 +9672,7 @@ void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, { struct devlink_param_item *param_item; - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); + param_item = devlink_param_find_by_id(&devlink->params, param_id); if (WARN_ON(!param_item)) return; @@ -9691,8 +9690,9 @@ EXPORT_SYMBOL_GPL(devl_param_driverinit_value_set); void devlink_params_driverinit_load_new(struct devlink *devlink) { struct devlink_param_item *param_item; + unsigned long param_id; - list_for_each_entry(param_item, &devlink->param_list, list) { + xa_for_each(&devlink->params, param_id, param_item) { if (!devlink_param_cmode_is_supported(param_item->param, DEVLINK_PARAM_CMODE_DRIVERINIT) || !param_item->driverinit_value_new_valid) @@ -9719,7 +9719,7 @@ void devl_param_value_changed(struct devlink *devlink, u32 param_id) { struct devlink_param_item *param_item; - param_item = devlink_param_find_by_id(&devlink->param_list, param_id); + param_item = devlink_param_find_by_id(&devlink->params, param_id); WARN_ON(!param_item); devlink_param_notify(devlink, 0, param_item, DEVLINK_CMD_PARAM_NEW); -- cgit v1.2.3 From 280f7b2adca09c8d5f34b99f49e5c570aa81daad Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:30 +0100 Subject: devlink: allow to call devl_param_driverinit_value_get() without holding instance lock If the driver maintains following basic sane behavior, the devl_param_driverinit_value_get() function could be called without holding instance lock: 1) Driver ensures a call to devl_param_driverinit_value_get() cannot race with registering/unregistering the parameter with the same parameter ID. 2) Driver ensures a call to devl_param_driverinit_value_get() cannot race with devl_param_driverinit_value_set() call with the same parameter ID. 3) Driver ensures a call to devl_param_driverinit_value_get() cannot race with reload operation. By the nature of params usage, these requirements should be trivially achievable. If the driver for some off reason is not able to comply, it has to take the devlink->lock while calling devl_param_driverinit_value_get(). Remove the lock assertion and add comment describing the locking requirements. This fixes a splat in mlx5 driver introduced by the commit referenced in the "Fixes" tag. Lore: https://lore.kernel.org/netdev/719de4f0-76ac-e8b9-38a9-167ae239efc7@amd.com/ Reported-by: Kim Phillips Fixes: 075935f0ae0f ("devlink: protect devlink param list by instance lock") Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Acked-by: Jakub Kicinski Reviewed-by: Jacob Keller Tested-by: Kim Phillips Signed-off-by: David S. Miller --- net/devlink/leftover.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index c6f0eccc700a..d4c896f89905 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -9627,14 +9627,23 @@ EXPORT_SYMBOL_GPL(devlink_params_unregister); * * This function should be used by the driver to get driverinit * configuration for initialization after reload command. + * + * Note that lockless call of this function relies on the + * driver to maintain following basic sane behavior: + * 1) Driver ensures a call to this function cannot race with + * registering/unregistering the parameter with the same parameter ID. + * 2) Driver ensures a call to this function cannot race with + * devl_param_driverinit_value_set() call with the same parameter ID. + * 3) Driver ensures a call to this function cannot race with + * reload operation. + * If the driver is not able to comply, it has to take the devlink->lock + * while calling this. */ int devl_param_driverinit_value_get(struct devlink *devlink, u32 param_id, union devlink_param_value *val) { struct devlink_param_item *param_item; - lockdep_assert_held(&devlink->lock); - if (WARN_ON(!devlink_reload_supported(devlink->ops))) return -EOPNOTSUPP; -- cgit v1.2.3 From 6b4bfa43ce29165fb0a2a8ef770d94c1d93e5ad8 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 10 Feb 2023 11:01:31 +0100 Subject: devlink: add forgotten devlink instance lock assertion to devl_param_driverinit_value_set() Driver calling devl_param_driverinit_value_set() has to hold devlink instance lock while doing that. Put an assertion there. Signed-off-by: Jiri Pirko Reviewed-by: Simon Horman Acked-by: Jakub Kicinski Reviewed-by: Jacob Keller Signed-off-by: David S. Miller --- net/devlink/leftover.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index d4c896f89905..3569706c49e1 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -9681,6 +9681,8 @@ void devl_param_driverinit_value_set(struct devlink *devlink, u32 param_id, { struct devlink_param_item *param_item; + devl_assert_locked(devlink); + param_item = devlink_param_find_by_id(&devlink->params, param_id); if (WARN_ON(!param_item)) return; -- cgit v1.2.3 From 4fab64126891d413f207dacd5762a839b3470315 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 10 Feb 2023 15:26:05 +0000 Subject: net/sched: fix error recovery in qdisc_create() If TCA_STAB attribute is malformed, qdisc_get_stab() returns an error, and we end up calling ops->destroy() while ops->init() has not been called yet. While we are at it, call qdisc_put_stab() after ops->destroy(). Fixes: 1f62879e3632 ("net/sched: make stab available before ops->init() call") Reported-by: syzbot+d44d88f1d11e6ca8576b@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet Cc: Vladimir Oltean Cc: Kurt Kanzenbach Reviewed-by: Vladimir Oltean Signed-off-by: David S. Miller --- net/sched/sch_api.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index e9780631b5b5..aba789c30a2e 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1286,7 +1286,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, stab = qdisc_get_stab(tca[TCA_STAB], extack); if (IS_ERR(stab)) { err = PTR_ERR(stab); - goto err_out4; + goto err_out3; } rcu_assign_pointer(sch->stab, stab); } @@ -1294,14 +1294,14 @@ static struct Qdisc *qdisc_create(struct net_device *dev, if (ops->init) { err = ops->init(sch, tca[TCA_OPTIONS], extack); if (err != 0) - goto err_out5; + goto err_out4; } if (tca[TCA_RATE]) { err = -EOPNOTSUPP; if (sch->flags & TCQ_F_MQROOT) { NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to a multi-queue root qdisc"); - goto err_out5; + goto err_out4; } err = gen_new_estimator(&sch->bstats, @@ -1312,7 +1312,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, tca[TCA_RATE]); if (err) { NL_SET_ERR_MSG(extack, "Failed to generate new estimator"); - goto err_out5; + goto err_out4; } } @@ -1321,12 +1321,13 @@ static struct Qdisc *qdisc_create(struct net_device *dev, return sch; -err_out5: - qdisc_put_stab(rtnl_dereference(sch->stab)); err_out4: - /* ops->init() failed, we call ->destroy() like qdisc_create_dflt() */ + /* Even if ops->init() failed, we call ops->destroy() + * like qdisc_create_dflt(). + */ if (ops->destroy) ops->destroy(sch); + qdisc_put_stab(rtnl_dereference(sch->stab)); err_out3: netdev_put(dev, &sch->dev_tracker); qdisc_free(sch); -- cgit v1.2.3 From 5b4e9a7a71ab912d150cb2276cb23af51c863150 Mon Sep 17 00:00:00 2001 From: Shannon Nelson Date: Fri, 10 Feb 2023 16:50:16 -0800 Subject: net: ethtool: extend ringparam set/get APIs for rx_push Similar to what was done for TX_PUSH, add an RX_PUSH concept to the ethtool interfaces. Signed-off-by: Shannon Nelson Signed-off-by: David S. Miller --- Documentation/networking/ethtool-netlink.rst | 6 ++++-- include/linux/ethtool.h | 4 ++++ include/uapi/linux/ethtool_netlink.h | 1 + net/ethtool/netlink.h | 2 +- net/ethtool/rings.c | 17 +++++++++++++++-- 5 files changed, 25 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index d144e890961d..e1bc6186d7ea 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -874,6 +874,7 @@ Kernel response contents: ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode + ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode ==================================== ====== =========================== ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` indicates whether the device is usable with @@ -883,8 +884,8 @@ separate buffers. The device configuration must make it possible to receive full memory pages of data, for example because MTU is high enough or through HW-GRO. -``ETHTOOL_A_RINGS_TX_PUSH`` flag is used to enable descriptor fast -path to send packets. In ordinary path, driver fills descriptors in DRAM and +``ETHTOOL_A_RINGS_[RX|TX]_PUSH`` flag is used to enable descriptor fast +path to send or receive packets. In ordinary path, driver fills descriptors in DRAM and notifies NIC hardware. In fast path, driver pushes descriptors to the device through MMIO writes, thus reducing the latency. However, enabling this feature may increase the CPU cost. Drivers may enforce additional per-packet @@ -906,6 +907,7 @@ Request contents: ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode + ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode ==================================== ====== =========================== Kernel checks that requested ring sizes do not exceed limits reported by diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 515c78d8eb7c..2792185dda22 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -73,12 +73,14 @@ enum { * @rx_buf_len: Current length of buffers on the rx ring. * @tcp_data_split: Scatter packet headers and data to separate buffers * @tx_push: The flag of tx push mode + * @rx_push: The flag of rx push mode * @cqe_size: Size of TX/RX completion queue event */ struct kernel_ethtool_ringparam { u32 rx_buf_len; u8 tcp_data_split; u8 tx_push; + u8 rx_push; u32 cqe_size; }; @@ -87,11 +89,13 @@ struct kernel_ethtool_ringparam { * @ETHTOOL_RING_USE_RX_BUF_LEN: capture for setting rx_buf_len * @ETHTOOL_RING_USE_CQE_SIZE: capture for setting cqe_size * @ETHTOOL_RING_USE_TX_PUSH: capture for setting tx_push + * @ETHTOOL_RING_USE_RX_PUSH: capture for setting rx_push */ enum ethtool_supported_ring_param { ETHTOOL_RING_USE_RX_BUF_LEN = BIT(0), ETHTOOL_RING_USE_CQE_SIZE = BIT(1), ETHTOOL_RING_USE_TX_PUSH = BIT(2), + ETHTOOL_RING_USE_RX_PUSH = BIT(3), }; #define __ETH_RSS_HASH_BIT(bit) ((u32)1 << (bit)) diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index ffb073c0dbb4..d39ce21381c5 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -356,6 +356,7 @@ enum { ETHTOOL_A_RINGS_TCP_DATA_SPLIT, /* u8 */ ETHTOOL_A_RINGS_CQE_SIZE, /* u32 */ ETHTOOL_A_RINGS_TX_PUSH, /* u8 */ + ETHTOOL_A_RINGS_RX_PUSH, /* u8 */ /* add new constants above here */ __ETHTOOL_A_RINGS_CNT, diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index ae0732460e88..f7b189ed96b2 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -413,7 +413,7 @@ extern const struct nla_policy ethnl_features_set_policy[ETHTOOL_A_FEATURES_WANT extern const struct nla_policy ethnl_privflags_get_policy[ETHTOOL_A_PRIVFLAGS_HEADER + 1]; extern const struct nla_policy ethnl_privflags_set_policy[ETHTOOL_A_PRIVFLAGS_FLAGS + 1]; extern const struct nla_policy ethnl_rings_get_policy[ETHTOOL_A_RINGS_HEADER + 1]; -extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_TX_PUSH + 1]; +extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_RX_PUSH + 1]; extern const struct nla_policy ethnl_channels_get_policy[ETHTOOL_A_CHANNELS_HEADER + 1]; extern const struct nla_policy ethnl_channels_set_policy[ETHTOOL_A_CHANNELS_COMBINED_COUNT + 1]; extern const struct nla_policy ethnl_coalesce_get_policy[ETHTOOL_A_COALESCE_HEADER + 1]; diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c index 2a2d3539630c..f358cd57d094 100644 --- a/net/ethtool/rings.c +++ b/net/ethtool/rings.c @@ -56,7 +56,8 @@ static int rings_reply_size(const struct ethnl_req_info *req_base, nla_total_size(sizeof(u32)) + /* _RINGS_RX_BUF_LEN */ nla_total_size(sizeof(u8)) + /* _RINGS_TCP_DATA_SPLIT */ nla_total_size(sizeof(u32) + /* _RINGS_CQE_SIZE */ - nla_total_size(sizeof(u8))); /* _RINGS_TX_PUSH */ + nla_total_size(sizeof(u8)) + /* _RINGS_TX_PUSH */ + nla_total_size(sizeof(u8))); /* _RINGS_RX_PUSH */ } static int rings_fill_reply(struct sk_buff *skb, @@ -96,7 +97,8 @@ static int rings_fill_reply(struct sk_buff *skb, kr->tcp_data_split))) || (kr->cqe_size && (nla_put_u32(skb, ETHTOOL_A_RINGS_CQE_SIZE, kr->cqe_size))) || - nla_put_u8(skb, ETHTOOL_A_RINGS_TX_PUSH, !!kr->tx_push)) + nla_put_u8(skb, ETHTOOL_A_RINGS_TX_PUSH, !!kr->tx_push) || + nla_put_u8(skb, ETHTOOL_A_RINGS_RX_PUSH, !!kr->rx_push)) return -EMSGSIZE; return 0; @@ -114,6 +116,7 @@ const struct nla_policy ethnl_rings_set_policy[] = { [ETHTOOL_A_RINGS_RX_BUF_LEN] = NLA_POLICY_MIN(NLA_U32, 1), [ETHTOOL_A_RINGS_CQE_SIZE] = NLA_POLICY_MIN(NLA_U32, 1), [ETHTOOL_A_RINGS_TX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), + [ETHTOOL_A_RINGS_RX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), }; static int @@ -147,6 +150,14 @@ ethnl_set_rings_validate(struct ethnl_req_info *req_info, return -EOPNOTSUPP; } + if (tb[ETHTOOL_A_RINGS_RX_PUSH] && + !(ops->supported_ring_params & ETHTOOL_RING_USE_RX_PUSH)) { + NL_SET_ERR_MSG_ATTR(info->extack, + tb[ETHTOOL_A_RINGS_RX_PUSH], + "setting rx push not supported"); + return -EOPNOTSUPP; + } + return ops->get_ringparam && ops->set_ringparam ? 1 : -EOPNOTSUPP; } @@ -176,6 +187,8 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) tb[ETHTOOL_A_RINGS_CQE_SIZE], &mod); ethnl_update_u8(&kernel_ringparam.tx_push, tb[ETHTOOL_A_RINGS_TX_PUSH], &mod); + ethnl_update_u8(&kernel_ringparam.rx_push, + tb[ETHTOOL_A_RINGS_RX_PUSH], &mod); if (!mod) return 0; -- cgit v1.2.3 From 30c89bad3ea2ef7a2d4686f9c3cc08420fe627bc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 10 Feb 2023 18:47:07 +0000 Subject: ipv6: icmp6: add drop reason support to icmpv6_notify() Accurately reports what happened in icmpv6_notify() when handling a packet. This makes use of the new IPV6_BAD_EXTHDR drop reason. Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- include/net/ipv6.h | 3 ++- net/ipv6/icmp.c | 25 +++++++++++++++++-------- 2 files changed, 19 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 03f3af02a9a6..7332296eca44 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -436,7 +436,8 @@ static inline void fl6_sock_release(struct ip6_flowlabel *fl) atomic_dec(&fl->users); } -void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info); +enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type, + u8 code, __be32 info); void icmpv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6, struct icmp6hdr *thdr, int len); diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index c9346515e24d..40bb5dedac09 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -813,16 +813,19 @@ out_bh_enable: local_bh_enable(); } -void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info) +enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type, + u8 code, __be32 info) { struct inet6_skb_parm *opt = IP6CB(skb); + struct net *net = dev_net(skb->dev); const struct inet6_protocol *ipprot; + enum skb_drop_reason reason; int inner_offset; __be16 frag_off; u8 nexthdr; - struct net *net = dev_net(skb->dev); - if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) + reason = pskb_may_pull_reason(skb, sizeof(struct ipv6hdr)); + if (reason != SKB_NOT_DROPPED_YET) goto out; seg6_icmp_srh(skb, opt); @@ -832,14 +835,17 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info) /* now skip over extension headers */ inner_offset = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off); - if (inner_offset < 0) + if (inner_offset < 0) { + SKB_DR_SET(reason, IPV6_BAD_EXTHDR); goto out; + } } else { inner_offset = sizeof(struct ipv6hdr); } /* Checkin header including 8 bytes of inner protocol header. */ - if (!pskb_may_pull(skb, inner_offset+8)) + reason = pskb_may_pull_reason(skb, inner_offset + 8); + if (reason != SKB_NOT_DROPPED_YET) goto out; /* BUGGG_FUTURE: we should try to parse exthdrs in this packet. @@ -854,10 +860,11 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info) ipprot->err_handler(skb, opt, type, code, inner_offset, info); raw6_icmp_error(skb, nexthdr, type, code, inner_offset, info); - return; + return SKB_CONSUMED; out: __ICMP6_INC_STATS(net, __in6_dev_get(skb->dev), ICMP6_MIB_INERRORS); + return reason; } /* @@ -953,7 +960,8 @@ static int icmpv6_rcv(struct sk_buff *skb) case ICMPV6_DEST_UNREACH: case ICMPV6_TIME_EXCEED: case ICMPV6_PARAMPROB: - icmpv6_notify(skb, type, hdr->icmp6_code, hdr->icmp6_mtu); + reason = icmpv6_notify(skb, type, hdr->icmp6_code, + hdr->icmp6_mtu); break; case NDISC_ROUTER_SOLICITATION: @@ -995,7 +1003,8 @@ static int icmpv6_rcv(struct sk_buff *skb) * must pass to upper level */ - icmpv6_notify(skb, type, hdr->icmp6_code, hdr->icmp6_mtu); + reason = icmpv6_notify(skb, type, hdr->icmp6_code, + hdr->icmp6_mtu); } /* until the v6 path can be better sorted assume failure and -- cgit v1.2.3 From 545dbcd124b02c9dc93c8a5894c71d682effc3e6 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 10 Feb 2023 18:47:08 +0000 Subject: ipv6: icmp6: add drop reason support to ndisc_rcv() Creates three new drop reasons: SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc). SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit. SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code. Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- include/net/dropreason.h | 9 +++++++++ include/net/ndisc.h | 2 +- net/ipv6/icmp.c | 2 +- net/ipv6/ndisc.c | 13 +++++++------ 4 files changed, 18 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 6c41e535175c..ef3f65d135d3 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -73,6 +73,9 @@ FN(FRAG_TOO_FAR) \ FN(TCP_MINTTL) \ FN(IPV6_BAD_EXTHDR) \ + FN(IPV6_NDISC_FRAG) \ + FN(IPV6_NDISC_HOP_LIMIT) \ + FN(IPV6_NDISC_BAD_CODE) \ FNe(MAX) /** @@ -321,6 +324,12 @@ enum skb_drop_reason { SKB_DROP_REASON_TCP_MINTTL, /** @SKB_DROP_REASON_IPV6_BAD_EXTHDR: Bad IPv6 extension header. */ SKB_DROP_REASON_IPV6_BAD_EXTHDR, + /** @SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc). */ + SKB_DROP_REASON_IPV6_NDISC_FRAG, + /** @SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit. */ + SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT, + /** @SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code. */ + SKB_DROP_REASON_IPV6_NDISC_BAD_CODE, /** * @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be * used as a real 'reason' diff --git a/include/net/ndisc.h b/include/net/ndisc.h index da7eec8669ec..07e5168cdaf9 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -445,7 +445,7 @@ int ndisc_late_init(void); void ndisc_late_cleanup(void); void ndisc_cleanup(void); -int ndisc_rcv(struct sk_buff *skb); +enum skb_drop_reason ndisc_rcv(struct sk_buff *skb); struct sk_buff *ndisc_ns_create(struct net_device *dev, const struct in6_addr *solicit, const struct in6_addr *saddr, u64 nonce); diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 40bb5dedac09..f32bc98155bf 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -969,7 +969,7 @@ static int icmpv6_rcv(struct sk_buff *skb) case NDISC_NEIGHBOUR_SOLICITATION: case NDISC_NEIGHBOUR_ADVERTISEMENT: case NDISC_REDIRECT: - ndisc_rcv(skb); + reason = ndisc_rcv(skb); break; case ICMPV6_MGM_QUERY: diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 3a553494ff16..9548b5a44714 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1804,15 +1804,16 @@ static bool ndisc_suppress_frag_ndisc(struct sk_buff *skb) return false; } -int ndisc_rcv(struct sk_buff *skb) +enum skb_drop_reason ndisc_rcv(struct sk_buff *skb) { struct nd_msg *msg; + SKB_DR(reason); if (ndisc_suppress_frag_ndisc(skb)) - return 0; + return SKB_DROP_REASON_IPV6_NDISC_FRAG; if (skb_linearize(skb)) - return 0; + return SKB_DROP_REASON_NOMEM; msg = (struct nd_msg *)skb_transport_header(skb); @@ -1821,13 +1822,13 @@ int ndisc_rcv(struct sk_buff *skb) if (ipv6_hdr(skb)->hop_limit != 255) { ND_PRINTK(2, warn, "NDISC: invalid hop-limit: %d\n", ipv6_hdr(skb)->hop_limit); - return 0; + return SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT; } if (msg->icmph.icmp6_code != 0) { ND_PRINTK(2, warn, "NDISC: invalid ICMPv6 code: %d\n", msg->icmph.icmp6_code); - return 0; + return SKB_DROP_REASON_IPV6_NDISC_BAD_CODE; } switch (msg->icmph.icmp6_type) { @@ -1853,7 +1854,7 @@ int ndisc_rcv(struct sk_buff *skb) break; } - return 0; + return reason; } static int ndisc_netdev_event(struct notifier_block *this, unsigned long event, void *ptr) -- cgit v1.2.3 From 3d9c361713f24f3f55b9622d18d32add1910e6ba Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 19 Jan 2023 14:45:47 +0100 Subject: wifi: cfg80211: trace: remove MAC_PR_{FMT,ARG} With %pM, this really is no longer needed, and actually longer to spell out. Remove it. Signed-off-by: Johannes Berg --- net/wireless/trace.h | 244 +++++++++++++++++++++++++-------------------------- 1 file changed, 120 insertions(+), 124 deletions(-) (limited to 'net') diff --git a/net/wireless/trace.h b/net/wireless/trace.h index a405c3edbc47..9660fd94cd92 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -19,8 +19,6 @@ else \ eth_zero_addr(__entry->entry_mac); \ } while (0) -#define MAC_PR_FMT "%pM" -#define MAC_PR_ARG(entry_mac) (__entry->entry_mac) #define MAXNAME 32 #define WIPHY_ENTRY __array(char, wiphy_name, 32) @@ -454,10 +452,10 @@ DECLARE_EVENT_CLASS(key_handle, __entry->pairwise = pairwise; ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " - "key_index: %u, pairwise: %s, mac addr: " MAC_PR_FMT, + "key_index: %u, pairwise: %s, mac addr: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index, BOOL_TO_STR(__entry->pairwise), - MAC_PR_ARG(mac_addr)) + __entry->mac_addr) ); DEFINE_EVENT(key_handle, rdev_get_key, @@ -496,10 +494,10 @@ TRACE_EVENT(rdev_add_key, ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, " "key_index: %u, mode: %u, pairwise: %s, " - "mac addr: " MAC_PR_FMT, + "mac addr: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, __entry->key_index, __entry->mode, - BOOL_TO_STR(__entry->pairwise), MAC_PR_ARG(mac_addr)) + BOOL_TO_STR(__entry->pairwise), __entry->mac_addr) ); TRACE_EVENT(rdev_set_default_key, @@ -813,11 +811,11 @@ DECLARE_EVENT_CLASS(station_add_change, __entry->opmode_notif_used = params->link_sta_params.opmode_notif_used; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", station flags mask: %u, station flags set: %u, " "station modify mask: %u, listen interval: %d, aid: %u, " "plink action: %u, plink state: %u, uapsd queues: %u, vlan:%s", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(sta_mac), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac, __entry->sta_flags_mask, __entry->sta_flags_set, __entry->sta_modify_mask, __entry->listen_interval, __entry->aid, __entry->plink_action, __entry->plink_state, @@ -849,8 +847,8 @@ DECLARE_EVENT_CLASS(wiphy_netdev_mac_evt, NETDEV_ASSIGN; MAC_ASSIGN(sta_mac, mac); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", mac: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(sta_mac)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", mac: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac) ); DECLARE_EVENT_CLASS(station_del, @@ -871,9 +869,9 @@ DECLARE_EVENT_CLASS(station_del, __entry->subtype = params->subtype; __entry->reason_code = params->reason_code; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", subtype: %u, reason_code: %u", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(sta_mac), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac, __entry->subtype, __entry->reason_code) ); @@ -909,8 +907,8 @@ TRACE_EVENT(rdev_dump_station, MAC_ASSIGN(sta_mac, mac); __entry->idx = _idx; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: " MAC_PR_FMT ", idx: %d", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(sta_mac), + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM, idx: %d", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->sta_mac, __entry->idx) ); @@ -947,9 +945,9 @@ DECLARE_EVENT_CLASS(mpath_evt, MAC_ASSIGN(dst, dst); MAC_ASSIGN(next_hop, next_hop); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", destination: " MAC_PR_FMT ", next hop: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(dst), - MAC_PR_ARG(next_hop)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", destination: %pM, next hop: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dst, + __entry->next_hop) ); DEFINE_EVENT(mpath_evt, rdev_add_mpath, @@ -988,10 +986,9 @@ TRACE_EVENT(rdev_dump_mpath, MAC_ASSIGN(next_hop, next_hop); __entry->idx = _idx; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d, destination: " - MAC_PR_FMT ", next hop: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx, MAC_PR_ARG(dst), - MAC_PR_ARG(next_hop)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d, destination: %pM, next hop: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx, __entry->dst, + __entry->next_hop) ); TRACE_EVENT(rdev_get_mpp, @@ -1010,9 +1007,9 @@ TRACE_EVENT(rdev_get_mpp, MAC_ASSIGN(dst, dst); MAC_ASSIGN(mpp, mpp); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", destination: " MAC_PR_FMT - ", mpp: " MAC_PR_FMT, WIPHY_PR_ARG, NETDEV_PR_ARG, - MAC_PR_ARG(dst), MAC_PR_ARG(mpp)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", destination: %pM" + ", mpp: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, + __entry->dst, __entry->mpp) ); TRACE_EVENT(rdev_dump_mpp, @@ -1033,10 +1030,9 @@ TRACE_EVENT(rdev_dump_mpp, MAC_ASSIGN(mpp, mpp); __entry->idx = _idx; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d, destination: " - MAC_PR_FMT ", mpp: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx, MAC_PR_ARG(dst), - MAC_PR_ARG(mpp)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", index: %d, destination: %pM, mpp: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->idx, __entry->dst, + __entry->mpp) ); TRACE_EVENT(rdev_return_int_mpath_info, @@ -1243,9 +1239,9 @@ TRACE_EVENT(rdev_auth, eth_zero_addr(__entry->bssid); __entry->auth_type = req->auth_type; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", auth type: %d, bssid: " MAC_PR_FMT, + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", auth type: %d, bssid: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->auth_type, - MAC_PR_ARG(bssid)) + __entry->bssid) ); TRACE_EVENT(rdev_assoc, @@ -1294,10 +1290,10 @@ TRACE_EVENT(rdev_assoc, memcpy(__get_dynamic_array(fils_nonces), req->fils_nonces, 2 * FILS_NONCE_LEN); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT - ", previous bssid: " MAC_PR_FMT ", use mfp: %s, flags: %u", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(bssid), - MAC_PR_ARG(prev_bssid), BOOL_TO_STR(__entry->use_mfp), + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" + ", previous bssid: %pM, use mfp: %s, flags: %u", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, + __entry->prev_bssid, BOOL_TO_STR(__entry->use_mfp), __entry->flags) ); @@ -1317,8 +1313,8 @@ TRACE_EVENT(rdev_deauth, MAC_ASSIGN(bssid, req->bssid); __entry->reason_code = req->reason_code; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT ", reason: %u", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(bssid), + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM, reason: %u", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->reason_code) ); @@ -1340,9 +1336,9 @@ TRACE_EVENT(rdev_disassoc, __entry->reason_code = req->reason_code; __entry->local_state_change = req->local_state_change; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", reason: %u, local state change: %s", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(bssid), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->reason_code, BOOL_TO_STR(__entry->local_state_change)) ); @@ -1413,12 +1409,12 @@ TRACE_EVENT(rdev_connect, __entry->flags = sme->flags; MAC_ASSIGN(prev_bssid, sme->prev_bssid); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", ssid: %s, auth type: %d, privacy: %s, wpa versions: %u, " - "flags: %u, previous bssid: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(bssid), __entry->ssid, + "flags: %u, previous bssid: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->ssid, __entry->auth_type, BOOL_TO_STR(__entry->privacy), - __entry->wpa_versions, __entry->flags, MAC_PR_ARG(prev_bssid)) + __entry->wpa_versions, __entry->flags, __entry->prev_bssid) ); TRACE_EVENT(rdev_update_connect_params, @@ -1542,8 +1538,8 @@ TRACE_EVENT(rdev_join_ibss, memset(__entry->ssid, 0, IEEE80211_MAX_SSID_LEN + 1); memcpy(__entry->ssid, params->ssid, params->ssid_len); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT ", ssid: %s", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(bssid), __entry->ssid) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM, ssid: %s", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->ssid) ); TRACE_EVENT(rdev_join_ocb, @@ -1664,9 +1660,9 @@ TRACE_EVENT(rdev_set_bitrate_mask, __entry->link_id = link_id; MAC_ASSIGN(peer, peer); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, peer: " MAC_PR_FMT, + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", link_id: %d, peer: %pM", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->link_id, - MAC_PR_ARG(peer)) + __entry->peer) ); TRACE_EVENT(rdev_update_mgmt_frame_registrations, @@ -1810,10 +1806,10 @@ TRACE_EVENT(rdev_tdls_mgmt, __entry->initiator = initiator; memcpy(__get_dynamic_array(buf), buf, len); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT ", action_code: %u, " + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, action_code: %u, " "dialog_token: %u, status_code: %u, peer_capability: %u " "initiator: %s buf: %#.2x ", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->action_code, __entry->dialog_token, __entry->status_code, __entry->peer_capability, BOOL_TO_STR(__entry->initiator), @@ -1893,8 +1889,8 @@ TRACE_EVENT(rdev_tdls_oper, MAC_ASSIGN(peer, peer); __entry->oper = oper; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT ", oper: %d", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), __entry->oper) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, oper: %d", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->oper) ); DECLARE_EVENT_CLASS(rdev_pmksa, @@ -1911,8 +1907,8 @@ DECLARE_EVENT_CLASS(rdev_pmksa, NETDEV_ASSIGN; MAC_ASSIGN(bssid, pmksa->bssid); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(bssid)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid) ); TRACE_EVENT(rdev_probe_client, @@ -1929,8 +1925,8 @@ TRACE_EVENT(rdev_probe_client, NETDEV_ASSIGN; MAC_ASSIGN(peer, peer); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer) ); DEFINE_EVENT(rdev_pmksa, rdev_set_pmksa, @@ -2051,9 +2047,9 @@ TRACE_EVENT(rdev_tx_control_port, __entry->unencrypted = unencrypted; __entry->link_id = link_id; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT "," + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM," " proto: 0x%x, unencrypted: %s, link: %d", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(dest), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dest, be16_to_cpu(__entry->proto), BOOL_TO_STR(__entry->unencrypted), __entry->link_id) @@ -2392,8 +2388,8 @@ TRACE_EVENT(rdev_add_tx_ts, __entry->user_prio = user_prio; __entry->admitted_time = admitted_time; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT ", TSID %d, UP %d, time %d", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, TSID %d, UP %d, time %d", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->tsid, __entry->user_prio, __entry->admitted_time) ); @@ -2413,8 +2409,8 @@ TRACE_EVENT(rdev_del_tx_ts, MAC_ASSIGN(peer, peer); __entry->tsid = tsid; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT ", TSID %d", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), __entry->tsid) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM, TSID %d", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->tsid) ); TRACE_EVENT(rdev_tdls_channel_switch, @@ -2435,9 +2431,9 @@ TRACE_EVENT(rdev_tdls_channel_switch, MAC_ASSIGN(addr, addr); CHAN_DEF_ASSIGN(chandef); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM" " oper class %d, " CHAN_DEF_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(addr), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->addr, __entry->oper_class, CHAN_DEF_PR_ARG) ); @@ -2455,8 +2451,8 @@ TRACE_EVENT(rdev_tdls_cancel_channel_switch, NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(addr)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->addr) ); TRACE_EVENT(rdev_set_pmk, @@ -2488,9 +2484,9 @@ TRACE_EVENT(rdev_set_pmk, pmk_conf->pmk_r0_name ? WLAN_PMK_NAME_LEN : 0); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM" "pmk_len=%u, pmk: %s pmk_r0_name: %s", WIPHY_PR_ARG, - NETDEV_PR_ARG, MAC_PR_ARG(aa), __entry->pmk_len, + NETDEV_PR_ARG, __entry->aa, __entry->pmk_len, __print_array(__get_dynamic_array(pmk), __get_dynamic_array_len(pmk), 1), __entry->pmk_r0_name_len ? @@ -2515,8 +2511,8 @@ TRACE_EVENT(rdev_del_pmk, MAC_ASSIGN(aa, aa); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(aa)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->aa) ); TRACE_EVENT(rdev_external_auth, @@ -2537,7 +2533,7 @@ TRACE_EVENT(rdev_external_auth, params->ssid.ssid_len); __entry->status = params->status; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" ", ssid: %s, status: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, __entry->ssid, __entry->status) ); @@ -2720,8 +2716,8 @@ TRACE_EVENT(rdev_update_owe_info, __entry->status = owe_info->status; memcpy(__get_dynamic_array(ie), owe_info->ie, owe_info->ie_len);), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: " MAC_PR_FMT - " status %d", WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM" + " status %d", WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->status) ); @@ -2739,8 +2735,8 @@ TRACE_EVENT(rdev_probe_mesh_link, NETDEV_ASSIGN; MAC_ASSIGN(dest, dest); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(dest)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->dest) ); TRACE_EVENT(rdev_set_tid_config, @@ -2757,8 +2753,8 @@ TRACE_EVENT(rdev_set_tid_config, NETDEV_ASSIGN; MAC_ASSIGN(peer, tid_conf->peer); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer) ); TRACE_EVENT(rdev_reset_tid_config, @@ -2777,8 +2773,8 @@ TRACE_EVENT(rdev_reset_tid_config, MAC_ASSIGN(peer, peer); __entry->tids = tids; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: " MAC_PR_FMT ", tids: 0x%x", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), __entry->tids) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM, tids: 0x%x", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->tids) ); TRACE_EVENT(rdev_set_sar_specs, @@ -2881,8 +2877,8 @@ DECLARE_EVENT_CLASS(cfg80211_netdev_mac_evt, NETDEV_ASSIGN; MAC_ASSIGN(macaddr, macaddr); ), - TP_printk(NETDEV_PR_FMT ", mac: " MAC_PR_FMT, - NETDEV_PR_ARG, MAC_PR_ARG(macaddr)) + TP_printk(NETDEV_PR_FMT ", mac: %pM", + NETDEV_PR_ARG, __entry->macaddr) ); DEFINE_EVENT(cfg80211_netdev_mac_evt, cfg80211_notify_new_peer_candidate, @@ -2920,8 +2916,8 @@ TRACE_EVENT(cfg80211_send_rx_assoc, MAC_ASSIGN(ap_addr, data->ap_mld_addr ?: data->links[0].bss->bssid); ), - TP_printk(NETDEV_PR_FMT ", " MAC_PR_FMT, - NETDEV_PR_ARG, MAC_PR_ARG(ap_addr)) + TP_printk(NETDEV_PR_FMT ", %pM", + NETDEV_PR_ARG, __entry->ap_addr) ); DECLARE_EVENT_CLASS(netdev_frame_event, @@ -2981,8 +2977,8 @@ DECLARE_EVENT_CLASS(netdev_mac_evt, NETDEV_ASSIGN; MAC_ASSIGN(mac, mac) ), - TP_printk(NETDEV_PR_FMT ", mac: " MAC_PR_FMT, - NETDEV_PR_ARG, MAC_PR_ARG(mac)) + TP_printk(NETDEV_PR_FMT ", mac: %pM", + NETDEV_PR_ARG, __entry->mac) ); DEFINE_EVENT(netdev_mac_evt, cfg80211_send_auth_timeout, @@ -3004,8 +3000,8 @@ TRACE_EVENT(cfg80211_send_assoc_failure, MAC_ASSIGN(ap_addr, data->ap_mld_addr ?: data->bss[0]->bssid); __entry->timeout = data->timeout; ), - TP_printk(NETDEV_PR_FMT ", mac: " MAC_PR_FMT ", timeout: %d", - NETDEV_PR_ARG, MAC_PR_ARG(ap_addr), __entry->timeout) + TP_printk(NETDEV_PR_FMT ", mac: %pM, timeout: %d", + NETDEV_PR_ARG, __entry->ap_addr, __entry->timeout) ); TRACE_EVENT(cfg80211_michael_mic_failure, @@ -3027,8 +3023,8 @@ TRACE_EVENT(cfg80211_michael_mic_failure, if (tsc) memcpy(__entry->tsc, tsc, 6); ), - TP_printk(NETDEV_PR_FMT ", " MAC_PR_FMT ", key type: %d, key id: %d, tsc: %pm", - NETDEV_PR_ARG, MAC_PR_ARG(addr), __entry->key_type, + TP_printk(NETDEV_PR_FMT ", %pM, key type: %d, key id: %d, tsc: %pm", + NETDEV_PR_ARG, __entry->addr, __entry->key_type, __entry->key_id, __entry->tsc) ); @@ -3104,8 +3100,8 @@ TRACE_EVENT(cfg80211_new_sta, MAC_ASSIGN(mac_addr, mac_addr); SINFO_ASSIGN; ), - TP_printk(NETDEV_PR_FMT ", " MAC_PR_FMT, - NETDEV_PR_ARG, MAC_PR_ARG(mac_addr)) + TP_printk(NETDEV_PR_FMT ", %pM", + NETDEV_PR_ARG, __entry->mac_addr) ); DEFINE_EVENT(cfg80211_netdev_mac_evt, cfg80211_del_sta, @@ -3182,8 +3178,8 @@ TRACE_EVENT(cfg80211_rx_control_port, __entry->proto = be16_to_cpu(skb->protocol); __entry->unencrypted = unencrypted; ), - TP_printk(NETDEV_PR_FMT ", len=%d, " MAC_PR_FMT ", proto: 0x%x, unencrypted: %s", - NETDEV_PR_ARG, __entry->len, MAC_PR_ARG(from), + TP_printk(NETDEV_PR_FMT ", len=%d, %pM, proto: 0x%x, unencrypted: %s", + NETDEV_PR_ARG, __entry->len, __entry->from, __entry->proto, BOOL_TO_STR(__entry->unencrypted)) ); @@ -3324,7 +3320,7 @@ DECLARE_EVENT_CLASS(cfg80211_rx_evt, NETDEV_ASSIGN; MAC_ASSIGN(addr, addr); ), - TP_printk(NETDEV_PR_FMT ", " MAC_PR_FMT, NETDEV_PR_ARG, MAC_PR_ARG(addr)) + TP_printk(NETDEV_PR_FMT ", %pM", NETDEV_PR_ARG, __entry->addr) ); DEFINE_EVENT(cfg80211_rx_evt, cfg80211_rx_spurious_frame, @@ -3351,8 +3347,8 @@ TRACE_EVENT(cfg80211_ibss_joined, MAC_ASSIGN(bssid, bssid); CHAN_ASSIGN(channel); ), - TP_printk(NETDEV_PR_FMT ", bssid: " MAC_PR_FMT ", " CHAN_PR_FMT, - NETDEV_PR_ARG, MAC_PR_ARG(bssid), CHAN_PR_ARG) + TP_printk(NETDEV_PR_FMT ", bssid: %pM, " CHAN_PR_FMT, + NETDEV_PR_ARG, __entry->bssid, CHAN_PR_ARG) ); TRACE_EVENT(cfg80211_probe_status, @@ -3371,8 +3367,8 @@ TRACE_EVENT(cfg80211_probe_status, __entry->cookie = cookie; __entry->acked = acked; ), - TP_printk(NETDEV_PR_FMT " addr:" MAC_PR_FMT ", cookie: %llu, acked: %s", - NETDEV_PR_ARG, MAC_PR_ARG(addr), __entry->cookie, + TP_printk(NETDEV_PR_FMT " addr:%pM, cookie: %llu, acked: %s", + NETDEV_PR_ARG, __entry->addr, __entry->cookie, BOOL_TO_STR(__entry->acked)) ); @@ -3389,8 +3385,8 @@ TRACE_EVENT(cfg80211_cqm_pktloss_notify, MAC_ASSIGN(peer, peer); __entry->num_packets = num_packets; ), - TP_printk(NETDEV_PR_FMT ", peer: " MAC_PR_FMT ", num of lost packets: %u", - NETDEV_PR_ARG, MAC_PR_ARG(peer), __entry->num_packets) + TP_printk(NETDEV_PR_FMT ", peer: %pM, num of lost packets: %u", + NETDEV_PR_ARG, __entry->peer, __entry->num_packets) ); DEFINE_EVENT(cfg80211_netdev_mac_evt, cfg80211_gtk_rekey_notify, @@ -3414,8 +3410,8 @@ TRACE_EVENT(cfg80211_pmksa_candidate_notify, MAC_ASSIGN(bssid, bssid); __entry->preauth = preauth; ), - TP_printk(NETDEV_PR_FMT ", index:%d, bssid: " MAC_PR_FMT ", pre auth: %s", - NETDEV_PR_ARG, __entry->index, MAC_PR_ARG(bssid), + TP_printk(NETDEV_PR_FMT ", index:%d, bssid: %pM, pre auth: %s", + NETDEV_PR_ARG, __entry->index, __entry->bssid, BOOL_TO_STR(__entry->preauth)) ); @@ -3455,8 +3451,8 @@ TRACE_EVENT(cfg80211_tdls_oper_request, __entry->oper = oper; __entry->reason_code = reason_code; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: " MAC_PR_FMT ", oper: %d, reason_code %u", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer), __entry->oper, + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM, oper: %d, reason_code %u", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, __entry->oper, __entry->reason_code) ); @@ -3494,10 +3490,10 @@ TRACE_EVENT(cfg80211_scan_done, MAC_ASSIGN(tsf_bssid, info->tsf_bssid); } ), - TP_printk("aborted: %s, scan start (TSF): %llu, tsf_bssid: " MAC_PR_FMT, + TP_printk("aborted: %s, scan start (TSF): %llu, tsf_bssid: %pM", BOOL_TO_STR(__entry->aborted), (unsigned long long)__entry->scan_start_tsf, - MAC_PR_ARG(tsf_bssid)) + __entry->tsf_bssid) ); DECLARE_EVENT_CLASS(wiphy_id_evt, @@ -3546,9 +3542,9 @@ TRACE_EVENT(cfg80211_get_bss, __entry->bss_type = bss_type; __entry->privacy = privacy; ), - TP_printk(WIPHY_PR_FMT ", " CHAN_PR_FMT ", " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " CHAN_PR_FMT ", %pM" ", buf: %#.2x, bss_type: %d, privacy: %d", - WIPHY_PR_ARG, CHAN_PR_ARG, MAC_PR_ARG(bssid), + WIPHY_PR_ARG, CHAN_PR_ARG, __entry->bssid, ((u8 *)__get_dynamic_array(ssid))[0], __entry->bss_type, __entry->privacy) ); @@ -3579,11 +3575,11 @@ TRACE_EVENT(cfg80211_inform_bss_frame, MAC_ASSIGN(parent_bssid, data->parent_bssid); ), TP_printk(WIPHY_PR_FMT ", " CHAN_PR_FMT - "(scan_width: %d) signal: %d, tsb:%llu, detect_tsf:%llu, tsf_bssid: " - MAC_PR_FMT, WIPHY_PR_ARG, CHAN_PR_ARG, __entry->scan_width, + "(scan_width: %d) signal: %d, tsb:%llu, detect_tsf:%llu, tsf_bssid: %pM", + WIPHY_PR_ARG, CHAN_PR_ARG, __entry->scan_width, __entry->signal, (unsigned long long)__entry->ts_boottime, (unsigned long long)__entry->parent_tsf, - MAC_PR_ARG(parent_bssid)) + __entry->parent_bssid) ); DECLARE_EVENT_CLASS(cfg80211_bss_evt, @@ -3597,7 +3593,7 @@ DECLARE_EVENT_CLASS(cfg80211_bss_evt, MAC_ASSIGN(bssid, pub->bssid); CHAN_ASSIGN(pub->channel); ), - TP_printk(MAC_PR_FMT ", " CHAN_PR_FMT, MAC_PR_ARG(bssid), CHAN_PR_ARG) + TP_printk("%pM, " CHAN_PR_FMT, __entry->bssid, CHAN_PR_ARG) ); DEFINE_EVENT(cfg80211_bss_evt, cfg80211_return_bss, @@ -3689,8 +3685,8 @@ TRACE_EVENT(cfg80211_ft_event, memcpy(__get_dynamic_array(ric_ies), ft_event->ric_ies, ft_event->ric_ies_len); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", target_ap: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(target_ap)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", target_ap: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->target_ap) ); TRACE_EVENT(cfg80211_stop_iface, @@ -3724,10 +3720,10 @@ TRACE_EVENT(cfg80211_pmsr_report, __entry->cookie = cookie; MAC_ASSIGN(addr, addr); ), - TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie:%lld, " MAC_PR_FMT, + TP_printk(WIPHY_PR_FMT ", " WDEV_PR_FMT ", cookie:%lld, %pM", WIPHY_PR_ARG, WDEV_PR_ARG, (unsigned long long)__entry->cookie, - MAC_PR_ARG(addr)) + __entry->addr) ); TRACE_EVENT(cfg80211_pmsr_complete, @@ -3761,8 +3757,8 @@ TRACE_EVENT(cfg80211_update_owe_info_event, MAC_ASSIGN(peer, owe_info->peer); memcpy(__get_dynamic_array(ie), owe_info->ie, owe_info->ie_len);), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: " MAC_PR_FMT, - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(peer)) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer) ); TRACE_EVENT(cfg80211_bss_color_notify, @@ -3800,8 +3796,8 @@ TRACE_EVENT(cfg80211_assoc_comeback, MAC_ASSIGN(ap_addr, ap_addr); __entry->timeout = timeout; ), - TP_printk(WDEV_PR_FMT ", " MAC_PR_FMT ", timeout: %u TUs", - WDEV_PR_ARG, MAC_PR_ARG(ap_addr), __entry->timeout) + TP_printk(WDEV_PR_FMT ", %pM, timeout: %u TUs", + WDEV_PR_ARG, __entry->ap_addr, __entry->timeout) ); DECLARE_EVENT_CLASS(link_station_add_mod, @@ -3859,10 +3855,10 @@ DECLARE_EVENT_CLASS(link_station_add_mod, memcpy(__get_dynamic_array(eht_capa), params->eht_capa, params->eht_capa_len); ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: " MAC_PR_FMT - ", link mac: " MAC_PR_FMT ", link id: %u", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(mld_mac), - MAC_PR_ARG(link_mac), __entry->link_id) + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" + ", link mac: %pM, link id: %u", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->mld_mac, + __entry->link_mac, __entry->link_id) ); DEFINE_EVENT(link_station_add_mod, rdev_add_link_station, @@ -3895,9 +3891,9 @@ TRACE_EVENT(rdev_del_link_station, memcpy(__entry->mld_mac, params->mld_mac, 6); __entry->link_id = params->link_id; ), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: " MAC_PR_FMT + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", station mac: %pM" ", link id: %u", - WIPHY_PR_ARG, NETDEV_PR_ARG, MAC_PR_ARG(mld_mac), + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->mld_mac, __entry->link_id) ); -- cgit v1.2.3 From 8f2ca70c07f4cee68ed6297c1876c28b73c9af21 Mon Sep 17 00:00:00 2001 From: Oz Shlomo Date: Sun, 12 Feb 2023 15:25:12 +0200 Subject: net/sched: optimize action stats api calls Currently the hw action stats update is called from tcf_exts_hw_stats_update, when a tc filter is dumped, and from tcf_action_copy_stats, when a hw action is dumped. However, the tcf_action_copy_stats is also called from tcf_action_dump. As such, the hw action stats update cb is called 3 times for every tc flower filter dump. Move the tc action hw stats update from tcf_action_copy_stats to tcf_dump_walker to update the hw action stats when tc action is dumped. Signed-off-by: Oz Shlomo Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- net/sched/act_api.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/sched/act_api.c b/net/sched/act_api.c index cd09ef49df22..f4fa6d7340f8 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -539,6 +539,8 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb, (unsigned long)p->tcfa_tm.lastuse)) continue; + tcf_action_update_hw_stats(p); + nest = nla_nest_start_noflag(skb, n_i); if (!nest) { index--; @@ -1539,9 +1541,6 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p, if (p == NULL) goto errout; - /* update hw stats for this action */ - tcf_action_update_hw_stats(p); - /* compat_mode being true specifies a call that is supposed * to add additional backward compatibility statistic TLVs. */ -- cgit v1.2.3 From 3320f36fd8ad45312c98857d36ecbef90f829497 Mon Sep 17 00:00:00 2001 From: Oz Shlomo Date: Sun, 12 Feb 2023 15:25:13 +0200 Subject: net/sched: act_pedit, setup offload action for action stats query A single tc pedit action may be translated to multiple flow_offload actions. Offload only actions that translate to a single pedit command value. Signed-off-by: Oz Shlomo Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- net/sched/act_pedit.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index c42fcc47dd6d..35ebe5d5c261 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -545,7 +545,28 @@ static int tcf_pedit_offload_act_setup(struct tc_action *act, void *entry_data, } *index_inc = k; } else { - return -EOPNOTSUPP; + struct flow_offload_action *fl_action = entry_data; + u32 cmd = tcf_pedit_cmd(act, 0); + int k; + + switch (cmd) { + case TCA_PEDIT_KEY_EX_CMD_SET: + fl_action->id = FLOW_ACTION_MANGLE; + break; + case TCA_PEDIT_KEY_EX_CMD_ADD: + fl_action->id = FLOW_ACTION_ADD; + break; + default: + NL_SET_ERR_MSG_MOD(extack, "Unsupported pedit command offload"); + return -EOPNOTSUPP; + } + + for (k = 1; k < tcf_pedit_nkeys(act); k++) { + if (cmd != tcf_pedit_cmd(act, k)) { + NL_SET_ERR_MSG_MOD(extack, "Unsupported pedit command offload"); + return -EOPNOTSUPP; + } + } } return 0; -- cgit v1.2.3 From ac7d27907d5445d0accaf998e1dc3ea570ed1ba6 Mon Sep 17 00:00:00 2001 From: Oz Shlomo Date: Sun, 12 Feb 2023 15:25:14 +0200 Subject: net/sched: pass flow_stats instead of multiple stats args Instead of passing 6 stats related args, pass the flow_stats. Signed-off-by: Oz Shlomo Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- include/net/pkt_cls.h | 11 +++++------ net/sched/cls_flower.c | 7 +------ net/sched/cls_matchall.c | 6 +----- 3 files changed, 7 insertions(+), 17 deletions(-) (limited to 'net') diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index cd410a87517b..bf50829d9255 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -294,8 +294,7 @@ static inline void tcf_exts_put_net(struct tcf_exts *exts) static inline void tcf_exts_hw_stats_update(const struct tcf_exts *exts, - u64 bytes, u64 packets, u64 drops, u64 lastuse, - u8 used_hw_stats, bool used_hw_stats_valid) + struct flow_stats *stats) { #ifdef CONFIG_NET_CLS_ACT int i; @@ -306,12 +305,12 @@ tcf_exts_hw_stats_update(const struct tcf_exts *exts, /* if stats from hw, just skip */ if (tcf_action_update_hw_stats(a)) { preempt_disable(); - tcf_action_stats_update(a, bytes, packets, drops, - lastuse, true); + tcf_action_stats_update(a, stats->bytes, stats->pkts, stats->drops, + stats->lastused, true); preempt_enable(); - a->used_hw_stats = used_hw_stats; - a->used_hw_stats_valid = used_hw_stats_valid; + a->used_hw_stats = stats->used_hw_stats; + a->used_hw_stats_valid = stats->used_hw_stats_valid; } } #endif diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 0b15698b3531..cb04739a13ce 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -502,12 +502,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f, tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, rtnl_held); - tcf_exts_hw_stats_update(&f->exts, cls_flower.stats.bytes, - cls_flower.stats.pkts, - cls_flower.stats.drops, - cls_flower.stats.lastused, - cls_flower.stats.used_hw_stats, - cls_flower.stats.used_hw_stats_valid); + tcf_exts_hw_stats_update(&f->exts, &cls_flower.stats); } static void __fl_put(struct cls_fl_filter *f) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index 705f63da2c21..b3883d3d4dbd 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -331,11 +331,7 @@ static void mall_stats_hw_filter(struct tcf_proto *tp, tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true); - tcf_exts_hw_stats_update(&head->exts, cls_mall.stats.bytes, - cls_mall.stats.pkts, cls_mall.stats.drops, - cls_mall.stats.lastused, - cls_mall.stats.used_hw_stats, - cls_mall.stats.used_hw_stats_valid); + tcf_exts_hw_stats_update(&head->exts, &cls_mall.stats); } static int mall_dump(struct net *net, struct tcf_proto *tp, void *fh, -- cgit v1.2.3 From d307b2c6f962ad5d83d7a7df71c2e9c9e4106d82 Mon Sep 17 00:00:00 2001 From: Oz Shlomo Date: Sun, 12 Feb 2023 15:25:15 +0200 Subject: net/sched: introduce flow_offload action cookie Currently a hardware action is uniquely identified by the tuple. However, the id is set by the flow_act_setup callback and tc core cannot enforce this, and it is possible that a future change could break this. In addition, are not unique across network namespaces. Uniquely identify the action by setting an action cookie by the tc core. Use the unique action cookie to query the action's hardware stats. Signed-off-by: Oz Shlomo Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- include/net/flow_offload.h | 2 ++ net/sched/act_api.c | 1 + net/sched/cls_api.c | 1 + 3 files changed, 4 insertions(+) (limited to 'net') diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 0400a0ac8a29..d177bf5f0e1a 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -228,6 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie); struct flow_action_entry { enum flow_action_id id; u32 hw_index; + unsigned long act_cookie; enum flow_action_hw_stats hw_stats; action_destr destructor; void *destructor_priv; @@ -610,6 +611,7 @@ struct flow_offload_action { enum offload_act_command command; enum flow_action_id id; u32 index; + unsigned long cookie; struct flow_stats stats; struct flow_action action; }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index f4fa6d7340f8..917827199102 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -192,6 +192,7 @@ static int offload_action_init(struct flow_offload_action *fl_action, fl_action->extack = extack; fl_action->command = cmd; fl_action->index = act->tcfa_index; + fl_action->cookie = (unsigned long)act; if (act->ops->offload_act_setup) { spin_lock_bh(&act->tcfa_lock); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 5b4a95e8a1ee..bfabc9c95fa9 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -3577,6 +3577,7 @@ int tc_setup_action(struct flow_action *flow_action, for (k = 0; k < index ; k++) { entry[k].hw_stats = tc_act_hw_stats(act->hw_stats); entry[k].hw_index = act->tcfa_index; + entry[k].act_cookie = (unsigned long)act; } j += index; -- cgit v1.2.3 From 5246c896b805b043a87fa78af32a33cbce00de05 Mon Sep 17 00:00:00 2001 From: Oz Shlomo Date: Sun, 12 Feb 2023 15:25:16 +0200 Subject: net/sched: support per action hw stats There are currently two mechanisms for populating hardware stats: 1. Using flow_offload api to query the flow's statistics. The api assumes that the same stats values apply to all the flow's actions. This assumption breaks when action drops or jumps over following actions. 2. Using hw_action api to query specific action stats via a driver callback method. This api assures the correct action stats for the offloaded action, however, it does not apply to the rest of the actions in the flow's actions array. Extend the flow_offload stats callback to indicate that a per action stats update is required. Use the existing flow_offload_action api to query the action's hw stats. In addition, currently the tc action stats utility only updates hw actions. Reuse the existing action stats cb infrastructure to query any action stats. Signed-off-by: Oz Shlomo Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- include/net/flow_offload.h | 1 + include/net/pkt_cls.h | 29 +++++++++++++++++++---------- net/sched/act_api.c | 8 -------- net/sched/cls_flower.c | 2 +- net/sched/cls_matchall.c | 2 +- 5 files changed, 22 insertions(+), 20 deletions(-) (limited to 'net') diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index d177bf5f0e1a..8c05455b1e34 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -594,6 +594,7 @@ struct flow_cls_common_offload { struct flow_cls_offload { struct flow_cls_common_offload common; enum flow_cls_command command; + bool use_act_stats; unsigned long cookie; struct flow_rule *rule; struct flow_stats stats; diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index bf50829d9255..ace437c6754b 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -292,9 +292,15 @@ static inline void tcf_exts_put_net(struct tcf_exts *exts) #define tcf_act_for_each_action(i, a, actions) \ for (i = 0; i < TCA_ACT_MAX_PRIO && ((a) = actions[i]); i++) +static inline bool tc_act_in_hw(struct tc_action *act) +{ + return !!act->in_hw_count; +} + static inline void tcf_exts_hw_stats_update(const struct tcf_exts *exts, - struct flow_stats *stats) + struct flow_stats *stats, + bool use_act_stats) { #ifdef CONFIG_NET_CLS_ACT int i; @@ -302,16 +308,18 @@ tcf_exts_hw_stats_update(const struct tcf_exts *exts, for (i = 0; i < exts->nr_actions; i++) { struct tc_action *a = exts->actions[i]; - /* if stats from hw, just skip */ - if (tcf_action_update_hw_stats(a)) { - preempt_disable(); - tcf_action_stats_update(a, stats->bytes, stats->pkts, stats->drops, - stats->lastused, true); - preempt_enable(); - - a->used_hw_stats = stats->used_hw_stats; - a->used_hw_stats_valid = stats->used_hw_stats_valid; + if (use_act_stats || tc_act_in_hw(a)) { + if (!tcf_action_update_hw_stats(a)) + continue; } + + preempt_disable(); + tcf_action_stats_update(a, stats->bytes, stats->pkts, stats->drops, + stats->lastused, true); + preempt_enable(); + + a->used_hw_stats = stats->used_hw_stats; + a->used_hw_stats_valid = stats->used_hw_stats_valid; } #endif } @@ -769,6 +777,7 @@ struct tc_cls_matchall_offload { enum tc_matchall_command command; struct flow_rule *rule; struct flow_stats stats; + bool use_act_stats; unsigned long cookie; }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 917827199102..eda58b78da13 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -169,11 +169,6 @@ static bool tc_act_skip_sw(u32 flags) return (flags & TCA_ACT_FLAGS_SKIP_SW) ? true : false; } -static bool tc_act_in_hw(struct tc_action *act) -{ - return !!act->in_hw_count; -} - /* SKIP_HW and SKIP_SW are mutually exclusive flags. */ static bool tc_act_flags_valid(u32 flags) { @@ -308,9 +303,6 @@ int tcf_action_update_hw_stats(struct tc_action *action) struct flow_offload_action fl_act = {}; int err; - if (!tc_act_in_hw(action)) - return -EOPNOTSUPP; - err = offload_action_init(&fl_act, action, FLOW_ACT_STATS, NULL); if (err) return err; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index cb04739a13ce..885c95191ccf 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -502,7 +502,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f, tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, rtnl_held); - tcf_exts_hw_stats_update(&f->exts, &cls_flower.stats); + tcf_exts_hw_stats_update(&f->exts, &cls_flower.stats, cls_flower.use_act_stats); } static void __fl_put(struct cls_fl_filter *f) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index b3883d3d4dbd..fa3bbd187eb9 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -331,7 +331,7 @@ static void mall_stats_hw_filter(struct tcf_proto *tp, tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true); - tcf_exts_hw_stats_update(&head->exts, &cls_mall.stats); + tcf_exts_hw_stats_update(&head->exts, &cls_mall.stats, cls_mall.use_act_stats); } static int mall_dump(struct net *net, struct tcf_proto *tp, void *fh, -- cgit v1.2.3 From 9a47c1ef5a95d1fd229ee5e375985f809a9d8177 Mon Sep 17 00:00:00 2001 From: Veerendranath Jakkam Date: Mon, 16 Jan 2023 18:20:58 +0530 Subject: wifi: cfg80211: Authentication offload to user space for MLO connection in STA mode Currently authentication request event interface doesn't have support to indicate the user space whether it should enable MLO or not during the authentication with the specified AP. But driver needs such capability since the connection is MLO or not decided by the driver in case of SME offload to the driver. Add support for driver to indicate MLD address of the AP in authentication offload request to inform user space to enable MLO during authentication process. Driver shall look at NL80211_ATTR_MLO_SUPPORT flag capability in NL80211_CMD_CONNECT to know whether the user space supports enabling MLO during the authentication offload. User space should enable MLO during the authentication only when it receives the AP MLD address in authentication offload request. User space shouldn't enable MLO if the authentication offload request doesn't indicate the AP MLD address even if the AP is MLO capable. When MLO is enabled, user space should use the MAC address of the interface (on which driver sent request) as self MLD address. User space and driver to use MLD addresses in RA, TA and BSSID fields of the frames between them, and driver translates the MLD addresses to/from link addresses based on the link chosen for the authentication. Signed-off-by: Veerendranath Jakkam Link: https://lore.kernel.org/r/20230116125058.1604843-1-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 12 ++++++++++++ include/uapi/linux/nl80211.h | 17 +++++++++++++++++ net/wireless/nl80211.c | 4 +++- net/wireless/trace.h | 7 +++++-- 4 files changed, 37 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 54a77d906b2d..bcde215475c6 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -3600,6 +3600,17 @@ struct cfg80211_pmk_conf { * the real status code for failures. Used only for the authentication * response command interface (user space to driver). * @pmkid: The identifier to refer a PMKSA. + * @mld_addr: MLD address of the peer. Used by the authentication request event + * interface. Driver indicates this to enable MLO during the authentication + * offload to user space. Driver shall look at %NL80211_ATTR_MLO_SUPPORT + * flag capability in NL80211_CMD_CONNECT to know whether the user space + * supports enabling MLO during the authentication offload. + * User space should use the address of the interface (on which the + * authentication request event reported) as self MLD address. User space + * and driver should use MLD addresses in RA, TA and BSSID fields of + * authentication frames sent or received via cfg80211. The driver + * translates the MLD addresses to/from link addresses based on the link + * chosen for the authentication. */ struct cfg80211_external_auth_params { enum nl80211_external_auth_action action; @@ -3608,6 +3619,7 @@ struct cfg80211_external_auth_params { unsigned int key_mgmt_suite; u16 status; const u8 *pmkid; + u8 mld_addr[ETH_ALEN] __aligned(2); }; /** diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h index 8ecb0fbee721..a984c6c4cf02 100644 --- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -1167,6 +1167,23 @@ * %NL80211_ATTR_STATUS_CODE attribute in %NL80211_CMD_EXTERNAL_AUTH * command interface. * + * Host driver sends MLD address of the AP with %NL80211_ATTR_MLD_ADDR in + * %NL80211_CMD_EXTERNAL_AUTH event to indicate user space to enable MLO + * during the authentication offload in STA mode while connecting to MLD + * APs. Host driver should check %NL80211_ATTR_MLO_SUPPORT flag capability + * in %NL80211_CMD_CONNECT to know whether the user space supports enabling + * MLO during the authentication offload or not. + * User space should enable MLO during the authentication only when it + * receives the AP MLD address in authentication offload request. User + * space shouldn't enable MLO when the authentication offload request + * doesn't indicate the AP MLD address even if the AP is MLO capable. + * User space should use %NL80211_ATTR_MLD_ADDR as peer's MLD address and + * interface address identified by %NL80211_ATTR_IFINDEX as self MLD + * address. User space and host driver to use MLD addresses in RA, TA and + * BSSID fields of the frames between them, and host driver translates the + * MLD addresses to/from link addresses based on the link chosen for the + * authentication. + * * Host driver reports this status on an authentication failure to the * user space through the connect result as the user space would have * initiated the connection through the connect request. diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 64cf6110ce9d..a0858d747e45 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -19720,7 +19720,9 @@ int cfg80211_external_auth_request(struct net_device *dev, params->action) || nla_put(msg, NL80211_ATTR_BSSID, ETH_ALEN, params->bssid) || nla_put(msg, NL80211_ATTR_SSID, params->ssid.ssid_len, - params->ssid.ssid)) + params->ssid.ssid) || + (!is_zero_ether_addr(params->mld_addr) && + nla_put(msg, NL80211_ATTR_MLD_ADDR, ETH_ALEN, params->mld_addr))) goto nla_put_failure; genlmsg_end(msg, hdr); diff --git a/net/wireless/trace.h b/net/wireless/trace.h index 9660fd94cd92..044b3c5a88ef 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -2524,6 +2524,7 @@ TRACE_EVENT(rdev_external_auth, MAC_ENTRY(bssid) __array(u8, ssid, IEEE80211_MAX_SSID_LEN + 1) __field(u16, status) + MAC_ENTRY(mld_addr) ), TP_fast_assign(WIPHY_ASSIGN; NETDEV_ASSIGN; @@ -2532,10 +2533,12 @@ TRACE_EVENT(rdev_external_auth, memcpy(__entry->ssid, params->ssid.ssid, params->ssid.ssid_len); __entry->status = params->status; + MAC_ASSIGN(mld_addr, params->mld_addr); ), TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", bssid: %pM" - ", ssid: %s, status: %u", WIPHY_PR_ARG, NETDEV_PR_ARG, - __entry->bssid, __entry->ssid, __entry->status) + ", ssid: %s, status: %u, mld_addr: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->bssid, + __entry->ssid, __entry->status, __entry->mld_addr) ); TRACE_EVENT(rdev_start_radar_detection, -- cgit v1.2.3 From 015b8cc5e7c4d7bb671f1984d7b7338c310b185b Mon Sep 17 00:00:00 2001 From: Alexander Wetzel Date: Tue, 24 Jan 2023 15:18:56 +0100 Subject: wifi: cfg80211: Fix use after free for wext Key information in wext.connect is not reset on (re)connect and can hold data from a previous connection. Reset key data to avoid that drivers or mac80211 incorrectly detect a WEP connection request and access the freed or already reused memory. Additionally optimize cfg80211_sme_connect() and avoid an useless schedule of conn_work. Fixes: fffd0934b939 ("cfg80211: rework key operation") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230124141856.356646-1-alexander@wetzel-home.de Signed-off-by: Alexander Wetzel Signed-off-by: Johannes Berg --- net/wireless/sme.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/wireless/sme.c b/net/wireless/sme.c index 123248b2c0be..0cc841c0c59b 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -285,6 +285,15 @@ void cfg80211_conn_work(struct work_struct *work) wiphy_unlock(&rdev->wiphy); } +static void cfg80211_step_auth_next(struct cfg80211_conn *conn, + struct cfg80211_bss *bss) +{ + memcpy(conn->bssid, bss->bssid, ETH_ALEN); + conn->params.bssid = conn->bssid; + conn->params.channel = bss->channel; + conn->state = CFG80211_CONN_AUTHENTICATE_NEXT; +} + /* Returned bss is reference counted and must be cleaned up appropriately. */ static struct cfg80211_bss *cfg80211_get_conn_bss(struct wireless_dev *wdev) { @@ -302,10 +311,7 @@ static struct cfg80211_bss *cfg80211_get_conn_bss(struct wireless_dev *wdev) if (!bss) return NULL; - memcpy(wdev->conn->bssid, bss->bssid, ETH_ALEN); - wdev->conn->params.bssid = wdev->conn->bssid; - wdev->conn->params.channel = bss->channel; - wdev->conn->state = CFG80211_CONN_AUTHENTICATE_NEXT; + cfg80211_step_auth_next(wdev->conn, bss); schedule_work(&rdev->conn_work); return bss; @@ -597,7 +603,12 @@ static int cfg80211_sme_connect(struct wireless_dev *wdev, wdev->conn->params.ssid_len = wdev->u.client.ssid_len; /* see if we have the bss already */ - bss = cfg80211_get_conn_bss(wdev); + bss = cfg80211_get_bss(wdev->wiphy, wdev->conn->params.channel, + wdev->conn->params.bssid, + wdev->conn->params.ssid, + wdev->conn->params.ssid_len, + wdev->conn_bss_type, + IEEE80211_PRIVACY(wdev->conn->params.privacy)); if (prev_bssid) { memcpy(wdev->conn->prev_bssid, prev_bssid, ETH_ALEN); @@ -608,6 +619,7 @@ static int cfg80211_sme_connect(struct wireless_dev *wdev, if (bss) { enum nl80211_timeout_reason treason; + cfg80211_step_auth_next(wdev->conn, bss); err = cfg80211_conn_do_work(wdev, &treason); cfg80211_put_bss(wdev->wiphy, bss); } else { @@ -1464,6 +1476,15 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev, } else { if (WARN_ON(connkeys)) return -EINVAL; + + /* connect can point to wdev->wext.connect which + * can hold key data from a previous connection + */ + connect->key = NULL; + connect->key_len = 0; + connect->key_idx = 0; + connect->crypto.cipher_group = 0; + connect->crypto.n_ciphers_pairwise = 0; } wdev->connect_keys = connkeys; -- cgit v1.2.3 From 9288188438d85e22c23cfd6657ee8a801babc83c Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 25 Jan 2023 12:01:02 +0100 Subject: wifi: mac80211: move color collision detection report in a delayed work Move color collision report in a dedicated delayed work and do not run it in interrupt context in order to rate-limit the number of events reported to userspace. Moreover grab wdev mutex in ieee80211_color_collision_detection_work routine since it is required by cfg80211_obss_color_collision_notify(). Tested-by: Nicolas Cavallari Signed-off-by: Lorenzo Bianconi Fixes: 5f9404abdf2a ("mac80211: add support for BSS color change") Link: https://lore.kernel.org/r/3f6cf60c892ad40c1cca4a55d62b1224ef1c6ce9.1674644379.git.lorenzo@kernel.org Signed-off-by: Johannes Berg --- net/mac80211/cfg.c | 26 +++++++++++++++++++++++++- net/mac80211/ieee80211_i.h | 3 +++ net/mac80211/link.c | 3 +++ 3 files changed, 31 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index f5d43f42f6d8..c3e4e48e9ce9 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -4652,6 +4652,20 @@ unlock: sdata_unlock(sdata); } +void ieee80211_color_collision_detection_work(struct work_struct *work) +{ + struct delayed_work *delayed_work = to_delayed_work(work); + struct ieee80211_link_data *link = + container_of(delayed_work, struct ieee80211_link_data, + color_collision_detect_work); + struct ieee80211_sub_if_data *sdata = link->sdata; + + sdata_lock(sdata); + cfg80211_obss_color_collision_notify(sdata->dev, link->color_bitmap, + GFP_KERNEL); + sdata_unlock(sdata); +} + void ieee80211_color_change_finish(struct ieee80211_vif *vif) { struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); @@ -4666,11 +4680,21 @@ ieee80211_obss_color_collision_notify(struct ieee80211_vif *vif, u64 color_bitmap, gfp_t gfp) { struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); + struct ieee80211_link_data *link = &sdata->deflink; if (sdata->vif.bss_conf.color_change_active || sdata->vif.bss_conf.csa_active) return; - cfg80211_obss_color_collision_notify(sdata->dev, color_bitmap, gfp); + if (delayed_work_pending(&link->color_collision_detect_work)) + return; + + link->color_bitmap = color_bitmap; + /* queue the color collision detection event every 500 ms in order to + * avoid sending too much netlink messages to userspace. + */ + ieee80211_queue_delayed_work(&sdata->local->hw, + &link->color_collision_detect_work, + msecs_to_jiffies(500)); } EXPORT_SYMBOL_GPL(ieee80211_obss_color_collision_notify); diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index d16606e84e22..7ca9bde3c6d2 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -974,6 +974,8 @@ struct ieee80211_link_data { struct cfg80211_chan_def csa_chandef; struct work_struct color_change_finalize_work; + struct delayed_work color_collision_detect_work; + u64 color_bitmap; /* context reservation -- protected with chanctx_mtx */ struct ieee80211_chanctx *reserved_chanctx; @@ -1929,6 +1931,7 @@ int ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, /* color change handling */ void ieee80211_color_change_finalize_work(struct work_struct *work); +void ieee80211_color_collision_detection_work(struct work_struct *work); /* interface handling */ #define MAC80211_SUPPORTED_FEATURES_TX (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | \ diff --git a/net/mac80211/link.c b/net/mac80211/link.c index d1f5a9f7c647..8c8869cc1fb4 100644 --- a/net/mac80211/link.c +++ b/net/mac80211/link.c @@ -39,6 +39,8 @@ void ieee80211_link_init(struct ieee80211_sub_if_data *sdata, ieee80211_csa_finalize_work); INIT_WORK(&link->color_change_finalize_work, ieee80211_color_change_finalize_work); + INIT_DELAYED_WORK(&link->color_collision_detect_work, + ieee80211_color_collision_detection_work); INIT_LIST_HEAD(&link->assigned_chanctx_list); INIT_LIST_HEAD(&link->reserved_chanctx_list); INIT_DELAYED_WORK(&link->dfs_cac_timer_work, @@ -66,6 +68,7 @@ void ieee80211_link_stop(struct ieee80211_link_data *link) if (link->sdata->vif.type == NL80211_IFTYPE_STATION) ieee80211_mgd_stop_link(link); + cancel_delayed_work_sync(&link->color_collision_detect_work); ieee80211_link_release_channel(link); } -- cgit v1.2.3 From a42e59eb9689e54279227e2af5ed75128d92a82b Mon Sep 17 00:00:00 2001 From: Veerendranath Jakkam Date: Thu, 26 Jan 2023 20:02:55 +0530 Subject: wifi: cfg80211: Extend cfg80211_new_sta() for MLD AP Add support for drivers to indicate STA connection(MLO/non-MLO) when user space SME (e.g., hostapd) is not used for MLD AP. Add new parameters in struct station_info to provide below information in cfg80211_new_sta() call: - MLO link ID of the AP, with which station completed (re)association. This is applicable for both MLO and non-MLO station connections when the AP affiliated with an MLD. - Station's MLD address if the connection is MLO capable. - (Re)Association Response IEs sent to the station. User space needs this to determine rejected and accepted affiliated links information of the connected station if the connection is MLO capable. Signed-off-by: Veerendranath Jakkam Link: https://lore.kernel.org/r/20230126143256.960563-2-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 24 ++++++++++++++++++++++++ net/wireless/nl80211.c | 16 ++++++++++++++++ 2 files changed, 40 insertions(+) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index bcde215475c6..28529194eb89 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1876,6 +1876,24 @@ struct cfg80211_tid_stats { * received packet with an FCS error matches the peer MAC address. * @airtime_link_metric: mesh airtime link metric. * @connected_to_as: true if mesh STA has a path to authentication server + * @mlo_params_valid: Indicates @assoc_link_id and @mld_addr fields are filled + * by driver. Drivers use this only in cfg80211_new_sta() calls when AP + * MLD's MLME/SME is offload to driver. Drivers won't fill this + * information in cfg80211_del_sta_sinfo(), get_station() and + * dump_station() callbacks. + * @assoc_link_id: Indicates MLO link ID of the AP, with which the station + * completed (re)association. This information filled for both MLO + * and non-MLO STA connections when the AP affiliated with an MLD. + * @mld_addr: For MLO STA connection, filled with MLD address of the station. + * For non-MLO STA connection, filled with all zeros. + * @assoc_resp_ies: IEs from (Re)Association Response. + * This is used only when in AP mode with drivers that do not use user + * space MLME/SME implementation. The information is provided only for the + * cfg80211_new_sta() calls to notify user space of the IEs. Drivers won't + * fill this information in cfg80211_del_sta_sinfo(), get_station() and + * dump_station() callbacks. User space needs this information to determine + * the accepted and rejected affiliated links of the connected station. + * @assoc_resp_ies_len: Length of @assoc_resp_ies buffer in octets. */ struct station_info { u64 filled; @@ -1935,6 +1953,12 @@ struct station_info { u32 airtime_link_metric; u8 connected_to_as; + + bool mlo_params_valid; + u8 assoc_link_id; + u8 mld_addr[ETH_ALEN] __aligned(2); + const u8 *assoc_resp_ies; + size_t assoc_resp_ies_len; }; /** diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index a0858d747e45..caf716c3afaf 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -6527,6 +6527,22 @@ static int nl80211_send_station(struct sk_buff *msg, u32 cmd, u32 portid, sinfo->assoc_req_ies)) goto nla_put_failure; + if (sinfo->assoc_resp_ies_len && + nla_put(msg, NL80211_ATTR_RESP_IE, sinfo->assoc_resp_ies_len, + sinfo->assoc_resp_ies)) + goto nla_put_failure; + + if (sinfo->mlo_params_valid) { + if (nla_put_u8(msg, NL80211_ATTR_MLO_LINK_ID, + sinfo->assoc_link_id)) + goto nla_put_failure; + + if (!is_zero_ether_addr(sinfo->mld_addr) && + nla_put(msg, NL80211_ATTR_MLD_ADDR, ETH_ALEN, + sinfo->mld_addr)) + goto nla_put_failure; + } + cfg80211_sinfo_release_content(sinfo); genlmsg_end(msg, hdr); return 0; -- cgit v1.2.3 From 8bb588d975019748ebdab9448e9a274b7463c13b Mon Sep 17 00:00:00 2001 From: Veerendranath Jakkam Date: Thu, 26 Jan 2023 20:02:56 +0530 Subject: wifi: cfg80211: Extend cfg80211_update_owe_info_event() for MLD AP Add support to offload OWE processing to user space for MLD AP when driver's SME in use. Add new parameters in struct cfg80211_update_owe_info to provide below information in cfg80211_update_owe_info_event() call: - MLO link ID of the AP, with which station requested (re)association. This is applicable for both MLO and non-MLO station connections when the AP affiliated with an MLD. - Station's MLD address if the connection is MLO capable. Signed-off-by: Veerendranath Jakkam Link: https://lore.kernel.org/r/20230126143256.960563-3-quic_vjakkam@quicinc.com [reformat the trace event macro] Signed-off-by: Johannes Berg --- drivers/net/wireless/quantenna/qtnfmac/event.c | 1 + include/net/cfg80211.h | 10 +++++++ net/wireless/nl80211.c | 11 ++++++++ net/wireless/trace.h | 38 ++++++++++++++++---------- 4 files changed, 46 insertions(+), 14 deletions(-) (limited to 'net') diff --git a/drivers/net/wireless/quantenna/qtnfmac/event.c b/drivers/net/wireless/quantenna/qtnfmac/event.c index 4fafe370101a..f0158737aed6 100644 --- a/drivers/net/wireless/quantenna/qtnfmac/event.c +++ b/drivers/net/wireless/quantenna/qtnfmac/event.c @@ -662,6 +662,7 @@ qtnf_event_handle_update_owe(struct qtnf_vif *vif, memcpy(ie, owe_ev->ies, ie_len); owe_info.ie_len = ie_len; owe_info.ie = ie; + owe_info.assoc_link_id = -1; pr_info("%s: external OWE processing: peer=%pM\n", vif->netdev->name, owe_ev->peer); diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 28529194eb89..70f01ea5ba5c 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -3898,12 +3898,22 @@ struct cfg80211_pmsr_request { * the IEs of the remote peer in the event from the host driver and * the constructed IEs by the user space in the request interface. * @ie_len: Length of IEs in octets. + * @assoc_link_id: MLO link ID of the AP, with which (re)association requested + * by peer. This will be filled by driver for both MLO and non-MLO station + * connections when the AP affiliated with an MLD. For non-MLD AP mode, it + * will be -1. Used only with OWE update event (driver to user space). + * @peer_mld_addr: For MLO connection, MLD address of the peer. For non-MLO + * connection, it will be all zeros. This is applicable only when + * @assoc_link_id is not -1, i.e., the AP affiliated with an MLD. Used only + * with OWE update event (driver to user space). */ struct cfg80211_update_owe_info { u8 peer[ETH_ALEN] __aligned(2); u16 status; const u8 *ie; size_t ie_len; + int assoc_link_id; + u8 peer_mld_addr[ETH_ALEN] __aligned(2); }; /** diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index caf716c3afaf..f9e6e65bc029 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -19780,6 +19780,17 @@ void cfg80211_update_owe_info_event(struct net_device *netdev, nla_put(msg, NL80211_ATTR_IE, owe_info->ie_len, owe_info->ie)) goto nla_put_failure; + if (owe_info->assoc_link_id != -1) { + if (nla_put_u8(msg, NL80211_ATTR_MLO_LINK_ID, + owe_info->assoc_link_id)) + goto nla_put_failure; + + if (!is_zero_ether_addr(owe_info->peer_mld_addr) && + nla_put(msg, NL80211_ATTR_MLD_ADDR, ETH_ALEN, + owe_info->peer_mld_addr)) + goto nla_put_failure; + } + genlmsg_end(msg, hdr); genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0, diff --git a/net/wireless/trace.h b/net/wireless/trace.h index 044b3c5a88ef..0e688696fc60 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -3748,20 +3748,30 @@ TRACE_EVENT(cfg80211_pmsr_complete, ); TRACE_EVENT(cfg80211_update_owe_info_event, - TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, - struct cfg80211_update_owe_info *owe_info), - TP_ARGS(wiphy, netdev, owe_info), - TP_STRUCT__entry(WIPHY_ENTRY - NETDEV_ENTRY - MAC_ENTRY(peer) - __dynamic_array(u8, ie, owe_info->ie_len)), - TP_fast_assign(WIPHY_ASSIGN; - NETDEV_ASSIGN; - MAC_ASSIGN(peer, owe_info->peer); - memcpy(__get_dynamic_array(ie), owe_info->ie, - owe_info->ie_len);), - TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM", - WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer) + TP_PROTO(struct wiphy *wiphy, struct net_device *netdev, + struct cfg80211_update_owe_info *owe_info), + TP_ARGS(wiphy, netdev, owe_info), + TP_STRUCT__entry( + WIPHY_ENTRY + NETDEV_ENTRY + MAC_ENTRY(peer) + __dynamic_array(u8, ie, owe_info->ie_len) + __field(int, assoc_link_id) + MAC_ENTRY(peer_mld_addr) + ), + TP_fast_assign( + WIPHY_ASSIGN; + NETDEV_ASSIGN; + MAC_ASSIGN(peer, owe_info->peer); + memcpy(__get_dynamic_array(ie), owe_info->ie, + owe_info->ie_len); + __entry->assoc_link_id = owe_info->assoc_link_id; + MAC_ASSIGN(peer_mld_addr, owe_info->peer_mld_addr); + ), + TP_printk(WIPHY_PR_FMT ", " NETDEV_PR_FMT ", peer: %pM," + " assoc_link_id: %d, peer_mld_addr: %pM", + WIPHY_PR_ARG, NETDEV_PR_ARG, __entry->peer, + __entry->assoc_link_id, __entry->peer_mld_addr) ); TRACE_EVENT(cfg80211_bss_color_notify, -- cgit v1.2.3 From aa87cd8b35736a5183745ab0ec4b82419024dfd7 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 27 Jan 2023 12:39:31 +0100 Subject: wifi: mac80211: mlme: handle EHT channel puncturing Handle the Puncturing info received from the AP in the EHT Operation element in beacons. If the info is invalid: - during association: disable EHT connection for the AP - after association: disconnect This commit includes many (internal) bugfixes and spec updates various people. Co-developed-by: Miri Korenblit Signed-off-by: Miri Korenblit Link: https://lore.kernel.org/r/20230127123930.4fbc74582331.I3547481d49f958389f59dfeba3fcc75e72b0aa6e@changeid Signed-off-by: Johannes Berg --- include/net/mac80211.h | 5 +- net/mac80211/cfg.c | 2 +- net/mac80211/chan.c | 2 +- net/mac80211/ieee80211_i.h | 2 +- net/mac80211/mlme.c | 224 ++++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 228 insertions(+), 7 deletions(-) (limited to 'net') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 2635e6de8101..54ffc0cc2918 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -340,7 +340,7 @@ struct ieee80211_vif_chanctx_switch { * @BSS_CHANGED_FILS_DISCOVERY: FILS discovery status changed. * @BSS_CHANGED_UNSOL_BCAST_PROBE_RESP: Unsolicited broadcast probe response * status changed. - * + * @BSS_CHANGED_EHT_PUNCTURING: The channel puncturing bitmap changed. */ enum ieee80211_bss_change { BSS_CHANGED_ASSOC = 1<<0, @@ -375,6 +375,7 @@ enum ieee80211_bss_change { BSS_CHANGED_HE_BSS_COLOR = 1<<29, BSS_CHANGED_FILS_DISCOVERY = 1<<30, BSS_CHANGED_UNSOL_BCAST_PROBE_RESP = 1<<31, + BSS_CHANGED_EHT_PUNCTURING = BIT_ULL(32), /* when adding here, make sure to change ieee80211_reconfig */ }; @@ -640,6 +641,7 @@ struct ieee80211_fils_discovery { * @tx_pwr_env_num: number of @tx_pwr_env. * @pwr_reduction: power constraint of BSS. * @eht_support: does this BSS support EHT + * @eht_puncturing: bitmap to indicate which channels are punctured in this BSS * @csa_active: marks whether a channel switch is going on. Internally it is * write-protected by sdata_lock and local->mtx so holding either is fine * for read access. @@ -736,6 +738,7 @@ struct ieee80211_bss_conf { u8 tx_pwr_env_num; u8 pwr_reduction; bool eht_support; + u16 eht_puncturing; bool csa_active; bool mu_mimo_owner; diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index c3e4e48e9ce9..686309648cff 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -4171,7 +4171,7 @@ static int ieee80211_set_ap_chanwidth(struct wiphy *wiphy, struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev); struct ieee80211_link_data *link; int ret; - u32 changed = 0; + u64 changed = 0; link = sdata_dereference(sdata->link[link_id], sdata); diff --git a/net/mac80211/chan.c b/net/mac80211/chan.c index e72cf0749d49..dbc34fbe7c8f 100644 --- a/net/mac80211/chan.c +++ b/net/mac80211/chan.c @@ -1916,7 +1916,7 @@ int ieee80211_link_use_reserved_context(struct ieee80211_link_data *link) int ieee80211_link_change_bandwidth(struct ieee80211_link_data *link, const struct cfg80211_chan_def *chandef, - u32 *changed) + u64 *changed) { struct ieee80211_sub_if_data *sdata = link->sdata; struct ieee80211_bss_conf *link_conf = link->conf; diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 7ca9bde3c6d2..517b1840a8e3 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -2481,7 +2481,7 @@ int ieee80211_link_unreserve_chanctx(struct ieee80211_link_data *link); int __must_check ieee80211_link_change_bandwidth(struct ieee80211_link_data *link, const struct cfg80211_chan_def *chandef, - u32 *changed); + u64 *changed); void ieee80211_link_release_channel(struct ieee80211_link_data *link); void ieee80211_link_vlan_copy_chanctx(struct ieee80211_link_data *link); void ieee80211_link_copy_chanctx_to_vlans(struct ieee80211_link_data *link, diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 0aee2392dd29..a14a5ea2bffd 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -8,7 +8,7 @@ * Copyright 2007, Michael Wu * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2015 - 2017 Intel Deutschland GmbH - * Copyright (C) 2018 - 2022 Intel Corporation + * Copyright (C) 2018 - 2023 Intel Corporation */ #include @@ -88,6 +88,141 @@ MODULE_PARM_DESC(probe_wait_ms, */ #define IEEE80211_SIGNAL_AVE_MIN_COUNT 4 +struct ieee80211_per_bw_puncturing_values { + u8 len; + const u16 *valid_values; +}; + +static const u16 puncturing_values_80mhz[] = { + 0x8, 0x4, 0x2, 0x1 +}; + +static const u16 puncturing_values_160mhz[] = { + 0x80, 0x40, 0x20, 0x10, 0x8, 0x4, 0x2, 0x1, 0xc0, 0x30, 0xc, 0x3 +}; + +static const u16 puncturing_values_320mhz[] = { + 0xc000, 0x3000, 0xc00, 0x300, 0xc0, 0x30, 0xc, 0x3, 0xf000, 0xf00, + 0xf0, 0xf, 0xfc00, 0xf300, 0xf0c0, 0xf030, 0xf00c, 0xf003, 0xc00f, + 0x300f, 0xc0f, 0x30f, 0xcf, 0x3f +}; + +#define IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(_bw) \ + { \ + .len = ARRAY_SIZE(puncturing_values_ ## _bw ## mhz), \ + .valid_values = puncturing_values_ ## _bw ## mhz \ + } + +static const struct ieee80211_per_bw_puncturing_values per_bw_puncturing[] = { + IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(80), + IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(160), + IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(320) +}; + +static bool ieee80211_valid_disable_subchannel_bitmap(u16 *bitmap, + enum nl80211_chan_width bw) +{ + u32 idx, i; + + switch (bw) { + case NL80211_CHAN_WIDTH_80: + idx = 0; + break; + case NL80211_CHAN_WIDTH_160: + idx = 1; + break; + case NL80211_CHAN_WIDTH_320: + idx = 2; + break; + default: + *bitmap = 0; + break; + } + + if (!*bitmap) + return true; + + for (i = 0; i < per_bw_puncturing[idx].len; i++) + if (per_bw_puncturing[idx].valid_values[i] == *bitmap) + return true; + + return false; +} + +/* + * Extract from the given disabled subchannel bitmap (raw format + * from the EHT Operation Element) the bits for the subchannel + * we're using right now. + */ +static u16 +ieee80211_extract_dis_subch_bmap(const struct ieee80211_eht_operation *eht_oper, + struct cfg80211_chan_def *chandef, u16 bitmap) +{ + struct ieee80211_eht_operation_info *info = (void *)eht_oper->optional; + struct cfg80211_chan_def ap_chandef = *chandef; + u32 ap_center_freq, local_center_freq; + u32 ap_bw, local_bw; + int ap_start_freq, local_start_freq; + u16 shift, mask; + + if (!(eht_oper->params & IEEE80211_EHT_OPER_INFO_PRESENT) || + !(eht_oper->params & + IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT)) + return 0; + + /* set 160/320 supported to get the full AP definition */ + ieee80211_chandef_eht_oper(eht_oper, true, true, &ap_chandef); + ap_center_freq = ap_chandef.center_freq1; + ap_bw = 20 * BIT(u8_get_bits(info->control, + IEEE80211_EHT_OPER_CHAN_WIDTH)); + ap_start_freq = ap_center_freq - ap_bw / 2; + local_center_freq = chandef->center_freq1; + local_bw = 20 * BIT(ieee80211_chan_width_to_rx_bw(chandef->width)); + local_start_freq = local_center_freq - local_bw / 2; + shift = (local_start_freq - ap_start_freq) / 20; + mask = BIT(local_bw / 20) - 1; + + return (bitmap >> shift) & mask; +} + +/* + * Handle the puncturing bitmap, possibly downgrading bandwidth to get a + * valid bitmap. + */ +static void +ieee80211_handle_puncturing_bitmap(struct ieee80211_link_data *link, + const struct ieee80211_eht_operation *eht_oper, + u16 bitmap, u64 *changed) +{ + struct cfg80211_chan_def *chandef = &link->conf->chandef; + u16 extracted; + u64 _changed = 0; + + if (!changed) + changed = &_changed; + + while (chandef->width > NL80211_CHAN_WIDTH_40) { + extracted = + ieee80211_extract_dis_subch_bmap(eht_oper, chandef, + bitmap); + + if (ieee80211_valid_disable_subchannel_bitmap(&bitmap, + chandef->width)) + break; + link->u.mgd.conn_flags |= + ieee80211_chandef_downgrade(chandef); + *changed |= BSS_CHANGED_BANDWIDTH; + } + + if (chandef->width <= NL80211_CHAN_WIDTH_40) + extracted = 0; + + if (link->conf->eht_puncturing != extracted) { + link->conf->eht_puncturing = extracted; + *changed |= BSS_CHANGED_EHT_PUNCTURING; + } +} + /* * We can have multiple work items (and connection probing) * scheduling this timer, but we need to take care to only @@ -413,7 +548,7 @@ static int ieee80211_config_bw(struct ieee80211_link_data *link, const struct ieee80211_he_operation *he_oper, const struct ieee80211_eht_operation *eht_oper, const struct ieee80211_s1g_oper_ie *s1g_oper, - const u8 *bssid, u32 *changed) + const u8 *bssid, u64 *changed) { struct ieee80211_sub_if_data *sdata = link->sdata; struct ieee80211_local *local = sdata->local; @@ -4145,6 +4280,7 @@ static bool ieee80211_assoc_config_link(struct ieee80211_link_data *link, link_sta); bss_conf->eht_support = link_sta->pub->eht_cap.has_eht; + *changed |= BSS_CHANGED_EHT_PUNCTURING; } else { bss_conf->eht_support = false; } @@ -5477,6 +5613,45 @@ static bool ieee80211_rx_our_beacon(const u8 *tx_bssid, return ether_addr_equal(tx_bssid, bss->transmitted_bss->bssid); } +static bool ieee80211_config_puncturing(struct ieee80211_link_data *link, + const struct ieee80211_eht_operation *eht_oper, + u64 *changed) +{ + u16 bitmap = 0, extracted; + + if ((eht_oper->params & IEEE80211_EHT_OPER_INFO_PRESENT) && + (eht_oper->params & + IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT)) { + const struct ieee80211_eht_operation_info *info = + (void *)eht_oper->optional; + const u8 *disable_subchannel_bitmap = info->optional; + + bitmap = get_unaligned_le16(disable_subchannel_bitmap); + } + + extracted = ieee80211_extract_dis_subch_bmap(eht_oper, + &link->conf->chandef, + bitmap); + + /* accept if there are no changes */ + if (!(*changed & BSS_CHANGED_BANDWIDTH) && + extracted == link->conf->eht_puncturing) + return true; + + if (!ieee80211_valid_disable_subchannel_bitmap(&bitmap, + link->conf->chandef.width)) { + link_info(link, + "Got an invalid disable subchannel bitmap from AP %pM: bitmap = 0x%x, bw = 0x%x. disconnect\n", + link->u.mgd.bssid, + bitmap, + link->conf->chandef.width); + return false; + } + + ieee80211_handle_puncturing_bitmap(link, eht_oper, bitmap, changed); + return true; +} + static void ieee80211_rx_mgmt_beacon(struct ieee80211_link_data *link, struct ieee80211_hdr *hdr, size_t len, struct ieee80211_rx_status *rx_status) @@ -5494,7 +5669,7 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_link_data *link, struct ieee80211_channel *chan; struct link_sta_info *link_sta; struct sta_info *sta; - u32 changed = 0; + u64 changed = 0; bool erp_valid; u8 erp_value = 0; u32 ncrc = 0; @@ -5791,6 +5966,21 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_link_data *link, elems->pwr_constr_elem, elems->cisco_dtpc_elem); + if (elems->eht_operation && + !(link->u.mgd.conn_flags & IEEE80211_CONN_DISABLE_EHT)) { + if (!ieee80211_config_puncturing(link, elems->eht_operation, + &changed)) { + ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH, + WLAN_REASON_DEAUTH_LEAVING, + true, deauth_buf); + ieee80211_report_disconnect(sdata, deauth_buf, + sizeof(deauth_buf), true, + WLAN_REASON_DEAUTH_LEAVING, + false); + goto free; + } + } + ieee80211_link_info_change_notify(sdata, link, changed); free: kfree(elems); @@ -6892,9 +7082,12 @@ ieee80211_setup_assoc_link(struct ieee80211_sub_if_data *sdata, ieee80211_apply_htcap_overrides(sdata, &sta_ht_cap); } + link->conf->eht_puncturing = 0; + rcu_read_lock(); beacon_ies = rcu_dereference(cbss->beacon_ies); if (beacon_ies) { + const struct ieee80211_eht_operation *eht_oper; const struct element *elem; u8 dtim_count = 0; @@ -6923,6 +7116,31 @@ ieee80211_setup_assoc_link(struct ieee80211_sub_if_data *sdata, link->conf->ema_ap = true; else link->conf->ema_ap = false; + + elem = cfg80211_find_ext_elem(WLAN_EID_EXT_EHT_OPERATION, + beacon_ies->data, beacon_ies->len); + eht_oper = (const void *)(elem->data + 1); + + if (elem && + ieee80211_eht_oper_size_ok((const void *)(elem->data + 1), + elem->datalen - 1) && + (eht_oper->params & IEEE80211_EHT_OPER_INFO_PRESENT) && + (eht_oper->params & IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT)) { + const struct ieee80211_eht_operation_info *info = + (void *)eht_oper->optional; + const u8 *disable_subchannel_bitmap = info->optional; + u16 bitmap; + + bitmap = get_unaligned_le16(disable_subchannel_bitmap); + if (ieee80211_valid_disable_subchannel_bitmap(&bitmap, + link->conf->chandef.width)) + ieee80211_handle_puncturing_bitmap(link, + eht_oper, + bitmap, + NULL); + else + conn_flags |= IEEE80211_CONN_DISABLE_EHT; + } } rcu_read_unlock(); -- cgit v1.2.3 From 77669c151f1de6de2eaa28d7efdf387d39208309 Mon Sep 17 00:00:00 2001 From: Alvin Šipraga Date: Sat, 28 Jan 2023 13:58:43 +0100 Subject: wifi: nl80211: emit CMD_START_AP on multicast group when an AP is started MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Userspace processes such as network daemons may wish to be informed when any AP interface is brought up on the system, for example to initiate a (re)configuration of IP settings or to start a DHCP server. Currently nl80211 does not broadcast any such event on its multicast groups, leaving userspace only two options: 1. the process must be the one that actually issued the NL80211_CMD_START_AP request, so that it can react on the response to that request; 2. the process must react to RTM_NEWLINK events indicating a change in carrier state, and may query for further information about the AP and react accordingly. Option (1) is robust, but it does not cover all scenarios. It is easy to imagine a situation where this is not the case (e.g. hostapd + systemd-networkd). Option (2) is not robust, because RTM_NEWLINK events may be silently discarded by the linkwatch logic (cf. linkwatch_fire_event()). Concretely, consider a scenario in which the carrier state flip-flops in the following way: ^ carrier state (high/low = carrier/no carrier) | | _______ _______ ... | | | | | ______| "foo" |____| "bar" (SSID in "quotes") | +-------A-------B----C---------> time If the time interval between (A) and (C) is less than 1 second, then linkwatch may emit only a single RTM_NEWLINK event indicating carrier gain. This is problematic because it is possible that the network configuration that should be applied is a function of the AP's properties such as SSID (cf. SSID= in systemd.network(5)). As illustrated in the above diagram, it may be that the AP with SSID "bar" ends up being configured as though it had SSID "foo". Address the above issue by having nl80211 emit an NL80211_CMD_START_AP message on the MLME nl80211 multicast group. This allows for arbitrary processes to be reliably informed. Signed-off-by: Alvin Šipraga Link: https://lore.kernel.org/r/20230128125844.2407135-1-alvin@pqrs.dk Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) (limited to 'net') diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index f9e6e65bc029..f3a4b40c6726 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -5770,6 +5770,42 @@ static bool nl80211_valid_auth_type(struct cfg80211_registered_device *rdev, } } +static void nl80211_send_ap_started(struct wireless_dev *wdev, + unsigned int link_id) +{ + struct wiphy *wiphy = wdev->wiphy; + struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy); + struct sk_buff *msg; + void *hdr; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + hdr = nl80211hdr_put(msg, 0, 0, 0, NL80211_CMD_START_AP); + if (!hdr) + goto out; + + if (nla_put_u32(msg, NL80211_ATTR_WIPHY, rdev->wiphy_idx) || + nla_put_u32(msg, NL80211_ATTR_IFINDEX, wdev->netdev->ifindex) || + nla_put_u64_64bit(msg, NL80211_ATTR_WDEV, wdev_id(wdev), + NL80211_ATTR_PAD) || + (wdev->u.ap.ssid_len && + nla_put(msg, NL80211_ATTR_SSID, wdev->u.ap.ssid_len, + wdev->u.ap.ssid)) || + (wdev->valid_links && + nla_put_u8(msg, NL80211_ATTR_MLO_LINK_ID, link_id))) + goto out; + + genlmsg_end(msg, hdr); + + genlmsg_multicast_netns(&nl80211_fam, wiphy_net(wiphy), msg, 0, + NL80211_MCGRP_MLME, GFP_KERNEL); + return; +out: + nlmsg_free(msg); +} + static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info) { struct cfg80211_registered_device *rdev = info->user_ptr[0]; @@ -6050,6 +6086,8 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info) if (info->attrs[NL80211_ATTR_SOCKET_OWNER]) wdev->conn_owner_nlportid = info->snd_portid; + + nl80211_send_ap_started(wdev, link_id); } out_unlock: wdev_unlock(wdev); -- cgit v1.2.3 From cba7217a9269e0c43cb858bdca33b291d6442068 Mon Sep 17 00:00:00 2001 From: Alvin Šipraga Date: Sat, 28 Jan 2023 13:58:44 +0100 Subject: wifi: nl80211: add MLO_LINK_ID to CMD_STOP_AP event MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit nl80211_send_ap_stopped() can be called multiple times on the same netdev for each link when using Multi-Link Operation. Add the MLO_LINK_ID attribute to the event to allow userspace to distinguish which link the event is for. Signed-off-by: Alvin Šipraga Link: https://lore.kernel.org/r/20230128125844.2407135-2-alvin@pqrs.dk Signed-off-by: Johannes Berg --- net/wireless/ap.c | 2 +- net/wireless/nl80211.c | 6 ++++-- net/wireless/nl80211.h | 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) (limited to 'net') diff --git a/net/wireless/ap.c b/net/wireless/ap.c index e68923200018..0962770303b2 100644 --- a/net/wireless/ap.c +++ b/net/wireless/ap.c @@ -39,7 +39,7 @@ static int ___cfg80211_stop_ap(struct cfg80211_registered_device *rdev, wdev->u.ap.ssid_len = 0; rdev_set_qos_map(rdev, dev, NULL); if (notify) - nl80211_send_ap_stopped(wdev); + nl80211_send_ap_stopped(wdev, link_id); /* Should we apply the grace period during beaconing interface * shutdown also? diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index f3a4b40c6726..50515a5de64d 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -19717,7 +19717,7 @@ void cfg80211_crit_proto_stopped(struct wireless_dev *wdev, gfp_t gfp) } EXPORT_SYMBOL(cfg80211_crit_proto_stopped); -void nl80211_send_ap_stopped(struct wireless_dev *wdev) +void nl80211_send_ap_stopped(struct wireless_dev *wdev, unsigned int link_id) { struct wiphy *wiphy = wdev->wiphy; struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy); @@ -19735,7 +19735,9 @@ void nl80211_send_ap_stopped(struct wireless_dev *wdev) if (nla_put_u32(msg, NL80211_ATTR_WIPHY, rdev->wiphy_idx) || nla_put_u32(msg, NL80211_ATTR_IFINDEX, wdev->netdev->ifindex) || nla_put_u64_64bit(msg, NL80211_ATTR_WDEV, wdev_id(wdev), - NL80211_ATTR_PAD)) + NL80211_ATTR_PAD) || + (wdev->valid_links && + nla_put_u8(msg, NL80211_ATTR_MLO_LINK_ID, link_id))) goto out; genlmsg_end(msg, hdr); diff --git a/net/wireless/nl80211.h b/net/wireless/nl80211.h index ba9457e94c43..0278d817bb02 100644 --- a/net/wireless/nl80211.h +++ b/net/wireless/nl80211.h @@ -114,7 +114,7 @@ nl80211_radar_notify(struct cfg80211_registered_device *rdev, enum nl80211_radar_event event, struct net_device *netdev, gfp_t gfp); -void nl80211_send_ap_stopped(struct wireless_dev *wdev); +void nl80211_send_ap_stopped(struct wireless_dev *wdev, unsigned int link_id); void cfg80211_rdev_free_coalesce(struct cfg80211_registered_device *rdev); -- cgit v1.2.3 From 90b2c3cc4b718d7e5591e812313dd72677d9fad0 Mon Sep 17 00:00:00 2001 From: Jaewan Kim Date: Mon, 30 Jan 2023 07:45:14 +0000 Subject: wifi: nl80211: return error message for malformed chandef Add an error message to the missing frequency case to have all -EINVAL in nl80211_parse_chandef() return a better error. Signed-off-by: Jaewan Kim Link: https://lore.kernel.org/r/20230130074514.1560021-1-jaewan@google.com [rewrite commit message] Signed-off-by: Johannes Berg --- net/wireless/nl80211.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 50515a5de64d..c82c66c32faa 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3181,8 +3181,11 @@ int nl80211_parse_chandef(struct cfg80211_registered_device *rdev, struct nlattr **attrs = info->attrs; u32 control_freq; - if (!attrs[NL80211_ATTR_WIPHY_FREQ]) + if (!attrs[NL80211_ATTR_WIPHY_FREQ]) { + NL_SET_ERR_MSG_ATTR(extack, attrs[NL80211_ATTR_WIPHY_FREQ], + "Frequency is missing"); return -EINVAL; + } control_freq = MHZ_TO_KHZ( nla_get_u32(info->attrs[NL80211_ATTR_WIPHY_FREQ])); -- cgit v1.2.3 From b25413fed3d43e1ed3340df4d928971bb8639f66 Mon Sep 17 00:00:00 2001 From: Aloka Dixit Date: Mon, 30 Jan 2023 16:12:24 -0800 Subject: wifi: cfg80211: move puncturing bitmap validation from mac80211 - Move ieee80211_valid_disable_subchannel_bitmap() from mlme.c to chan.c, rename it as cfg80211_valid_disable_subchannel_bitmap() and export it. - Modify the prototype to include struct cfg80211_chan_def instead of only bandwidth to support a check which returns false if the primary channel is punctured. Signed-off-by: Aloka Dixit Link: https://lore.kernel.org/r/20230131001227.25014-2-quic_alokad@quicinc.com Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 12 +++++++++ net/mac80211/mlme.c | 73 +++++--------------------------------------------- net/wireless/chan.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 87 insertions(+), 67 deletions(-) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 70f01ea5ba5c..191603f32b37 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -8952,4 +8952,16 @@ static inline int cfg80211_color_change_notify(struct net_device *dev) 0, 0); } +/** + * cfg80211_valid_disable_subchannel_bitmap - validate puncturing bitmap + * @bitmap: bitmap to be validated + * @chandef: channel definition + * + * Validate the puncturing bitmap. + * + * Return: %true if the bitmap is valid. %false otherwise. + */ +bool cfg80211_valid_disable_subchannel_bitmap(u16 *bitmap, + const struct cfg80211_chan_def *chandef); + #endif /* __NET_CFG80211_H */ diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index a14a5ea2bffd..09703cae2fbb 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -88,67 +88,6 @@ MODULE_PARM_DESC(probe_wait_ms, */ #define IEEE80211_SIGNAL_AVE_MIN_COUNT 4 -struct ieee80211_per_bw_puncturing_values { - u8 len; - const u16 *valid_values; -}; - -static const u16 puncturing_values_80mhz[] = { - 0x8, 0x4, 0x2, 0x1 -}; - -static const u16 puncturing_values_160mhz[] = { - 0x80, 0x40, 0x20, 0x10, 0x8, 0x4, 0x2, 0x1, 0xc0, 0x30, 0xc, 0x3 -}; - -static const u16 puncturing_values_320mhz[] = { - 0xc000, 0x3000, 0xc00, 0x300, 0xc0, 0x30, 0xc, 0x3, 0xf000, 0xf00, - 0xf0, 0xf, 0xfc00, 0xf300, 0xf0c0, 0xf030, 0xf00c, 0xf003, 0xc00f, - 0x300f, 0xc0f, 0x30f, 0xcf, 0x3f -}; - -#define IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(_bw) \ - { \ - .len = ARRAY_SIZE(puncturing_values_ ## _bw ## mhz), \ - .valid_values = puncturing_values_ ## _bw ## mhz \ - } - -static const struct ieee80211_per_bw_puncturing_values per_bw_puncturing[] = { - IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(80), - IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(160), - IEEE80211_PER_BW_VALID_PUNCTURING_VALUES(320) -}; - -static bool ieee80211_valid_disable_subchannel_bitmap(u16 *bitmap, - enum nl80211_chan_width bw) -{ - u32 idx, i; - - switch (bw) { - case NL80211_CHAN_WIDTH_80: - idx = 0; - break; - case NL80211_CHAN_WIDTH_160: - idx = 1; - break; - case NL80211_CHAN_WIDTH_320: - idx = 2; - break; - default: - *bitmap = 0; - break; - } - - if (!*bitmap) - return true; - - for (i = 0; i < per_bw_puncturing[idx].len; i++) - if (per_bw_puncturing[idx].valid_values[i] == *bitmap) - return true; - - return false; -} - /* * Extract from the given disabled subchannel bitmap (raw format * from the EHT Operation Element) the bits for the subchannel @@ -206,8 +145,8 @@ ieee80211_handle_puncturing_bitmap(struct ieee80211_link_data *link, ieee80211_extract_dis_subch_bmap(eht_oper, chandef, bitmap); - if (ieee80211_valid_disable_subchannel_bitmap(&bitmap, - chandef->width)) + if (cfg80211_valid_disable_subchannel_bitmap(&bitmap, + chandef)) break; link->u.mgd.conn_flags |= ieee80211_chandef_downgrade(chandef); @@ -5638,8 +5577,8 @@ static bool ieee80211_config_puncturing(struct ieee80211_link_data *link, extracted == link->conf->eht_puncturing) return true; - if (!ieee80211_valid_disable_subchannel_bitmap(&bitmap, - link->conf->chandef.width)) { + if (!cfg80211_valid_disable_subchannel_bitmap(&bitmap, + &link->conf->chandef)) { link_info(link, "Got an invalid disable subchannel bitmap from AP %pM: bitmap = 0x%x, bw = 0x%x. disconnect\n", link->u.mgd.bssid, @@ -7132,8 +7071,8 @@ ieee80211_setup_assoc_link(struct ieee80211_sub_if_data *sdata, u16 bitmap; bitmap = get_unaligned_le16(disable_subchannel_bitmap); - if (ieee80211_valid_disable_subchannel_bitmap(&bitmap, - link->conf->chandef.width)) + if (cfg80211_valid_disable_subchannel_bitmap(&bitmap, + &link->conf->chandef)) ieee80211_handle_puncturing_bitmap(link, eht_oper, bitmap, diff --git a/net/wireless/chan.c b/net/wireless/chan.c index 0e5835cd8c61..0b7e81db383d 100644 --- a/net/wireless/chan.c +++ b/net/wireless/chan.c @@ -1460,3 +1460,72 @@ struct cfg80211_chan_def *wdev_chandef(struct wireless_dev *wdev, } } EXPORT_SYMBOL(wdev_chandef); + +struct cfg80211_per_bw_puncturing_values { + u8 len; + const u16 *valid_values; +}; + +static const u16 puncturing_values_80mhz[] = { + 0x8, 0x4, 0x2, 0x1 +}; + +static const u16 puncturing_values_160mhz[] = { + 0x80, 0x40, 0x20, 0x10, 0x8, 0x4, 0x2, 0x1, 0xc0, 0x30, 0xc, 0x3 +}; + +static const u16 puncturing_values_320mhz[] = { + 0xc000, 0x3000, 0xc00, 0x300, 0xc0, 0x30, 0xc, 0x3, 0xf000, 0xf00, + 0xf0, 0xf, 0xfc00, 0xf300, 0xf0c0, 0xf030, 0xf00c, 0xf003, 0xc00f, + 0x300f, 0xc0f, 0x30f, 0xcf, 0x3f +}; + +#define CFG80211_PER_BW_VALID_PUNCTURING_VALUES(_bw) \ + { \ + .len = ARRAY_SIZE(puncturing_values_ ## _bw ## mhz), \ + .valid_values = puncturing_values_ ## _bw ## mhz \ + } + +static const struct cfg80211_per_bw_puncturing_values per_bw_puncturing[] = { + CFG80211_PER_BW_VALID_PUNCTURING_VALUES(80), + CFG80211_PER_BW_VALID_PUNCTURING_VALUES(160), + CFG80211_PER_BW_VALID_PUNCTURING_VALUES(320) +}; + +bool cfg80211_valid_disable_subchannel_bitmap(u16 *bitmap, + const struct cfg80211_chan_def *chandef) +{ + u32 idx, i, start_freq; + + switch (chandef->width) { + case NL80211_CHAN_WIDTH_80: + idx = 0; + start_freq = chandef->center_freq1 - 40; + break; + case NL80211_CHAN_WIDTH_160: + idx = 1; + start_freq = chandef->center_freq1 - 80; + break; + case NL80211_CHAN_WIDTH_320: + idx = 2; + start_freq = chandef->center_freq1 - 160; + break; + default: + *bitmap = 0; + break; + } + + if (!*bitmap) + return true; + + /* check if primary channel is punctured */ + if (*bitmap & (u16)BIT((chandef->chan->center_freq - start_freq) / 20)) + return false; + + for (i = 0; i < per_bw_puncturing[idx].len; i++) + if (per_bw_puncturing[idx].valid_values[i] == *bitmap) + return true; + + return false; +} +EXPORT_SYMBOL(cfg80211_valid_disable_subchannel_bitmap); -- cgit v1.2.3 From d7c1a9a0ed180d8884798ce97afe7283622a484f Mon Sep 17 00:00:00 2001 From: Aloka Dixit Date: Mon, 30 Jan 2023 16:12:25 -0800 Subject: wifi: nl80211: validate and configure puncturing bitmap - New feature flag, NL80211_EXT_FEATURE_PUNCT, to advertise driver support for preamble puncturing in AP mode. - New attribute, NL80211_ATTR_PUNCT_BITMAP, to receive a puncturing bitmap from the userspace during AP bring up (NL80211_CMD_START_AP) and channel switch (NL80211_CMD_CHANNEL_SWITCH) operations. Each bit corresponds to a 20 MHz channel in the operating bandwidth, lowest bit for the lowest channel. Bit set to 1 indicates that the channel is punctured. Higher 16 bits are reserved. - New members added to structures cfg80211_ap_settings and cfg80211_csa_settings to propagate the bitmap to the driver after validation. Signed-off-by: Aloka Dixit Signed-off-by: Muna Sinada Link: https://lore.kernel.org/r/20230131001227.25014-3-quic_alokad@quicinc.com [move validation against 0xffff into policy] Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 8 ++++++++ include/uapi/linux/nl80211.h | 11 +++++++++++ net/wireless/nl80211.c | 32 ++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 191603f32b37..fb8b79e6ac36 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1316,6 +1316,9 @@ struct cfg80211_unsol_bcast_probe_resp { * @fils_discovery: FILS discovery transmission parameters * @unsol_bcast_probe_resp: Unsolicited broadcast probe response parameters * @mbssid_config: AP settings for multiple bssid + * @punct_bitmap: Preamble puncturing bitmap. Each bit represents + * a 20 MHz channel, lowest bit corresponding to the lowest channel. + * Bit set to 1 indicates that the channel is punctured. */ struct cfg80211_ap_settings { struct cfg80211_chan_def chandef; @@ -1350,6 +1353,7 @@ struct cfg80211_ap_settings { struct cfg80211_fils_discovery fils_discovery; struct cfg80211_unsol_bcast_probe_resp unsol_bcast_probe_resp; struct cfg80211_mbssid_config mbssid_config; + u16 punct_bitmap; }; /** @@ -1367,6 +1371,9 @@ struct cfg80211_ap_settings { * @radar_required: whether radar detection is required on the new channel * @block_tx: whether transmissions should be blocked while changing * @count: number of beacons until switch + * @punct_bitmap: Preamble puncturing bitmap. Each bit represents + * a 20 MHz channel, lowest bit corresponding to the lowest channel. + * Bit set to 1 indicates that the channel is punctured. */ struct cfg80211_csa_settings { struct cfg80211_chan_def chandef; @@ -1379,6 +1386,7 @@ struct cfg80211_csa_settings { bool radar_required; bool block_tx; u8 count; + u16 punct_bitmap; }; /** diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h index a984c6c4cf02..831660956ab2 100644 --- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -2769,6 +2769,12 @@ enum nl80211_commands { * the incoming frame RX timestamp. * @NL80211_ATTR_TD_BITMAP: Transition Disable bitmap, for subsequent * (re)associations. + * + * @NL80211_ATTR_PUNCT_BITMAP: (u32) Preamble puncturing bitmap, lowest + * bit corresponds to the lowest 20 MHz channel. Each bit set to 1 + * indicates that the sub-channel is punctured. Higher 16 bits are + * reserved. + * * @NUM_NL80211_ATTR: total number of nl80211_attrs available * @NL80211_ATTR_MAX: highest attribute number currently defined * @__NL80211_ATTR_AFTER_LAST: internal use @@ -3298,6 +3304,8 @@ enum nl80211_attrs { NL80211_ATTR_RX_HW_TIMESTAMP, NL80211_ATTR_TD_BITMAP, + NL80211_ATTR_PUNCT_BITMAP, + /* add attributes here, update the policy in nl80211.c */ __NL80211_ATTR_AFTER_LAST, @@ -6313,6 +6321,8 @@ enum nl80211_feature_flags { * might apply, e.g. no scans in progress, no offchannel operations * in progress, and no active connections. * + * @NL80211_EXT_FEATURE_PUNCT: Driver supports preamble puncturing in AP mode. + * * @NUM_NL80211_EXT_FEATURES: number of extended features. * @MAX_NL80211_EXT_FEATURES: highest extended feature index. */ @@ -6381,6 +6391,7 @@ enum nl80211_ext_feature_index { NL80211_EXT_FEATURE_FILS_CRYPTO_OFFLOAD, NL80211_EXT_FEATURE_RADAR_BACKGROUND, NL80211_EXT_FEATURE_POWERED_ADDR_CHANGE, + NL80211_EXT_FEATURE_PUNCT, /* add new features before the definition below */ NUM_NL80211_EXT_FEATURES, diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index c82c66c32faa..683adeb3c9e3 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -805,6 +805,7 @@ static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_MLD_ADDR] = NLA_POLICY_EXACT_LEN(ETH_ALEN), [NL80211_ATTR_MLO_SUPPORT] = { .type = NLA_FLAG }, [NL80211_ATTR_MAX_NUM_AKM_SUITES] = { .type = NLA_REJECT }, + [NL80211_ATTR_PUNCT_BITMAP] = NLA_POLICY_RANGE(NLA_U8, 0, 0xffff), }; /* policy for the key attributes */ @@ -3173,6 +3174,21 @@ static bool nl80211_can_set_dev_channel(struct wireless_dev *wdev) wdev->iftype == NL80211_IFTYPE_P2P_GO; } +static int nl80211_parse_punct_bitmap(struct cfg80211_registered_device *rdev, + struct genl_info *info, + const struct cfg80211_chan_def *chandef, + u16 *punct_bitmap) +{ + if (!wiphy_ext_feature_isset(&rdev->wiphy, NL80211_EXT_FEATURE_PUNCT)) + return -EINVAL; + + *punct_bitmap = nla_get_u32(info->attrs[NL80211_ATTR_PUNCT_BITMAP]); + if (!cfg80211_valid_disable_subchannel_bitmap(punct_bitmap, chandef)) + return -EINVAL; + + return 0; +} + int nl80211_parse_chandef(struct cfg80211_registered_device *rdev, struct genl_info *info, struct cfg80211_chan_def *chandef) @@ -5957,6 +5973,14 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info) goto out; } + if (info->attrs[NL80211_ATTR_PUNCT_BITMAP]) { + err = nl80211_parse_punct_bitmap(rdev, info, + ¶ms->chandef, + ¶ms->punct_bitmap); + if (err) + goto out; + } + if (!cfg80211_reg_can_beacon_relax(&rdev->wiphy, ¶ms->chandef, wdev->iftype)) { err = -EINVAL; @@ -10114,6 +10138,14 @@ skip_beacons: if (info->attrs[NL80211_ATTR_CH_SWITCH_BLOCK_TX]) params.block_tx = true; + if (info->attrs[NL80211_ATTR_PUNCT_BITMAP]) { + err = nl80211_parse_punct_bitmap(rdev, info, + ¶ms.chandef, + ¶ms.punct_bitmap); + if (err) + goto free; + } + wdev_lock(wdev); err = rdev_channel_switch(rdev, dev, ¶ms); wdev_unlock(wdev); -- cgit v1.2.3 From b345f0637c0042f9e6b78378a32256d90f485774 Mon Sep 17 00:00:00 2001 From: Aloka Dixit Date: Mon, 30 Jan 2023 16:12:26 -0800 Subject: wifi: cfg80211: include puncturing bitmap in channel switch events Add puncturing bitmap in channel switch notifications and corresponding trace functions. Signed-off-by: Aloka Dixit Link: https://lore.kernel.org/r/20230131001227.25014-4-quic_alokad@quicinc.com [fix qtnfmac] Signed-off-by: Johannes Berg --- drivers/net/wireless/ath/ath6kl/cfg80211.c | 2 +- drivers/net/wireless/marvell/mwifiex/11h.c | 2 +- drivers/net/wireless/quantenna/qtnfmac/event.c | 2 +- include/net/cfg80211.h | 6 ++++-- net/mac80211/cfg.c | 5 +++-- net/mac80211/mlme.c | 4 ++-- net/wireless/nl80211.c | 20 +++++++++++++------- net/wireless/trace.h | 24 ++++++++++++++++-------- 8 files changed, 41 insertions(+), 24 deletions(-) (limited to 'net') diff --git a/drivers/net/wireless/ath/ath6kl/cfg80211.c b/drivers/net/wireless/ath/ath6kl/cfg80211.c index a20e0aeae284..0c2b8b1a10d5 100644 --- a/drivers/net/wireless/ath/ath6kl/cfg80211.c +++ b/drivers/net/wireless/ath/ath6kl/cfg80211.c @@ -1119,7 +1119,7 @@ void ath6kl_cfg80211_ch_switch_notify(struct ath6kl_vif *vif, int freq, NL80211_CHAN_HT20 : NL80211_CHAN_NO_HT); mutex_lock(&vif->wdev.mtx); - cfg80211_ch_switch_notify(vif->ndev, &chandef, 0); + cfg80211_ch_switch_notify(vif->ndev, &chandef, 0, 0); mutex_unlock(&vif->wdev.mtx); } diff --git a/drivers/net/wireless/marvell/mwifiex/11h.c b/drivers/net/wireless/marvell/mwifiex/11h.c index 6a9d7bc1f41e..b0c40a776a2e 100644 --- a/drivers/net/wireless/marvell/mwifiex/11h.c +++ b/drivers/net/wireless/marvell/mwifiex/11h.c @@ -292,6 +292,6 @@ void mwifiex_dfs_chan_sw_work_queue(struct work_struct *work) mwifiex_dbg(priv->adapter, MSG, "indicating channel switch completion to kernel\n"); mutex_lock(&priv->wdev.mtx); - cfg80211_ch_switch_notify(priv->netdev, &priv->dfs_chandef, 0); + cfg80211_ch_switch_notify(priv->netdev, &priv->dfs_chandef, 0, 0); mutex_unlock(&priv->wdev.mtx); } diff --git a/drivers/net/wireless/quantenna/qtnfmac/event.c b/drivers/net/wireless/quantenna/qtnfmac/event.c index f0158737aed6..31bc58e96ac0 100644 --- a/drivers/net/wireless/quantenna/qtnfmac/event.c +++ b/drivers/net/wireless/quantenna/qtnfmac/event.c @@ -478,7 +478,7 @@ qtnf_event_handle_freq_change(struct qtnf_wmac *mac, continue; mutex_lock(&vif->wdev.mtx); - cfg80211_ch_switch_notify(vif->netdev, &chandef, 0); + cfg80211_ch_switch_notify(vif->netdev, &chandef, 0, 0); mutex_unlock(&vif->wdev.mtx); } diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index fb8b79e6ac36..e570530191ec 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -8326,13 +8326,14 @@ bool cfg80211_reg_can_beacon_relax(struct wiphy *wiphy, * @dev: the device which switched channels * @chandef: the new channel definition * @link_id: the link ID for MLO, must be 0 for non-MLO + * @punct_bitmap: the new puncturing bitmap * * Caller must acquire wdev_lock, therefore must only be called from sleepable * driver context! */ void cfg80211_ch_switch_notify(struct net_device *dev, struct cfg80211_chan_def *chandef, - unsigned int link_id); + unsigned int link_id, u16 punct_bitmap); /* * cfg80211_ch_switch_started_notify - notify channel switch start @@ -8341,6 +8342,7 @@ void cfg80211_ch_switch_notify(struct net_device *dev, * @link_id: the link ID for MLO, must be 0 for non-MLO * @count: the number of TBTTs until the channel switch happens * @quiet: whether or not immediate quiet was requested by the AP + * @punct_bitmap: the future puncturing bitmap * * Inform the userspace about the channel switch that has just * started, so that it can take appropriate actions (eg. starting @@ -8349,7 +8351,7 @@ void cfg80211_ch_switch_notify(struct net_device *dev, void cfg80211_ch_switch_started_notify(struct net_device *dev, struct cfg80211_chan_def *chandef, unsigned int link_id, u8 count, - bool quiet); + bool quiet, u16 punct_bitmap); /** * ieee80211_operating_class_to_band - convert operating class to band diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 686309648cff..f723b2c2d20f 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -3587,7 +3587,8 @@ static int __ieee80211_csa_finalize(struct ieee80211_sub_if_data *sdata) if (err) return err; - cfg80211_ch_switch_notify(sdata->dev, &sdata->deflink.csa_chandef, 0); + cfg80211_ch_switch_notify(sdata->dev, &sdata->deflink.csa_chandef, 0, + 0); return 0; } @@ -3859,7 +3860,7 @@ __ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, cfg80211_ch_switch_started_notify(sdata->dev, &sdata->deflink.csa_chandef, 0, - params->count, params->block_tx); + params->count, params->block_tx, 0); if (changed) { ieee80211_link_info_change_notify(sdata, &sdata->deflink, diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 09703cae2fbb..60792dfabc9d 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -1778,7 +1778,7 @@ static void ieee80211_chswitch_post_beacon(struct ieee80211_link_data *link) return; } - cfg80211_ch_switch_notify(sdata->dev, &link->reserved_chandef, 0); + cfg80211_ch_switch_notify(sdata->dev, &link->reserved_chandef, 0, 0); } void ieee80211_chswitch_done(struct ieee80211_vif *vif, bool success) @@ -1988,7 +1988,7 @@ ieee80211_sta_process_chanswitch(struct ieee80211_link_data *link, mutex_unlock(&local->mtx); cfg80211_ch_switch_started_notify(sdata->dev, &csa_ie.chandef, 0, - csa_ie.count, csa_ie.mode); + csa_ie.count, csa_ie.mode, 0); if (local->ops->channel_switch) { /* use driver's channel switch callback */ diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 683adeb3c9e3..22e6bc9a44cb 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -19045,7 +19045,7 @@ static void nl80211_ch_switch_notify(struct cfg80211_registered_device *rdev, struct cfg80211_chan_def *chandef, gfp_t gfp, enum nl80211_commands notif, - u8 count, bool quiet) + u8 count, bool quiet, u16 punct_bitmap) { struct wireless_dev *wdev = netdev->ieee80211_ptr; struct sk_buff *msg; @@ -19079,6 +19079,9 @@ static void nl80211_ch_switch_notify(struct cfg80211_registered_device *rdev, goto nla_put_failure; } + if (nla_put_u32(msg, NL80211_ATTR_PUNCT_BITMAP, punct_bitmap)) + goto nla_put_failure; + genlmsg_end(msg, hdr); genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0, @@ -19091,7 +19094,7 @@ static void nl80211_ch_switch_notify(struct cfg80211_registered_device *rdev, void cfg80211_ch_switch_notify(struct net_device *dev, struct cfg80211_chan_def *chandef, - unsigned int link_id) + unsigned int link_id, u16 punct_bitmap) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct wiphy *wiphy = wdev->wiphy; @@ -19100,7 +19103,7 @@ void cfg80211_ch_switch_notify(struct net_device *dev, ASSERT_WDEV_LOCK(wdev); WARN_INVALID_LINK_ID(wdev, link_id); - trace_cfg80211_ch_switch_notify(dev, chandef, link_id); + trace_cfg80211_ch_switch_notify(dev, chandef, link_id, punct_bitmap); switch (wdev->iftype) { case NL80211_IFTYPE_STATION: @@ -19128,14 +19131,15 @@ void cfg80211_ch_switch_notify(struct net_device *dev, cfg80211_sched_dfs_chan_update(rdev); nl80211_ch_switch_notify(rdev, dev, link_id, chandef, GFP_KERNEL, - NL80211_CMD_CH_SWITCH_NOTIFY, 0, false); + NL80211_CMD_CH_SWITCH_NOTIFY, 0, false, + punct_bitmap); } EXPORT_SYMBOL(cfg80211_ch_switch_notify); void cfg80211_ch_switch_started_notify(struct net_device *dev, struct cfg80211_chan_def *chandef, unsigned int link_id, u8 count, - bool quiet) + bool quiet, u16 punct_bitmap) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct wiphy *wiphy = wdev->wiphy; @@ -19144,11 +19148,13 @@ void cfg80211_ch_switch_started_notify(struct net_device *dev, ASSERT_WDEV_LOCK(wdev); WARN_INVALID_LINK_ID(wdev, link_id); - trace_cfg80211_ch_switch_started_notify(dev, chandef, link_id); + trace_cfg80211_ch_switch_started_notify(dev, chandef, link_id, + punct_bitmap); + nl80211_ch_switch_notify(rdev, dev, link_id, chandef, GFP_KERNEL, NL80211_CMD_CH_SWITCH_STARTED_NOTIFY, - count, quiet); + count, quiet, punct_bitmap); } EXPORT_SYMBOL(cfg80211_ch_switch_started_notify); diff --git a/net/wireless/trace.h b/net/wireless/trace.h index 0e688696fc60..ca7474eec723 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -3244,39 +3244,47 @@ TRACE_EVENT(cfg80211_chandef_dfs_required, TRACE_EVENT(cfg80211_ch_switch_notify, TP_PROTO(struct net_device *netdev, struct cfg80211_chan_def *chandef, - unsigned int link_id), - TP_ARGS(netdev, chandef, link_id), + unsigned int link_id, + u16 punct_bitmap), + TP_ARGS(netdev, chandef, link_id, punct_bitmap), TP_STRUCT__entry( NETDEV_ENTRY CHAN_DEF_ENTRY __field(unsigned int, link_id) + __field(u16, punct_bitmap) ), TP_fast_assign( NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->link_id = link_id; + __entry->punct_bitmap = punct_bitmap; ), - TP_printk(NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d", - NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id) + TP_printk(NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d, punct_bitmap:%u", + NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id, + __entry->punct_bitmap) ); TRACE_EVENT(cfg80211_ch_switch_started_notify, TP_PROTO(struct net_device *netdev, struct cfg80211_chan_def *chandef, - unsigned int link_id), - TP_ARGS(netdev, chandef, link_id), + unsigned int link_id, + u16 punct_bitmap), + TP_ARGS(netdev, chandef, link_id, punct_bitmap), TP_STRUCT__entry( NETDEV_ENTRY CHAN_DEF_ENTRY __field(unsigned int, link_id) + __field(u16, punct_bitmap) ), TP_fast_assign( NETDEV_ASSIGN; CHAN_DEF_ASSIGN(chandef); __entry->link_id = link_id; + __entry->punct_bitmap = punct_bitmap; ), - TP_printk(NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d", - NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id) + TP_printk(NETDEV_PR_FMT ", " CHAN_DEF_PR_FMT ", link:%d, punct_bitmap:%u", + NETDEV_PR_ARG, CHAN_DEF_PR_ARG, __entry->link_id, + __entry->punct_bitmap) ); TRACE_EVENT(cfg80211_radar_event, -- cgit v1.2.3 From 2cc25e4b2a04cdd90dbb2916678745565cc4aeed Mon Sep 17 00:00:00 2001 From: Aloka Dixit Date: Mon, 30 Jan 2023 16:12:27 -0800 Subject: wifi: mac80211: configure puncturing bitmap - Configure the bitmap in link_conf and notify the driver. - Modify 'change' in ieee80211_start_ap() from u32 to u64 to support BSS_CHANGED_EHT_PUNCTURING. - Propagate the bitmap in channel switch events to userspace. Signed-off-by: Aloka Dixit Signed-off-by: Muna Sinada Link: https://lore.kernel.org/r/20230131001227.25014-5-quic_alokad@quicinc.com Signed-off-by: Johannes Berg --- include/net/mac80211.h | 3 +++ net/mac80211/cfg.c | 22 +++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/net/mac80211.h b/include/net/mac80211.h index 54ffc0cc2918..219fd15893b0 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -645,6 +645,7 @@ struct ieee80211_fils_discovery { * @csa_active: marks whether a channel switch is going on. Internally it is * write-protected by sdata_lock and local->mtx so holding either is fine * for read access. + * @csa_punct_bitmap: new puncturing bitmap for channel switch * @mu_mimo_owner: indicates interface owns MU-MIMO capability * @chanctx_conf: The channel context this interface is assigned to, or %NULL * when it is not assigned. This pointer is RCU-protected due to the TX @@ -741,6 +742,8 @@ struct ieee80211_bss_conf { u16 eht_puncturing; bool csa_active; + u16 csa_punct_bitmap; + bool mu_mimo_owner; struct ieee80211_chanctx_conf __rcu *chanctx_conf; diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index f723b2c2d20f..d2519a4db653 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1220,7 +1220,7 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, struct ieee80211_local *local = sdata->local; struct beacon_data *old; struct ieee80211_sub_if_data *vlan; - u32 changed = BSS_CHANGED_BEACON_INT | + u64 changed = BSS_CHANGED_BEACON_INT | BSS_CHANGED_BEACON_ENABLED | BSS_CHANGED_BEACON | BSS_CHANGED_P2P_PS | @@ -1296,6 +1296,11 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev, IEEE80211_HE_PHY_CAP2_UL_MU_FULL_MU_MIMO; } + if (params->eht_cap) { + link_conf->eht_puncturing = params->punct_bitmap; + changed |= BSS_CHANGED_EHT_PUNCTURING; + } + if (sdata->vif.type == NL80211_IFTYPE_AP && params->mbssid_config.tx_wdev) { err = ieee80211_set_ap_mbssid_options(sdata, @@ -3546,6 +3551,12 @@ static int __ieee80211_csa_finalize(struct ieee80211_sub_if_data *sdata) lockdep_assert_held(&local->mtx); lockdep_assert_held(&local->chanctx_mtx); + if (sdata->vif.bss_conf.eht_puncturing != sdata->vif.bss_conf.csa_punct_bitmap) { + sdata->vif.bss_conf.eht_puncturing = + sdata->vif.bss_conf.csa_punct_bitmap; + changed |= BSS_CHANGED_EHT_PUNCTURING; + } + /* * using reservation isn't immediate as it may be deferred until later * with multi-vif. once reservation is complete it will re-schedule the @@ -3588,7 +3599,7 @@ static int __ieee80211_csa_finalize(struct ieee80211_sub_if_data *sdata) return err; cfg80211_ch_switch_notify(sdata->dev, &sdata->deflink.csa_chandef, 0, - 0); + sdata->vif.bss_conf.eht_puncturing); return 0; } @@ -3850,9 +3861,13 @@ __ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, goto out; } + if (params->punct_bitmap && !sdata->vif.bss_conf.eht_support) + goto out; + sdata->deflink.csa_chandef = params->chandef; sdata->deflink.csa_block_tx = params->block_tx; sdata->vif.bss_conf.csa_active = true; + sdata->vif.bss_conf.csa_punct_bitmap = params->punct_bitmap; if (sdata->deflink.csa_block_tx) ieee80211_stop_vif_queues(local, sdata, @@ -3860,7 +3875,8 @@ __ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, cfg80211_ch_switch_started_notify(sdata->dev, &sdata->deflink.csa_chandef, 0, - params->count, params->block_tx, 0); + params->count, params->block_tx, + sdata->vif.bss_conf.csa_punct_bitmap); if (changed) { ieee80211_link_info_change_notify(sdata, &sdata->deflink, -- cgit v1.2.3 From 19085ef39fa3dd27fa76d1c86dd448403101dcf7 Mon Sep 17 00:00:00 2001 From: Rameshkumar Sundaram Date: Wed, 1 Feb 2023 11:46:02 +0530 Subject: wifi: cfg80211: Allow action frames to be transmitted with link BSS in MLD Currently action frames TX only with ML address as A3(BSSID) are allowed in an ML AP, but TX for a non-ML Station can happen in any link of an ML BSS with link BSS address as A3. In case of an MLD, if User-space has provided a valid link_id in action frame TX request, allow transmission of the frame in that link. Signed-off-by: Rameshkumar Sundaram Link: https://lore.kernel.org/r/20230201061602.3918-1-quic_ramess@quicinc.com Signed-off-by: Johannes Berg --- net/wireless/mlme.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c index 58e1fb18f85a..81d3f40d6235 100644 --- a/net/wireless/mlme.c +++ b/net/wireless/mlme.c @@ -742,7 +742,10 @@ int cfg80211_mlme_mgmt_tx(struct cfg80211_registered_device *rdev, case NL80211_IFTYPE_AP: case NL80211_IFTYPE_P2P_GO: case NL80211_IFTYPE_AP_VLAN: - if (!ether_addr_equal(mgmt->bssid, wdev_address(wdev))) + if (!ether_addr_equal(mgmt->bssid, wdev_address(wdev)) && + (params->link_id < 0 || + !ether_addr_equal(mgmt->bssid, + wdev->links[params->link_id].addr))) err = -EINVAL; break; case NL80211_IFTYPE_MESH_POINT: -- cgit v1.2.3 From 796703baead0c2862f7f2ebb9b177590af533035 Mon Sep 17 00:00:00 2001 From: Bo Liu Date: Mon, 6 Feb 2023 03:16:41 -0500 Subject: rfkill: Use sysfs_emit() to instead of sprintf() Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230206081641.3193-1-liubo03@inspur.com Signed-off-by: Johannes Berg --- net/rfkill/core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'net') diff --git a/net/rfkill/core.c b/net/rfkill/core.c index b390ff245d5e..01fca7a10b4b 100644 --- a/net/rfkill/core.c +++ b/net/rfkill/core.c @@ -685,7 +685,7 @@ static ssize_t name_show(struct device *dev, struct device_attribute *attr, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%s\n", rfkill->name); + return sysfs_emit(buf, "%s\n", rfkill->name); } static DEVICE_ATTR_RO(name); @@ -694,7 +694,7 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%s\n", rfkill_types[rfkill->type]); + return sysfs_emit(buf, "%s\n", rfkill_types[rfkill->type]); } static DEVICE_ATTR_RO(type); @@ -703,7 +703,7 @@ static ssize_t index_show(struct device *dev, struct device_attribute *attr, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%d\n", rfkill->idx); + return sysfs_emit(buf, "%d\n", rfkill->idx); } static DEVICE_ATTR_RO(index); @@ -712,7 +712,7 @@ static ssize_t persistent_show(struct device *dev, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%d\n", rfkill->persistent); + return sysfs_emit(buf, "%d\n", rfkill->persistent); } static DEVICE_ATTR_RO(persistent); @@ -721,7 +721,7 @@ static ssize_t hard_show(struct device *dev, struct device_attribute *attr, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%d\n", (rfkill->state & RFKILL_BLOCK_HW) ? 1 : 0 ); + return sysfs_emit(buf, "%d\n", (rfkill->state & RFKILL_BLOCK_HW) ? 1 : 0); } static DEVICE_ATTR_RO(hard); @@ -730,7 +730,7 @@ static ssize_t soft_show(struct device *dev, struct device_attribute *attr, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%d\n", (rfkill->state & RFKILL_BLOCK_SW) ? 1 : 0 ); + return sysfs_emit(buf, "%d\n", (rfkill->state & RFKILL_BLOCK_SW) ? 1 : 0); } static ssize_t soft_store(struct device *dev, struct device_attribute *attr, @@ -764,7 +764,7 @@ static ssize_t hard_block_reasons_show(struct device *dev, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "0x%lx\n", rfkill->hard_block_reasons); + return sysfs_emit(buf, "0x%lx\n", rfkill->hard_block_reasons); } static DEVICE_ATTR_RO(hard_block_reasons); @@ -783,7 +783,7 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr, { struct rfkill *rfkill = to_rfkill(dev); - return sprintf(buf, "%d\n", user_state_from_blocked(rfkill->state)); + return sysfs_emit(buf, "%d\n", user_state_from_blocked(rfkill->state)); } static ssize_t state_store(struct device *dev, struct device_attribute *attr, -- cgit v1.2.3 From 59336e07b287d91dc4ec265e07724e8f7e3d0209 Mon Sep 17 00:00:00 2001 From: Shayne Chen Date: Thu, 9 Feb 2023 19:06:59 +0800 Subject: wifi: mac80211: make rate u32 in sta_set_rate_info_rx() The value of last_rate in ieee80211_sta_rx_stats is degraded from u32 to u16 after being assigned to rate variable, which causes information loss in STA_STATS_FIELD_TYPE and later bitfields. Signed-off-by: Shayne Chen Link: https://lore.kernel.org/r/20230209110659.25447-1-shayne.chen@mediatek.com Fixes: 41cbb0f5a295 ("mac80211: add support for HE") Signed-off-by: Johannes Berg --- net/mac80211/sta_info.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 27c737fe7fb8..bd532d3f925d 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -2418,7 +2418,7 @@ static void sta_stats_decode_rate(struct ieee80211_local *local, u32 rate, static int sta_set_rate_info_rx(struct sta_info *sta, struct rate_info *rinfo) { - u16 rate = READ_ONCE(sta_get_last_rx_stats(sta)->last_rate); + u32 rate = READ_ONCE(sta_get_last_rx_stats(sta)->last_rate); if (rate == STA_STATS_RATE_INVALID) return -EINVAL; -- cgit v1.2.3 From 0f690e6b4dcd7243e2805a76981b252c2d4bdce6 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 13 Feb 2023 11:08:51 +0100 Subject: wifi: cfg80211: move A-MSDU check in ieee80211_data_to_8023_exthdr When parsing the outer A-MSDU header, don't check for inner bridge tunnel or RFC1042 headers. This is handled by ieee80211_amsdu_to_8023s already. Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20230213100855.34315-1-nbd@nbd.name Signed-off-by: Johannes Berg --- net/wireless/util.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/wireless/util.c b/net/wireless/util.c index 38d3b434c18c..430962f97e74 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -631,8 +631,9 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, break; } - if (likely(skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && - ((!is_amsdu && ether_addr_equal(payload.hdr, rfc1042_header) && + if (likely(!is_amsdu && + skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && + ((ether_addr_equal(payload.hdr, rfc1042_header) && payload.proto != htons(ETH_P_AARP) && payload.proto != htons(ETH_P_IPX)) || ether_addr_equal(payload.hdr, bridge_tunnel_header)))) { -- cgit v1.2.3 From 9f718554e7eacea62d3f972cae24d969755bf3b6 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 13 Feb 2023 11:08:52 +0100 Subject: wifi: cfg80211: factor out bridge tunnel / RFC1042 header check The same check is done in multiple places, unify it. Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20230213100855.34315-2-nbd@nbd.name Signed-off-by: Johannes Berg --- net/wireless/util.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) (limited to 'net') diff --git a/net/wireless/util.c b/net/wireless/util.c index 430962f97e74..4498807b8b70 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -542,6 +542,21 @@ unsigned int ieee80211_get_mesh_hdrlen(struct ieee80211s_hdr *meshhdr) } EXPORT_SYMBOL(ieee80211_get_mesh_hdrlen); +static bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) +{ + const __be16 *hdr_proto = hdr + ETH_ALEN; + + if (!(ether_addr_equal(hdr, rfc1042_header) && + *hdr_proto != htons(ETH_P_AARP) && + *hdr_proto != htons(ETH_P_IPX)) && + !ether_addr_equal(hdr, bridge_tunnel_header)) + return false; + + *proto = *hdr_proto; + + return true; +} + int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, const u8 *addr, enum nl80211_iftype iftype, u8 data_offset, bool is_amsdu) @@ -633,14 +648,9 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, if (likely(!is_amsdu && skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && - ((ether_addr_equal(payload.hdr, rfc1042_header) && - payload.proto != htons(ETH_P_AARP) && - payload.proto != htons(ETH_P_IPX)) || - ether_addr_equal(payload.hdr, bridge_tunnel_header)))) { - /* remove RFC1042 or Bridge-Tunnel encapsulation and - * replace EtherType */ + ieee80211_get_8023_tunnel_proto(&payload, &tmp.h_proto))) { + /* remove RFC1042 or Bridge-Tunnel encapsulation */ hdrlen += ETH_ALEN + 2; - tmp.h_proto = payload.proto; skb_postpull_rcsum(skb, &payload, ETH_ALEN + 2); } else { tmp.h_proto = htons(skb->len - hdrlen); @@ -756,8 +766,6 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, { unsigned int hlen = ALIGN(extra_headroom, 4); struct sk_buff *frame = NULL; - u16 ethertype; - u8 *payload; int offset = 0, remaining; struct ethhdr eth; bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb); @@ -811,14 +819,8 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, frame->dev = skb->dev; frame->priority = skb->priority; - payload = frame->data; - ethertype = (payload[6] << 8) | payload[7]; - if (likely((ether_addr_equal(payload, rfc1042_header) && - ethertype != ETH_P_AARP && ethertype != ETH_P_IPX) || - ether_addr_equal(payload, bridge_tunnel_header))) { - eth.h_proto = htons(ethertype); + if (likely(ieee80211_get_8023_tunnel_proto(frame->data, ð.h_proto))) skb_pull(frame, ETH_ALEN + 2); - } memcpy(skb_push(frame, sizeof(eth)), ð, sizeof(eth)); __skb_queue_tail(list, frame); -- cgit v1.2.3 From 5c1e269aa5ebafeec69b68ff560522faa5bcb6c1 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 13 Feb 2023 11:08:53 +0100 Subject: wifi: mac80211: remove mesh forwarding congestion check Now that all drivers use iTXQ, it does not make sense to check to drop tx forwarding packets when the driver has stopped the queues. fq_codel will take care of dropping packets when the queues fill up Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20230213100855.34315-3-nbd@nbd.name Signed-off-by: Johannes Berg --- net/mac80211/debugfs_netdev.c | 3 --- net/mac80211/ieee80211_i.h | 1 - net/mac80211/rx.c | 5 ----- 3 files changed, 9 deletions(-) (limited to 'net') diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c index c87e1137e5da..0bac9af3ca96 100644 --- a/net/mac80211/debugfs_netdev.c +++ b/net/mac80211/debugfs_netdev.c @@ -603,8 +603,6 @@ IEEE80211_IF_FILE(fwded_mcast, u.mesh.mshstats.fwded_mcast, DEC); IEEE80211_IF_FILE(fwded_unicast, u.mesh.mshstats.fwded_unicast, DEC); IEEE80211_IF_FILE(fwded_frames, u.mesh.mshstats.fwded_frames, DEC); IEEE80211_IF_FILE(dropped_frames_ttl, u.mesh.mshstats.dropped_frames_ttl, DEC); -IEEE80211_IF_FILE(dropped_frames_congestion, - u.mesh.mshstats.dropped_frames_congestion, DEC); IEEE80211_IF_FILE(dropped_frames_no_route, u.mesh.mshstats.dropped_frames_no_route, DEC); @@ -740,7 +738,6 @@ static void add_mesh_stats(struct ieee80211_sub_if_data *sdata) MESHSTATS_ADD(fwded_frames); MESHSTATS_ADD(dropped_frames_ttl); MESHSTATS_ADD(dropped_frames_no_route); - MESHSTATS_ADD(dropped_frames_congestion); #undef MESHSTATS_ADD } diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 517b1840a8e3..ecc232eb1ee8 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -327,7 +327,6 @@ struct mesh_stats { __u32 fwded_frames; /* Mesh total forwarded frames */ __u32 dropped_frames_ttl; /* Not transmitted since mesh_ttl == 0*/ __u32 dropped_frames_no_route; /* Not transmitted, no route found */ - __u32 dropped_frames_congestion;/* Not forwarded due to congestion */ }; #define PREQ_Q_F_START 0x1 diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index e284897ba5e9..519bd9daea75 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2926,11 +2926,6 @@ ieee80211_rx_h_mesh_fwding(struct ieee80211_rx_data *rx) return RX_CONTINUE; ac = ieee802_1d_to_ac[skb->priority]; - q = sdata->vif.hw_queue[ac]; - if (ieee80211_queue_stopped(&local->hw, q)) { - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_congestion); - return RX_DROP_MONITOR; - } skb_set_queue_mapping(skb, ac); if (!--mesh_hdr->ttl) { -- cgit v1.2.3 From 986e43b19ae9176093da35e0a844e65c8bf9ede7 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 13 Feb 2023 11:08:54 +0100 Subject: wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces The current mac80211 mesh A-MSDU receive path fails to parse A-MSDU packets on mesh interfaces, because it assumes that the Mesh Control field is always directly after the 802.11 header. 802.11-2020 9.3.2.2.2 Figure 9-70 shows that the Mesh Control field is actually part of the A-MSDU subframe header. This makes more sense, since it allows packets for multiple different destinations to be included in the same A-MSDU, as long as RA and TID are still the same. Another issue is the fact that the A-MSDU subframe length field was apparently accidentally defined as little-endian in the standard. In order to fix this, the mesh forwarding path needs happen at a different point in the receive path. ieee80211_data_to_8023_exthdr is changed to ignore the mesh control field and leave it in after the ethernet header. This also affects the source/dest MAC address fields, which now in the case of mesh point to the mesh SA/DA. ieee80211_amsdu_to_8023s is changed to deal with the endian difference and to add the Mesh Control length to the subframe length, since it's not covered by the MSDU length field. With these changes, the mac80211 will get the same packet structure for converted regular data packets and unpacked A-MSDU subframes. The mesh forwarding checks are now only performed after the A-MSDU decap. For locally received packets, the Mesh Control header is stripped away. For forwarded packets, a new 802.11 header gets added. Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20230213100855.34315-4-nbd@nbd.name [fix fortify build error] Signed-off-by: Johannes Berg --- .../net/wireless/marvell/mwifiex/11n_rxreorder.c | 2 +- include/net/cfg80211.h | 27 +- net/mac80211/rx.c | 350 ++++++++++++--------- net/wireless/util.c | 120 ++++--- 4 files changed, 297 insertions(+), 202 deletions(-) (limited to 'net') diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c index a04b66284af4..391793a16adc 100644 --- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c +++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c @@ -33,7 +33,7 @@ static int mwifiex_11n_dispatch_amsdu_pkt(struct mwifiex_private *priv, skb_trim(skb, le16_to_cpu(local_rx_pd->rx_pkt_length)); ieee80211_amsdu_to_8023s(skb, &list, priv->curr_addr, - priv->wdev.iftype, 0, NULL, NULL); + priv->wdev.iftype, 0, NULL, NULL, false); while (!skb_queue_empty(&list)) { struct rx_packet_hdr *rx_hdr; diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index e570530191ec..c506dc128685 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -6251,11 +6251,36 @@ static inline int ieee80211_data_to_8023(struct sk_buff *skb, const u8 *addr, * @extra_headroom: The hardware extra headroom for SKBs in the @list. * @check_da: DA to check in the inner ethernet header, or NULL * @check_sa: SA to check in the inner ethernet header, or NULL + * @mesh_control: A-MSDU subframe header includes the mesh control field */ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom, - const u8 *check_da, const u8 *check_sa); + const u8 *check_da, const u8 *check_sa, + bool mesh_control); + +/** + * ieee80211_get_8023_tunnel_proto - get RFC1042 or bridge tunnel encap protocol + * + * Check for RFC1042 or bridge tunnel header and fetch the encapsulated + * protocol. + * + * @hdr: pointer to the MSDU payload + * @proto: destination pointer to store the protocol + * Return: true if encapsulation was found + */ +bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto); + +/** + * ieee80211_strip_8023_mesh_hdr - strip mesh header from converted 802.3 frames + * + * Strip the mesh header, which was left in by ieee80211_data_to_8023 as part + * of the MSDU data. Also move any source/destination addresses from the mesh + * header to the ethernet header (if present). + * + * @skb: The 802.3 frame with embedded mesh header + */ +int ieee80211_strip_8023_mesh_hdr(struct sk_buff *skb); /** * cfg80211_classify8021d - determine the 802.1p/1d tag for a data frame diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 519bd9daea75..b5ee049e3197 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2720,6 +2720,174 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx) } } +static ieee80211_rx_result +ieee80211_rx_mesh_data(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, + struct sk_buff *skb) +{ +#ifdef CONFIG_MAC80211_MESH + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct ieee80211_local *local = sdata->local; + uint16_t fc = IEEE80211_FTYPE_DATA | IEEE80211_STYPE_QOS_DATA; + struct ieee80211_hdr hdr = { + .frame_control = cpu_to_le16(fc) + }; + struct ieee80211_hdr *fwd_hdr; + struct ieee80211s_hdr *mesh_hdr; + struct ieee80211_tx_info *info; + struct sk_buff *fwd_skb; + struct ethhdr *eth; + bool multicast; + int tailroom = 0; + int hdrlen, mesh_hdrlen; + u8 *qos; + + if (!ieee80211_vif_is_mesh(&sdata->vif)) + return RX_CONTINUE; + + if (!pskb_may_pull(skb, sizeof(*eth) + 6)) + return RX_DROP_MONITOR; + + mesh_hdr = (struct ieee80211s_hdr *)(skb->data + sizeof(*eth)); + mesh_hdrlen = ieee80211_get_mesh_hdrlen(mesh_hdr); + + if (!pskb_may_pull(skb, sizeof(*eth) + mesh_hdrlen)) + return RX_DROP_MONITOR; + + eth = (struct ethhdr *)skb->data; + multicast = is_multicast_ether_addr(eth->h_dest); + + mesh_hdr = (struct ieee80211s_hdr *)(eth + 1); + if (!mesh_hdr->ttl) + return RX_DROP_MONITOR; + + /* frame is in RMC, don't forward */ + if (is_multicast_ether_addr(eth->h_dest) && + mesh_rmc_check(sdata, eth->h_source, mesh_hdr)) + return RX_DROP_MONITOR; + + /* Frame has reached destination. Don't forward */ + if (ether_addr_equal(sdata->vif.addr, eth->h_dest)) + goto rx_accept; + + if (!ifmsh->mshcfg.dot11MeshForwarding) { + if (is_multicast_ether_addr(eth->h_dest)) + goto rx_accept; + + return RX_DROP_MONITOR; + } + + /* forward packet */ + if (sdata->crypto_tx_tailroom_needed_cnt) + tailroom = IEEE80211_ENCRYPT_TAILROOM; + + if (!--mesh_hdr->ttl) { + if (multicast) + goto rx_accept; + + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_ttl); + return RX_DROP_MONITOR; + } + + if (mesh_hdr->flags & MESH_FLAGS_AE) { + struct mesh_path *mppath; + char *proxied_addr; + + if (multicast) + proxied_addr = mesh_hdr->eaddr1; + else if ((mesh_hdr->flags & MESH_FLAGS_AE) == MESH_FLAGS_AE_A5_A6) + /* has_a4 already checked in ieee80211_rx_mesh_check */ + proxied_addr = mesh_hdr->eaddr2; + else + return RX_DROP_MONITOR; + + rcu_read_lock(); + mppath = mpp_path_lookup(sdata, proxied_addr); + if (!mppath) { + mpp_path_add(sdata, proxied_addr, eth->h_source); + } else { + spin_lock_bh(&mppath->state_lock); + if (!ether_addr_equal(mppath->mpp, eth->h_source)) + memcpy(mppath->mpp, eth->h_source, ETH_ALEN); + mppath->exp_time = jiffies; + spin_unlock_bh(&mppath->state_lock); + } + rcu_read_unlock(); + } + + skb_set_queue_mapping(skb, ieee802_1d_to_ac[skb->priority]); + + ieee80211_fill_mesh_addresses(&hdr, &hdr.frame_control, + eth->h_dest, eth->h_source); + hdrlen = ieee80211_hdrlen(hdr.frame_control); + if (multicast) { + int extra_head = sizeof(struct ieee80211_hdr) - sizeof(*eth); + + fwd_skb = skb_copy_expand(skb, local->tx_headroom + extra_head + + IEEE80211_ENCRYPT_HEADROOM, + tailroom, GFP_ATOMIC); + if (!fwd_skb) + goto rx_accept; + } else { + fwd_skb = skb; + skb = NULL; + + if (skb_cow_head(fwd_skb, hdrlen - sizeof(struct ethhdr))) + return RX_DROP_UNUSABLE; + } + + fwd_hdr = skb_push(fwd_skb, hdrlen - sizeof(struct ethhdr)); + memcpy(fwd_hdr, &hdr, hdrlen - 2); + qos = ieee80211_get_qos_ctl(fwd_hdr); + qos[0] = qos[1] = 0; + + skb_reset_mac_header(fwd_skb); + hdrlen += mesh_hdrlen; + if (ieee80211_get_8023_tunnel_proto(fwd_skb->data + hdrlen, + &fwd_skb->protocol)) + hdrlen += ETH_ALEN; + else + fwd_skb->protocol = htons(fwd_skb->len - hdrlen); + skb_set_network_header(fwd_skb, hdrlen); + + info = IEEE80211_SKB_CB(fwd_skb); + memset(info, 0, sizeof(*info)); + info->control.flags |= IEEE80211_TX_INTCFL_NEED_TXPROCESSING; + info->control.vif = &sdata->vif; + info->control.jiffies = jiffies; + if (multicast) { + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_mcast); + memcpy(fwd_hdr->addr2, sdata->vif.addr, ETH_ALEN); + /* update power mode indication when forwarding */ + ieee80211_mps_set_frame_flags(sdata, NULL, fwd_hdr); + } else if (!mesh_nexthop_lookup(sdata, fwd_skb)) { + /* mesh power mode flags updated in mesh_nexthop_lookup */ + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_unicast); + } else { + /* unable to resolve next hop */ + if (sta) + mesh_path_error_tx(sdata, ifmsh->mshcfg.element_ttl, + hdr.addr3, 0, + WLAN_REASON_MESH_PATH_NOFORWARD, + sta->sta.addr); + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_no_route); + kfree_skb(fwd_skb); + goto rx_accept; + } + + IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_frames); + fwd_skb->dev = sdata->dev; + ieee80211_add_pending_skb(local, fwd_skb); + +rx_accept: + if (!skb) + return RX_QUEUED; + + ieee80211_strip_8023_mesh_hdr(skb); +#endif + + return RX_CONTINUE; +} + static ieee80211_rx_result debug_noinline __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) { @@ -2728,8 +2896,10 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; __le16 fc = hdr->frame_control; struct sk_buff_head frame_list; + static ieee80211_rx_result res; struct ethhdr ethhdr; const u8 *check_da = ethhdr.h_dest, *check_sa = ethhdr.h_source; + bool mesh = false; if (unlikely(ieee80211_has_a4(hdr->frame_control))) { check_da = NULL; @@ -2746,6 +2916,8 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) break; case NL80211_IFTYPE_MESH_POINT: check_sa = NULL; + check_da = NULL; + mesh = true; break; default: break; @@ -2763,17 +2935,29 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) ieee80211_amsdu_to_8023s(skb, &frame_list, dev->dev_addr, rx->sdata->vif.type, rx->local->hw.extra_tx_headroom, - check_da, check_sa); + check_da, check_sa, mesh); while (!skb_queue_empty(&frame_list)) { rx->skb = __skb_dequeue(&frame_list); - if (!ieee80211_frame_allowed(rx, fc)) { - dev_kfree_skb(rx->skb); + res = ieee80211_rx_mesh_data(rx->sdata, rx->sta, rx->skb); + switch (res) { + case RX_QUEUED: continue; + case RX_CONTINUE: + break; + default: + goto free; } + if (!ieee80211_frame_allowed(rx, fc)) + goto free; + ieee80211_deliver_skb(rx); + continue; + +free: + dev_kfree_skb(rx->skb); } return RX_QUEUED; @@ -2806,6 +2990,8 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx) if (!rx->sdata->u.mgd.use_4addr) return RX_DROP_UNUSABLE; break; + case NL80211_IFTYPE_MESH_POINT: + break; default: return RX_DROP_UNUSABLE; } @@ -2834,155 +3020,6 @@ ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx) return __ieee80211_rx_h_amsdu(rx, 0); } -#ifdef CONFIG_MAC80211_MESH -static ieee80211_rx_result -ieee80211_rx_h_mesh_fwding(struct ieee80211_rx_data *rx) -{ - struct ieee80211_hdr *fwd_hdr, *hdr; - struct ieee80211_tx_info *info; - struct ieee80211s_hdr *mesh_hdr; - struct sk_buff *skb = rx->skb, *fwd_skb; - struct ieee80211_local *local = rx->local; - struct ieee80211_sub_if_data *sdata = rx->sdata; - struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; - u16 ac, q, hdrlen; - int tailroom = 0; - - hdr = (struct ieee80211_hdr *) skb->data; - hdrlen = ieee80211_hdrlen(hdr->frame_control); - - /* make sure fixed part of mesh header is there, also checks skb len */ - if (!pskb_may_pull(rx->skb, hdrlen + 6)) - return RX_DROP_MONITOR; - - mesh_hdr = (struct ieee80211s_hdr *) (skb->data + hdrlen); - - /* make sure full mesh header is there, also checks skb len */ - if (!pskb_may_pull(rx->skb, - hdrlen + ieee80211_get_mesh_hdrlen(mesh_hdr))) - return RX_DROP_MONITOR; - - /* reload pointers */ - hdr = (struct ieee80211_hdr *) skb->data; - mesh_hdr = (struct ieee80211s_hdr *) (skb->data + hdrlen); - - if (ieee80211_drop_unencrypted(rx, hdr->frame_control)) { - int offset = hdrlen + ieee80211_get_mesh_hdrlen(mesh_hdr) + - sizeof(rfc1042_header); - __be16 ethertype; - - if (!ether_addr_equal(hdr->addr1, rx->sdata->vif.addr) || - skb_copy_bits(rx->skb, offset, ðertype, 2) != 0 || - ethertype != rx->sdata->control_port_protocol) - return RX_DROP_MONITOR; - } - - /* frame is in RMC, don't forward */ - if (ieee80211_is_data(hdr->frame_control) && - is_multicast_ether_addr(hdr->addr1) && - mesh_rmc_check(rx->sdata, hdr->addr3, mesh_hdr)) - return RX_DROP_MONITOR; - - if (!ieee80211_is_data(hdr->frame_control)) - return RX_CONTINUE; - - if (!mesh_hdr->ttl) - return RX_DROP_MONITOR; - - if (mesh_hdr->flags & MESH_FLAGS_AE) { - struct mesh_path *mppath; - char *proxied_addr; - char *mpp_addr; - - if (is_multicast_ether_addr(hdr->addr1)) { - mpp_addr = hdr->addr3; - proxied_addr = mesh_hdr->eaddr1; - } else if ((mesh_hdr->flags & MESH_FLAGS_AE) == - MESH_FLAGS_AE_A5_A6) { - /* has_a4 already checked in ieee80211_rx_mesh_check */ - mpp_addr = hdr->addr4; - proxied_addr = mesh_hdr->eaddr2; - } else { - return RX_DROP_MONITOR; - } - - rcu_read_lock(); - mppath = mpp_path_lookup(sdata, proxied_addr); - if (!mppath) { - mpp_path_add(sdata, proxied_addr, mpp_addr); - } else { - spin_lock_bh(&mppath->state_lock); - if (!ether_addr_equal(mppath->mpp, mpp_addr)) - memcpy(mppath->mpp, mpp_addr, ETH_ALEN); - mppath->exp_time = jiffies; - spin_unlock_bh(&mppath->state_lock); - } - rcu_read_unlock(); - } - - /* Frame has reached destination. Don't forward */ - if (!is_multicast_ether_addr(hdr->addr1) && - ether_addr_equal(sdata->vif.addr, hdr->addr3)) - return RX_CONTINUE; - - ac = ieee802_1d_to_ac[skb->priority]; - skb_set_queue_mapping(skb, ac); - - if (!--mesh_hdr->ttl) { - if (!is_multicast_ether_addr(hdr->addr1)) - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, - dropped_frames_ttl); - goto out; - } - - if (!ifmsh->mshcfg.dot11MeshForwarding) - goto out; - - if (sdata->crypto_tx_tailroom_needed_cnt) - tailroom = IEEE80211_ENCRYPT_TAILROOM; - - fwd_skb = skb_copy_expand(skb, local->tx_headroom + - IEEE80211_ENCRYPT_HEADROOM, - tailroom, GFP_ATOMIC); - if (!fwd_skb) - goto out; - - fwd_skb->dev = sdata->dev; - fwd_hdr = (struct ieee80211_hdr *) fwd_skb->data; - fwd_hdr->frame_control &= ~cpu_to_le16(IEEE80211_FCTL_RETRY); - info = IEEE80211_SKB_CB(fwd_skb); - memset(info, 0, sizeof(*info)); - info->control.flags |= IEEE80211_TX_INTCFL_NEED_TXPROCESSING; - info->control.vif = &rx->sdata->vif; - info->control.jiffies = jiffies; - if (is_multicast_ether_addr(fwd_hdr->addr1)) { - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_mcast); - memcpy(fwd_hdr->addr2, sdata->vif.addr, ETH_ALEN); - /* update power mode indication when forwarding */ - ieee80211_mps_set_frame_flags(sdata, NULL, fwd_hdr); - } else if (!mesh_nexthop_lookup(sdata, fwd_skb)) { - /* mesh power mode flags updated in mesh_nexthop_lookup */ - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_unicast); - } else { - /* unable to resolve next hop */ - mesh_path_error_tx(sdata, ifmsh->mshcfg.element_ttl, - fwd_hdr->addr3, 0, - WLAN_REASON_MESH_PATH_NOFORWARD, - fwd_hdr->addr2); - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, dropped_frames_no_route); - kfree_skb(fwd_skb); - return RX_DROP_MONITOR; - } - - IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_frames); - ieee80211_add_pending_skb(local, fwd_skb); - out: - if (is_multicast_ether_addr(hdr->addr1)) - return RX_CONTINUE; - return RX_DROP_MONITOR; -} -#endif - static ieee80211_rx_result debug_noinline ieee80211_rx_h_data(struct ieee80211_rx_data *rx) { @@ -2991,6 +3028,7 @@ ieee80211_rx_h_data(struct ieee80211_rx_data *rx) struct net_device *dev = sdata->dev; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)rx->skb->data; __le16 fc = hdr->frame_control; + static ieee80211_rx_result res; bool port_control; int err; @@ -3017,6 +3055,10 @@ ieee80211_rx_h_data(struct ieee80211_rx_data *rx) if (unlikely(err)) return RX_DROP_UNUSABLE; + res = ieee80211_rx_mesh_data(rx->sdata, rx->sta, rx->skb); + if (res != RX_CONTINUE) + return res; + if (!ieee80211_frame_allowed(rx, fc)) return RX_DROP_MONITOR; @@ -3987,10 +4029,6 @@ static void ieee80211_rx_handlers(struct ieee80211_rx_data *rx, CALL_RXH(ieee80211_rx_h_defragment); CALL_RXH(ieee80211_rx_h_michael_mic_verify); /* must be after MMIC verify so header is counted in MPDU mic */ -#ifdef CONFIG_MAC80211_MESH - if (ieee80211_vif_is_mesh(&rx->sdata->vif)) - CALL_RXH(ieee80211_rx_h_mesh_fwding); -#endif CALL_RXH(ieee80211_rx_h_amsdu); CALL_RXH(ieee80211_rx_h_data); diff --git a/net/wireless/util.c b/net/wireless/util.c index 4498807b8b70..8ae117cfc22c 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -542,7 +542,7 @@ unsigned int ieee80211_get_mesh_hdrlen(struct ieee80211s_hdr *meshhdr) } EXPORT_SYMBOL(ieee80211_get_mesh_hdrlen); -static bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) +bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) { const __be16 *hdr_proto = hdr + ETH_ALEN; @@ -556,6 +556,49 @@ static bool ieee80211_get_8023_tunnel_proto(const void *hdr, __be16 *proto) return true; } +EXPORT_SYMBOL(ieee80211_get_8023_tunnel_proto); + +int ieee80211_strip_8023_mesh_hdr(struct sk_buff *skb) +{ + const void *mesh_addr; + struct { + struct ethhdr eth; + u8 flags; + } payload; + int hdrlen; + int ret; + + ret = skb_copy_bits(skb, 0, &payload, sizeof(payload)); + if (ret) + return ret; + + hdrlen = sizeof(payload.eth) + __ieee80211_get_mesh_hdrlen(payload.flags); + + if (likely(pskb_may_pull(skb, hdrlen + 8) && + ieee80211_get_8023_tunnel_proto(skb->data + hdrlen, + &payload.eth.h_proto))) + hdrlen += ETH_ALEN + 2; + else if (!pskb_may_pull(skb, hdrlen)) + return -EINVAL; + + mesh_addr = skb->data + sizeof(payload.eth) + ETH_ALEN; + switch (payload.flags & MESH_FLAGS_AE) { + case MESH_FLAGS_AE_A4: + memcpy(&payload.eth.h_source, mesh_addr, ETH_ALEN); + break; + case MESH_FLAGS_AE_A5_A6: + memcpy(&payload.eth, mesh_addr, 2 * ETH_ALEN); + break; + default: + break; + } + + pskb_pull(skb, hdrlen - sizeof(payload.eth)); + memcpy(skb->data, &payload.eth, sizeof(payload.eth)); + + return 0; +} +EXPORT_SYMBOL(ieee80211_strip_8023_mesh_hdr); int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, const u8 *addr, enum nl80211_iftype iftype, @@ -568,7 +611,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, } payload; struct ethhdr tmp; u16 hdrlen; - u8 mesh_flags = 0; if (unlikely(!ieee80211_is_data_present(hdr->frame_control))) return -1; @@ -589,12 +631,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, memcpy(tmp.h_dest, ieee80211_get_DA(hdr), ETH_ALEN); memcpy(tmp.h_source, ieee80211_get_SA(hdr), ETH_ALEN); - if (iftype == NL80211_IFTYPE_MESH_POINT && - skb_copy_bits(skb, hdrlen, &mesh_flags, 1) < 0) - return -1; - - mesh_flags &= MESH_FLAGS_AE; - switch (hdr->frame_control & cpu_to_le16(IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) { case cpu_to_le16(IEEE80211_FCTL_TODS): @@ -608,17 +644,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, iftype != NL80211_IFTYPE_AP_VLAN && iftype != NL80211_IFTYPE_STATION)) return -1; - if (iftype == NL80211_IFTYPE_MESH_POINT) { - if (mesh_flags == MESH_FLAGS_AE_A4) - return -1; - if (mesh_flags == MESH_FLAGS_AE_A5_A6 && - skb_copy_bits(skb, hdrlen + - offsetof(struct ieee80211s_hdr, eaddr1), - tmp.h_dest, 2 * ETH_ALEN) < 0) - return -1; - - hdrlen += __ieee80211_get_mesh_hdrlen(mesh_flags); - } break; case cpu_to_le16(IEEE80211_FCTL_FROMDS): if ((iftype != NL80211_IFTYPE_STATION && @@ -627,16 +652,6 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, (is_multicast_ether_addr(tmp.h_dest) && ether_addr_equal(tmp.h_source, addr))) return -1; - if (iftype == NL80211_IFTYPE_MESH_POINT) { - if (mesh_flags == MESH_FLAGS_AE_A5_A6) - return -1; - if (mesh_flags == MESH_FLAGS_AE_A4 && - skb_copy_bits(skb, hdrlen + - offsetof(struct ieee80211s_hdr, eaddr1), - tmp.h_source, ETH_ALEN) < 0) - return -1; - hdrlen += __ieee80211_get_mesh_hdrlen(mesh_flags); - } break; case cpu_to_le16(0): if (iftype != NL80211_IFTYPE_ADHOC && @@ -646,7 +661,7 @@ int ieee80211_data_to_8023_exthdr(struct sk_buff *skb, struct ethhdr *ehdr, break; } - if (likely(!is_amsdu && + if (likely(!is_amsdu && iftype != NL80211_IFTYPE_MESH_POINT && skb_copy_bits(skb, hdrlen, &payload, sizeof(payload)) == 0 && ieee80211_get_8023_tunnel_proto(&payload, &tmp.h_proto))) { /* remove RFC1042 or Bridge-Tunnel encapsulation */ @@ -722,7 +737,8 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, static struct sk_buff * __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, - int offset, int len, bool reuse_frag) + int offset, int len, bool reuse_frag, + int min_len) { struct sk_buff *frame; int cur_len = len; @@ -736,7 +752,7 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, * in the stack later. */ if (reuse_frag) - cur_len = min_t(int, len, 32); + cur_len = min_t(int, len, min_len); /* * Allocate and reserve two bytes more for payload @@ -746,6 +762,7 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, if (!frame) return NULL; + frame->priority = skb->priority; skb_reserve(frame, hlen + sizeof(struct ethhdr) + 2); skb_copy_bits(skb, offset, skb_put(frame, cur_len), cur_len); @@ -762,23 +779,37 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom, - const u8 *check_da, const u8 *check_sa) + const u8 *check_da, const u8 *check_sa, + bool mesh_control) { unsigned int hlen = ALIGN(extra_headroom, 4); struct sk_buff *frame = NULL; int offset = 0, remaining; - struct ethhdr eth; + struct { + struct ethhdr eth; + uint8_t flags; + } hdr; bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb); bool reuse_skb = false; bool last = false; + int copy_len = sizeof(hdr.eth); + + if (iftype == NL80211_IFTYPE_MESH_POINT) + copy_len = sizeof(hdr); while (!last) { unsigned int subframe_len; - int len; + int len, mesh_len = 0; u8 padding; - skb_copy_bits(skb, offset, ð, sizeof(eth)); - len = ntohs(eth.h_proto); + skb_copy_bits(skb, offset, &hdr, copy_len); + if (iftype == NL80211_IFTYPE_MESH_POINT) + mesh_len = __ieee80211_get_mesh_hdrlen(hdr.flags); + if (mesh_control) + len = le16_to_cpu(*(__le16 *)&hdr.eth.h_proto) + mesh_len; + else + len = ntohs(hdr.eth.h_proto); + subframe_len = sizeof(struct ethhdr) + len; padding = (4 - subframe_len) & 0x3; @@ -787,16 +818,16 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, if (subframe_len > remaining) goto purge; /* mitigate A-MSDU aggregation injection attacks */ - if (ether_addr_equal(eth.h_dest, rfc1042_header)) + if (ether_addr_equal(hdr.eth.h_dest, rfc1042_header)) goto purge; offset += sizeof(struct ethhdr); last = remaining <= subframe_len + padding; /* FIXME: should we really accept multicast DA? */ - if ((check_da && !is_multicast_ether_addr(eth.h_dest) && - !ether_addr_equal(check_da, eth.h_dest)) || - (check_sa && !ether_addr_equal(check_sa, eth.h_source))) { + if ((check_da && !is_multicast_ether_addr(hdr.eth.h_dest) && + !ether_addr_equal(check_da, hdr.eth.h_dest)) || + (check_sa && !ether_addr_equal(check_sa, hdr.eth.h_source))) { offset += len + padding; continue; } @@ -808,7 +839,7 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, reuse_skb = true; } else { frame = __ieee80211_amsdu_copy(skb, hlen, offset, len, - reuse_frag); + reuse_frag, 32 + mesh_len); if (!frame) goto purge; @@ -819,10 +850,11 @@ void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, frame->dev = skb->dev; frame->priority = skb->priority; - if (likely(ieee80211_get_8023_tunnel_proto(frame->data, ð.h_proto))) + if (likely(iftype != NL80211_IFTYPE_MESH_POINT && + ieee80211_get_8023_tunnel_proto(frame->data, &hdr.eth.h_proto))) skb_pull(frame, ETH_ALEN + 2); - memcpy(skb_push(frame, sizeof(eth)), ð, sizeof(eth)); + memcpy(skb_push(frame, sizeof(hdr.eth)), &hdr.eth, sizeof(hdr.eth)); __skb_queue_tail(list, frame); } -- cgit v1.2.3 From 6e4c0d0460bd32ca9244dff3ba2d2da27235de11 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 13 Feb 2023 11:08:55 +0100 Subject: wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU At least ath10k and ath11k supported hardware (maybe more) does not implement mesh A-MSDU aggregation in a standard compliant way. 802.11-2020 9.3.2.2.2 declares that the Mesh Control field is part of the A-MSDU header (and little-endian). As such, its length must not be included in the subframe length field. Hardware affected by this bug treats the mesh control field as part of the MSDU data and sets the length accordingly. In order to avoid packet loss, keep track of which stations are affected by this and take it into account when converting A-MSDU to 802.3 + mesh control packets. Signed-off-by: Felix Fietkau Link: https://lore.kernel.org/r/20230213100855.34315-5-nbd@nbd.name Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 13 +++++++++++++ net/mac80211/rx.c | 15 ++++++++++++--- net/mac80211/sta_info.c | 3 +++ net/mac80211/sta_info.h | 1 + net/wireless/util.c | 32 ++++++++++++++++++++++++++++++++ 5 files changed, 61 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index c506dc128685..9b015bb877db 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -6236,6 +6236,19 @@ static inline int ieee80211_data_to_8023(struct sk_buff *skb, const u8 *addr, return ieee80211_data_to_8023_exthdr(skb, NULL, addr, iftype, 0, false); } +/** + * ieee80211_is_valid_amsdu - check if subframe lengths of an A-MSDU are valid + * + * This is used to detect non-standard A-MSDU frames, e.g. the ones generated + * by ath10k and ath11k, where the subframe length includes the length of the + * mesh control field. + * + * @skb: The input A-MSDU frame without any headers. + * @mesh_hdr: use standard compliant mesh A-MSDU subframe header + * Returns: true if subframe header lengths are valid for the @mesh_hdr mode + */ +bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr); + /** * ieee80211_amsdu_to_8023s - decode an IEEE 802.11n A-MSDU frame * diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index b5ee049e3197..759936aff2c3 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -2899,7 +2899,6 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) static ieee80211_rx_result res; struct ethhdr ethhdr; const u8 *check_da = ethhdr.h_dest, *check_sa = ethhdr.h_source; - bool mesh = false; if (unlikely(ieee80211_has_a4(hdr->frame_control))) { check_da = NULL; @@ -2917,7 +2916,6 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) case NL80211_IFTYPE_MESH_POINT: check_sa = NULL; check_da = NULL; - mesh = true; break; default: break; @@ -2932,10 +2930,21 @@ __ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx, u8 data_offset) data_offset, true)) return RX_DROP_UNUSABLE; + if (rx->sta && rx->sta->amsdu_mesh_control < 0) { + bool valid_std = ieee80211_is_valid_amsdu(skb, true); + bool valid_nonstd = ieee80211_is_valid_amsdu(skb, false); + + if (valid_std && !valid_nonstd) + rx->sta->amsdu_mesh_control = 1; + else if (valid_nonstd && !valid_std) + rx->sta->amsdu_mesh_control = 0; + } + ieee80211_amsdu_to_8023s(skb, &frame_list, dev->dev_addr, rx->sdata->vif.type, rx->local->hw.extra_tx_headroom, - check_da, check_sa, mesh); + check_da, check_sa, + rx->sta->amsdu_mesh_control); while (!skb_queue_empty(&frame_list)) { rx->skb = __skb_dequeue(&frame_list); diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index bd532d3f925d..7d68dbc872d7 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -594,6 +594,9 @@ __sta_info_alloc(struct ieee80211_sub_if_data *sdata, sta->sta_state = IEEE80211_STA_NONE; + if (sdata->vif.type == NL80211_IFTYPE_MESH_POINT) + sta->amsdu_mesh_control = -1; + /* Mark TID as unreserved */ sta->reserved_tid = IEEE80211_TID_UNRESERVED; diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index c30f02874fb1..5a1c541bb4ac 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -707,6 +707,7 @@ struct sta_info { struct codel_params cparams; u8 reserved_tid; + s8 amsdu_mesh_control; struct cfg80211_chan_def tdls_chandef; diff --git a/net/wireless/util.c b/net/wireless/util.c index 8ae117cfc22c..d1a89e82ead0 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -776,6 +776,38 @@ __ieee80211_amsdu_copy(struct sk_buff *skb, unsigned int hlen, return frame; } +bool ieee80211_is_valid_amsdu(struct sk_buff *skb, bool mesh_hdr) +{ + int offset = 0, remaining, subframe_len, padding; + + for (offset = 0; offset < skb->len; offset += subframe_len + padding) { + struct { + __be16 len; + u8 mesh_flags; + } hdr; + u16 len; + + if (skb_copy_bits(skb, offset + 2 * ETH_ALEN, &hdr, sizeof(hdr)) < 0) + return false; + + if (mesh_hdr) + len = le16_to_cpu(*(__le16 *)&hdr.len) + + __ieee80211_get_mesh_hdrlen(hdr.mesh_flags); + else + len = ntohs(hdr.len); + + subframe_len = sizeof(struct ethhdr) + len; + padding = (4 - subframe_len) & 0x3; + remaining = skb->len - offset; + + if (subframe_len > remaining) + return false; + } + + return true; +} +EXPORT_SYMBOL(ieee80211_is_valid_amsdu); + void ieee80211_amsdu_to_8023s(struct sk_buff *skb, struct sk_buff_head *list, const u8 *addr, enum nl80211_iftype iftype, const unsigned int extra_headroom, -- cgit v1.2.3 From 57b341e9ab13e5688491bfd54f8b5502416c8905 Mon Sep 17 00:00:00 2001 From: Rameshkumar Sundaram Date: Tue, 7 Feb 2023 17:11:46 +0530 Subject: wifi: mac80211: Allow NSS change only up to capability Stations can update bandwidth/NSS change in VHT action frame with action type Operating Mode Notification. (IEEE Std 802.11-2020 - 9.4.1.53 Operating Mode field) For Operating Mode Notification, an RX NSS change to a value greater than AP's maximum NSS should not be allowed. During fuzz testing, by forcefully sending VHT Op. mode notif. frames from STA with random rx_nss values, it is found that AP accepts rx_nss values greater that APs maximum NSS instead of discarding such NSS change. Hence allow NSS change only up to maximum NSS that is negotiated and capped to AP's capability during association. Signed-off-by: Rameshkumar Sundaram Link: https://lore.kernel.org/r/20230207114146.10567-1-quic_ramess@quicinc.com Signed-off-by: Johannes Berg --- net/mac80211/vht.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/mac80211/vht.c b/net/mac80211/vht.c index 803de5881485..c1250aa47808 100644 --- a/net/mac80211/vht.c +++ b/net/mac80211/vht.c @@ -625,7 +625,7 @@ u32 __ieee80211_vht_handle_opmode(struct ieee80211_sub_if_data *sdata, enum ieee80211_sta_rx_bandwidth new_bw; struct sta_opmode_info sta_opmode = {}; u32 changed = 0; - u8 nss; + u8 nss, cur_nss; /* ignore - no support for BF yet */ if (opmode & IEEE80211_OPMODE_NOTIF_RX_NSS_TYPE_BF) @@ -636,10 +636,25 @@ u32 __ieee80211_vht_handle_opmode(struct ieee80211_sub_if_data *sdata, nss += 1; if (link_sta->pub->rx_nss != nss) { - link_sta->pub->rx_nss = nss; - sta_opmode.rx_nss = nss; - changed |= IEEE80211_RC_NSS_CHANGED; - sta_opmode.changed |= STA_OPMODE_N_SS_CHANGED; + cur_nss = link_sta->pub->rx_nss; + /* Reset rx_nss and call ieee80211_sta_set_rx_nss() which + * will set the same to max nss value calculated based on capability. + */ + link_sta->pub->rx_nss = 0; + ieee80211_sta_set_rx_nss(link_sta); + /* Do not allow an nss change to rx_nss greater than max_nss + * negotiated and capped to APs capability during association. + */ + if (nss <= link_sta->pub->rx_nss) { + link_sta->pub->rx_nss = nss; + sta_opmode.rx_nss = nss; + changed |= IEEE80211_RC_NSS_CHANGED; + sta_opmode.changed |= STA_OPMODE_N_SS_CHANGED; + } else { + link_sta->pub->rx_nss = cur_nss; + pr_warn_ratelimited("Ignoring NSS change in VHT Operating Mode Notification from %pM with invalid nss %d", + link_sta->pub->addr, nss); + } } switch (opmode & IEEE80211_OPMODE_NOTIF_CHANWIDTH_MASK) { -- cgit v1.2.3 From aaacf1740f2f95e0c5449ff3bbcff252d69cf952 Mon Sep 17 00:00:00 2001 From: Karthikeyan Periyasamy Date: Mon, 6 Feb 2023 21:33:30 +0530 Subject: wifi: mac80211: fix non-MLO station association Non-MLO station frames are dropped in Rx path due to the condition check in ieee80211_rx_is_valid_sta_link_id(). In multi-link AP scenario, non-MLO stations try to connect in any of the valid links in the ML AP, where the station valid_links and link_id params are valid in the ieee80211_sta object. But ieee80211_rx_is_valid_sta_link_id() always return false for the non-MLO stations by the assumption taken is valid_links and link_id are not valid in non-MLO stations object (ieee80211_sta), this assumption is wrong. Due to this assumption, non-MLO station frames are dropped which leads to failure in association. Fix it by removing the condition check and allow the link validation check for the non-MLO stations. Fixes: e66b7920aa5a ("wifi: mac80211: fix initialization of rx->link and rx->link_sta") Signed-off-by: Karthikeyan Periyasamy Link: https://lore.kernel.org/r/20230206160330.1613-1-quic_periyasa@quicinc.com Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'net') diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 759936aff2c3..dd571c715b8e 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -4094,9 +4094,6 @@ static void ieee80211_invoke_rx_handlers(struct ieee80211_rx_data *rx) static bool ieee80211_rx_is_valid_sta_link_id(struct ieee80211_sta *sta, u8 link_id) { - if (!sta->mlo) - return false; - return !!(sta->valid_links & BIT(link_id)); } -- cgit v1.2.3 From 9b89495e479c5fedbf3f2eca4f1c4e9dd481265e Mon Sep 17 00:00:00 2001 From: Vinay Gannevaram Date: Sat, 4 Feb 2023 19:29:39 +0530 Subject: wifi: nl80211: Allow authentication frames and set keys on NAN interface Wi-Fi Aware R4 specification defines NAN Pairing which uses PASN handshake to authenticate the peer and generate keys. Hence allow to register and transmit the PASN authentication frames on NAN interface and set the keys to driver or underlying modules on NAN interface. The driver needs to configure the feature flag NL80211_EXT_FEATURE_SECURE_NAN, which also helps userspace modules to know if the driver supports secure NAN. Signed-off-by: Vinay Gannevaram Link: https://lore.kernel.org/r/1675519179-24174-1-git-send-email-quic_vganneva@quicinc.com Signed-off-by: Johannes Berg --- include/uapi/linux/nl80211.h | 4 ++++ net/wireless/nl80211.c | 18 +++++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h index 831660956ab2..f14621a954e1 100644 --- a/include/uapi/linux/nl80211.h +++ b/include/uapi/linux/nl80211.h @@ -6323,6 +6323,9 @@ enum nl80211_feature_flags { * * @NL80211_EXT_FEATURE_PUNCT: Driver supports preamble puncturing in AP mode. * + * @NL80211_EXT_FEATURE_SECURE_NAN: Device supports NAN Pairing which enables + * authentication, data encryption and message integrity. + * * @NUM_NL80211_EXT_FEATURES: number of extended features. * @MAX_NL80211_EXT_FEATURES: highest extended feature index. */ @@ -6392,6 +6395,7 @@ enum nl80211_ext_feature_index { NL80211_EXT_FEATURE_RADAR_BACKGROUND, NL80211_EXT_FEATURE_POWERED_ADDR_CHANGE, NL80211_EXT_FEATURE_PUNCT, + NL80211_EXT_FEATURE_SECURE_NAN, /* add new features before the definition below */ NUM_NL80211_EXT_FEATURES, diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 22e6bc9a44cb..4f112d75ac9c 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -1549,10 +1549,14 @@ static int nl80211_key_allowed(struct wireless_dev *wdev) if (wdev->connected) return 0; return -ENOLINK; + case NL80211_IFTYPE_NAN: + if (wiphy_ext_feature_isset(wdev->wiphy, + NL80211_EXT_FEATURE_SECURE_NAN)) + return 0; + return -EINVAL; case NL80211_IFTYPE_UNSPECIFIED: case NL80211_IFTYPE_OCB: case NL80211_IFTYPE_MONITOR: - case NL80211_IFTYPE_NAN: case NL80211_IFTYPE_P2P_DEVICE: case NL80211_IFTYPE_WDS: case NUM_NL80211_IFTYPES: @@ -12342,6 +12346,10 @@ static int nl80211_register_mgmt(struct sk_buff *skb, struct genl_info *info) case NL80211_IFTYPE_P2P_DEVICE: break; case NL80211_IFTYPE_NAN: + if (!wiphy_ext_feature_isset(wdev->wiphy, + NL80211_EXT_FEATURE_SECURE_NAN)) + return -EOPNOTSUPP; + break; default: return -EOPNOTSUPP; } @@ -12399,6 +12407,10 @@ static int nl80211_tx_mgmt(struct sk_buff *skb, struct genl_info *info) case NL80211_IFTYPE_P2P_GO: break; case NL80211_IFTYPE_NAN: + if (!wiphy_ext_feature_isset(wdev->wiphy, + NL80211_EXT_FEATURE_SECURE_NAN)) + return -EOPNOTSUPP; + break; default: return -EOPNOTSUPP; } @@ -12536,6 +12548,10 @@ static int nl80211_tx_mgmt_cancel_wait(struct sk_buff *skb, struct genl_info *in case NL80211_IFTYPE_P2P_DEVICE: break; case NL80211_IFTYPE_NAN: + if (!wiphy_ext_feature_isset(wdev->wiphy, + NL80211_EXT_FEATURE_SECURE_NAN)) + return -EOPNOTSUPP; + break; default: return -EOPNOTSUPP; } -- cgit v1.2.3 From 935ef47b16cc5bc15fcd2b3dbc61abb0b7ea671a Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 1 Feb 2023 13:48:30 +0100 Subject: wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify Since cfg80211_bss_color_notify() is now always run in non-atomic context, get rid of gfp_t flags in the routine signature and always use GFP_KERNEL for netlink message allocation. Signed-off-by: Lorenzo Bianconi Link: https://lore.kernel.org/r/c687724e7b53556f7a2d9cbe3d11cdcf065cb687.1675255390.git.lorenzo@kernel.org Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 16 ++++++---------- net/mac80211/cfg.c | 3 +-- net/wireless/nl80211.c | 6 +++--- 3 files changed, 10 insertions(+), 15 deletions(-) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 9b015bb877db..c65f17d74191 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -8936,12 +8936,11 @@ void cfg80211_bss_flush(struct wiphy *wiphy); /** * cfg80211_bss_color_notify - notify about bss color event * @dev: network device - * @gfp: allocation flags * @cmd: the actual event we want to notify * @count: the number of TBTTs until the color change happens * @color_bitmap: representations of the colors that the local BSS is aware of */ -int cfg80211_bss_color_notify(struct net_device *dev, gfp_t gfp, +int cfg80211_bss_color_notify(struct net_device *dev, enum nl80211_commands cmd, u8 count, u64 color_bitmap); @@ -8952,10 +8951,9 @@ int cfg80211_bss_color_notify(struct net_device *dev, gfp_t gfp, * @gfp: allocation flags */ static inline int cfg80211_obss_color_collision_notify(struct net_device *dev, - u64 color_bitmap, gfp_t gfp) + u64 color_bitmap) { - return cfg80211_bss_color_notify(dev, gfp, - NL80211_CMD_OBSS_COLOR_COLLISION, + return cfg80211_bss_color_notify(dev, NL80211_CMD_OBSS_COLOR_COLLISION, 0, color_bitmap); } @@ -8969,8 +8967,7 @@ static inline int cfg80211_obss_color_collision_notify(struct net_device *dev, static inline int cfg80211_color_change_started_notify(struct net_device *dev, u8 count) { - return cfg80211_bss_color_notify(dev, GFP_KERNEL, - NL80211_CMD_COLOR_CHANGE_STARTED, + return cfg80211_bss_color_notify(dev, NL80211_CMD_COLOR_CHANGE_STARTED, count, 0); } @@ -8982,8 +8979,7 @@ static inline int cfg80211_color_change_started_notify(struct net_device *dev, */ static inline int cfg80211_color_change_aborted_notify(struct net_device *dev) { - return cfg80211_bss_color_notify(dev, GFP_KERNEL, - NL80211_CMD_COLOR_CHANGE_ABORTED, + return cfg80211_bss_color_notify(dev, NL80211_CMD_COLOR_CHANGE_ABORTED, 0, 0); } @@ -8995,7 +8991,7 @@ static inline int cfg80211_color_change_aborted_notify(struct net_device *dev) */ static inline int cfg80211_color_change_notify(struct net_device *dev) { - return cfg80211_bss_color_notify(dev, GFP_KERNEL, + return cfg80211_bss_color_notify(dev, NL80211_CMD_COLOR_CHANGE_COMPLETED, 0, 0); } diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index d2519a4db653..8eb342300868 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -4678,8 +4678,7 @@ void ieee80211_color_collision_detection_work(struct work_struct *work) struct ieee80211_sub_if_data *sdata = link->sdata; sdata_lock(sdata); - cfg80211_obss_color_collision_notify(sdata->dev, link->color_bitmap, - GFP_KERNEL); + cfg80211_obss_color_collision_notify(sdata->dev, link->color_bitmap); sdata_unlock(sdata); } diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 4f112d75ac9c..112b4bb009c8 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -19174,7 +19174,7 @@ void cfg80211_ch_switch_started_notify(struct net_device *dev, } EXPORT_SYMBOL(cfg80211_ch_switch_started_notify); -int cfg80211_bss_color_notify(struct net_device *dev, gfp_t gfp, +int cfg80211_bss_color_notify(struct net_device *dev, enum nl80211_commands cmd, u8 count, u64 color_bitmap) { @@ -19188,7 +19188,7 @@ int cfg80211_bss_color_notify(struct net_device *dev, gfp_t gfp, trace_cfg80211_bss_color_notify(dev, cmd, count, color_bitmap); - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, gfp); + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!msg) return -ENOMEM; @@ -19211,7 +19211,7 @@ int cfg80211_bss_color_notify(struct net_device *dev, gfp_t gfp, genlmsg_end(msg, hdr); return genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), - msg, 0, NL80211_MCGRP_MLME, gfp); + msg, 0, NL80211_MCGRP_MLME, GFP_KERNEL); nla_put_failure: nlmsg_free(msg); -- cgit v1.2.3 From d99975c4953eb79e389d4630e848435c700e2dfc Mon Sep 17 00:00:00 2001 From: Wen Gong Date: Wed, 1 Feb 2023 01:53:13 -0500 Subject: wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint Currently the regulatory driver does not call the regulatory callback reg_notifier for self managed wiphys. Sometimes driver needs cfg80211 to calculate the info of ieee80211_channel such as flags and power, and driver needs to get the info of ieee80211_channel after hint of driver, but driver does not know when calculation of the info of ieee80211_channel become finished, so add notify to driver in reg_process_self_managed_hint() from cfg80211 is a good way, then driver could get the correct info in callback of reg_notifier. Signed-off-by: Wen Gong Link: https://lore.kernel.org/r/20230201065313.27203-1-quic_wgong@quicinc.com Signed-off-by: Johannes Berg --- include/net/cfg80211.h | 3 +++ net/wireless/reg.c | 3 +++ 2 files changed, 6 insertions(+) (limited to 'net') diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index c65f17d74191..15fb019ce28d 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -4737,6 +4737,8 @@ struct cfg80211_ops { * complete feature/interface combinations/etc. advertisement. No driver * should set this flag for now. * @WIPHY_FLAG_SUPPORTS_EXT_KCK_32: The device supports 32-byte KCK keys. + * @WIPHY_FLAG_NOTIFY_REGDOM_BY_DRIVER: The device could handle reg notify for + * NL80211_REGDOM_SET_BY_DRIVER. */ enum wiphy_flags { WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK = BIT(0), @@ -4762,6 +4764,7 @@ enum wiphy_flags { WIPHY_FLAG_HAS_REMAIN_ON_CHANNEL = BIT(21), WIPHY_FLAG_SUPPORTS_5_10_MHZ = BIT(22), WIPHY_FLAG_HAS_CHANNEL_SWITCH = BIT(23), + WIPHY_FLAG_NOTIFY_REGDOM_BY_DRIVER = BIT(24), }; /** diff --git a/net/wireless/reg.c b/net/wireless/reg.c index af65196c916e..0d40d6af7e10 100644 --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -3160,6 +3160,9 @@ static void reg_process_self_managed_hint(struct wiphy *wiphy) request.alpha2[1] = regd->alpha2[1]; request.initiator = NL80211_REGDOM_SET_BY_DRIVER; + if (wiphy->flags & WIPHY_FLAG_NOTIFY_REGDOM_BY_DRIVER) + reg_call_notifier(wiphy, &request); + nl80211_send_wiphy_reg_change_event(&request); } -- cgit v1.2.3 From daf8fb4295dccc032515cdc1bd3873370063542b Mon Sep 17 00:00:00 2001 From: Andrei Otcheretianski Date: Tue, 14 Feb 2023 12:10:48 +0200 Subject: wifi: mac80211: Don't translate MLD addresses for multicast MLD address translation should be done only for individually addressed frames. Otherwise, AAD calculation would be wrong and the decryption would fail. Fixes: e66b7920aa5ac ("wifi: mac80211: fix initialization of rx->link and rx->link_sta") Signed-off-by: Andrei Otcheretianski Link: https://lore.kernel.org/r/20230214101048.792414-1-andrei.otcheretianski@intel.com Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index dd571c715b8e..8965508928f3 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -4882,7 +4882,8 @@ static bool ieee80211_prepare_and_rx_handle(struct ieee80211_rx_data *rx, hdr = (struct ieee80211_hdr *)rx->skb->data; } - if (unlikely(rx->sta && rx->sta->sta.mlo)) { + if (unlikely(rx->sta && rx->sta->sta.mlo) && + is_unicast_ether_addr(hdr->addr1)) { /* translate to MLD addresses */ if (ether_addr_equal(link->conf->addr, hdr->addr1)) ether_addr_copy(hdr->addr1, rx->sdata->vif.addr); -- cgit v1.2.3 From 2edd92570441dd33246210042dc167319a5cf7e3 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Mon, 13 Feb 2023 12:58:36 +0100 Subject: devlink: don't allow to change net namespace for FW_ACTIVATE reload action The change on network namespace only makes sense during re-init reload action. For FW activation it is not applicable. So check if user passed an ATTR indicating network namespace change request and forbid it. Signed-off-by: Jiri Pirko Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230213115836.3404039-1-jiri@resnulli.us Signed-off-by: Paolo Abeni --- net/devlink/dev.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'net') diff --git a/net/devlink/dev.c b/net/devlink/dev.c index ab4e0f3c4e3d..b40153fa2680 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -476,6 +476,12 @@ int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info) dest_net = devlink_netns_get(skb, info); if (IS_ERR(dest_net)) return PTR_ERR(dest_net); + if (!net_eq(dest_net, devlink_net(devlink)) && + action != DEVLINK_RELOAD_ACTION_DRIVER_REINIT) { + NL_SET_ERR_MSG_MOD(info->extack, + "Changing namespace is only supported for reinit action"); + return -EOPNOTSUPP; + } } err = devlink_reload(devlink, dest_net, action, limit, &actions_performed, info->extack); -- cgit v1.2.3 From 1d8d4af4347420d657be448f8be4c39c558f3b5d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 14 Feb 2023 14:20:21 +0100 Subject: wifi: mac80211: avoid u32_encode_bits() warning gcc-9 triggers a false-postive warning in ieee80211_mlo_multicast_tx() for u32_encode_bits(ffs(links) - 1, ...), since ffs() can return zero on an empty bitmask, and the negative argument to u32_encode_bits() is then out of range: In file included from include/linux/ieee80211.h:21, from include/net/cfg80211.h:23, from net/mac80211/tx.c:23: In function 'u32_encode_bits', inlined from 'ieee80211_mlo_multicast_tx' at net/mac80211/tx.c:4437:17, inlined from 'ieee80211_subif_start_xmit' at net/mac80211/tx.c:4485:3: include/linux/bitfield.h:177:3: error: call to '__field_overflow' declared with attribute error: value doesn't fit into mask 177 | __field_overflow(); \ | ^~~~~~~~~~~~~~~~~~ include/linux/bitfield.h:197:2: note: in expansion of macro '____MAKE_OP' 197 | ____MAKE_OP(u##size,u##size,,) | ^~~~~~~~~~~ include/linux/bitfield.h:200:1: note: in expansion of macro '__MAKE_OP' 200 | __MAKE_OP(32) | ^~~~~~~~~ Newer compiler versions do not cause problems with the zero argument because they do not consider this a __builtin_constant_p(). It's also harmless since the hweight16() check already guarantees that this cannot be 0. Replace the ffs() with an equivalent find_first_bit() check that matches the later for_each_set_bit() style and avoids the warning. Fixes: 963d0e8d08d9 ("wifi: mac80211: optionally implement MLO multicast TX") Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20230214132025.1532147-1-arnd@kernel.org Signed-off-by: Johannes Berg --- net/mac80211/tx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index defe97a31724..118648af979c 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4434,7 +4434,7 @@ static void ieee80211_mlo_multicast_tx(struct net_device *dev, u32 ctrl_flags = IEEE80211_TX_CTRL_MCAST_MLO_FIRST_TX; if (hweight16(links) == 1) { - ctrl_flags |= u32_encode_bits(ffs(links) - 1, + ctrl_flags |= u32_encode_bits(find_first_bit(&links, 16) - 1, IEEE80211_TX_CTRL_MLO_LINK); __ieee80211_subif_start_xmit(skb, sdata->dev, 0, ctrl_flags, -- cgit v1.2.3 From e6f5dcb7ec9badd9500f64b087f7243c37300d63 Mon Sep 17 00:00:00 2001 From: Gilad Itzkovitch Date: Thu, 24 Nov 2022 13:53:36 +1300 Subject: wifi: mac80211: Fix for Rx fragmented action frames The ieee80211_accept_frame() function performs a number of early checks to decide whether or not further processing needs to be done on a frame. One of those checks is the ieee80211_is_robust_mgmt_frame() function. It requires to peek into the frame payload, but because defragmentation does not occur until later on in the receive path, this peek is invalid for any fragment other than the first one. Also, in this scenario there is no STA and so the fragmented frame will be dropped later on in the process and will not reach the upper stack. This can happen with large action frames at low rates, for example, we see issues with DPP on S1G. This change will only check if the frame is robust if it's the first fragment. Invalid fragmented packets will be discarded later after defragmentation is completed. Signed-off-by: Gilad Itzkovitch Link: https://lore.kernel.org/r/20221124005336.1618411-1-gilad.itzkovitch@morsemicro.com Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 8965508928f3..7bcd77384191 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -4284,7 +4284,8 @@ static bool ieee80211_accept_frame(struct ieee80211_rx_data *rx) case NL80211_IFTYPE_STATION: if (!bssid && !sdata->u.mgd.use_4addr) return false; - if (ieee80211_is_robust_mgmt_frame(skb) && !rx->sta) + if (ieee80211_is_first_frag(hdr->seq_ctrl) && + ieee80211_is_robust_mgmt_frame(skb) && !rx->sta) return false; if (multicast) return true; -- cgit v1.2.3 From cf08e29db760b144bde51e2444f3430c75763e26 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 14 Feb 2023 20:08:15 +0100 Subject: wifi: mac80211: fix off-by-one link setting The convention for find_first_bit() is 0-based, while ffs() is 1-based, so this is now off-by-one. I cannot reproduce the gcc-9 problem, but since the -1 is now removed, I'm hoping it will still avoid the original issue. Reported-by: Alexander Lobakin Fixes: 1d8d4af43474 ("wifi: mac80211: avoid u32_encode_bits() warning") Signed-off-by: Johannes Berg --- net/mac80211/tx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 118648af979c..7699fb410670 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4434,7 +4434,7 @@ static void ieee80211_mlo_multicast_tx(struct net_device *dev, u32 ctrl_flags = IEEE80211_TX_CTRL_MCAST_MLO_FIRST_TX; if (hweight16(links) == 1) { - ctrl_flags |= u32_encode_bits(find_first_bit(&links, 16) - 1, + ctrl_flags |= u32_encode_bits(__ffs(links), IEEE80211_TX_CTRL_MLO_LINK); __ieee80211_subif_start_xmit(skb, sdata->dev, 0, ctrl_flags, -- cgit v1.2.3 From e8c6cbd7656e4cea9e2336a3a85bd609dd76eb3d Mon Sep 17 00:00:00 2001 From: Thomas Weißschuh Date: Tue, 14 Feb 2023 04:23:11 +0000 Subject: net: bridge: make kobj_type structure constant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- net/bridge/br_if.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index ad13b48e3e08..24f01ff113f0 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -269,7 +269,7 @@ static void brport_get_ownership(const struct kobject *kobj, kuid_t *uid, kgid_t net_ns_get_ownership(dev_net(p->dev), uid, gid); } -static struct kobj_type brport_ktype = { +static const struct kobj_type brport_ktype = { #ifdef CONFIG_SYSFS .sysfs_ops = &brport_sysfs_ops, #endif -- cgit v1.2.3 From b2793517052d9771fbc0bc3db09693aa850eb3de Mon Sep 17 00:00:00 2001 From: Thomas Weißschuh Date: Tue, 14 Feb 2023 04:23:12 +0000 Subject: net-sysfs: make kobj_type structures constant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh Signed-off-by: Jakub Kicinski --- net/core/net-sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 4b361ac6a252..e20784b6f873 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1052,7 +1052,7 @@ static void rx_queue_get_ownership(const struct kobject *kobj, net_ns_get_ownership(net, uid, gid); } -static struct kobj_type rx_queue_ktype __ro_after_init = { +static const struct kobj_type rx_queue_ktype = { .sysfs_ops = &rx_queue_sysfs_ops, .release = rx_queue_release, .default_groups = rx_queue_default_groups, @@ -1662,7 +1662,7 @@ static void netdev_queue_get_ownership(const struct kobject *kobj, net_ns_get_ownership(net, uid, gid); } -static struct kobj_type netdev_queue_ktype __ro_after_init = { +static const struct kobj_type netdev_queue_ktype = { .sysfs_ops = &netdev_queue_sysfs_ops, .release = netdev_queue_release, .default_groups = netdev_queue_default_groups, -- cgit v1.2.3 From fe33311c3e371855c4f4c0ab8a5fce5b9a9fdafd Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Tue, 14 Feb 2023 12:14:10 +0800 Subject: net: no longer support SOCK_REFCNT_DEBUG feature Commit e48c414ee61f ("[INET]: Generalise the TCP sock ID lookup routines") commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another commit 463c84b97f24 ("[NET]: Introduce inet_connection_sock") removed it. Since we could track all of them through bpf and kprobe related tools and the feature could print loads of information which might not be that helpful even under a little bit pressure, the whole feature which has been inactive for many years is no longer supported. Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/ Suggested-by: Kuniyuki Iwashima Signed-off-by: Jason Xing Reviewed-by: Kuniyuki Iwashima Acked-by: Wenjia Zhang Reviewed-by: Eric Dumazet Acked-by: Matthieu Baerts Signed-off-by: David S. Miller --- include/net/sock.h | 28 ---------------------------- net/core/sock.c | 13 ------------- net/ipv4/af_inet.c | 3 --- net/ipv4/inet_connection_sock.c | 2 -- net/ipv4/inet_timewait_sock.c | 3 --- net/ipv6/af_inet6.c | 10 ---------- net/ipv6/ipv6_sockglue.c | 12 ------------ net/mptcp/protocol.c | 1 - net/packet/af_packet.c | 4 ---- net/sctp/ipv6.c | 2 -- net/sctp/protocol.c | 2 -- net/smc/af_smc.c | 3 --- net/xdp/xsk.c | 4 ---- 13 files changed, 87 deletions(-) (limited to 'net') diff --git a/include/net/sock.h b/include/net/sock.h index 937e842dc930..2cb258fde072 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1349,9 +1349,6 @@ struct proto { char name[32]; struct list_head node; -#ifdef SOCK_REFCNT_DEBUG - atomic_t socks; -#endif int (*diag_destroy)(struct sock *sk, int err); } __randomize_layout; @@ -1359,31 +1356,6 @@ int proto_register(struct proto *prot, int alloc_slab); void proto_unregister(struct proto *prot); int sock_load_diag_module(int family, int protocol); -#ifdef SOCK_REFCNT_DEBUG -static inline void sk_refcnt_debug_inc(struct sock *sk) -{ - atomic_inc(&sk->sk_prot->socks); -} - -static inline void sk_refcnt_debug_dec(struct sock *sk) -{ - atomic_dec(&sk->sk_prot->socks); - printk(KERN_DEBUG "%s socket %p released, %d are still alive\n", - sk->sk_prot->name, sk, atomic_read(&sk->sk_prot->socks)); -} - -static inline void sk_refcnt_debug_release(const struct sock *sk) -{ - if (refcount_read(&sk->sk_refcnt) != 1) - printk(KERN_DEBUG "Destruction of the %s socket %p delayed, refcnt=%d\n", - sk->sk_prot->name, sk, refcount_read(&sk->sk_refcnt)); -} -#else /* SOCK_REFCNT_DEBUG */ -#define sk_refcnt_debug_inc(sk) do { } while (0) -#define sk_refcnt_debug_dec(sk) do { } while (0) -#define sk_refcnt_debug_release(sk) do { } while (0) -#endif /* SOCK_REFCNT_DEBUG */ - INDIRECT_CALLABLE_DECLARE(bool tcp_stream_memory_free(const struct sock *sk, int wake)); static inline int sk_forward_alloc_get(const struct sock *sk) diff --git a/net/core/sock.c b/net/core/sock.c index afbb02984d5f..341c565dbc26 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2340,17 +2340,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) smp_wmb(); refcount_set(&newsk->sk_refcnt, 2); - /* Increment the counter in the same struct proto as the master - * sock (sk_refcnt_debug_inc uses newsk->sk_prot->socks, that - * is the same as sk->sk_prot->socks, as this field was copied - * with memcpy). - * - * This _changes_ the previous behaviour, where - * tcp_create_openreq_child always was incrementing the - * equivalent to tcp_prot->socks (inet_sock_nr), so this have - * to be taken into account in all callers. -acme - */ - sk_refcnt_debug_inc(newsk); sk_set_socket(newsk, NULL); sk_tx_queue_clear(newsk); RCU_INIT_POINTER(newsk->sk_wq, NULL); @@ -3710,8 +3699,6 @@ void sk_common_release(struct sock *sk) xfrm_sk_free_policy(sk); - sk_refcnt_debug_release(sk); - sock_put(sk); } EXPORT_SYMBOL(sk_common_release); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 2c778b013cb0..8db6747f892f 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -156,7 +156,6 @@ void inet_sock_destruct(struct sock *sk) kfree(rcu_dereference_protected(inet->inet_opt, 1)); dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1)); dst_release(rcu_dereference_protected(sk->sk_rx_dst, 1)); - sk_refcnt_debug_dec(sk); } EXPORT_SYMBOL(inet_sock_destruct); @@ -357,8 +356,6 @@ lookup_protocol: inet->mc_list = NULL; inet->rcv_tos = 0; - sk_refcnt_debug_inc(sk); - if (inet->inet_num) { /* It assumes that any protocol which allows * the user to assign a number at socket diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 7d206a10ad14..eedcf4146d29 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1199,8 +1199,6 @@ void inet_csk_destroy_sock(struct sock *sk) xfrm_sk_free_policy(sk); - sk_refcnt_debug_release(sk); - this_cpu_dec(*sk->sk_prot->orphan_count); sock_put(sk); diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index beed32fff484..40052414c7c7 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -77,9 +77,6 @@ void inet_twsk_free(struct inet_timewait_sock *tw) { struct module *owner = tw->tw_prot->owner; twsk_destructor((struct sock *)tw); -#ifdef SOCK_REFCNT_DEBUG - pr_debug("%s timewait_sock %p released\n", tw->tw_prot->name, tw); -#endif kmem_cache_free(tw->tw_prot->twsk_prot->twsk_slab, tw); module_put(owner); } diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 847934763868..38689bedfce7 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -239,16 +239,6 @@ lookup_protocol: inet->pmtudisc = IP_PMTUDISC_DONT; else inet->pmtudisc = IP_PMTUDISC_WANT; - /* - * Increment only the relevant sk_prot->socks debug field, this changes - * the previous behaviour of incrementing both the equivalent to - * answer->prot->socks (inet6_sock_nr) and inet_sock_nr. - * - * This allows better debug granularity as we'll know exactly how many - * UDPv6, TCPv6, etc socks were allocated, not the sum of all IPv6 - * transport protocol socks. -acme - */ - sk_refcnt_debug_inc(sk); if (inet->inet_num) { /* It assumes that any protocol which allows diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 9ce51680290b..2917dd8d198c 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -464,13 +464,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname, __ipv6_sock_mc_close(sk); __ipv6_sock_ac_close(sk); - /* - * Sock is moving from IPv6 to IPv4 (sk_prot), so - * remove it from the refcnt debug socks count in the - * original family... - */ - sk_refcnt_debug_dec(sk); - if (sk->sk_protocol == IPPROTO_TCP) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -507,11 +500,6 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname, inet6_cleanup_sock(sk); - /* - * ... and add it to the refcnt debug socks count - * in the new family. -acme - */ - sk_refcnt_debug_inc(sk); module_put(THIS_MODULE); retv = 0; break; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c9817aa0f413..3ad9c46202fc 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2875,7 +2875,6 @@ static void __mptcp_destroy_sock(struct sock *sk) sk_stream_kill_queues(sk); xfrm_sk_free_policy(sk); - sk_refcnt_debug_release(sk); sock_put(sk); } diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8ffb19c643ab..d4e76e2ae153 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1335,8 +1335,6 @@ static void packet_sock_destruct(struct sock *sk) pr_err("Attempt to release alive packet socket: %p\n", sk); return; } - - sk_refcnt_debug_dec(sk); } static bool fanout_flow_is_huge(struct packet_sock *po, struct sk_buff *skb) @@ -3174,7 +3172,6 @@ static int packet_release(struct socket *sock) skb_queue_purge(&sk->sk_receive_queue); packet_free_pending(po); - sk_refcnt_debug_release(sk); sock_put(sk); return 0; @@ -3364,7 +3361,6 @@ static int packet_create(struct net *net, struct socket *sock, int protocol, packet_cached_dev_reset(po); sk->sk_destruct = packet_sock_destruct; - sk_refcnt_debug_inc(sk); /* * Attach a protocol block diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 097bd60ce964..62b436a2c8fe 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -807,8 +807,6 @@ static struct sock *sctp_v6_create_accept_sk(struct sock *sk, newsk->sk_v6_rcv_saddr = sk->sk_v6_rcv_saddr; - sk_refcnt_debug_inc(newsk); - if (newsk->sk_prot->init(newsk)) { sk_common_release(newsk); newsk = NULL; diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 909a89a1cff4..c365df24ad33 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -601,8 +601,6 @@ static struct sock *sctp_v4_create_accept_sk(struct sock *sk, newinet->inet_daddr = asoc->peer.primary_addr.v4.sin_addr.s_addr; - sk_refcnt_debug_inc(newsk); - if (newsk->sk_prot->init(newsk)) { sk_common_release(newsk); newsk = NULL; diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index b163266e581a..d7a7420e81ec 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -360,8 +360,6 @@ static void smc_destruct(struct sock *sk) return; if (!sock_flag(sk, SOCK_DEAD)) return; - - sk_refcnt_debug_dec(sk); } static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, @@ -390,7 +388,6 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, spin_lock_init(&smc->accept_q_lock); spin_lock_init(&smc->conn.send_lock); sk->sk_prot->hash(sk); - sk_refcnt_debug_inc(sk); mutex_init(&smc->clcsock_release_lock); smc_init_saved_callbacks(smc); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 9f0561b67c12..a245c1b4a21b 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -845,7 +845,6 @@ static int xsk_release(struct socket *sock) sock_orphan(sk); sock->sk = NULL; - sk_refcnt_debug_release(sk); sock_put(sk); return 0; @@ -1396,8 +1395,6 @@ static void xsk_destruct(struct sock *sk) if (!xp_put_pool(xs->pool)) xdp_put_umem(xs->umem, !xs->pool); - - sk_refcnt_debug_dec(sk); } static int xsk_create(struct net *net, struct socket *sock, int protocol, @@ -1427,7 +1424,6 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol, sk->sk_family = PF_XDP; sk->sk_destruct = xsk_destruct; - sk_refcnt_debug_inc(sk); sock_set_flag(sk, SOCK_RCU_FREE); -- cgit v1.2.3 From c38c701851011c94ce3be1ccb3593678d2933fd8 Mon Sep 17 00:00:00 2001 From: Marc Bornand Date: Wed, 15 Feb 2023 08:47:53 +0000 Subject: wifi: cfg80211: Set SSID if it is not already set When a connection was established without going through NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct. Now we set it in __cfg80211_connect_result() when it is not already set. When using a userspace configuration that does not call cfg80211_connect() (can be checked with breakpoints in the kernel), this patch should allow `networkctl status device_name` to output the SSID instead of null. Cc: stable@vger.kernel.org Reported-by: Yohan Prod'homme Fixes: 7b0a0e3c3a88 (wifi: cfg80211: do some rework towards MLO link APIs) Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711 Signed-off-by: Marc Bornand Signed-off-by: Johannes Berg --- net/wireless/sme.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'net') diff --git a/net/wireless/sme.c b/net/wireless/sme.c index 0cc841c0c59b..28ce13840a88 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -736,6 +736,7 @@ void __cfg80211_connect_result(struct net_device *dev, { struct wireless_dev *wdev = dev->ieee80211_ptr; const struct element *country_elem = NULL; + const struct element *ssid; const u8 *country_data; u8 country_datalen; #ifdef CONFIG_CFG80211_WEXT @@ -894,6 +895,22 @@ void __cfg80211_connect_result(struct net_device *dev, country_data, country_datalen); kfree(country_data); + if (!wdev->u.client.ssid_len) { + rcu_read_lock(); + for_each_valid_link(cr, link) { + ssid = ieee80211_bss_get_elem(cr->links[link].bss, + WLAN_EID_SSID); + + if (!ssid || !ssid->datalen) + continue; + + memcpy(wdev->u.client.ssid, ssid->data, ssid->datalen); + wdev->u.client.ssid_len = ssid->datalen; + break; + } + rcu_read_unlock(); + } + return; out: for_each_valid_link(cr, link) -- cgit v1.2.3 From 0d846bdc11101ac0ba4d89c2be359af08cb9379b Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 15 Feb 2023 10:07:05 +0100 Subject: wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta() There's at least one case in ieee80211_rx_for_interface() where we might pass &((struct sta_info *)NULL)->sta to it only to then do container_of(), and then checking the result for NULL, but checking the result of container_of() for NULL looks really odd. Fix this by just passing the struct sta_info * instead. Fixes: e66b7920aa5a ("wifi: mac80211: fix initialization of rx->link and rx->link_sta") Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) (limited to 'net') diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 7bcd77384191..69aa623163c6 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -4115,13 +4115,8 @@ static bool ieee80211_rx_data_set_link(struct ieee80211_rx_data *rx, } static bool ieee80211_rx_data_set_sta(struct ieee80211_rx_data *rx, - struct ieee80211_sta *pubsta, - int link_id) + struct sta_info *sta, int link_id) { - struct sta_info *sta; - - sta = container_of(pubsta, struct sta_info, sta); - rx->link_id = link_id; rx->sta = sta; @@ -4159,7 +4154,7 @@ void ieee80211_release_reorder_timeout(struct sta_info *sta, int tid) if (sta->sta.valid_links) link_id = ffs(sta->sta.valid_links) - 1; - if (!ieee80211_rx_data_set_sta(&rx, &sta->sta, link_id)) + if (!ieee80211_rx_data_set_sta(&rx, sta, link_id)) return; tid_agg_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]); @@ -4205,7 +4200,7 @@ void ieee80211_mark_rx_ba_filtered_frames(struct ieee80211_sta *pubsta, u8 tid, sta = container_of(pubsta, struct sta_info, sta); - if (!ieee80211_rx_data_set_sta(&rx, pubsta, -1)) + if (!ieee80211_rx_data_set_sta(&rx, sta, -1)) return; rcu_read_lock(); @@ -4914,6 +4909,7 @@ static void __ieee80211_rx_handle_8023(struct ieee80211_hw *hw, struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb); struct ieee80211_fast_rx *fast_rx; struct ieee80211_rx_data rx; + struct sta_info *sta; int link_id = -1; memset(&rx, 0, sizeof(rx)); @@ -4941,7 +4937,8 @@ static void __ieee80211_rx_handle_8023(struct ieee80211_hw *hw, * link_id is used only for stats purpose and updating the stats on * the deflink is fine? */ - if (!ieee80211_rx_data_set_sta(&rx, pubsta, link_id)) + sta = container_of(pubsta, struct sta_info, sta); + if (!ieee80211_rx_data_set_sta(&rx, sta, link_id)) goto drop; fast_rx = rcu_dereference(rx.sta->fast_rx); @@ -4981,7 +4978,7 @@ static bool ieee80211_rx_for_interface(struct ieee80211_rx_data *rx, link_id = status->link_id; } - if (!ieee80211_rx_data_set_sta(rx, &sta->sta, link_id)) + if (!ieee80211_rx_data_set_sta(rx, sta, link_id)) return false; return ieee80211_prepare_and_rx_handle(rx, skb, consume); @@ -5048,7 +5045,8 @@ static void __ieee80211_rx_handle_packet(struct ieee80211_hw *hw, link_id = status->link_id; if (pubsta) { - if (!ieee80211_rx_data_set_sta(&rx, pubsta, link_id)) + sta = container_of(pubsta, struct sta_info, sta); + if (!ieee80211_rx_data_set_sta(&rx, sta, link_id)) goto out; /* @@ -5085,8 +5083,7 @@ static void __ieee80211_rx_handle_packet(struct ieee80211_hw *hw, } rx.sdata = prev_sta->sdata; - if (!ieee80211_rx_data_set_sta(&rx, &prev_sta->sta, - link_id)) + if (!ieee80211_rx_data_set_sta(&rx, prev_sta, link_id)) goto out; if (!status->link_valid && prev_sta->sta.mlo) @@ -5099,8 +5096,7 @@ static void __ieee80211_rx_handle_packet(struct ieee80211_hw *hw, if (prev_sta) { rx.sdata = prev_sta->sdata; - if (!ieee80211_rx_data_set_sta(&rx, &prev_sta->sta, - link_id)) + if (!ieee80211_rx_data_set_sta(&rx, prev_sta, link_id)) goto out; if (!status->link_valid && prev_sta->sta.mlo) -- cgit v1.2.3 From ab5f171e36063b33aba1dd4c7c6d5031b00b42aa Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 15 Feb 2023 10:40:41 +0100 Subject: wifi: mac80211: always initialize link_sta with sta When we have multiple interfaces receiving the same frame, such as a multicast frame, one interface might have a sta and the other not. In this case, link_sta would be set but not cleared again. Always set link_sta, so we keep an invariant that link_sta and sta are either both set or both not set. Signed-off-by: Johannes Berg --- net/mac80211/rx.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 69aa623163c6..f7fdfe710951 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -4125,6 +4125,8 @@ static bool ieee80211_rx_data_set_sta(struct ieee80211_rx_data *rx, if (!rx->sdata) rx->sdata = sta->sdata; rx->link_sta = &sta->deflink; + } else { + rx->link_sta = NULL; } if (link_id < 0) -- cgit v1.2.3 From 3caf31e7b18a90b74a2709d761a0dfa423f2c2e4 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 15 Feb 2023 18:30:26 +0100 Subject: wifi: mac80211: add documentation for amsdu_mesh_control This documentation wasn't added in the original patch, add it now. Reported-by: Stephen Rothwell Fixes: 6e4c0d0460bd ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU") Signed-off-by: Johannes Berg --- net/mac80211/sta_info.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index 5a1c541bb4ac..e8e482a82d77 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -622,6 +622,8 @@ struct link_sta_info { * taken from HT/VHT capabilities or VHT operating mode notification * @cparams: CoDel parameters for this station. * @reserved_tid: reserved TID (if any, otherwise IEEE80211_TID_UNRESERVED) + * @amsdu_mesh_control: track the mesh A-MSDU format used by the peer + * (-1: not yet known, 0: non-standard [without mesh header], 1: standard) * @fast_tx: TX fastpath information * @fast_rx: RX fastpath information * @tdls_chandef: a TDLS peer can have a wider chandef that is compatible to -- cgit v1.2.3 From 6c20822fada1b8adb77fa450d03a0d449686a4a9 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Wed, 15 Feb 2023 19:54:40 +0100 Subject: bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit &xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed *before* the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen Signed-off-by: Alexander Lobakin Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau --- net/bpf/test_run.c | 29 +++++++++++++++++----- .../selftests/bpf/prog_tests/xdp_do_redirect.c | 7 +++--- 2 files changed, 27 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index b766a84c8536..1ab396a2b87f 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -97,8 +97,11 @@ reset: struct xdp_page_head { struct xdp_buff orig_ctx; struct xdp_buff ctx; - struct xdp_frame frm; - u8 data[]; + union { + /* ::data_hard_start starts here */ + DECLARE_FLEX_ARRAY(struct xdp_frame, frame); + DECLARE_FLEX_ARRAY(u8, data); + }; }; struct xdp_test_data { @@ -116,6 +119,20 @@ struct xdp_test_data { #define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)) #define TEST_XDP_MAX_BATCH 256 +#if BITS_PER_LONG == 64 && PAGE_SIZE == SZ_4K +/* tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c:%MAX_PKT_SIZE + * must be updated accordingly when any of these changes, otherwise BPF + * selftests will fail. + */ +#ifdef __s390x__ +#define TEST_MAX_PKT_SIZE 3216 +#else +#define TEST_MAX_PKT_SIZE 3408 +#endif +static_assert(SKB_WITH_OVERHEAD(TEST_XDP_FRAME_SIZE - XDP_PACKET_HEADROOM) == + TEST_MAX_PKT_SIZE); +#endif + static void xdp_test_run_init_page(struct page *page, void *arg) { struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); @@ -132,8 +149,8 @@ static void xdp_test_run_init_page(struct page *page, void *arg) headroom -= meta_len; new_ctx = &head->ctx; - frm = &head->frm; - data = &head->data; + frm = head->frame; + data = head->data; memcpy(data + headroom, orig_ctx->data_meta, frm_len); xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq); @@ -223,7 +240,7 @@ static void reset_ctx(struct xdp_page_head *head) head->ctx.data = head->orig_ctx.data; head->ctx.data_meta = head->orig_ctx.data_meta; head->ctx.data_end = head->orig_ctx.data_end; - xdp_update_frame_from_buff(&head->ctx, &head->frm); + xdp_update_frame_from_buff(&head->ctx, head->frame); } static int xdp_recv_frames(struct xdp_frame **frames, int nframes, @@ -285,7 +302,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, head = phys_to_virt(page_to_phys(page)); reset_ctx(head); ctx = &head->ctx; - frm = &head->frm; + frm = head->frame; xdp->frame_cnt++; act = bpf_prog_run_xdp(prog, ctx); diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c index 2666c84dbd01..7271a18ab3e2 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -65,12 +65,13 @@ static int attach_tc_prog(struct bpf_tc_hook *hook, int fd) } /* The maximum permissible size is: PAGE_SIZE - sizeof(struct xdp_page_head) - - * sizeof(struct skb_shared_info) - XDP_PACKET_HEADROOM = 3368 bytes + * SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) - XDP_PACKET_HEADROOM = + * 3408 bytes for 64-byte cacheline and 3216 for 256-byte one. */ #if defined(__s390x__) -#define MAX_PKT_SIZE 3176 +#define MAX_PKT_SIZE 3216 #else -#define MAX_PKT_SIZE 3368 +#define MAX_PKT_SIZE 3408 #endif static void test_max_pkt_size(int fd) { -- cgit v1.2.3 From b4740e3a8137faa5831c690d0bf0b46f41008baf Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:37:57 +0200 Subject: devlink: Split out health reporter create code Move devlink health reporter create/destroy and related dev code to new file health.c. This file shall include all callbacks and functionality that are related to devlink health. In addition, fix kdoc indentation and make reporter create/destroy kdoc more clear. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Signed-off-by: Jakub Kicinski --- net/devlink/Makefile | 2 +- net/devlink/devl_internal.h | 28 ++++++ net/devlink/health.c | 196 +++++++++++++++++++++++++++++++++++++++++ net/devlink/leftover.c | 209 +------------------------------------------- 4 files changed, 226 insertions(+), 209 deletions(-) create mode 100644 net/devlink/health.c (limited to 'net') diff --git a/net/devlink/Makefile b/net/devlink/Makefile index daad4521c61e..ef91a76646a3 100644 --- a/net/devlink/Makefile +++ b/net/devlink/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 -obj-y := leftover.o core.o netlink.o dev.o +obj-y := leftover.o core.o netlink.o dev.o health.o diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 2f4820e40d27..49fe9e2dae34 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -198,6 +198,34 @@ int devlink_resources_validate(struct devlink *devlink, struct devlink_resource *resource, struct genl_info *info); +/* Health */ +struct devlink_health_reporter { + struct list_head list; + void *priv; + const struct devlink_health_reporter_ops *ops; + struct devlink *devlink; + struct devlink_port *devlink_port; + struct devlink_fmsg *dump_fmsg; + struct mutex dump_lock; /* lock parallel read/write from dump buffers */ + u64 graceful_period; + bool auto_recover; + bool auto_dump; + u8 health_state; + u64 dump_ts; + u64 dump_real_ts; + u64 error_count; + u64 recovery_count; + u64 last_recovery_ts; +}; + +struct devlink_health_reporter * +devlink_health_reporter_find_by_name(struct devlink *devlink, + const char *reporter_name); +struct devlink_health_reporter * +devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, + const char *reporter_name); +void devlink_fmsg_free(struct devlink_fmsg *fmsg); + /* Line cards */ struct devlink_linecard; diff --git a/net/devlink/health.c b/net/devlink/health.c new file mode 100644 index 000000000000..18d1f38380b3 --- /dev/null +++ b/net/devlink/health.c @@ -0,0 +1,196 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2016 Jiri Pirko + */ + +#include +#include "devl_internal.h" + +void * +devlink_health_reporter_priv(struct devlink_health_reporter *reporter) +{ + return reporter->priv; +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_priv); + +static struct devlink_health_reporter * +__devlink_health_reporter_find_by_name(struct list_head *reporter_list, + const char *reporter_name) +{ + struct devlink_health_reporter *reporter; + + list_for_each_entry(reporter, reporter_list, list) + if (!strcmp(reporter->ops->name, reporter_name)) + return reporter; + return NULL; +} + +struct devlink_health_reporter * +devlink_health_reporter_find_by_name(struct devlink *devlink, + const char *reporter_name) +{ + return __devlink_health_reporter_find_by_name(&devlink->reporter_list, + reporter_name); +} + +struct devlink_health_reporter * +devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, + const char *reporter_name) +{ + return __devlink_health_reporter_find_by_name(&devlink_port->reporter_list, + reporter_name); +} + +static struct devlink_health_reporter * +__devlink_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + if (WARN_ON(graceful_period && !ops->recover)) + return ERR_PTR(-EINVAL); + + reporter = kzalloc(sizeof(*reporter), GFP_KERNEL); + if (!reporter) + return ERR_PTR(-ENOMEM); + + reporter->priv = priv; + reporter->ops = ops; + reporter->devlink = devlink; + reporter->graceful_period = graceful_period; + reporter->auto_recover = !!ops->recover; + reporter->auto_dump = !!ops->dump; + mutex_init(&reporter->dump_lock); + return reporter; +} + +/** + * devl_port_health_reporter_create() - create devlink health reporter for + * specified port instance + * + * @port: devlink_port to which health reports will relate + * @ops: devlink health reporter ops + * @graceful_period: min time (in msec) between recovery attempts + * @priv: driver priv pointer + */ +struct devlink_health_reporter * +devl_port_health_reporter_create(struct devlink_port *port, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + devl_assert_locked(port->devlink); + + if (__devlink_health_reporter_find_by_name(&port->reporter_list, + ops->name)) + return ERR_PTR(-EEXIST); + + reporter = __devlink_health_reporter_create(port->devlink, ops, + graceful_period, priv); + if (IS_ERR(reporter)) + return reporter; + + reporter->devlink_port = port; + list_add_tail(&reporter->list, &port->reporter_list); + return reporter; +} +EXPORT_SYMBOL_GPL(devl_port_health_reporter_create); + +struct devlink_health_reporter * +devlink_port_health_reporter_create(struct devlink_port *port, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + struct devlink *devlink = port->devlink; + + devl_lock(devlink); + reporter = devl_port_health_reporter_create(port, ops, + graceful_period, priv); + devl_unlock(devlink); + return reporter; +} +EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create); + +/** + * devl_health_reporter_create - create devlink health reporter + * + * @devlink: devlink instance which the health reports will relate + * @ops: devlink health reporter ops + * @graceful_period: min time (in msec) between recovery attempts + * @priv: driver priv pointer + */ +struct devlink_health_reporter * +devl_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + devl_assert_locked(devlink); + + if (devlink_health_reporter_find_by_name(devlink, ops->name)) + return ERR_PTR(-EEXIST); + + reporter = __devlink_health_reporter_create(devlink, ops, + graceful_period, priv); + if (IS_ERR(reporter)) + return reporter; + + list_add_tail(&reporter->list, &devlink->reporter_list); + return reporter; +} +EXPORT_SYMBOL_GPL(devl_health_reporter_create); + +struct devlink_health_reporter * +devlink_health_reporter_create(struct devlink *devlink, + const struct devlink_health_reporter_ops *ops, + u64 graceful_period, void *priv) +{ + struct devlink_health_reporter *reporter; + + devl_lock(devlink); + reporter = devl_health_reporter_create(devlink, ops, + graceful_period, priv); + devl_unlock(devlink); + return reporter; +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_create); + +static void +devlink_health_reporter_free(struct devlink_health_reporter *reporter) +{ + mutex_destroy(&reporter->dump_lock); + if (reporter->dump_fmsg) + devlink_fmsg_free(reporter->dump_fmsg); + kfree(reporter); +} + +/** + * devl_health_reporter_destroy() - destroy devlink health reporter + * + * @reporter: devlink health reporter to destroy + */ +void +devl_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + devl_assert_locked(reporter->devlink); + + list_del(&reporter->list); + devlink_health_reporter_free(reporter); +} +EXPORT_SYMBOL_GPL(devl_health_reporter_destroy); + +void +devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) +{ + struct devlink *devlink = reporter->devlink; + + devl_lock(devlink); + devl_health_reporter_destroy(reporter); + devl_unlock(devlink); +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 3569706c49e1..cfd1b90a0fc1 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5403,7 +5403,7 @@ static struct devlink_fmsg *devlink_fmsg_alloc(void) return fmsg; } -static void devlink_fmsg_free(struct devlink_fmsg *fmsg) +void devlink_fmsg_free(struct devlink_fmsg *fmsg) { struct devlink_fmsg_item *item, *tmp; @@ -5963,213 +5963,6 @@ nla_put_failure: return err; } -struct devlink_health_reporter { - struct list_head list; - void *priv; - const struct devlink_health_reporter_ops *ops; - struct devlink *devlink; - struct devlink_port *devlink_port; - struct devlink_fmsg *dump_fmsg; - struct mutex dump_lock; /* lock parallel read/write from dump buffers */ - u64 graceful_period; - bool auto_recover; - bool auto_dump; - u8 health_state; - u64 dump_ts; - u64 dump_real_ts; - u64 error_count; - u64 recovery_count; - u64 last_recovery_ts; -}; - -void * -devlink_health_reporter_priv(struct devlink_health_reporter *reporter) -{ - return reporter->priv; -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_priv); - -static struct devlink_health_reporter * -__devlink_health_reporter_find_by_name(struct list_head *reporter_list, - const char *reporter_name) -{ - struct devlink_health_reporter *reporter; - - list_for_each_entry(reporter, reporter_list, list) - if (!strcmp(reporter->ops->name, reporter_name)) - return reporter; - return NULL; -} - -static struct devlink_health_reporter * -devlink_health_reporter_find_by_name(struct devlink *devlink, - const char *reporter_name) -{ - return __devlink_health_reporter_find_by_name(&devlink->reporter_list, - reporter_name); -} - -static struct devlink_health_reporter * -devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, - const char *reporter_name) -{ - return __devlink_health_reporter_find_by_name(&devlink_port->reporter_list, - reporter_name); -} - -static struct devlink_health_reporter * -__devlink_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - if (WARN_ON(graceful_period && !ops->recover)) - return ERR_PTR(-EINVAL); - - reporter = kzalloc(sizeof(*reporter), GFP_KERNEL); - if (!reporter) - return ERR_PTR(-ENOMEM); - - reporter->priv = priv; - reporter->ops = ops; - reporter->devlink = devlink; - reporter->graceful_period = graceful_period; - reporter->auto_recover = !!ops->recover; - reporter->auto_dump = !!ops->dump; - mutex_init(&reporter->dump_lock); - return reporter; -} - -/** - * devl_port_health_reporter_create - create devlink health reporter for - * specified port instance - * - * @port: devlink_port which should contain the new reporter - * @ops: ops - * @graceful_period: to avoid recovery loops, in msecs - * @priv: priv - */ -struct devlink_health_reporter * -devl_port_health_reporter_create(struct devlink_port *port, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - devl_assert_locked(port->devlink); - - if (__devlink_health_reporter_find_by_name(&port->reporter_list, - ops->name)) - return ERR_PTR(-EEXIST); - - reporter = __devlink_health_reporter_create(port->devlink, ops, - graceful_period, priv); - if (IS_ERR(reporter)) - return reporter; - - reporter->devlink_port = port; - list_add_tail(&reporter->list, &port->reporter_list); - return reporter; -} -EXPORT_SYMBOL_GPL(devl_port_health_reporter_create); - -struct devlink_health_reporter * -devlink_port_health_reporter_create(struct devlink_port *port, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - struct devlink *devlink = port->devlink; - - devl_lock(devlink); - reporter = devl_port_health_reporter_create(port, ops, - graceful_period, priv); - devl_unlock(devlink); - return reporter; -} -EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create); - -/** - * devl_health_reporter_create - create devlink health reporter - * - * @devlink: devlink - * @ops: ops - * @graceful_period: to avoid recovery loops, in msecs - * @priv: priv - */ -struct devlink_health_reporter * -devl_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - devl_assert_locked(devlink); - - if (devlink_health_reporter_find_by_name(devlink, ops->name)) - return ERR_PTR(-EEXIST); - - reporter = __devlink_health_reporter_create(devlink, ops, - graceful_period, priv); - if (IS_ERR(reporter)) - return reporter; - - list_add_tail(&reporter->list, &devlink->reporter_list); - return reporter; -} -EXPORT_SYMBOL_GPL(devl_health_reporter_create); - -struct devlink_health_reporter * -devlink_health_reporter_create(struct devlink *devlink, - const struct devlink_health_reporter_ops *ops, - u64 graceful_period, void *priv) -{ - struct devlink_health_reporter *reporter; - - devl_lock(devlink); - reporter = devl_health_reporter_create(devlink, ops, - graceful_period, priv); - devl_unlock(devlink); - return reporter; -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_create); - -static void -devlink_health_reporter_free(struct devlink_health_reporter *reporter) -{ - mutex_destroy(&reporter->dump_lock); - if (reporter->dump_fmsg) - devlink_fmsg_free(reporter->dump_fmsg); - kfree(reporter); -} - -/** - * devl_health_reporter_destroy - destroy devlink health reporter - * - * @reporter: devlink health reporter to destroy - */ -void -devl_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - devl_assert_locked(reporter->devlink); - - list_del(&reporter->list); - devlink_health_reporter_free(reporter); -} -EXPORT_SYMBOL_GPL(devl_health_reporter_destroy); - -void -devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) -{ - struct devlink *devlink = reporter->devlink; - - devl_lock(devlink); - devl_health_reporter_destroy(reporter); - devl_unlock(devlink); -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); - static int devlink_nl_health_reporter_fill(struct sk_buff *msg, struct devlink_health_reporter *reporter, -- cgit v1.2.3 From bfd4e6a5dbbc12f77620602e764ac940ccb159de Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:37:58 +0200 Subject: devlink: health: Fix nla_nest_end in error flow devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix it to call nla_nest_cancel() instead. Note the bug is harmless as genlmsg_cancel() cancel the entire message, so no fixes tag added. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/leftover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index cfd1b90a0fc1..90f95f06de28 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -6028,7 +6028,7 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg, return 0; reporter_nest_cancel: - nla_nest_end(msg, reporter_attr); + nla_nest_cancel(msg, reporter_attr); genlmsg_cancel: genlmsg_cancel(msg, hdr); return -EMSGSIZE; -- cgit v1.2.3 From db6b5f3ec400479c0906be6884c9ecceaf0b8c46 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:37:59 +0200 Subject: devlink: Move devlink health get and set code to health file Move devlink health get and set callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 18 ++++ net/devlink/health.c | 214 +++++++++++++++++++++++++++++++++++++++++++ net/devlink/leftover.c | 219 +------------------------------------------- 3 files changed, 234 insertions(+), 217 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 49fe9e2dae34..085f80b5feb8 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -176,6 +176,8 @@ int devlink_port_netdevice_event(struct notifier_block *nb, struct devlink_port * devlink_port_get_from_info(struct devlink *devlink, struct genl_info *info); +struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs); /* Reload */ bool devlink_reload_actions_valid(const struct devlink_ops *ops); @@ -224,6 +226,18 @@ devlink_health_reporter_find_by_name(struct devlink *devlink, struct devlink_health_reporter * devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, const char *reporter_name); +struct devlink_health_reporter * +devlink_health_reporter_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs); +struct devlink_health_reporter * +devlink_health_reporter_get_from_info(struct devlink *devlink, + struct genl_info *info); +int +devlink_nl_health_reporter_fill(struct sk_buff *msg, + struct devlink_health_reporter *reporter, + enum devlink_command cmd, u32 portid, + u32 seq, int flags); + void devlink_fmsg_free(struct devlink_fmsg *fmsg); /* Line cards */ @@ -249,3 +263,7 @@ int devlink_nl_cmd_info_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_flash_update(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_selftests_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_selftests_run(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, + struct genl_info *info); +int devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, + struct genl_info *info); diff --git a/net/devlink/health.c b/net/devlink/health.c index 18d1f38380b3..1c92f369c918 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -194,3 +194,217 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) devl_unlock(devlink); } EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); + +int +devlink_nl_health_reporter_fill(struct sk_buff *msg, + struct devlink_health_reporter *reporter, + enum devlink_command cmd, u32 portid, + u32 seq, int flags) +{ + struct devlink *devlink = reporter->devlink; + struct nlattr *reporter_attr; + void *hdr; + + hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); + if (!hdr) + return -EMSGSIZE; + + if (devlink_nl_put_handle(msg, devlink)) + goto genlmsg_cancel; + + if (reporter->devlink_port) { + if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, reporter->devlink_port->index)) + goto genlmsg_cancel; + } + reporter_attr = nla_nest_start_noflag(msg, + DEVLINK_ATTR_HEALTH_REPORTER); + if (!reporter_attr) + goto genlmsg_cancel; + if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME, + reporter->ops->name)) + goto reporter_nest_cancel; + if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE, + reporter->health_state)) + goto reporter_nest_cancel; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR_COUNT, + reporter->error_count, DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT, + reporter->recovery_count, DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->ops->recover && + nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, + reporter->graceful_period, + DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->ops->recover && + nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, + reporter->auto_recover)) + goto reporter_nest_cancel; + if (reporter->dump_fmsg && + nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, + jiffies_to_msecs(reporter->dump_ts), + DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->dump_fmsg && + nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS, + reporter->dump_real_ts, DEVLINK_ATTR_PAD)) + goto reporter_nest_cancel; + if (reporter->ops->dump && + nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, + reporter->auto_dump)) + goto reporter_nest_cancel; + + nla_nest_end(msg, reporter_attr); + genlmsg_end(msg, hdr); + return 0; + +reporter_nest_cancel: + nla_nest_cancel(msg, reporter_attr); +genlmsg_cancel: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +struct devlink_health_reporter * +devlink_health_reporter_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs) +{ + struct devlink_port *devlink_port; + char *reporter_name; + + if (!attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]) + return NULL; + + reporter_name = nla_data(attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]); + devlink_port = devlink_port_get_from_attrs(devlink, attrs); + if (IS_ERR(devlink_port)) + return devlink_health_reporter_find_by_name(devlink, + reporter_name); + else + return devlink_port_health_reporter_find_by_name(devlink_port, + reporter_name); +} + +struct devlink_health_reporter * +devlink_health_reporter_get_from_info(struct devlink *devlink, + struct genl_info *info) +{ + return devlink_health_reporter_get_from_attrs(devlink, info->attrs); +} + +int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + struct sk_buff *msg; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + err = devlink_nl_health_reporter_fill(msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + info->snd_portid, info->snd_seq, + 0); + if (err) { + nlmsg_free(msg); + return err; + } + + return genlmsg_reply(msg, info); +} + +static int +devlink_nl_cmd_health_reporter_get_dump_one(struct sk_buff *msg, + struct devlink *devlink, + struct netlink_callback *cb) +{ + struct devlink_nl_dump_state *state = devlink_dump_state(cb); + struct devlink_health_reporter *reporter; + struct devlink_port *port; + unsigned long port_index; + int idx = 0; + int err; + + list_for_each_entry(reporter, &devlink->reporter_list, list) { + if (idx < state->idx) { + idx++; + continue; + } + err = devlink_nl_health_reporter_fill(msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + return err; + } + idx++; + } + xa_for_each(&devlink->ports, port_index, port) { + list_for_each_entry(reporter, &port->reporter_list, list) { + if (idx < state->idx) { + idx++; + continue; + } + err = devlink_nl_health_reporter_fill(msg, reporter, + DEVLINK_CMD_HEALTH_REPORTER_GET, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NLM_F_MULTI); + if (err) { + state->idx = idx; + return err; + } + idx++; + } + } + + return 0; +} + +const struct devlink_cmd devl_cmd_health_reporter_get = { + .dump_one = devlink_nl_cmd_health_reporter_get_dump_one, +}; + +int devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->recover && + (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) + return -EOPNOTSUPP; + + if (!reporter->ops->dump && + info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) + return -EOPNOTSUPP; + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) + reporter->graceful_period = + nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]); + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]) + reporter->auto_recover = + nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]); + + if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) + reporter->auto_dump = + nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]); + + return 0; +} diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 90f95f06de28..0b1c5e0122f3 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -156,8 +156,8 @@ static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink, return xa_load(&devlink->ports, port_index); } -static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, - struct nlattr **attrs) +struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink, + struct nlattr **attrs) { if (attrs[DEVLINK_ATTR_PORT_INDEX]) { u32 port_index = nla_get_u32(attrs[DEVLINK_ATTR_PORT_INDEX]); @@ -5963,77 +5963,6 @@ nla_put_failure: return err; } -static int -devlink_nl_health_reporter_fill(struct sk_buff *msg, - struct devlink_health_reporter *reporter, - enum devlink_command cmd, u32 portid, - u32 seq, int flags) -{ - struct devlink *devlink = reporter->devlink; - struct nlattr *reporter_attr; - void *hdr; - - hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd); - if (!hdr) - return -EMSGSIZE; - - if (devlink_nl_put_handle(msg, devlink)) - goto genlmsg_cancel; - - if (reporter->devlink_port) { - if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, reporter->devlink_port->index)) - goto genlmsg_cancel; - } - reporter_attr = nla_nest_start_noflag(msg, - DEVLINK_ATTR_HEALTH_REPORTER); - if (!reporter_attr) - goto genlmsg_cancel; - if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME, - reporter->ops->name)) - goto reporter_nest_cancel; - if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE, - reporter->health_state)) - goto reporter_nest_cancel; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR_COUNT, - reporter->error_count, DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT, - reporter->recovery_count, DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->ops->recover && - nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, - reporter->graceful_period, - DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->ops->recover && - nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, - reporter->auto_recover)) - goto reporter_nest_cancel; - if (reporter->dump_fmsg && - nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, - jiffies_to_msecs(reporter->dump_ts), - DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->dump_fmsg && - nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS_NS, - reporter->dump_real_ts, DEVLINK_ATTR_PAD)) - goto reporter_nest_cancel; - if (reporter->ops->dump && - nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP, - reporter->auto_dump)) - goto reporter_nest_cancel; - - nla_nest_end(msg, reporter_attr); - genlmsg_end(msg, hdr); - return 0; - -reporter_nest_cancel: - nla_nest_cancel(msg, reporter_attr); -genlmsg_cancel: - genlmsg_cancel(msg, hdr); - return -EMSGSIZE; -} - static void devlink_recover_notify(struct devlink_health_reporter *reporter, enum devlink_command cmd) { @@ -6188,33 +6117,6 @@ int devlink_health_report(struct devlink_health_reporter *reporter, } EXPORT_SYMBOL_GPL(devlink_health_report); -static struct devlink_health_reporter * -devlink_health_reporter_get_from_attrs(struct devlink *devlink, - struct nlattr **attrs) -{ - struct devlink_port *devlink_port; - char *reporter_name; - - if (!attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]) - return NULL; - - reporter_name = nla_data(attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]); - devlink_port = devlink_port_get_from_attrs(devlink, attrs); - if (IS_ERR(devlink_port)) - return devlink_health_reporter_find_by_name(devlink, - reporter_name); - else - return devlink_port_health_reporter_find_by_name(devlink_port, - reporter_name); -} - -static struct devlink_health_reporter * -devlink_health_reporter_get_from_info(struct devlink *devlink, - struct genl_info *info) -{ - return devlink_health_reporter_get_from_attrs(devlink, info->attrs); -} - static struct devlink_health_reporter * devlink_health_reporter_get_from_cb(struct netlink_callback *cb) { @@ -6251,123 +6153,6 @@ devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, } EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update); -static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - struct sk_buff *msg; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return -ENOMEM; - - err = devlink_nl_health_reporter_fill(msg, reporter, - DEVLINK_CMD_HEALTH_REPORTER_GET, - info->snd_portid, info->snd_seq, - 0); - if (err) { - nlmsg_free(msg); - return err; - } - - return genlmsg_reply(msg, info); -} - -static int -devlink_nl_cmd_health_reporter_get_dump_one(struct sk_buff *msg, - struct devlink *devlink, - struct netlink_callback *cb) -{ - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink_health_reporter *reporter; - struct devlink_port *port; - unsigned long port_index; - int idx = 0; - int err; - - list_for_each_entry(reporter, &devlink->reporter_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_health_reporter_fill(msg, reporter, - DEVLINK_CMD_HEALTH_REPORTER_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - state->idx = idx; - return err; - } - idx++; - } - xa_for_each(&devlink->ports, port_index, port) { - list_for_each_entry(reporter, &port->reporter_list, list) { - if (idx < state->idx) { - idx++; - continue; - } - err = devlink_nl_health_reporter_fill(msg, reporter, - DEVLINK_CMD_HEALTH_REPORTER_GET, - NETLINK_CB(cb->skb).portid, - cb->nlh->nlmsg_seq, - NLM_F_MULTI); - if (err) { - state->idx = idx; - return err; - } - idx++; - } - } - - return 0; -} - -const struct devlink_cmd devl_cmd_health_reporter_get = { - .dump_one = devlink_nl_cmd_health_reporter_get_dump_one, -}; - -static int -devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->recover && - (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] || - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])) - return -EOPNOTSUPP; - - if (!reporter->ops->dump && - info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) - return -EOPNOTSUPP; - - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) - reporter->graceful_period = - nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]); - - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]) - reporter->auto_recover = - nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]); - - if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]) - reporter->auto_dump = - nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP]); - - return 0; -} - static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, struct genl_info *info) { -- cgit v1.2.3 From 55b9b249685214cc700113e6b677b78d2c0b97f9 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:38:00 +0200 Subject: devlink: Move devlink health report and recover to health file Move devlink health report helper and recover callback and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 5 ++ net/devlink/health.c | 136 ++++++++++++++++++++++++++++++++++++++++++ net/devlink/leftover.c | 141 +------------------------------------------- 3 files changed, 144 insertions(+), 138 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 085f80b5feb8..d1a901cb5900 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -237,6 +237,9 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg, struct devlink_health_reporter *reporter, enum devlink_command cmd, u32 portid, u32 seq, int flags); +int devlink_health_do_dump(struct devlink_health_reporter *reporter, + void *priv_ctx, + struct netlink_ext_ack *extack); void devlink_fmsg_free(struct devlink_fmsg *fmsg); @@ -267,3 +270,5 @@ int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, + struct genl_info *info); diff --git a/net/devlink/health.c b/net/devlink/health.c index 1c92f369c918..bfeb71f17ff0 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -5,6 +5,7 @@ */ #include +#include #include "devl_internal.h" void * @@ -408,3 +409,138 @@ int devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, return 0; } + +static void devlink_recover_notify(struct devlink_health_reporter *reporter, + enum devlink_command cmd) +{ + struct devlink *devlink = reporter->devlink; + struct sk_buff *msg; + int err; + + WARN_ON(cmd != DEVLINK_CMD_HEALTH_REPORTER_RECOVER); + WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return; + + err = devlink_nl_health_reporter_fill(msg, reporter, cmd, 0, 0, 0); + if (err) { + nlmsg_free(msg); + return; + } + + genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, + 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); +} + +void +devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter) +{ + reporter->recovery_count++; + reporter->last_recovery_ts = jiffies; +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done); + +static int +devlink_health_reporter_recover(struct devlink_health_reporter *reporter, + void *priv_ctx, struct netlink_ext_ack *extack) +{ + int err; + + if (reporter->health_state == DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) + return 0; + + if (!reporter->ops->recover) + return -EOPNOTSUPP; + + err = reporter->ops->recover(reporter, priv_ctx, extack); + if (err) + return err; + + devlink_health_reporter_recovery_done(reporter); + reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; + devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); + + return 0; +} + +int devlink_health_report(struct devlink_health_reporter *reporter, + const char *msg, void *priv_ctx) +{ + enum devlink_health_reporter_state prev_health_state; + struct devlink *devlink = reporter->devlink; + unsigned long recover_ts_threshold; + int ret; + + /* write a log message of the current error */ + WARN_ON(!msg); + trace_devlink_health_report(devlink, reporter->ops->name, msg); + reporter->error_count++; + prev_health_state = reporter->health_state; + reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; + devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); + + /* abort if the previous error wasn't recovered */ + recover_ts_threshold = reporter->last_recovery_ts + + msecs_to_jiffies(reporter->graceful_period); + if (reporter->auto_recover && + (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || + (reporter->last_recovery_ts && reporter->recovery_count && + time_is_after_jiffies(recover_ts_threshold)))) { + trace_devlink_health_recover_aborted(devlink, + reporter->ops->name, + reporter->health_state, + jiffies - + reporter->last_recovery_ts); + return -ECANCELED; + } + + if (reporter->auto_dump) { + mutex_lock(&reporter->dump_lock); + /* store current dump of current error, for later analysis */ + devlink_health_do_dump(reporter, priv_ctx, NULL); + mutex_unlock(&reporter->dump_lock); + } + + if (!reporter->auto_recover) + return 0; + + devl_lock(devlink); + ret = devlink_health_reporter_recover(reporter, priv_ctx, NULL); + devl_unlock(devlink); + + return ret; +} +EXPORT_SYMBOL_GPL(devlink_health_report); + +void +devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, + enum devlink_health_reporter_state state) +{ + if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY && + state != DEVLINK_HEALTH_REPORTER_STATE_ERROR)) + return; + + if (reporter->health_state == state) + return; + + reporter->health_state = state; + trace_devlink_health_reporter_state_update(reporter->devlink, + reporter->ops->name, state); + devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); +} +EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update); + +int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + return devlink_health_reporter_recover(reporter, NULL, info->extack); +} diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 0b1c5e0122f3..03184dd3d271 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5963,61 +5963,6 @@ nla_put_failure: return err; } -static void devlink_recover_notify(struct devlink_health_reporter *reporter, - enum devlink_command cmd) -{ - struct devlink *devlink = reporter->devlink; - struct sk_buff *msg; - int err; - - WARN_ON(cmd != DEVLINK_CMD_HEALTH_REPORTER_RECOVER); - WARN_ON(!xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED)); - - msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!msg) - return; - - err = devlink_nl_health_reporter_fill(msg, reporter, cmd, 0, 0, 0); - if (err) { - nlmsg_free(msg); - return; - } - - genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink), msg, - 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL); -} - -void -devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter) -{ - reporter->recovery_count++; - reporter->last_recovery_ts = jiffies; -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done); - -static int -devlink_health_reporter_recover(struct devlink_health_reporter *reporter, - void *priv_ctx, struct netlink_ext_ack *extack) -{ - int err; - - if (reporter->health_state == DEVLINK_HEALTH_REPORTER_STATE_HEALTHY) - return 0; - - if (!reporter->ops->recover) - return -EOPNOTSUPP; - - err = reporter->ops->recover(reporter, priv_ctx, extack); - if (err) - return err; - - devlink_health_reporter_recovery_done(reporter); - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY; - devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); - - return 0; -} - static void devlink_health_dump_clear(struct devlink_health_reporter *reporter) { @@ -6027,9 +5972,9 @@ devlink_health_dump_clear(struct devlink_health_reporter *reporter) reporter->dump_fmsg = NULL; } -static int devlink_health_do_dump(struct devlink_health_reporter *reporter, - void *priv_ctx, - struct netlink_ext_ack *extack) +int devlink_health_do_dump(struct devlink_health_reporter *reporter, + void *priv_ctx, + struct netlink_ext_ack *extack) { int err; @@ -6068,55 +6013,6 @@ dump_err: return err; } -int devlink_health_report(struct devlink_health_reporter *reporter, - const char *msg, void *priv_ctx) -{ - enum devlink_health_reporter_state prev_health_state; - struct devlink *devlink = reporter->devlink; - unsigned long recover_ts_threshold; - int ret; - - /* write a log message of the current error */ - WARN_ON(!msg); - trace_devlink_health_report(devlink, reporter->ops->name, msg); - reporter->error_count++; - prev_health_state = reporter->health_state; - reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR; - devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); - - /* abort if the previous error wasn't recovered */ - recover_ts_threshold = reporter->last_recovery_ts + - msecs_to_jiffies(reporter->graceful_period); - if (reporter->auto_recover && - (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY || - (reporter->last_recovery_ts && reporter->recovery_count && - time_is_after_jiffies(recover_ts_threshold)))) { - trace_devlink_health_recover_aborted(devlink, - reporter->ops->name, - reporter->health_state, - jiffies - - reporter->last_recovery_ts); - return -ECANCELED; - } - - if (reporter->auto_dump) { - mutex_lock(&reporter->dump_lock); - /* store current dump of current error, for later analysis */ - devlink_health_do_dump(reporter, priv_ctx, NULL); - mutex_unlock(&reporter->dump_lock); - } - - if (!reporter->auto_recover) - return 0; - - devl_lock(devlink); - ret = devlink_health_reporter_recover(reporter, priv_ctx, NULL); - devl_unlock(devlink); - - return ret; -} -EXPORT_SYMBOL_GPL(devlink_health_report); - static struct devlink_health_reporter * devlink_health_reporter_get_from_cb(struct netlink_callback *cb) { @@ -6135,37 +6031,6 @@ devlink_health_reporter_get_from_cb(struct netlink_callback *cb) return reporter; } -void -devlink_health_reporter_state_update(struct devlink_health_reporter *reporter, - enum devlink_health_reporter_state state) -{ - if (WARN_ON(state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY && - state != DEVLINK_HEALTH_REPORTER_STATE_ERROR)) - return; - - if (reporter->health_state == state) - return; - - reporter->health_state = state; - trace_devlink_health_reporter_state_update(reporter->devlink, - reporter->ops->name, state); - devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER); -} -EXPORT_SYMBOL_GPL(devlink_health_reporter_state_update); - -static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - return devlink_health_reporter_recover(reporter, NULL, info->extack); -} - static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, struct genl_info *info) { -- cgit v1.2.3 From a929df7fd9c6fbfb4ebbed7c9cf96657e9f622b6 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:38:01 +0200 Subject: devlink: Move devlink fmsg and health diagnose to health file Devlink fmsg (formatted message) is used by devlink health diagnose, dump and drivers which support these devlink health callbacks. Therefore, move devlink fmsg helpers and related code to file health.c. Move devlink health diagnose to file health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 6 + net/devlink/health.c | 630 ++++++++++++++++++++++++++++++++++++++++++++ net/devlink/leftover.c | 630 -------------------------------------------- 3 files changed, 636 insertions(+), 630 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index d1a901cb5900..a4b96f8a0ab4 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -240,7 +240,11 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg, int devlink_health_do_dump(struct devlink_health_reporter *reporter, void *priv_ctx, struct netlink_ext_ack *extack); +int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, + struct netlink_callback *cb, + enum devlink_command cmd); +struct devlink_fmsg *devlink_fmsg_alloc(void); void devlink_fmsg_free(struct devlink_fmsg *fmsg); /* Line cards */ @@ -272,3 +276,5 @@ int devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, + struct genl_info *info); diff --git a/net/devlink/health.c b/net/devlink/health.c index bfeb71f17ff0..fbeedc8df043 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -8,6 +8,48 @@ #include #include "devl_internal.h" +struct devlink_fmsg_item { + struct list_head list; + int attrtype; + u8 nla_type; + u16 len; + int value[]; +}; + +struct devlink_fmsg { + struct list_head item_list; + bool putting_binary; /* This flag forces enclosing of binary data + * in an array brackets. It forces using + * of designated API: + * devlink_fmsg_binary_pair_nest_start() + * devlink_fmsg_binary_pair_nest_end() + */ +}; + +struct devlink_fmsg *devlink_fmsg_alloc(void) +{ + struct devlink_fmsg *fmsg; + + fmsg = kzalloc(sizeof(*fmsg), GFP_KERNEL); + if (!fmsg) + return NULL; + + INIT_LIST_HEAD(&fmsg->item_list); + + return fmsg; +} + +void devlink_fmsg_free(struct devlink_fmsg *fmsg) +{ + struct devlink_fmsg_item *item, *tmp; + + list_for_each_entry_safe(item, tmp, &fmsg->item_list, list) { + list_del(&item->list); + kfree(item); + } + kfree(fmsg); +} + void * devlink_health_reporter_priv(struct devlink_health_reporter *reporter) { @@ -544,3 +586,591 @@ int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, return devlink_health_reporter_recover(reporter, NULL, info->extack); } + +static int devlink_fmsg_nest_common(struct devlink_fmsg *fmsg, + int attrtype) +{ + struct devlink_fmsg_item *item; + + item = kzalloc(sizeof(*item), GFP_KERNEL); + if (!item) + return -ENOMEM; + + item->attrtype = attrtype; + list_add_tail(&item->list, &fmsg->item_list); + + return 0; +} + +int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_OBJ_NEST_START); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_start); + +static int devlink_fmsg_nest_end(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_NEST_END); +} + +int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_end(fmsg); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_end); + +#define DEVLINK_FMSG_MAX_SIZE (GENLMSG_DEFAULT_SIZE - GENL_HDRLEN - NLA_HDRLEN) + +static int devlink_fmsg_put_name(struct devlink_fmsg *fmsg, const char *name) +{ + struct devlink_fmsg_item *item; + + if (fmsg->putting_binary) + return -EINVAL; + + if (strlen(name) + 1 > DEVLINK_FMSG_MAX_SIZE) + return -EMSGSIZE; + + item = kzalloc(sizeof(*item) + strlen(name) + 1, GFP_KERNEL); + if (!item) + return -ENOMEM; + + item->nla_type = NLA_NUL_STRING; + item->len = strlen(name) + 1; + item->attrtype = DEVLINK_ATTR_FMSG_OBJ_NAME; + memcpy(&item->value, name, item->len); + list_add_tail(&item->list, &fmsg->item_list); + + return 0; +} + +int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name) +{ + int err; + + if (fmsg->putting_binary) + return -EINVAL; + + err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_PAIR_NEST_START); + if (err) + return err; + + err = devlink_fmsg_put_name(fmsg, name); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_start); + +int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_nest_end(fmsg); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_end); + +int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg, + const char *name) +{ + int err; + + if (fmsg->putting_binary) + return -EINVAL; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_ARR_NEST_START); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_start); + +int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg) +{ + int err; + + if (fmsg->putting_binary) + return -EINVAL; + + err = devlink_fmsg_nest_end(fmsg); + if (err) + return err; + + err = devlink_fmsg_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_end); + +int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg *fmsg, + const char *name) +{ + int err; + + err = devlink_fmsg_arr_pair_nest_start(fmsg, name); + if (err) + return err; + + fmsg->putting_binary = true; + return err; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_start); + +int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg *fmsg) +{ + if (!fmsg->putting_binary) + return -EINVAL; + + fmsg->putting_binary = false; + return devlink_fmsg_arr_pair_nest_end(fmsg); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_end); + +static int devlink_fmsg_put_value(struct devlink_fmsg *fmsg, + const void *value, u16 value_len, + u8 value_nla_type) +{ + struct devlink_fmsg_item *item; + + if (value_len > DEVLINK_FMSG_MAX_SIZE) + return -EMSGSIZE; + + item = kzalloc(sizeof(*item) + value_len, GFP_KERNEL); + if (!item) + return -ENOMEM; + + item->nla_type = value_nla_type; + item->len = value_len; + item->attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; + memcpy(&item->value, value, item->len); + list_add_tail(&item->list, &fmsg->item_list); + + return 0; +} + +static int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_FLAG); +} + +static int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U8); +} + +int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U32); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u32_put); + +static int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U64); +} + +int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value) +{ + if (fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, value, strlen(value) + 1, + NLA_NUL_STRING); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_string_put); + +int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value, + u16 value_len) +{ + if (!fmsg->putting_binary) + return -EINVAL; + + return devlink_fmsg_put_value(fmsg, value, value_len, NLA_BINARY); +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_put); + +int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name, + bool value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_bool_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_bool_pair_put); + +int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name, + u8 value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_u8_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u8_pair_put); + +int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name, + u32 value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_u32_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u32_pair_put); + +int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name, + u64 value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_u64_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_u64_pair_put); + +int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name, + const char *value) +{ + int err; + + err = devlink_fmsg_pair_nest_start(fmsg, name); + if (err) + return err; + + err = devlink_fmsg_string_put(fmsg, value); + if (err) + return err; + + err = devlink_fmsg_pair_nest_end(fmsg); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_string_pair_put); + +int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name, + const void *value, u32 value_len) +{ + u32 data_size; + int end_err; + u32 offset; + int err; + + err = devlink_fmsg_binary_pair_nest_start(fmsg, name); + if (err) + return err; + + for (offset = 0; offset < value_len; offset += data_size) { + data_size = value_len - offset; + if (data_size > DEVLINK_FMSG_MAX_SIZE) + data_size = DEVLINK_FMSG_MAX_SIZE; + err = devlink_fmsg_binary_put(fmsg, value + offset, data_size); + if (err) + break; + /* Exit from loop with a break (instead of + * return) to make sure putting_binary is turned off in + * devlink_fmsg_binary_pair_nest_end + */ + } + + end_err = devlink_fmsg_binary_pair_nest_end(fmsg); + if (end_err) + err = end_err; + + return err; +} +EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_put); + +static int +devlink_fmsg_item_fill_type(struct devlink_fmsg_item *msg, struct sk_buff *skb) +{ + switch (msg->nla_type) { + case NLA_FLAG: + case NLA_U8: + case NLA_U32: + case NLA_U64: + case NLA_NUL_STRING: + case NLA_BINARY: + return nla_put_u8(skb, DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, + msg->nla_type); + default: + return -EINVAL; + } +} + +static int +devlink_fmsg_item_fill_data(struct devlink_fmsg_item *msg, struct sk_buff *skb) +{ + int attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; + u8 tmp; + + switch (msg->nla_type) { + case NLA_FLAG: + /* Always provide flag data, regardless of its value */ + tmp = *(bool *)msg->value; + + return nla_put_u8(skb, attrtype, tmp); + case NLA_U8: + return nla_put_u8(skb, attrtype, *(u8 *)msg->value); + case NLA_U32: + return nla_put_u32(skb, attrtype, *(u32 *)msg->value); + case NLA_U64: + return nla_put_u64_64bit(skb, attrtype, *(u64 *)msg->value, + DEVLINK_ATTR_PAD); + case NLA_NUL_STRING: + return nla_put_string(skb, attrtype, (char *)&msg->value); + case NLA_BINARY: + return nla_put(skb, attrtype, msg->len, (void *)&msg->value); + default: + return -EINVAL; + } +} + +static int +devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb, + int *start) +{ + struct devlink_fmsg_item *item; + struct nlattr *fmsg_nlattr; + int err = 0; + int i = 0; + + fmsg_nlattr = nla_nest_start_noflag(skb, DEVLINK_ATTR_FMSG); + if (!fmsg_nlattr) + return -EMSGSIZE; + + list_for_each_entry(item, &fmsg->item_list, list) { + if (i < *start) { + i++; + continue; + } + + switch (item->attrtype) { + case DEVLINK_ATTR_FMSG_OBJ_NEST_START: + case DEVLINK_ATTR_FMSG_PAIR_NEST_START: + case DEVLINK_ATTR_FMSG_ARR_NEST_START: + case DEVLINK_ATTR_FMSG_NEST_END: + err = nla_put_flag(skb, item->attrtype); + break; + case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA: + err = devlink_fmsg_item_fill_type(item, skb); + if (err) + break; + err = devlink_fmsg_item_fill_data(item, skb); + break; + case DEVLINK_ATTR_FMSG_OBJ_NAME: + err = nla_put_string(skb, item->attrtype, + (char *)&item->value); + break; + default: + err = -EINVAL; + break; + } + if (!err) + *start = ++i; + else + break; + } + + nla_nest_end(skb, fmsg_nlattr); + return err; +} + +static int devlink_fmsg_snd(struct devlink_fmsg *fmsg, + struct genl_info *info, + enum devlink_command cmd, int flags) +{ + struct nlmsghdr *nlh; + struct sk_buff *skb; + bool last = false; + int index = 0; + void *hdr; + int err; + + while (!last) { + int tmp_index = index; + + skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!skb) + return -ENOMEM; + + hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, + &devlink_nl_family, flags | NLM_F_MULTI, cmd); + if (!hdr) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + err = devlink_fmsg_prepare_skb(fmsg, skb, &index); + if (!err) + last = true; + else if (err != -EMSGSIZE || tmp_index == index) + goto nla_put_failure; + + genlmsg_end(skb, hdr); + err = genlmsg_reply(skb, info); + if (err) + return err; + } + + skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!skb) + return -ENOMEM; + nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, + NLMSG_DONE, 0, flags | NLM_F_MULTI); + if (!nlh) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + return genlmsg_reply(skb, info); + +nla_put_failure: + nlmsg_free(skb); + return err; +} + +int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, + struct netlink_callback *cb, + enum devlink_command cmd) +{ + struct devlink_nl_dump_state *state = devlink_dump_state(cb); + int index = state->idx; + int tmp_index = index; + void *hdr; + int err; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, + &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI, cmd); + if (!hdr) { + err = -EMSGSIZE; + goto nla_put_failure; + } + + err = devlink_fmsg_prepare_skb(fmsg, skb, &index); + if ((err && err != -EMSGSIZE) || tmp_index == index) + goto nla_put_failure; + + state->idx = index; + genlmsg_end(skb, hdr); + return skb->len; + +nla_put_failure: + genlmsg_cancel(skb, hdr); + return err; +} + +int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + struct devlink_fmsg *fmsg; + int err; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->diagnose) + return -EOPNOTSUPP; + + fmsg = devlink_fmsg_alloc(); + if (!fmsg) + return -ENOMEM; + + err = devlink_fmsg_obj_nest_start(fmsg); + if (err) + goto out; + + err = reporter->ops->diagnose(reporter, fmsg, info->extack); + if (err) + goto out; + + err = devlink_fmsg_obj_nest_end(fmsg); + if (err) + goto out; + + err = devlink_fmsg_snd(fmsg, info, + DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); + +out: + devlink_fmsg_free(fmsg); + return err; +} diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 03184dd3d271..e460bcf1d247 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5372,597 +5372,6 @@ out_unlock: return err; } -struct devlink_fmsg_item { - struct list_head list; - int attrtype; - u8 nla_type; - u16 len; - int value[]; -}; - -struct devlink_fmsg { - struct list_head item_list; - bool putting_binary; /* This flag forces enclosing of binary data - * in an array brackets. It forces using - * of designated API: - * devlink_fmsg_binary_pair_nest_start() - * devlink_fmsg_binary_pair_nest_end() - */ -}; - -static struct devlink_fmsg *devlink_fmsg_alloc(void) -{ - struct devlink_fmsg *fmsg; - - fmsg = kzalloc(sizeof(*fmsg), GFP_KERNEL); - if (!fmsg) - return NULL; - - INIT_LIST_HEAD(&fmsg->item_list); - - return fmsg; -} - -void devlink_fmsg_free(struct devlink_fmsg *fmsg) -{ - struct devlink_fmsg_item *item, *tmp; - - list_for_each_entry_safe(item, tmp, &fmsg->item_list, list) { - list_del(&item->list); - kfree(item); - } - kfree(fmsg); -} - -static int devlink_fmsg_nest_common(struct devlink_fmsg *fmsg, - int attrtype) -{ - struct devlink_fmsg_item *item; - - item = kzalloc(sizeof(*item), GFP_KERNEL); - if (!item) - return -ENOMEM; - - item->attrtype = attrtype; - list_add_tail(&item->list, &fmsg->item_list); - - return 0; -} - -int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_OBJ_NEST_START); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_start); - -static int devlink_fmsg_nest_end(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_NEST_END); -} - -int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_end(fmsg); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_end); - -#define DEVLINK_FMSG_MAX_SIZE (GENLMSG_DEFAULT_SIZE - GENL_HDRLEN - NLA_HDRLEN) - -static int devlink_fmsg_put_name(struct devlink_fmsg *fmsg, const char *name) -{ - struct devlink_fmsg_item *item; - - if (fmsg->putting_binary) - return -EINVAL; - - if (strlen(name) + 1 > DEVLINK_FMSG_MAX_SIZE) - return -EMSGSIZE; - - item = kzalloc(sizeof(*item) + strlen(name) + 1, GFP_KERNEL); - if (!item) - return -ENOMEM; - - item->nla_type = NLA_NUL_STRING; - item->len = strlen(name) + 1; - item->attrtype = DEVLINK_ATTR_FMSG_OBJ_NAME; - memcpy(&item->value, name, item->len); - list_add_tail(&item->list, &fmsg->item_list); - - return 0; -} - -int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name) -{ - int err; - - if (fmsg->putting_binary) - return -EINVAL; - - err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_PAIR_NEST_START); - if (err) - return err; - - err = devlink_fmsg_put_name(fmsg, name); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_start); - -int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_nest_end(fmsg); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_end); - -int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg, - const char *name) -{ - int err; - - if (fmsg->putting_binary) - return -EINVAL; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_ARR_NEST_START); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_start); - -int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg) -{ - int err; - - if (fmsg->putting_binary) - return -EINVAL; - - err = devlink_fmsg_nest_end(fmsg); - if (err) - return err; - - err = devlink_fmsg_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_end); - -int devlink_fmsg_binary_pair_nest_start(struct devlink_fmsg *fmsg, - const char *name) -{ - int err; - - err = devlink_fmsg_arr_pair_nest_start(fmsg, name); - if (err) - return err; - - fmsg->putting_binary = true; - return err; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_start); - -int devlink_fmsg_binary_pair_nest_end(struct devlink_fmsg *fmsg) -{ - if (!fmsg->putting_binary) - return -EINVAL; - - fmsg->putting_binary = false; - return devlink_fmsg_arr_pair_nest_end(fmsg); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_nest_end); - -static int devlink_fmsg_put_value(struct devlink_fmsg *fmsg, - const void *value, u16 value_len, - u8 value_nla_type) -{ - struct devlink_fmsg_item *item; - - if (value_len > DEVLINK_FMSG_MAX_SIZE) - return -EMSGSIZE; - - item = kzalloc(sizeof(*item) + value_len, GFP_KERNEL); - if (!item) - return -ENOMEM; - - item->nla_type = value_nla_type; - item->len = value_len; - item->attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; - memcpy(&item->value, value, item->len); - list_add_tail(&item->list, &fmsg->item_list); - - return 0; -} - -static int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_FLAG); -} - -static int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U8); -} - -int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U32); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u32_put); - -static int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U64); -} - -int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value) -{ - if (fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, value, strlen(value) + 1, - NLA_NUL_STRING); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_string_put); - -int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value, - u16 value_len) -{ - if (!fmsg->putting_binary) - return -EINVAL; - - return devlink_fmsg_put_value(fmsg, value, value_len, NLA_BINARY); -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_put); - -int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name, - bool value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_bool_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_bool_pair_put); - -int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name, - u8 value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_u8_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u8_pair_put); - -int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name, - u32 value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_u32_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u32_pair_put); - -int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name, - u64 value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_u64_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_u64_pair_put); - -int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name, - const char *value) -{ - int err; - - err = devlink_fmsg_pair_nest_start(fmsg, name); - if (err) - return err; - - err = devlink_fmsg_string_put(fmsg, value); - if (err) - return err; - - err = devlink_fmsg_pair_nest_end(fmsg); - if (err) - return err; - - return 0; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_string_pair_put); - -int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name, - const void *value, u32 value_len) -{ - u32 data_size; - int end_err; - u32 offset; - int err; - - err = devlink_fmsg_binary_pair_nest_start(fmsg, name); - if (err) - return err; - - for (offset = 0; offset < value_len; offset += data_size) { - data_size = value_len - offset; - if (data_size > DEVLINK_FMSG_MAX_SIZE) - data_size = DEVLINK_FMSG_MAX_SIZE; - err = devlink_fmsg_binary_put(fmsg, value + offset, data_size); - if (err) - break; - /* Exit from loop with a break (instead of - * return) to make sure putting_binary is turned off in - * devlink_fmsg_binary_pair_nest_end - */ - } - - end_err = devlink_fmsg_binary_pair_nest_end(fmsg); - if (end_err) - err = end_err; - - return err; -} -EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_put); - -static int -devlink_fmsg_item_fill_type(struct devlink_fmsg_item *msg, struct sk_buff *skb) -{ - switch (msg->nla_type) { - case NLA_FLAG: - case NLA_U8: - case NLA_U32: - case NLA_U64: - case NLA_NUL_STRING: - case NLA_BINARY: - return nla_put_u8(skb, DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, - msg->nla_type); - default: - return -EINVAL; - } -} - -static int -devlink_fmsg_item_fill_data(struct devlink_fmsg_item *msg, struct sk_buff *skb) -{ - int attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA; - u8 tmp; - - switch (msg->nla_type) { - case NLA_FLAG: - /* Always provide flag data, regardless of its value */ - tmp = *(bool *) msg->value; - - return nla_put_u8(skb, attrtype, tmp); - case NLA_U8: - return nla_put_u8(skb, attrtype, *(u8 *) msg->value); - case NLA_U32: - return nla_put_u32(skb, attrtype, *(u32 *) msg->value); - case NLA_U64: - return nla_put_u64_64bit(skb, attrtype, *(u64 *) msg->value, - DEVLINK_ATTR_PAD); - case NLA_NUL_STRING: - return nla_put_string(skb, attrtype, (char *) &msg->value); - case NLA_BINARY: - return nla_put(skb, attrtype, msg->len, (void *) &msg->value); - default: - return -EINVAL; - } -} - -static int -devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb, - int *start) -{ - struct devlink_fmsg_item *item; - struct nlattr *fmsg_nlattr; - int err = 0; - int i = 0; - - fmsg_nlattr = nla_nest_start_noflag(skb, DEVLINK_ATTR_FMSG); - if (!fmsg_nlattr) - return -EMSGSIZE; - - list_for_each_entry(item, &fmsg->item_list, list) { - if (i < *start) { - i++; - continue; - } - - switch (item->attrtype) { - case DEVLINK_ATTR_FMSG_OBJ_NEST_START: - case DEVLINK_ATTR_FMSG_PAIR_NEST_START: - case DEVLINK_ATTR_FMSG_ARR_NEST_START: - case DEVLINK_ATTR_FMSG_NEST_END: - err = nla_put_flag(skb, item->attrtype); - break; - case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA: - err = devlink_fmsg_item_fill_type(item, skb); - if (err) - break; - err = devlink_fmsg_item_fill_data(item, skb); - break; - case DEVLINK_ATTR_FMSG_OBJ_NAME: - err = nla_put_string(skb, item->attrtype, - (char *) &item->value); - break; - default: - err = -EINVAL; - break; - } - if (!err) - *start = ++i; - else - break; - } - - nla_nest_end(skb, fmsg_nlattr); - return err; -} - -static int devlink_fmsg_snd(struct devlink_fmsg *fmsg, - struct genl_info *info, - enum devlink_command cmd, int flags) -{ - struct nlmsghdr *nlh; - struct sk_buff *skb; - bool last = false; - int index = 0; - void *hdr; - int err; - - while (!last) { - int tmp_index = index; - - skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!skb) - return -ENOMEM; - - hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq, - &devlink_nl_family, flags | NLM_F_MULTI, cmd); - if (!hdr) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - err = devlink_fmsg_prepare_skb(fmsg, skb, &index); - if (!err) - last = true; - else if (err != -EMSGSIZE || tmp_index == index) - goto nla_put_failure; - - genlmsg_end(skb, hdr); - err = genlmsg_reply(skb, info); - if (err) - return err; - } - - skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); - if (!skb) - return -ENOMEM; - nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq, - NLMSG_DONE, 0, flags | NLM_F_MULTI); - if (!nlh) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - return genlmsg_reply(skb, info); - -nla_put_failure: - nlmsg_free(skb); - return err; -} - -static int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, - struct netlink_callback *cb, - enum devlink_command cmd) -{ - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - int index = state->idx; - int tmp_index = index; - void *hdr; - int err; - - hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, - &devlink_nl_family, NLM_F_ACK | NLM_F_MULTI, cmd); - if (!hdr) { - err = -EMSGSIZE; - goto nla_put_failure; - } - - err = devlink_fmsg_prepare_skb(fmsg, skb, &index); - if ((err && err != -EMSGSIZE) || tmp_index == index) - goto nla_put_failure; - - state->idx = index; - genlmsg_end(skb, hdr); - return skb->len; - -nla_put_failure: - genlmsg_cancel(skb, hdr); - return err; -} - static void devlink_health_dump_clear(struct devlink_health_reporter *reporter) { @@ -6031,45 +5440,6 @@ devlink_health_reporter_get_from_cb(struct netlink_callback *cb) return reporter; } -static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - struct devlink_fmsg *fmsg; - int err; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->diagnose) - return -EOPNOTSUPP; - - fmsg = devlink_fmsg_alloc(); - if (!fmsg) - return -ENOMEM; - - err = devlink_fmsg_obj_nest_start(fmsg); - if (err) - goto out; - - err = reporter->ops->diagnose(reporter, fmsg, info->extack); - if (err) - goto out; - - err = devlink_fmsg_obj_nest_end(fmsg); - if (err) - goto out; - - err = devlink_fmsg_snd(fmsg, info, - DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0); - -out: - devlink_fmsg_free(fmsg); - return err; -} - static int devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) -- cgit v1.2.3 From 7004c6c45761143836d3c8122d91264320d87e8e Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:38:02 +0200 Subject: devlink: Move devlink health dump to health file Move devlink health report dump callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 4 ++ net/devlink/health.c | 122 +++++++++++++++++++++++++++++++++++++++++++ net/devlink/leftover.c | 123 -------------------------------------------- 3 files changed, 126 insertions(+), 123 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index a4b96f8a0ab4..ae7229742d66 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -278,3 +278,7 @@ int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, + struct netlink_callback *cb); +int devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, + struct genl_info *info); diff --git a/net/devlink/health.c b/net/devlink/health.c index fbeedc8df043..6991b9405f4f 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -5,6 +5,7 @@ */ #include +#include #include #include "devl_internal.h" @@ -507,6 +508,56 @@ devlink_health_reporter_recover(struct devlink_health_reporter *reporter, return 0; } +static void +devlink_health_dump_clear(struct devlink_health_reporter *reporter) +{ + if (!reporter->dump_fmsg) + return; + devlink_fmsg_free(reporter->dump_fmsg); + reporter->dump_fmsg = NULL; +} + +int devlink_health_do_dump(struct devlink_health_reporter *reporter, + void *priv_ctx, + struct netlink_ext_ack *extack) +{ + int err; + + if (!reporter->ops->dump) + return 0; + + if (reporter->dump_fmsg) + return 0; + + reporter->dump_fmsg = devlink_fmsg_alloc(); + if (!reporter->dump_fmsg) { + err = -ENOMEM; + return err; + } + + err = devlink_fmsg_obj_nest_start(reporter->dump_fmsg); + if (err) + goto dump_err; + + err = reporter->ops->dump(reporter, reporter->dump_fmsg, + priv_ctx, extack); + if (err) + goto dump_err; + + err = devlink_fmsg_obj_nest_end(reporter->dump_fmsg); + if (err) + goto dump_err; + + reporter->dump_ts = jiffies; + reporter->dump_real_ts = ktime_get_real_ns(); + + return 0; + +dump_err: + devlink_health_dump_clear(reporter); + return err; +} + int devlink_health_report(struct devlink_health_reporter *reporter, const char *msg, void *priv_ctx) { @@ -1174,3 +1225,74 @@ out: devlink_fmsg_free(fmsg); return err; } + +static struct devlink_health_reporter * +devlink_health_reporter_get_from_cb(struct netlink_callback *cb) +{ + const struct genl_dumpit_info *info = genl_dumpit_info(cb); + struct devlink_health_reporter *reporter; + struct nlattr **attrs = info->attrs; + struct devlink *devlink; + + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); + if (IS_ERR(devlink)) + return NULL; + devl_unlock(devlink); + + reporter = devlink_health_reporter_get_from_attrs(devlink, attrs); + devlink_put(devlink); + return reporter; +} + +int devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct devlink_nl_dump_state *state = devlink_dump_state(cb); + struct devlink_health_reporter *reporter; + int err; + + reporter = devlink_health_reporter_get_from_cb(cb); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->dump) + return -EOPNOTSUPP; + + mutex_lock(&reporter->dump_lock); + if (!state->idx) { + err = devlink_health_do_dump(reporter, NULL, cb->extack); + if (err) + goto unlock; + state->dump_ts = reporter->dump_ts; + } + if (!reporter->dump_fmsg || state->dump_ts != reporter->dump_ts) { + NL_SET_ERR_MSG(cb->extack, "Dump trampled, please retry"); + err = -EAGAIN; + goto unlock; + } + + err = devlink_fmsg_dumpit(reporter->dump_fmsg, skb, cb, + DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET); +unlock: + mutex_unlock(&reporter->dump_lock); + return err; +} + +int devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->dump) + return -EOPNOTSUPP; + + mutex_lock(&reporter->dump_lock); + devlink_health_dump_clear(reporter); + mutex_unlock(&reporter->dump_lock); + return 0; +} diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index e460bcf1d247..55be664d14ad 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5372,129 +5372,6 @@ out_unlock: return err; } -static void -devlink_health_dump_clear(struct devlink_health_reporter *reporter) -{ - if (!reporter->dump_fmsg) - return; - devlink_fmsg_free(reporter->dump_fmsg); - reporter->dump_fmsg = NULL; -} - -int devlink_health_do_dump(struct devlink_health_reporter *reporter, - void *priv_ctx, - struct netlink_ext_ack *extack) -{ - int err; - - if (!reporter->ops->dump) - return 0; - - if (reporter->dump_fmsg) - return 0; - - reporter->dump_fmsg = devlink_fmsg_alloc(); - if (!reporter->dump_fmsg) { - err = -ENOMEM; - return err; - } - - err = devlink_fmsg_obj_nest_start(reporter->dump_fmsg); - if (err) - goto dump_err; - - err = reporter->ops->dump(reporter, reporter->dump_fmsg, - priv_ctx, extack); - if (err) - goto dump_err; - - err = devlink_fmsg_obj_nest_end(reporter->dump_fmsg); - if (err) - goto dump_err; - - reporter->dump_ts = jiffies; - reporter->dump_real_ts = ktime_get_real_ns(); - - return 0; - -dump_err: - devlink_health_dump_clear(reporter); - return err; -} - -static struct devlink_health_reporter * -devlink_health_reporter_get_from_cb(struct netlink_callback *cb) -{ - const struct genl_dumpit_info *info = genl_dumpit_info(cb); - struct devlink_health_reporter *reporter; - struct nlattr **attrs = info->attrs; - struct devlink *devlink; - - devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); - if (IS_ERR(devlink)) - return NULL; - devl_unlock(devlink); - - reporter = devlink_health_reporter_get_from_attrs(devlink, attrs); - devlink_put(devlink); - return reporter; -} - -static int -devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, - struct netlink_callback *cb) -{ - struct devlink_nl_dump_state *state = devlink_dump_state(cb); - struct devlink_health_reporter *reporter; - int err; - - reporter = devlink_health_reporter_get_from_cb(cb); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->dump) - return -EOPNOTSUPP; - - mutex_lock(&reporter->dump_lock); - if (!state->idx) { - err = devlink_health_do_dump(reporter, NULL, cb->extack); - if (err) - goto unlock; - state->dump_ts = reporter->dump_ts; - } - if (!reporter->dump_fmsg || state->dump_ts != reporter->dump_ts) { - NL_SET_ERR_MSG(cb->extack, "Dump trampled, please retry"); - err = -EAGAIN; - goto unlock; - } - - err = devlink_fmsg_dumpit(reporter->dump_fmsg, skb, cb, - DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET); -unlock: - mutex_unlock(&reporter->dump_lock); - return err; -} - -static int -devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->dump) - return -EOPNOTSUPP; - - mutex_lock(&reporter->dump_lock); - devlink_health_dump_clear(reporter); - mutex_unlock(&reporter->dump_lock); - return 0; -} - static int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, struct genl_info *info) { -- cgit v1.2.3 From c9311ee13f0ebd6d684793f88b91092672c21171 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:38:03 +0200 Subject: devlink: Move devlink health test to health file Move devlink health report test callback from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 2 ++ net/devlink/health.c | 16 ++++++++++++++++ net/devlink/leftover.c | 16 ---------------- 3 files changed, 18 insertions(+), 16 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index ae7229742d66..211f7ea38d6a 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -282,3 +282,5 @@ int devlink_nl_cmd_health_reporter_dump_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); int devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, struct genl_info *info); +int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, + struct genl_info *info); diff --git a/net/devlink/health.c b/net/devlink/health.c index 6991b9405f4f..38ad890bb947 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -1296,3 +1296,19 @@ int devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb, mutex_unlock(&reporter->dump_lock); return 0; } + +int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, + struct genl_info *info) +{ + struct devlink *devlink = info->user_ptr[0]; + struct devlink_health_reporter *reporter; + + reporter = devlink_health_reporter_get_from_info(devlink, info); + if (!reporter) + return -EINVAL; + + if (!reporter->ops->test) + return -EOPNOTSUPP; + + return reporter->ops->test(reporter, info->extack); +} diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 55be664d14ad..dffca2f9bfa7 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -5372,22 +5372,6 @@ out_unlock: return err; } -static int devlink_nl_cmd_health_reporter_test_doit(struct sk_buff *skb, - struct genl_info *info) -{ - struct devlink *devlink = info->user_ptr[0]; - struct devlink_health_reporter *reporter; - - reporter = devlink_health_reporter_get_from_info(devlink, info); - if (!reporter) - return -EINVAL; - - if (!reporter->ops->test) - return -EOPNOTSUPP; - - return reporter->ops->test(reporter, info->extack); -} - struct devlink_stats { u64_stats_t rx_bytes; u64_stats_t rx_packets; -- cgit v1.2.3 From 12af29e7790af811611409194f61d054e54365b9 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 14 Feb 2023 18:38:04 +0200 Subject: devlink: Move health common function to health file Now that all devlink health callbacks and related code are in file health.c move common health functions and devlink_health_reporter struct to be local in health.c file. Signed-off-by: Moshe Shemesh Reviewed-by: Jiri Pirko Reviewed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski --- net/devlink/devl_internal.h | 47 --------------------------------------------- net/devlink/health.c | 45 ++++++++++++++++++++++++++++++------------- 2 files changed, 32 insertions(+), 60 deletions(-) (limited to 'net') diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 211f7ea38d6a..e133f423294a 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -200,53 +200,6 @@ int devlink_resources_validate(struct devlink *devlink, struct devlink_resource *resource, struct genl_info *info); -/* Health */ -struct devlink_health_reporter { - struct list_head list; - void *priv; - const struct devlink_health_reporter_ops *ops; - struct devlink *devlink; - struct devlink_port *devlink_port; - struct devlink_fmsg *dump_fmsg; - struct mutex dump_lock; /* lock parallel read/write from dump buffers */ - u64 graceful_period; - bool auto_recover; - bool auto_dump; - u8 health_state; - u64 dump_ts; - u64 dump_real_ts; - u64 error_count; - u64 recovery_count; - u64 last_recovery_ts; -}; - -struct devlink_health_reporter * -devlink_health_reporter_find_by_name(struct devlink *devlink, - const char *reporter_name); -struct devlink_health_reporter * -devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, - const char *reporter_name); -struct devlink_health_reporter * -devlink_health_reporter_get_from_attrs(struct devlink *devlink, - struct nlattr **attrs); -struct devlink_health_reporter * -devlink_health_reporter_get_from_info(struct devlink *devlink, - struct genl_info *info); -int -devlink_nl_health_reporter_fill(struct sk_buff *msg, - struct devlink_health_reporter *reporter, - enum devlink_command cmd, u32 portid, - u32 seq, int flags); -int devlink_health_do_dump(struct devlink_health_reporter *reporter, - void *priv_ctx, - struct netlink_ext_ack *extack); -int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, - struct netlink_callback *cb, - enum devlink_command cmd); - -struct devlink_fmsg *devlink_fmsg_alloc(void); -void devlink_fmsg_free(struct devlink_fmsg *fmsg); - /* Line cards */ struct devlink_linecard; diff --git a/net/devlink/health.c b/net/devlink/health.c index 38ad890bb947..0839706d5741 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -27,7 +27,7 @@ struct devlink_fmsg { */ }; -struct devlink_fmsg *devlink_fmsg_alloc(void) +static struct devlink_fmsg *devlink_fmsg_alloc(void) { struct devlink_fmsg *fmsg; @@ -40,7 +40,7 @@ struct devlink_fmsg *devlink_fmsg_alloc(void) return fmsg; } -void devlink_fmsg_free(struct devlink_fmsg *fmsg) +static void devlink_fmsg_free(struct devlink_fmsg *fmsg) { struct devlink_fmsg_item *item, *tmp; @@ -51,6 +51,25 @@ void devlink_fmsg_free(struct devlink_fmsg *fmsg) kfree(fmsg); } +struct devlink_health_reporter { + struct list_head list; + void *priv; + const struct devlink_health_reporter_ops *ops; + struct devlink *devlink; + struct devlink_port *devlink_port; + struct devlink_fmsg *dump_fmsg; + struct mutex dump_lock; /* lock parallel read/write from dump buffers */ + u64 graceful_period; + bool auto_recover; + bool auto_dump; + u8 health_state; + u64 dump_ts; + u64 dump_real_ts; + u64 error_count; + u64 recovery_count; + u64 last_recovery_ts; +}; + void * devlink_health_reporter_priv(struct devlink_health_reporter *reporter) { @@ -70,7 +89,7 @@ __devlink_health_reporter_find_by_name(struct list_head *reporter_list, return NULL; } -struct devlink_health_reporter * +static struct devlink_health_reporter * devlink_health_reporter_find_by_name(struct devlink *devlink, const char *reporter_name) { @@ -78,7 +97,7 @@ devlink_health_reporter_find_by_name(struct devlink *devlink, reporter_name); } -struct devlink_health_reporter * +static struct devlink_health_reporter * devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port, const char *reporter_name) { @@ -239,7 +258,7 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter) } EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy); -int +static int devlink_nl_health_reporter_fill(struct sk_buff *msg, struct devlink_health_reporter *reporter, enum devlink_command cmd, u32 portid, @@ -310,7 +329,7 @@ genlmsg_cancel: return -EMSGSIZE; } -struct devlink_health_reporter * +static struct devlink_health_reporter * devlink_health_reporter_get_from_attrs(struct devlink *devlink, struct nlattr **attrs) { @@ -330,7 +349,7 @@ devlink_health_reporter_get_from_attrs(struct devlink *devlink, reporter_name); } -struct devlink_health_reporter * +static struct devlink_health_reporter * devlink_health_reporter_get_from_info(struct devlink *devlink, struct genl_info *info) { @@ -517,9 +536,9 @@ devlink_health_dump_clear(struct devlink_health_reporter *reporter) reporter->dump_fmsg = NULL; } -int devlink_health_do_dump(struct devlink_health_reporter *reporter, - void *priv_ctx, - struct netlink_ext_ack *extack) +static int devlink_health_do_dump(struct devlink_health_reporter *reporter, + void *priv_ctx, + struct netlink_ext_ack *extack) { int err; @@ -1157,9 +1176,9 @@ nla_put_failure: return err; } -int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, - struct netlink_callback *cb, - enum devlink_command cmd) +static int devlink_fmsg_dumpit(struct devlink_fmsg *fmsg, struct sk_buff *skb, + struct netlink_callback *cb, + enum devlink_command cmd) { struct devlink_nl_dump_state *state = devlink_dump_state(cb); int index = state->idx; -- cgit v1.2.3 From 14ade6ba4120fb38fcb5033998882c5b3d591194 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Tue, 14 Feb 2023 10:57:40 -0500 Subject: net: msg_zerocopy: elide page accounting if RLIM_INFINITY MSG_ZEROCOPY ensures that pinned user pages do not exceed the limit. If no limit is set, skip this accounting as otherwise expensive atomic_long operations are called for no reason. This accounting is already skipped for privileged (CAP_IPC_LOCK) users. Rely on the same mechanism: if no mmp->user is set, mm_unaccount_pinned_pages does not decrement either. Tested by running tools/testing/selftests/net/msg_zerocopy.sh with an unprivileged user for the TXMODE binary: ip netns exec "${NS1}" sudo -u "{$USER}" "${BIN}" "-${IP}" ... Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20230214155740.3448763-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski --- net/core/skbuff.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 13ea10cf8544..98ebce9f6a51 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1406,14 +1406,18 @@ EXPORT_SYMBOL_GPL(skb_morph); int mm_account_pinned_pages(struct mmpin *mmp, size_t size) { - unsigned long max_pg, num_pg, new_pg, old_pg; + unsigned long max_pg, num_pg, new_pg, old_pg, rlim; struct user_struct *user; if (capable(CAP_IPC_LOCK) || !size) return 0; + rlim = rlimit(RLIMIT_MEMLOCK); + if (rlim == RLIM_INFINITY) + return 0; + num_pg = (size >> PAGE_SHIFT) + 2; /* worst case */ - max_pg = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + max_pg = rlim >> PAGE_SHIFT; user = mmp->user ? : current_user(); old_pg = atomic_long_read(&user->locked_vm); -- cgit v1.2.3 From 051d442098421c28c7951625652f61b1e15c4bd5 Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Tue, 14 Feb 2023 08:49:11 -0500 Subject: net/sched: Retire CBQ qdisc While this amazing qdisc has served us well over the years it has not been getting any tender love and care and has bitrotted over time. It has become mostly a shooting target for syzkaller lately. For this reason, we are retiring it. Goodbye CBQ - we loved you. Signed-off-by: Jamal Hadi Salim Acked-by: Jiri Pirko Signed-off-by: Paolo Abeni --- net/sched/Kconfig | 17 - net/sched/Makefile | 1 - net/sched/sch_cbq.c | 1727 -------------------- .../selftests/tc-testing/tc-tests/qdiscs/cbq.json | 184 --- 4 files changed, 1929 deletions(-) delete mode 100644 net/sched/sch_cbq.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/cbq.json (limited to 'net') diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 4f7b52f5a11c..960e30b6b746 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -45,23 +45,6 @@ if NET_SCHED comment "Queueing/Scheduling" -config NET_SCH_CBQ - tristate "Class Based Queueing (CBQ)" - help - Say Y here if you want to use the Class-Based Queueing (CBQ) packet - scheduling algorithm. This algorithm classifies the waiting packets - into a tree-like hierarchy of classes; the leaves of this tree are - in turn scheduled by separate algorithms. - - See the top of for more details. - - CBQ is a commonly used scheduler, so if you're unsure, you should - say Y here. Then say Y to all the queueing algorithms below that you - want to use as leaf disciplines. - - To compile this code as a module, choose M here: the - module will be called sch_cbq. - config NET_SCH_HTB tristate "Hierarchical Token Bucket (HTB)" help diff --git a/net/sched/Makefile b/net/sched/Makefile index 7911eec09837..9cc7d74b9c4a 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -33,7 +33,6 @@ obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o obj-$(CONFIG_NET_ACT_CT) += act_ct.o obj-$(CONFIG_NET_ACT_GATE) += act_gate.o obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o -obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o obj-$(CONFIG_NET_SCH_HFSC) += sch_hfsc.o obj-$(CONFIG_NET_SCH_RED) += sch_red.o diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c deleted file mode 100644 index 36db5f6782f2..000000000000 --- a/net/sched/sch_cbq.c +++ /dev/null @@ -1,1727 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * net/sched/sch_cbq.c Class-Based Queueing discipline. - * - * Authors: Alexey Kuznetsov, - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - - -/* Class-Based Queueing (CBQ) algorithm. - ======================================= - - Sources: [1] Sally Floyd and Van Jacobson, "Link-sharing and Resource - Management Models for Packet Networks", - IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995 - - [2] Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995 - - [3] Sally Floyd, "Notes on Class-Based Queueing: Setting - Parameters", 1996 - - [4] Sally Floyd and Michael Speer, "Experimental Results - for Class-Based Queueing", 1998, not published. - - ----------------------------------------------------------------------- - - Algorithm skeleton was taken from NS simulator cbq.cc. - If someone wants to check this code against the LBL version, - he should take into account that ONLY the skeleton was borrowed, - the implementation is different. Particularly: - - --- The WRR algorithm is different. Our version looks more - reasonable (I hope) and works when quanta are allowed to be - less than MTU, which is always the case when real time classes - have small rates. Note, that the statement of [3] is - incomplete, delay may actually be estimated even if class - per-round allotment is less than MTU. Namely, if per-round - allotment is W*r_i, and r_1+...+r_k = r < 1 - - delay_i <= ([MTU/(W*r_i)]*W*r + W*r + k*MTU)/B - - In the worst case we have IntServ estimate with D = W*r+k*MTU - and C = MTU*r. The proof (if correct at all) is trivial. - - - --- It seems that cbq-2.0 is not very accurate. At least, I cannot - interpret some places, which look like wrong translations - from NS. Anyone is advised to find these differences - and explain to me, why I am wrong 8). - - --- Linux has no EOI event, so that we cannot estimate true class - idle time. Workaround is to consider the next dequeue event - as sign that previous packet is finished. This is wrong because of - internal device queueing, but on a permanently loaded link it is true. - Moreover, combined with clock integrator, this scheme looks - very close to an ideal solution. */ - -struct cbq_sched_data; - - -struct cbq_class { - struct Qdisc_class_common common; - struct cbq_class *next_alive; /* next class with backlog in this priority band */ - -/* Parameters */ - unsigned char priority; /* class priority */ - unsigned char priority2; /* priority to be used after overlimit */ - unsigned char ewma_log; /* time constant for idle time calculation */ - - u32 defmap; - - /* Link-sharing scheduler parameters */ - long maxidle; /* Class parameters: see below. */ - long offtime; - long minidle; - u32 avpkt; - struct qdisc_rate_table *R_tab; - - /* General scheduler (WRR) parameters */ - long allot; - long quantum; /* Allotment per WRR round */ - long weight; /* Relative allotment: see below */ - - struct Qdisc *qdisc; /* Ptr to CBQ discipline */ - struct cbq_class *split; /* Ptr to split node */ - struct cbq_class *share; /* Ptr to LS parent in the class tree */ - struct cbq_class *tparent; /* Ptr to tree parent in the class tree */ - struct cbq_class *borrow; /* NULL if class is bandwidth limited; - parent otherwise */ - struct cbq_class *sibling; /* Sibling chain */ - struct cbq_class *children; /* Pointer to children chain */ - - struct Qdisc *q; /* Elementary queueing discipline */ - - -/* Variables */ - unsigned char cpriority; /* Effective priority */ - unsigned char delayed; - unsigned char level; /* level of the class in hierarchy: - 0 for leaf classes, and maximal - level of children + 1 for nodes. - */ - - psched_time_t last; /* Last end of service */ - psched_time_t undertime; - long avgidle; - long deficit; /* Saved deficit for WRR */ - psched_time_t penalized; - struct gnet_stats_basic_sync bstats; - struct gnet_stats_queue qstats; - struct net_rate_estimator __rcu *rate_est; - struct tc_cbq_xstats xstats; - - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - - int filters; - - struct cbq_class *defaults[TC_PRIO_MAX + 1]; -}; - -struct cbq_sched_data { - struct Qdisc_class_hash clhash; /* Hash table of all classes */ - int nclasses[TC_CBQ_MAXPRIO + 1]; - unsigned int quanta[TC_CBQ_MAXPRIO + 1]; - - struct cbq_class link; - - unsigned int activemask; - struct cbq_class *active[TC_CBQ_MAXPRIO + 1]; /* List of all classes - with backlog */ - -#ifdef CONFIG_NET_CLS_ACT - struct cbq_class *rx_class; -#endif - struct cbq_class *tx_class; - struct cbq_class *tx_borrowed; - int tx_len; - psched_time_t now; /* Cached timestamp */ - unsigned int pmask; - - struct qdisc_watchdog watchdog; /* Watchdog timer, - started when CBQ has - backlog, but cannot - transmit just now */ - psched_tdiff_t wd_expires; - int toplevel; - u32 hgenerator; -}; - - -#define L2T(cl, len) qdisc_l2t((cl)->R_tab, len) - -static inline struct cbq_class * -cbq_class_lookup(struct cbq_sched_data *q, u32 classid) -{ - struct Qdisc_class_common *clc; - - clc = qdisc_class_find(&q->clhash, classid); - if (clc == NULL) - return NULL; - return container_of(clc, struct cbq_class, common); -} - -#ifdef CONFIG_NET_CLS_ACT - -static struct cbq_class * -cbq_reclassify(struct sk_buff *skb, struct cbq_class *this) -{ - struct cbq_class *cl; - - for (cl = this->tparent; cl; cl = cl->tparent) { - struct cbq_class *new = cl->defaults[TC_PRIO_BESTEFFORT]; - - if (new != NULL && new != this) - return new; - } - return NULL; -} - -#endif - -/* Classify packet. The procedure is pretty complicated, but - * it allows us to combine link sharing and priority scheduling - * transparently. - * - * Namely, you can put link sharing rules (f.e. route based) at root of CBQ, - * so that it resolves to split nodes. Then packets are classified - * by logical priority, or a more specific classifier may be attached - * to the split node. - */ - -static struct cbq_class * -cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *head = &q->link; - struct cbq_class **defmap; - struct cbq_class *cl = NULL; - u32 prio = skb->priority; - struct tcf_proto *fl; - struct tcf_result res; - - /* - * Step 1. If skb->priority points to one of our classes, use it. - */ - if (TC_H_MAJ(prio ^ sch->handle) == 0 && - (cl = cbq_class_lookup(q, prio)) != NULL) - return cl; - - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; - for (;;) { - int result = 0; - defmap = head->defaults; - - fl = rcu_dereference_bh(head->filter_list); - /* - * Step 2+n. Apply classifier. - */ - result = tcf_classify(skb, NULL, fl, &res, true); - if (!fl || result < 0) - goto fallback; - if (result == TC_ACT_SHOT) - return NULL; - - cl = (void *)res.class; - if (!cl) { - if (TC_H_MAJ(res.classid)) - cl = cbq_class_lookup(q, res.classid); - else if ((cl = defmap[res.classid & TC_PRIO_MAX]) == NULL) - cl = defmap[TC_PRIO_BESTEFFORT]; - - if (cl == NULL) - goto fallback; - } - if (cl->level >= head->level) - goto fallback; -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - fallthrough; - case TC_ACT_RECLASSIFY: - return cbq_reclassify(skb, cl); - } -#endif - if (cl->level == 0) - return cl; - - /* - * Step 3+n. If classifier selected a link sharing class, - * apply agency specific classifier. - * Repeat this procedure until we hit a leaf node. - */ - head = cl; - } - -fallback: - cl = head; - - /* - * Step 4. No success... - */ - if (TC_H_MAJ(prio) == 0 && - !(cl = head->defaults[prio & TC_PRIO_MAX]) && - !(cl = head->defaults[TC_PRIO_BESTEFFORT])) - return head; - - return cl; -} - -/* - * A packet has just been enqueued on the empty class. - * cbq_activate_class adds it to the tail of active class list - * of its priority band. - */ - -static inline void cbq_activate_class(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - int prio = cl->cpriority; - struct cbq_class *cl_tail; - - cl_tail = q->active[prio]; - q->active[prio] = cl; - - if (cl_tail != NULL) { - cl->next_alive = cl_tail->next_alive; - cl_tail->next_alive = cl; - } else { - cl->next_alive = cl; - q->activemask |= (1<qdisc); - int prio = this->cpriority; - struct cbq_class *cl; - struct cbq_class *cl_prev = q->active[prio]; - - do { - cl = cl_prev->next_alive; - if (cl == this) { - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - - if (cl == q->active[prio]) { - q->active[prio] = cl_prev; - if (cl == q->active[prio]) { - q->active[prio] = NULL; - q->activemask &= ~(1<active[prio]); -} - -static void -cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) -{ - int toplevel = q->toplevel; - - if (toplevel > cl->level) { - psched_time_t now = psched_get_time(); - - do { - if (cl->undertime < now) { - q->toplevel = cl->level; - return; - } - } while ((cl = cl->borrow) != NULL && toplevel > cl->level); - } -} - -static int -cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - int ret; - struct cbq_class *cl = cbq_classify(skb, sch, &ret); - -#ifdef CONFIG_NET_CLS_ACT - q->rx_class = cl; -#endif - if (cl == NULL) { - if (ret & __NET_XMIT_BYPASS) - qdisc_qstats_drop(sch); - __qdisc_drop(skb, to_free); - return ret; - } - - ret = qdisc_enqueue(skb, cl->q, to_free); - if (ret == NET_XMIT_SUCCESS) { - sch->q.qlen++; - cbq_mark_toplevel(q, cl); - if (!cl->next_alive) - cbq_activate_class(cl); - return ret; - } - - if (net_xmit_drop_count(ret)) { - qdisc_qstats_drop(sch); - cbq_mark_toplevel(q, cl); - cl->qstats.drops++; - } - return ret; -} - -/* Overlimit action: penalize leaf class by adding offtime */ -static void cbq_overlimit(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - psched_tdiff_t delay = cl->undertime - q->now; - - if (!cl->delayed) { - delay += cl->offtime; - - /* - * Class goes to sleep, so that it will have no - * chance to work avgidle. Let's forgive it 8) - * - * BTW cbq-2.0 has a crap in this - * place, apparently they forgot to shift it by cl->ewma_log. - */ - if (cl->avgidle < 0) - delay -= (-cl->avgidle) - ((-cl->avgidle) >> cl->ewma_log); - if (cl->avgidle < cl->minidle) - cl->avgidle = cl->minidle; - if (delay <= 0) - delay = 1; - cl->undertime = q->now + delay; - - cl->xstats.overactions++; - cl->delayed = 1; - } - if (q->wd_expires == 0 || q->wd_expires > delay) - q->wd_expires = delay; - - /* Dirty work! We must schedule wakeups based on - * real available rate, rather than leaf rate, - * which may be tiny (even zero). - */ - if (q->toplevel == TC_CBQ_MAXLEVEL) { - struct cbq_class *b; - psched_tdiff_t base_delay = q->wd_expires; - - for (b = cl->borrow; b; b = b->borrow) { - delay = b->undertime - q->now; - if (delay < base_delay) { - if (delay <= 0) - delay = 1; - base_delay = delay; - } - } - - q->wd_expires = base_delay; - } -} - -/* - * It is mission critical procedure. - * - * We "regenerate" toplevel cutoff, if transmitting class - * has backlog and it is not regulated. It is not part of - * original CBQ description, but looks more reasonable. - * Probably, it is wrong. This question needs further investigation. - */ - -static inline void -cbq_update_toplevel(struct cbq_sched_data *q, struct cbq_class *cl, - struct cbq_class *borrowed) -{ - if (cl && q->toplevel >= borrowed->level) { - if (cl->q->q.qlen > 1) { - do { - if (borrowed->undertime == PSCHED_PASTPERFECT) { - q->toplevel = borrowed->level; - return; - } - } while ((borrowed = borrowed->borrow) != NULL); - } -#if 0 - /* It is not necessary now. Uncommenting it - will save CPU cycles, but decrease fairness. - */ - q->toplevel = TC_CBQ_MAXLEVEL; -#endif - } -} - -static void -cbq_update(struct cbq_sched_data *q) -{ - struct cbq_class *this = q->tx_class; - struct cbq_class *cl = this; - int len = q->tx_len; - psched_time_t now; - - q->tx_class = NULL; - /* Time integrator. We calculate EOS time - * by adding expected packet transmission time. - */ - now = q->now + L2T(&q->link, len); - - for ( ; cl; cl = cl->share) { - long avgidle = cl->avgidle; - long idle; - - _bstats_update(&cl->bstats, len, 1); - - /* - * (now - last) is total time between packet right edges. - * (last_pktlen/rate) is "virtual" busy time, so that - * - * idle = (now - last) - last_pktlen/rate - */ - - idle = now - cl->last; - if ((unsigned long)idle > 128*1024*1024) { - avgidle = cl->maxidle; - } else { - idle -= L2T(cl, len); - - /* true_avgidle := (1-W)*true_avgidle + W*idle, - * where W=2^{-ewma_log}. But cl->avgidle is scaled: - * cl->avgidle == true_avgidle/W, - * hence: - */ - avgidle += idle - (avgidle>>cl->ewma_log); - } - - if (avgidle <= 0) { - /* Overlimit or at-limit */ - - if (avgidle < cl->minidle) - avgidle = cl->minidle; - - cl->avgidle = avgidle; - - /* Calculate expected time, when this class - * will be allowed to send. - * It will occur, when: - * (1-W)*true_avgidle + W*delay = 0, i.e. - * idle = (1/W - 1)*(-true_avgidle) - * or - * idle = (1 - W)*(-cl->avgidle); - */ - idle = (-avgidle) - ((-avgidle) >> cl->ewma_log); - - /* - * That is not all. - * To maintain the rate allocated to the class, - * we add to undertime virtual clock, - * necessary to complete transmitted packet. - * (len/phys_bandwidth has been already passed - * to the moment of cbq_update) - */ - - idle -= L2T(&q->link, len); - idle += L2T(cl, len); - - cl->undertime = now + idle; - } else { - /* Underlimit */ - - cl->undertime = PSCHED_PASTPERFECT; - if (avgidle > cl->maxidle) - cl->avgidle = cl->maxidle; - else - cl->avgidle = avgidle; - } - if ((s64)(now - cl->last) > 0) - cl->last = now; - } - - cbq_update_toplevel(q, this, q->tx_borrowed); -} - -static inline struct cbq_class * -cbq_under_limit(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - struct cbq_class *this_cl = cl; - - if (cl->tparent == NULL) - return cl; - - if (cl->undertime == PSCHED_PASTPERFECT || q->now >= cl->undertime) { - cl->delayed = 0; - return cl; - } - - do { - /* It is very suspicious place. Now overlimit - * action is generated for not bounded classes - * only if link is completely congested. - * Though it is in agree with ancestor-only paradigm, - * it looks very stupid. Particularly, - * it means that this chunk of code will either - * never be called or result in strong amplification - * of burstiness. Dangerous, silly, and, however, - * no another solution exists. - */ - cl = cl->borrow; - if (!cl) { - this_cl->qstats.overlimits++; - cbq_overlimit(this_cl); - return NULL; - } - if (cl->level > q->toplevel) - return NULL; - } while (cl->undertime != PSCHED_PASTPERFECT && q->now < cl->undertime); - - cl->delayed = 0; - return cl; -} - -static inline struct sk_buff * -cbq_dequeue_prio(struct Qdisc *sch, int prio) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl_tail, *cl_prev, *cl; - struct sk_buff *skb; - int deficit; - - cl_tail = cl_prev = q->active[prio]; - cl = cl_prev->next_alive; - - do { - deficit = 0; - - /* Start round */ - do { - struct cbq_class *borrow = cl; - - if (cl->q->q.qlen && - (borrow = cbq_under_limit(cl)) == NULL) - goto skip_class; - - if (cl->deficit <= 0) { - /* Class exhausted its allotment per - * this round. Switch to the next one. - */ - deficit = 1; - cl->deficit += cl->quantum; - goto next_class; - } - - skb = cl->q->dequeue(cl->q); - - /* Class did not give us any skb :-( - * It could occur even if cl->q->q.qlen != 0 - * f.e. if cl->q == "tbf" - */ - if (skb == NULL) - goto skip_class; - - cl->deficit -= qdisc_pkt_len(skb); - q->tx_class = cl; - q->tx_borrowed = borrow; - if (borrow != cl) { -#ifndef CBQ_XSTATS_BORROWS_BYTES - borrow->xstats.borrows++; - cl->xstats.borrows++; -#else - borrow->xstats.borrows += qdisc_pkt_len(skb); - cl->xstats.borrows += qdisc_pkt_len(skb); -#endif - } - q->tx_len = qdisc_pkt_len(skb); - - if (cl->deficit <= 0) { - q->active[prio] = cl; - cl = cl->next_alive; - cl->deficit += cl->quantum; - } - return skb; - -skip_class: - if (cl->q->q.qlen == 0 || prio != cl->cpriority) { - /* Class is empty or penalized. - * Unlink it from active chain. - */ - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - - /* Did cl_tail point to it? */ - if (cl == cl_tail) { - /* Repair it! */ - cl_tail = cl_prev; - - /* Was it the last class in this band? */ - if (cl == cl_tail) { - /* Kill the band! */ - q->active[prio] = NULL; - q->activemask &= ~(1<q->q.qlen) - cbq_activate_class(cl); - return NULL; - } - - q->active[prio] = cl_tail; - } - if (cl->q->q.qlen) - cbq_activate_class(cl); - - cl = cl_prev; - } - -next_class: - cl_prev = cl; - cl = cl->next_alive; - } while (cl_prev != cl_tail); - } while (deficit); - - q->active[prio] = cl_prev; - - return NULL; -} - -static inline struct sk_buff * -cbq_dequeue_1(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct sk_buff *skb; - unsigned int activemask; - - activemask = q->activemask & 0xFF; - while (activemask) { - int prio = ffz(~activemask); - activemask &= ~(1<tx_class) - cbq_update(q); - - q->now = now; - - for (;;) { - q->wd_expires = 0; - - skb = cbq_dequeue_1(sch); - if (skb) { - qdisc_bstats_update(sch, skb); - sch->q.qlen--; - return skb; - } - - /* All the classes are overlimit. - * - * It is possible, if: - * - * 1. Scheduler is empty. - * 2. Toplevel cutoff inhibited borrowing. - * 3. Root class is overlimit. - * - * Reset 2d and 3d conditions and retry. - * - * Note, that NS and cbq-2.0 are buggy, peeking - * an arbitrary class is appropriate for ancestor-only - * sharing, but not for toplevel algorithm. - * - * Our version is better, but slower, because it requires - * two passes, but it is unavoidable with top-level sharing. - */ - - if (q->toplevel == TC_CBQ_MAXLEVEL && - q->link.undertime == PSCHED_PASTPERFECT) - break; - - q->toplevel = TC_CBQ_MAXLEVEL; - q->link.undertime = PSCHED_PASTPERFECT; - } - - /* No packets in scheduler or nobody wants to give them to us :-( - * Sigh... start watchdog timer in the last case. - */ - - if (sch->q.qlen) { - qdisc_qstats_overlimit(sch); - if (q->wd_expires) - qdisc_watchdog_schedule(&q->watchdog, - now + q->wd_expires); - } - return NULL; -} - -/* CBQ class maintenance routines */ - -static void cbq_adjust_levels(struct cbq_class *this) -{ - if (this == NULL) - return; - - do { - int level = 0; - struct cbq_class *cl; - - cl = this->children; - if (cl) { - do { - if (cl->level > level) - level = cl->level; - } while ((cl = cl->sibling) != this->children); - } - this->level = level + 1; - } while ((this = this->tparent) != NULL); -} - -static void cbq_normalize_quanta(struct cbq_sched_data *q, int prio) -{ - struct cbq_class *cl; - unsigned int h; - - if (q->quanta[prio] == 0) - return; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - /* BUGGGG... Beware! This expression suffer of - * arithmetic overflows! - */ - if (cl->priority == prio) { - cl->quantum = (cl->weight*cl->allot*q->nclasses[prio])/ - q->quanta[prio]; - } - if (cl->quantum <= 0 || - cl->quantum > 32*qdisc_dev(cl->qdisc)->mtu) { - pr_warn("CBQ: class %08x has bad quantum==%ld, repaired.\n", - cl->common.classid, cl->quantum); - cl->quantum = qdisc_dev(cl->qdisc)->mtu/2 + 1; - } - } - } -} - -static void cbq_sync_defmap(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - struct cbq_class *split = cl->split; - unsigned int h; - int i; - - if (split == NULL) - return; - - for (i = 0; i <= TC_PRIO_MAX; i++) { - if (split->defaults[i] == cl && !(cl->defmap & (1<defaults[i] = NULL; - } - - for (i = 0; i <= TC_PRIO_MAX; i++) { - int level = split->level; - - if (split->defaults[i]) - continue; - - for (h = 0; h < q->clhash.hashsize; h++) { - struct cbq_class *c; - - hlist_for_each_entry(c, &q->clhash.hash[h], - common.hnode) { - if (c->split == split && c->level < level && - c->defmap & (1<defaults[i] = c; - level = c->level; - } - } - } - } -} - -static void cbq_change_defmap(struct cbq_class *cl, u32 splitid, u32 def, u32 mask) -{ - struct cbq_class *split = NULL; - - if (splitid == 0) { - split = cl->split; - if (!split) - return; - splitid = split->common.classid; - } - - if (split == NULL || split->common.classid != splitid) { - for (split = cl->tparent; split; split = split->tparent) - if (split->common.classid == splitid) - break; - } - - if (split == NULL) - return; - - if (cl->split != split) { - cl->defmap = 0; - cbq_sync_defmap(cl); - cl->split = split; - cl->defmap = def & mask; - } else - cl->defmap = (cl->defmap & ~mask) | (def & mask); - - cbq_sync_defmap(cl); -} - -static void cbq_unlink_class(struct cbq_class *this) -{ - struct cbq_class *cl, **clp; - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - - qdisc_class_hash_remove(&q->clhash, &this->common); - - if (this->tparent) { - clp = &this->sibling; - cl = *clp; - do { - if (cl == this) { - *clp = cl->sibling; - break; - } - clp = &cl->sibling; - } while ((cl = *clp) != this->sibling); - - if (this->tparent->children == this) { - this->tparent->children = this->sibling; - if (this->sibling == this) - this->tparent->children = NULL; - } - } else { - WARN_ON(this->sibling != this); - } -} - -static void cbq_link_class(struct cbq_class *this) -{ - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - struct cbq_class *parent = this->tparent; - - this->sibling = this; - qdisc_class_hash_insert(&q->clhash, &this->common); - - if (parent == NULL) - return; - - if (parent->children == NULL) { - parent->children = this; - } else { - this->sibling = parent->children->sibling; - parent->children->sibling = this; - } -} - -static void -cbq_reset(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl; - int prio; - unsigned int h; - - q->activemask = 0; - q->pmask = 0; - q->tx_class = NULL; - q->tx_borrowed = NULL; - qdisc_watchdog_cancel(&q->watchdog); - q->toplevel = TC_CBQ_MAXLEVEL; - q->now = psched_get_time(); - - for (prio = 0; prio <= TC_CBQ_MAXPRIO; prio++) - q->active[prio] = NULL; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - qdisc_reset(cl->q); - - cl->next_alive = NULL; - cl->undertime = PSCHED_PASTPERFECT; - cl->avgidle = cl->maxidle; - cl->deficit = cl->quantum; - cl->cpriority = cl->priority; - } - } -} - - -static void cbq_set_lss(struct cbq_class *cl, struct tc_cbq_lssopt *lss) -{ - if (lss->change & TCF_CBQ_LSS_FLAGS) { - cl->share = (lss->flags & TCF_CBQ_LSS_ISOLATED) ? NULL : cl->tparent; - cl->borrow = (lss->flags & TCF_CBQ_LSS_BOUNDED) ? NULL : cl->tparent; - } - if (lss->change & TCF_CBQ_LSS_EWMA) - cl->ewma_log = lss->ewma_log; - if (lss->change & TCF_CBQ_LSS_AVPKT) - cl->avpkt = lss->avpkt; - if (lss->change & TCF_CBQ_LSS_MINIDLE) - cl->minidle = -(long)lss->minidle; - if (lss->change & TCF_CBQ_LSS_MAXIDLE) { - cl->maxidle = lss->maxidle; - cl->avgidle = lss->maxidle; - } - if (lss->change & TCF_CBQ_LSS_OFFTIME) - cl->offtime = lss->offtime; -} - -static void cbq_rmprio(struct cbq_sched_data *q, struct cbq_class *cl) -{ - q->nclasses[cl->priority]--; - q->quanta[cl->priority] -= cl->weight; - cbq_normalize_quanta(q, cl->priority); -} - -static void cbq_addprio(struct cbq_sched_data *q, struct cbq_class *cl) -{ - q->nclasses[cl->priority]++; - q->quanta[cl->priority] += cl->weight; - cbq_normalize_quanta(q, cl->priority); -} - -static int cbq_set_wrr(struct cbq_class *cl, struct tc_cbq_wrropt *wrr) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - - if (wrr->allot) - cl->allot = wrr->allot; - if (wrr->weight) - cl->weight = wrr->weight; - if (wrr->priority) { - cl->priority = wrr->priority - 1; - cl->cpriority = cl->priority; - if (cl->priority >= cl->priority2) - cl->priority2 = TC_CBQ_MAXPRIO - 1; - } - - cbq_addprio(q, cl); - return 0; -} - -static int cbq_set_fopt(struct cbq_class *cl, struct tc_cbq_fopt *fopt) -{ - cbq_change_defmap(cl, fopt->split, fopt->defmap, fopt->defchange); - return 0; -} - -static const struct nla_policy cbq_policy[TCA_CBQ_MAX + 1] = { - [TCA_CBQ_LSSOPT] = { .len = sizeof(struct tc_cbq_lssopt) }, - [TCA_CBQ_WRROPT] = { .len = sizeof(struct tc_cbq_wrropt) }, - [TCA_CBQ_FOPT] = { .len = sizeof(struct tc_cbq_fopt) }, - [TCA_CBQ_OVL_STRATEGY] = { .len = sizeof(struct tc_cbq_ovl) }, - [TCA_CBQ_RATE] = { .len = sizeof(struct tc_ratespec) }, - [TCA_CBQ_RTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, - [TCA_CBQ_POLICE] = { .len = sizeof(struct tc_cbq_police) }, -}; - -static int cbq_opt_parse(struct nlattr *tb[TCA_CBQ_MAX + 1], - struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - int err; - - if (!opt) { - NL_SET_ERR_MSG(extack, "CBQ options are required for this operation"); - return -EINVAL; - } - - err = nla_parse_nested_deprecated(tb, TCA_CBQ_MAX, opt, - cbq_policy, extack); - if (err < 0) - return err; - - if (tb[TCA_CBQ_WRROPT]) { - const struct tc_cbq_wrropt *wrr = nla_data(tb[TCA_CBQ_WRROPT]); - - if (wrr->priority > TC_CBQ_MAXPRIO) { - NL_SET_ERR_MSG(extack, "priority is bigger than TC_CBQ_MAXPRIO"); - err = -EINVAL; - } - } - return err; -} - -static int cbq_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct nlattr *tb[TCA_CBQ_MAX + 1]; - struct tc_ratespec *r; - int err; - - qdisc_watchdog_init(&q->watchdog, sch); - - err = cbq_opt_parse(tb, opt, extack); - if (err < 0) - return err; - - if (!tb[TCA_CBQ_RTAB] || !tb[TCA_CBQ_RATE]) { - NL_SET_ERR_MSG(extack, "Rate specification missing or incomplete"); - return -EINVAL; - } - - r = nla_data(tb[TCA_CBQ_RATE]); - - q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB], extack); - if (!q->link.R_tab) - return -EINVAL; - - err = tcf_block_get(&q->link.block, &q->link.filter_list, sch, extack); - if (err) - goto put_rtab; - - err = qdisc_class_hash_init(&q->clhash); - if (err < 0) - goto put_block; - - q->link.sibling = &q->link; - q->link.common.classid = sch->handle; - q->link.qdisc = sch; - q->link.q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - sch->handle, NULL); - if (!q->link.q) - q->link.q = &noop_qdisc; - else - qdisc_hash_add(q->link.q, true); - - q->link.priority = TC_CBQ_MAXPRIO - 1; - q->link.priority2 = TC_CBQ_MAXPRIO - 1; - q->link.cpriority = TC_CBQ_MAXPRIO - 1; - q->link.allot = psched_mtu(qdisc_dev(sch)); - q->link.quantum = q->link.allot; - q->link.weight = q->link.R_tab->rate.rate; - - q->link.ewma_log = TC_CBQ_DEF_EWMA; - q->link.avpkt = q->link.allot/2; - q->link.minidle = -0x7FFFFFFF; - - q->toplevel = TC_CBQ_MAXLEVEL; - q->now = psched_get_time(); - - cbq_link_class(&q->link); - - if (tb[TCA_CBQ_LSSOPT]) - cbq_set_lss(&q->link, nla_data(tb[TCA_CBQ_LSSOPT])); - - cbq_addprio(q, &q->link); - return 0; - -put_block: - tcf_block_put(q->link.block); - -put_rtab: - qdisc_put_rtab(q->link.R_tab); - return err; -} - -static int cbq_dump_rate(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - - if (nla_put(skb, TCA_CBQ_RATE, sizeof(cl->R_tab->rate), &cl->R_tab->rate)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_lss(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_lssopt opt; - - opt.flags = 0; - if (cl->borrow == NULL) - opt.flags |= TCF_CBQ_LSS_BOUNDED; - if (cl->share == NULL) - opt.flags |= TCF_CBQ_LSS_ISOLATED; - opt.ewma_log = cl->ewma_log; - opt.level = cl->level; - opt.avpkt = cl->avpkt; - opt.maxidle = cl->maxidle; - opt.minidle = (u32)(-cl->minidle); - opt.offtime = cl->offtime; - opt.change = ~0; - if (nla_put(skb, TCA_CBQ_LSSOPT, sizeof(opt), &opt)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_wrr(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_wrropt opt; - - memset(&opt, 0, sizeof(opt)); - opt.flags = 0; - opt.allot = cl->allot; - opt.priority = cl->priority + 1; - opt.cpriority = cl->cpriority + 1; - opt.weight = cl->weight; - if (nla_put(skb, TCA_CBQ_WRROPT, sizeof(opt), &opt)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_fopt(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_fopt opt; - - if (cl->split || cl->defmap) { - opt.split = cl->split ? cl->split->common.classid : 0; - opt.defmap = cl->defmap; - opt.defchange = ~0; - if (nla_put(skb, TCA_CBQ_FOPT, sizeof(opt), &opt)) - goto nla_put_failure; - } - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_attr(struct sk_buff *skb, struct cbq_class *cl) -{ - if (cbq_dump_lss(skb, cl) < 0 || - cbq_dump_rate(skb, cl) < 0 || - cbq_dump_wrr(skb, cl) < 0 || - cbq_dump_fopt(skb, cl) < 0) - return -1; - return 0; -} - -static int cbq_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct nlattr *nest; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - if (cbq_dump_attr(skb, &q->link) < 0) - goto nla_put_failure; - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static int -cbq_dump_stats(struct Qdisc *sch, struct gnet_dump *d) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - q->link.xstats.avgidle = q->link.avgidle; - return gnet_stats_copy_app(d, &q->link.xstats, sizeof(q->link.xstats)); -} - -static int -cbq_dump_class(struct Qdisc *sch, unsigned long arg, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - struct nlattr *nest; - - if (cl->tparent) - tcm->tcm_parent = cl->tparent->common.classid; - else - tcm->tcm_parent = TC_H_ROOT; - tcm->tcm_handle = cl->common.classid; - tcm->tcm_info = cl->q->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - if (cbq_dump_attr(skb, cl) < 0) - goto nla_put_failure; - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static int -cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg, - struct gnet_dump *d) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - __u32 qlen; - - cl->xstats.avgidle = cl->avgidle; - cl->xstats.undertime = 0; - qdisc_qstats_qlen_backlog(cl->q, &qlen, &cl->qstats.backlog); - - if (cl->undertime != PSCHED_PASTPERFECT) - cl->xstats.undertime = cl->undertime - q->now; - - if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 || - gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 || - gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0) - return -1; - - return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats)); -} - -static int cbq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, - struct Qdisc **old, struct netlink_ext_ack *extack) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - if (new == NULL) { - new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - cl->common.classid, extack); - if (new == NULL) - return -ENOBUFS; - } - - *old = qdisc_replace(sch, new, &cl->q); - return 0; -} - -static struct Qdisc *cbq_leaf(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - return cl->q; -} - -static void cbq_qlen_notify(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - cbq_deactivate_class(cl); -} - -static unsigned long cbq_find(struct Qdisc *sch, u32 classid) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - return (unsigned long)cbq_class_lookup(q, classid); -} - -static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - WARN_ON(cl->filters); - - tcf_block_put(cl->block); - qdisc_put(cl->q); - qdisc_put_rtab(cl->R_tab); - gen_kill_estimator(&cl->rate_est); - if (cl != &q->link) - kfree(cl); -} - -static void cbq_destroy(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct hlist_node *next; - struct cbq_class *cl; - unsigned int h; - -#ifdef CONFIG_NET_CLS_ACT - q->rx_class = NULL; -#endif - /* - * Filters must be destroyed first because we don't destroy the - * classes from root to leafs which means that filters can still - * be bound to classes which have been destroyed already. --TGR '04 - */ - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - tcf_block_put(cl->block); - cl->block = NULL; - } - } - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry_safe(cl, next, &q->clhash.hash[h], - common.hnode) - cbq_destroy_class(sch, cl); - } - qdisc_class_hash_destroy(&q->clhash); -} - -static int -cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **tca, - unsigned long *arg, struct netlink_ext_ack *extack) -{ - int err; - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)*arg; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_CBQ_MAX + 1]; - struct cbq_class *parent; - struct qdisc_rate_table *rtab = NULL; - - err = cbq_opt_parse(tb, opt, extack); - if (err < 0) - return err; - - if (tb[TCA_CBQ_OVL_STRATEGY] || tb[TCA_CBQ_POLICE]) { - NL_SET_ERR_MSG(extack, "Neither overlimit strategy nor policing attributes can be used for changing class params"); - return -EOPNOTSUPP; - } - - if (cl) { - /* Check parent */ - if (parentid) { - if (cl->tparent && - cl->tparent->common.classid != parentid) { - NL_SET_ERR_MSG(extack, "Invalid parent id"); - return -EINVAL; - } - if (!cl->tparent && parentid != TC_H_ROOT) { - NL_SET_ERR_MSG(extack, "Parent must be root"); - return -EINVAL; - } - } - - if (tb[TCA_CBQ_RATE]) { - rtab = qdisc_get_rtab(nla_data(tb[TCA_CBQ_RATE]), - tb[TCA_CBQ_RTAB], extack); - if (rtab == NULL) - return -EINVAL; - } - - if (tca[TCA_RATE]) { - err = gen_replace_estimator(&cl->bstats, NULL, - &cl->rate_est, - NULL, - true, - tca[TCA_RATE]); - if (err) { - NL_SET_ERR_MSG(extack, "Failed to replace specified rate estimator"); - qdisc_put_rtab(rtab); - return err; - } - } - - /* Change class parameters */ - sch_tree_lock(sch); - - if (cl->next_alive != NULL) - cbq_deactivate_class(cl); - - if (rtab) { - qdisc_put_rtab(cl->R_tab); - cl->R_tab = rtab; - } - - if (tb[TCA_CBQ_LSSOPT]) - cbq_set_lss(cl, nla_data(tb[TCA_CBQ_LSSOPT])); - - if (tb[TCA_CBQ_WRROPT]) { - cbq_rmprio(q, cl); - cbq_set_wrr(cl, nla_data(tb[TCA_CBQ_WRROPT])); - } - - if (tb[TCA_CBQ_FOPT]) - cbq_set_fopt(cl, nla_data(tb[TCA_CBQ_FOPT])); - - if (cl->q->q.qlen) - cbq_activate_class(cl); - - sch_tree_unlock(sch); - - return 0; - } - - if (parentid == TC_H_ROOT) - return -EINVAL; - - if (!tb[TCA_CBQ_WRROPT] || !tb[TCA_CBQ_RATE] || !tb[TCA_CBQ_LSSOPT]) { - NL_SET_ERR_MSG(extack, "One of the following attributes MUST be specified: WRR, rate or link sharing"); - return -EINVAL; - } - - rtab = qdisc_get_rtab(nla_data(tb[TCA_CBQ_RATE]), tb[TCA_CBQ_RTAB], - extack); - if (rtab == NULL) - return -EINVAL; - - if (classid) { - err = -EINVAL; - if (TC_H_MAJ(classid ^ sch->handle) || - cbq_class_lookup(q, classid)) { - NL_SET_ERR_MSG(extack, "Specified class not found"); - goto failure; - } - } else { - int i; - classid = TC_H_MAKE(sch->handle, 0x8000); - - for (i = 0; i < 0x8000; i++) { - if (++q->hgenerator >= 0x8000) - q->hgenerator = 1; - if (cbq_class_lookup(q, classid|q->hgenerator) == NULL) - break; - } - err = -ENOSR; - if (i >= 0x8000) { - NL_SET_ERR_MSG(extack, "Unable to generate classid"); - goto failure; - } - classid = classid|q->hgenerator; - } - - parent = &q->link; - if (parentid) { - parent = cbq_class_lookup(q, parentid); - err = -EINVAL; - if (!parent) { - NL_SET_ERR_MSG(extack, "Failed to find parentid"); - goto failure; - } - } - - err = -ENOBUFS; - cl = kzalloc(sizeof(*cl), GFP_KERNEL); - if (cl == NULL) - goto failure; - - gnet_stats_basic_sync_init(&cl->bstats); - err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack); - if (err) { - kfree(cl); - goto failure; - } - - if (tca[TCA_RATE]) { - err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est, - NULL, true, tca[TCA_RATE]); - if (err) { - NL_SET_ERR_MSG(extack, "Couldn't create new estimator"); - tcf_block_put(cl->block); - kfree(cl); - goto failure; - } - } - - cl->R_tab = rtab; - rtab = NULL; - cl->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid, - NULL); - if (!cl->q) - cl->q = &noop_qdisc; - else - qdisc_hash_add(cl->q, true); - - cl->common.classid = classid; - cl->tparent = parent; - cl->qdisc = sch; - cl->allot = parent->allot; - cl->quantum = cl->allot; - cl->weight = cl->R_tab->rate.rate; - - sch_tree_lock(sch); - cbq_link_class(cl); - cl->borrow = cl->tparent; - if (cl->tparent != &q->link) - cl->share = cl->tparent; - cbq_adjust_levels(parent); - cl->minidle = -0x7FFFFFFF; - cbq_set_lss(cl, nla_data(tb[TCA_CBQ_LSSOPT])); - cbq_set_wrr(cl, nla_data(tb[TCA_CBQ_WRROPT])); - if (cl->ewma_log == 0) - cl->ewma_log = q->link.ewma_log; - if (cl->maxidle == 0) - cl->maxidle = q->link.maxidle; - if (cl->avpkt == 0) - cl->avpkt = q->link.avpkt; - if (tb[TCA_CBQ_FOPT]) - cbq_set_fopt(cl, nla_data(tb[TCA_CBQ_FOPT])); - sch_tree_unlock(sch); - - qdisc_class_hash_grow(sch, &q->clhash); - - *arg = (unsigned long)cl; - return 0; - -failure: - qdisc_put_rtab(rtab); - return err; -} - -static int cbq_delete(struct Qdisc *sch, unsigned long arg, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - - if (cl->filters || cl->children || cl == &q->link) - return -EBUSY; - - sch_tree_lock(sch); - - qdisc_purge_queue(cl->q); - - if (cl->next_alive) - cbq_deactivate_class(cl); - - if (q->tx_borrowed == cl) - q->tx_borrowed = q->tx_class; - if (q->tx_class == cl) { - q->tx_class = NULL; - q->tx_borrowed = NULL; - } -#ifdef CONFIG_NET_CLS_ACT - if (q->rx_class == cl) - q->rx_class = NULL; -#endif - - cbq_unlink_class(cl); - cbq_adjust_levels(cl->tparent); - cl->defmap = 0; - cbq_sync_defmap(cl); - - cbq_rmprio(q, cl); - sch_tree_unlock(sch); - - cbq_destroy_class(sch, cl); - return 0; -} - -static struct tcf_block *cbq_tcf_block(struct Qdisc *sch, unsigned long arg, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - - if (cl == NULL) - cl = &q->link; - - return cl->block; -} - -static unsigned long cbq_bind_filter(struct Qdisc *sch, unsigned long parent, - u32 classid) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *p = (struct cbq_class *)parent; - struct cbq_class *cl = cbq_class_lookup(q, classid); - - if (cl) { - if (p && p->level <= cl->level) - return 0; - cl->filters++; - return (unsigned long)cl; - } - return 0; -} - -static void cbq_unbind_filter(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - cl->filters--; -} - -static void cbq_walk(struct Qdisc *sch, struct qdisc_walker *arg) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl; - unsigned int h; - - if (arg->stop) - return; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - if (!tc_qdisc_stats_dump(sch, (unsigned long)cl, arg)) - return; - } - } -} - -static const struct Qdisc_class_ops cbq_class_ops = { - .graft = cbq_graft, - .leaf = cbq_leaf, - .qlen_notify = cbq_qlen_notify, - .find = cbq_find, - .change = cbq_change_class, - .delete = cbq_delete, - .walk = cbq_walk, - .tcf_block = cbq_tcf_block, - .bind_tcf = cbq_bind_filter, - .unbind_tcf = cbq_unbind_filter, - .dump = cbq_dump_class, - .dump_stats = cbq_dump_class_stats, -}; - -static struct Qdisc_ops cbq_qdisc_ops __read_mostly = { - .next = NULL, - .cl_ops = &cbq_class_ops, - .id = "cbq", - .priv_size = sizeof(struct cbq_sched_data), - .enqueue = cbq_enqueue, - .dequeue = cbq_dequeue, - .peek = qdisc_peek_dequeued, - .init = cbq_init, - .reset = cbq_reset, - .destroy = cbq_destroy, - .change = NULL, - .dump = cbq_dump, - .dump_stats = cbq_dump_stats, - .owner = THIS_MODULE, -}; - -static int __init cbq_module_init(void) -{ - return register_qdisc(&cbq_qdisc_ops); -} -static void __exit cbq_module_exit(void) -{ - unregister_qdisc(&cbq_qdisc_ops); -} -module_init(cbq_module_init) -module_exit(cbq_module_exit) -MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/cbq.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/cbq.json deleted file mode 100644 index 1ab21c83a122..000000000000 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/cbq.json +++ /dev/null @@ -1,184 +0,0 @@ -[ - { - "id": "3460", - "name": "Create CBQ with default setting", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "0592", - "name": "Create CBQ with mpu", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000 mpu 1000", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "4684", - "name": "Create CBQ with valid cell num", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000 cell 128", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "4345", - "name": "Create CBQ with invalid cell num", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000 cell 100", - "expExitCode": "1", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "0", - "teardown": [ - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "4525", - "name": "Create CBQ with valid ewma", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000 ewma 16", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "6784", - "name": "Create CBQ with invalid ewma", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000 ewma 128", - "expExitCode": "1", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "0", - "teardown": [ - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "5468", - "name": "Delete CBQ with handle", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true", - "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000" - ], - "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 1: root", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc cbq 1: root refcnt [0-9]+ rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "0", - "teardown": [ - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "492a", - "name": "Show CBQ class", - "category": [ - "qdisc", - "cbq" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root cbq bandwidth 10000 avpkt 9000", - "expExitCode": "0", - "verifyCmd": "$TC class show dev $DUMMY", - "matchPattern": "class cbq 1: root rate 10Kbit \\(bounded,isolated\\) prio no-transmit", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - } -] -- cgit v1.2.3 From fb38306ceb9e770adfb5ffa6e3c64047b55f7a07 Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Tue, 14 Feb 2023 08:49:12 -0500 Subject: net/sched: Retire ATM qdisc The ATM qdisc has served us well over the years but has not been getting much TLC due to lack of known users. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim Acked-by: Jiri Pirko Signed-off-by: Paolo Abeni --- net/sched/Kconfig | 14 - net/sched/Makefile | 1 - net/sched/sch_atm.c | 706 --------------------- .../selftests/tc-testing/tc-tests/qdiscs/atm.json | 94 --- 4 files changed, 815 deletions(-) delete mode 100644 net/sched/sch_atm.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/atm.json (limited to 'net') diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 960e30b6b746..869321db12f7 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -68,20 +68,6 @@ config NET_SCH_HFSC To compile this code as a module, choose M here: the module will be called sch_hfsc. -config NET_SCH_ATM - tristate "ATM Virtual Circuits (ATM)" - depends on ATM - help - Say Y here if you want to use the ATM pseudo-scheduler. This - provides a framework for invoking classifiers, which in turn - select classes of this queuing discipline. Each class maps - the flow(s) it is handling to a given virtual circuit. - - See the top of for more details. - - To compile this code as a module, choose M here: the - module will be called sch_atm. - config NET_SCH_PRIO tristate "Multi Band Priority Queueing (PRIO)" help diff --git a/net/sched/Makefile b/net/sched/Makefile index 9cc7d74b9c4a..d2612b47530c 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -45,7 +45,6 @@ obj-$(CONFIG_NET_SCH_TBF) += sch_tbf.o obj-$(CONFIG_NET_SCH_TEQL) += sch_teql.o obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o obj-$(CONFIG_NET_SCH_MULTIQ) += sch_multiq.o -obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o obj-$(CONFIG_NET_SCH_NETEM) += sch_netem.o obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o obj-$(CONFIG_NET_SCH_PLUG) += sch_plug.o diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c deleted file mode 100644 index 4a981ca90b0b..000000000000 --- a/net/sched/sch_atm.c +++ /dev/null @@ -1,706 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* net/sched/sch_atm.c - ATM VC selection "queueing discipline" */ - -/* Written 1998-2000 by Werner Almesberger, EPFL ICA */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include /* for fput */ -#include -#include -#include - -/* - * The ATM queuing discipline provides a framework for invoking classifiers - * (aka "filters"), which in turn select classes of this queuing discipline. - * Each class maps the flow(s) it is handling to a given VC. Multiple classes - * may share the same VC. - * - * When creating a class, VCs are specified by passing the number of the open - * socket descriptor by which the calling process references the VC. The kernel - * keeps the VC open at least until all classes using it are removed. - * - * In this file, most functions are named atm_tc_* to avoid confusion with all - * the atm_* in net/atm. This naming convention differs from what's used in the - * rest of net/sched. - * - * Known bugs: - * - sometimes messes up the IP stack - * - any manipulations besides the few operations described in the README, are - * untested and likely to crash the system - * - should lock the flow while there is data in the queue (?) - */ - -#define VCC2FLOW(vcc) ((struct atm_flow_data *) ((vcc)->user_back)) - -struct atm_flow_data { - struct Qdisc_class_common common; - struct Qdisc *q; /* FIFO, TBF, etc. */ - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */ - void (*old_pop)(struct atm_vcc *vcc, - struct sk_buff *skb); /* chaining */ - struct atm_qdisc_data *parent; /* parent qdisc */ - struct socket *sock; /* for closing */ - int ref; /* reference count */ - struct gnet_stats_basic_sync bstats; - struct gnet_stats_queue qstats; - struct list_head list; - struct atm_flow_data *excess; /* flow for excess traffic; - NULL to set CLP instead */ - int hdr_len; - unsigned char hdr[]; /* header data; MUST BE LAST */ -}; - -struct atm_qdisc_data { - struct atm_flow_data link; /* unclassified skbs go here */ - struct list_head flows; /* NB: "link" is also on this - list */ - struct tasklet_struct task; /* dequeue tasklet */ -}; - -/* ------------------------- Class/flow operations ------------------------- */ - -static inline struct atm_flow_data *lookup_flow(struct Qdisc *sch, u32 classid) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - list_for_each_entry(flow, &p->flows, list) { - if (flow->common.classid == classid) - return flow; - } - return NULL; -} - -static int atm_tc_graft(struct Qdisc *sch, unsigned long arg, - struct Qdisc *new, struct Qdisc **old, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - pr_debug("atm_tc_graft(sch %p,[qdisc %p],flow %p,new %p,old %p)\n", - sch, p, flow, new, old); - if (list_empty(&flow->list)) - return -EINVAL; - if (!new) - new = &noop_qdisc; - *old = flow->q; - flow->q = new; - if (*old) - qdisc_reset(*old); - return 0; -} - -static struct Qdisc *atm_tc_leaf(struct Qdisc *sch, unsigned long cl) -{ - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_leaf(sch %p,flow %p)\n", sch, flow); - return flow ? flow->q : NULL; -} - -static unsigned long atm_tc_find(struct Qdisc *sch, u32 classid) -{ - struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid); - flow = lookup_flow(sch, classid); - pr_debug("%s: flow %p\n", __func__, flow); - return (unsigned long)flow; -} - -static unsigned long atm_tc_bind_filter(struct Qdisc *sch, - unsigned long parent, u32 classid) -{ - struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid); - flow = lookup_flow(sch, classid); - if (flow) - flow->ref++; - pr_debug("%s: flow %p\n", __func__, flow); - return (unsigned long)flow; -} - -/* - * atm_tc_put handles all destructions, including the ones that are explicitly - * requested (atm_tc_destroy, etc.). The assumption here is that we never drop - * anything that still seems to be in use. - */ -static void atm_tc_put(struct Qdisc *sch, unsigned long cl) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_put(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - if (--flow->ref) - return; - pr_debug("atm_tc_put: destroying\n"); - list_del_init(&flow->list); - pr_debug("atm_tc_put: qdisc %p\n", flow->q); - qdisc_put(flow->q); - tcf_block_put(flow->block); - if (flow->sock) { - pr_debug("atm_tc_put: f_count %ld\n", - file_count(flow->sock->file)); - flow->vcc->pop = flow->old_pop; - sockfd_put(flow->sock); - } - if (flow->excess) - atm_tc_put(sch, (unsigned long)flow->excess); - if (flow != &p->link) - kfree(flow); - /* - * If flow == &p->link, the qdisc no longer works at this point and - * needs to be removed. (By the caller of atm_tc_put.) - */ -} - -static void sch_atm_pop(struct atm_vcc *vcc, struct sk_buff *skb) -{ - struct atm_qdisc_data *p = VCC2FLOW(vcc)->parent; - - pr_debug("sch_atm_pop(vcc %p,skb %p,[qdisc %p])\n", vcc, skb, p); - VCC2FLOW(vcc)->old_pop(vcc, skb); - tasklet_schedule(&p->task); -} - -static const u8 llc_oui_ip[] = { - 0xaa, /* DSAP: non-ISO */ - 0xaa, /* SSAP: non-ISO */ - 0x03, /* Ctrl: Unnumbered Information Command PDU */ - 0x00, /* OUI: EtherType */ - 0x00, 0x00, - 0x08, 0x00 -}; /* Ethertype IP (0800) */ - -static const struct nla_policy atm_policy[TCA_ATM_MAX + 1] = { - [TCA_ATM_FD] = { .type = NLA_U32 }, - [TCA_ATM_EXCESS] = { .type = NLA_U32 }, -}; - -static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent, - struct nlattr **tca, unsigned long *arg, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)*arg; - struct atm_flow_data *excess = NULL; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_ATM_MAX + 1]; - struct socket *sock; - int fd, error, hdr_len; - void *hdr; - - pr_debug("atm_tc_change(sch %p,[qdisc %p],classid %x,parent %x," - "flow %p,opt %p)\n", sch, p, classid, parent, flow, opt); - /* - * The concept of parents doesn't apply for this qdisc. - */ - if (parent && parent != TC_H_ROOT && parent != sch->handle) - return -EINVAL; - /* - * ATM classes cannot be changed. In order to change properties of the - * ATM connection, that socket needs to be modified directly (via the - * native ATM API. In order to send a flow to a different VC, the old - * class needs to be removed and a new one added. (This may be changed - * later.) - */ - if (flow) - return -EBUSY; - if (opt == NULL) - return -EINVAL; - - error = nla_parse_nested_deprecated(tb, TCA_ATM_MAX, opt, atm_policy, - NULL); - if (error < 0) - return error; - - if (!tb[TCA_ATM_FD]) - return -EINVAL; - fd = nla_get_u32(tb[TCA_ATM_FD]); - pr_debug("atm_tc_change: fd %d\n", fd); - if (tb[TCA_ATM_HDR]) { - hdr_len = nla_len(tb[TCA_ATM_HDR]); - hdr = nla_data(tb[TCA_ATM_HDR]); - } else { - hdr_len = RFC1483LLC_LEN; - hdr = NULL; /* default LLC/SNAP for IP */ - } - if (!tb[TCA_ATM_EXCESS]) - excess = NULL; - else { - excess = (struct atm_flow_data *) - atm_tc_find(sch, nla_get_u32(tb[TCA_ATM_EXCESS])); - if (!excess) - return -ENOENT; - } - pr_debug("atm_tc_change: type %d, payload %d, hdr_len %d\n", - opt->nla_type, nla_len(opt), hdr_len); - sock = sockfd_lookup(fd, &error); - if (!sock) - return error; /* f_count++ */ - pr_debug("atm_tc_change: f_count %ld\n", file_count(sock->file)); - if (sock->ops->family != PF_ATMSVC && sock->ops->family != PF_ATMPVC) { - error = -EPROTOTYPE; - goto err_out; - } - /* @@@ should check if the socket is really operational or we'll crash - on vcc->send */ - if (classid) { - if (TC_H_MAJ(classid ^ sch->handle)) { - pr_debug("atm_tc_change: classid mismatch\n"); - error = -EINVAL; - goto err_out; - } - } else { - int i; - unsigned long cl; - - for (i = 1; i < 0x8000; i++) { - classid = TC_H_MAKE(sch->handle, 0x8000 | i); - cl = atm_tc_find(sch, classid); - if (!cl) - break; - } - } - pr_debug("atm_tc_change: new id %x\n", classid); - flow = kzalloc(sizeof(struct atm_flow_data) + hdr_len, GFP_KERNEL); - pr_debug("atm_tc_change: flow %p\n", flow); - if (!flow) { - error = -ENOBUFS; - goto err_out; - } - - error = tcf_block_get(&flow->block, &flow->filter_list, sch, - extack); - if (error) { - kfree(flow); - goto err_out; - } - - flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid, - extack); - if (!flow->q) - flow->q = &noop_qdisc; - pr_debug("atm_tc_change: qdisc %p\n", flow->q); - flow->sock = sock; - flow->vcc = ATM_SD(sock); /* speedup */ - flow->vcc->user_back = flow; - pr_debug("atm_tc_change: vcc %p\n", flow->vcc); - flow->old_pop = flow->vcc->pop; - flow->parent = p; - flow->vcc->pop = sch_atm_pop; - flow->common.classid = classid; - flow->ref = 1; - flow->excess = excess; - list_add(&flow->list, &p->link.list); - flow->hdr_len = hdr_len; - if (hdr) - memcpy(flow->hdr, hdr, hdr_len); - else - memcpy(flow->hdr, llc_oui_ip, sizeof(llc_oui_ip)); - *arg = (unsigned long)flow; - return 0; -err_out: - sockfd_put(sock); - return error; -} - -static int atm_tc_delete(struct Qdisc *sch, unsigned long arg, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - if (list_empty(&flow->list)) - return -EINVAL; - if (rcu_access_pointer(flow->filter_list) || flow == &p->link) - return -EBUSY; - /* - * Reference count must be 2: one for "keepalive" (set at class - * creation), and one for the reference held when calling delete. - */ - if (flow->ref < 2) { - pr_err("atm_tc_delete: flow->ref == %d\n", flow->ref); - return -EINVAL; - } - if (flow->ref > 2) - return -EBUSY; /* catch references via excess, etc. */ - atm_tc_put(sch, arg); - return 0; -} - -static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("atm_tc_walk(sch %p,[qdisc %p],walker %p)\n", sch, p, walker); - if (walker->stop) - return; - list_for_each_entry(flow, &p->flows, list) { - if (!tc_qdisc_stats_dump(sch, (unsigned long)flow, walker)) - break; - } -} - -static struct tcf_block *atm_tc_tcf_block(struct Qdisc *sch, unsigned long cl, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_find_tcf(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - return flow ? flow->block : p->link.block; -} - -/* --------------------------- Qdisc operations ---------------------------- */ - -static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - struct tcf_result res; - int result; - int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; - - pr_debug("atm_tc_enqueue(skb %p,sch %p,[qdisc %p])\n", skb, sch, p); - result = TC_ACT_OK; /* be nice to gcc */ - flow = NULL; - if (TC_H_MAJ(skb->priority) != sch->handle || - !(flow = (struct atm_flow_data *)atm_tc_find(sch, skb->priority))) { - struct tcf_proto *fl; - - list_for_each_entry(flow, &p->flows, list) { - fl = rcu_dereference_bh(flow->filter_list); - if (fl) { - result = tcf_classify(skb, NULL, fl, &res, true); - if (result < 0) - continue; - if (result == TC_ACT_SHOT) - goto done; - - flow = (struct atm_flow_data *)res.class; - if (!flow) - flow = lookup_flow(sch, res.classid); - goto drop; - } - } - flow = NULL; -done: - ; - } - if (!flow) { - flow = &p->link; - } else { - if (flow->vcc) - ATM_SKB(skb)->atm_options = flow->vcc->atm_options; - /*@@@ looks good ... but it's not supposed to work :-) */ -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - __qdisc_drop(skb, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - case TC_ACT_SHOT: - __qdisc_drop(skb, to_free); - goto drop; - case TC_ACT_RECLASSIFY: - if (flow->excess) - flow = flow->excess; - else - ATM_SKB(skb)->atm_options |= ATM_ATMOPT_CLP; - break; - } -#endif - } - - ret = qdisc_enqueue(skb, flow->q, to_free); - if (ret != NET_XMIT_SUCCESS) { -drop: __maybe_unused - if (net_xmit_drop_count(ret)) { - qdisc_qstats_drop(sch); - if (flow) - flow->qstats.drops++; - } - return ret; - } - /* - * Okay, this may seem weird. We pretend we've dropped the packet if - * it goes via ATM. The reason for this is that the outer qdisc - * expects to be able to q->dequeue the packet later on if we return - * success at this place. Also, sch->q.qdisc needs to reflect whether - * there is a packet egligible for dequeuing or not. Note that the - * statistics of the outer qdisc are necessarily wrong because of all - * this. There's currently no correct solution for this. - */ - if (flow == &p->link) { - sch->q.qlen++; - return NET_XMIT_SUCCESS; - } - tasklet_schedule(&p->task); - return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; -} - -/* - * Dequeue packets and send them over ATM. Note that we quite deliberately - * avoid checking net_device's flow control here, simply because sch_atm - * uses its own channels, which have nothing to do with any CLIP/LANE/or - * non-ATM interfaces. - */ - -static void sch_atm_dequeue(struct tasklet_struct *t) -{ - struct atm_qdisc_data *p = from_tasklet(p, t, task); - struct Qdisc *sch = qdisc_from_priv(p); - struct atm_flow_data *flow; - struct sk_buff *skb; - - pr_debug("sch_atm_dequeue(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) { - if (flow == &p->link) - continue; - /* - * If traffic is properly shaped, this won't generate nasty - * little bursts. Otherwise, it may ... (but that's okay) - */ - while ((skb = flow->q->ops->peek(flow->q))) { - if (!atm_may_send(flow->vcc, skb->truesize)) - break; - - skb = qdisc_dequeue_peeked(flow->q); - if (unlikely(!skb)) - break; - - qdisc_bstats_update(sch, skb); - bstats_update(&flow->bstats, skb); - pr_debug("atm_tc_dequeue: sending on class %p\n", flow); - /* remove any LL header somebody else has attached */ - skb_pull(skb, skb_network_offset(skb)); - if (skb_headroom(skb) < flow->hdr_len) { - struct sk_buff *new; - - new = skb_realloc_headroom(skb, flow->hdr_len); - dev_kfree_skb(skb); - if (!new) - continue; - skb = new; - } - pr_debug("sch_atm_dequeue: ip %p, data %p\n", - skb_network_header(skb), skb->data); - ATM_SKB(skb)->vcc = flow->vcc; - memcpy(skb_push(skb, flow->hdr_len), flow->hdr, - flow->hdr_len); - refcount_add(skb->truesize, - &sk_atm(flow->vcc)->sk_wmem_alloc); - /* atm.atm_options are already set by atm_tc_enqueue */ - flow->vcc->send(flow->vcc, skb); - } - } -} - -static struct sk_buff *atm_tc_dequeue(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct sk_buff *skb; - - pr_debug("atm_tc_dequeue(sch %p,[qdisc %p])\n", sch, p); - tasklet_schedule(&p->task); - skb = qdisc_dequeue_peeked(p->link.q); - if (skb) - sch->q.qlen--; - return skb; -} - -static struct sk_buff *atm_tc_peek(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - - pr_debug("atm_tc_peek(sch %p,[qdisc %p])\n", sch, p); - - return p->link.q->ops->peek(p->link.q); -} - -static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - int err; - - pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt); - INIT_LIST_HEAD(&p->flows); - INIT_LIST_HEAD(&p->link.list); - gnet_stats_basic_sync_init(&p->link.bstats); - list_add(&p->link.list, &p->flows); - p->link.q = qdisc_create_dflt(sch->dev_queue, - &pfifo_qdisc_ops, sch->handle, extack); - if (!p->link.q) - p->link.q = &noop_qdisc; - pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q); - p->link.vcc = NULL; - p->link.sock = NULL; - p->link.common.classid = sch->handle; - p->link.ref = 1; - - err = tcf_block_get(&p->link.block, &p->link.filter_list, sch, - extack); - if (err) - return err; - - tasklet_setup(&p->task, sch_atm_dequeue); - return 0; -} - -static void atm_tc_reset(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("atm_tc_reset(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) - qdisc_reset(flow->q); -} - -static void atm_tc_destroy(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow, *tmp; - - pr_debug("atm_tc_destroy(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) { - tcf_block_put(flow->block); - flow->block = NULL; - } - - list_for_each_entry_safe(flow, tmp, &p->flows, list) { - if (flow->ref > 1) - pr_err("atm_destroy: %p->ref = %d\n", flow, flow->ref); - atm_tc_put(sch, (unsigned long)flow); - } - tasklet_kill(&p->task); -} - -static int atm_tc_dump_class(struct Qdisc *sch, unsigned long cl, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - struct nlattr *nest; - - pr_debug("atm_tc_dump_class(sch %p,[qdisc %p],flow %p,skb %p,tcm %p)\n", - sch, p, flow, skb, tcm); - if (list_empty(&flow->list)) - return -EINVAL; - tcm->tcm_handle = flow->common.classid; - tcm->tcm_info = flow->q->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - - if (nla_put(skb, TCA_ATM_HDR, flow->hdr_len, flow->hdr)) - goto nla_put_failure; - if (flow->vcc) { - struct sockaddr_atmpvc pvc; - int state; - - memset(&pvc, 0, sizeof(pvc)); - pvc.sap_family = AF_ATMPVC; - pvc.sap_addr.itf = flow->vcc->dev ? flow->vcc->dev->number : -1; - pvc.sap_addr.vpi = flow->vcc->vpi; - pvc.sap_addr.vci = flow->vcc->vci; - if (nla_put(skb, TCA_ATM_ADDR, sizeof(pvc), &pvc)) - goto nla_put_failure; - state = ATM_VF2VS(flow->vcc->flags); - if (nla_put_u32(skb, TCA_ATM_STATE, state)) - goto nla_put_failure; - } - if (flow->excess) { - if (nla_put_u32(skb, TCA_ATM_EXCESS, flow->common.classid)) - goto nla_put_failure; - } else { - if (nla_put_u32(skb, TCA_ATM_EXCESS, 0)) - goto nla_put_failure; - } - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} -static int -atm_tc_dump_class_stats(struct Qdisc *sch, unsigned long arg, - struct gnet_dump *d) -{ - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - if (gnet_stats_copy_basic(d, NULL, &flow->bstats, true) < 0 || - gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0) - return -1; - - return 0; -} - -static int atm_tc_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - return 0; -} - -static const struct Qdisc_class_ops atm_class_ops = { - .graft = atm_tc_graft, - .leaf = atm_tc_leaf, - .find = atm_tc_find, - .change = atm_tc_change, - .delete = atm_tc_delete, - .walk = atm_tc_walk, - .tcf_block = atm_tc_tcf_block, - .bind_tcf = atm_tc_bind_filter, - .unbind_tcf = atm_tc_put, - .dump = atm_tc_dump_class, - .dump_stats = atm_tc_dump_class_stats, -}; - -static struct Qdisc_ops atm_qdisc_ops __read_mostly = { - .cl_ops = &atm_class_ops, - .id = "atm", - .priv_size = sizeof(struct atm_qdisc_data), - .enqueue = atm_tc_enqueue, - .dequeue = atm_tc_dequeue, - .peek = atm_tc_peek, - .init = atm_tc_init, - .reset = atm_tc_reset, - .destroy = atm_tc_destroy, - .dump = atm_tc_dump, - .owner = THIS_MODULE, -}; - -static int __init atm_init(void) -{ - return register_qdisc(&atm_qdisc_ops); -} - -static void __exit atm_exit(void) -{ - unregister_qdisc(&atm_qdisc_ops); -} - -module_init(atm_init) -module_exit(atm_exit) -MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/atm.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/atm.json deleted file mode 100644 index f5bc8670a67d..000000000000 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/atm.json +++ /dev/null @@ -1,94 +0,0 @@ -[ - { - "id": "7628", - "name": "Create ATM with default setting", - "category": [ - "qdisc", - "atm" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root atm", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc atm 1: root refcnt", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "390a", - "name": "Delete ATM with valid handle", - "category": [ - "qdisc", - "atm" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true", - "$TC qdisc add dev $DUMMY handle 1: root atm" - ], - "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 1: root", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc atm 1: root refcnt", - "matchCount": "0", - "teardown": [ - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "32a0", - "name": "Show ATM class", - "category": [ - "qdisc", - "atm" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root atm", - "expExitCode": "0", - "verifyCmd": "$TC class show dev $DUMMY", - "matchPattern": "class atm 1: parent 1:", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "6310", - "name": "Dump ATM stats", - "category": [ - "qdisc", - "atm" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root atm", - "expExitCode": "0", - "verifyCmd": "$TC -s qdisc show dev $DUMMY", - "matchPattern": "qdisc atm 1: root refcnt", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - } -] -- cgit v1.2.3 From bbe77c14ee6185a61ba6d5e435c1cbb489d2a9ed Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Tue, 14 Feb 2023 08:49:13 -0500 Subject: net/sched: Retire dsmark qdisc The dsmark qdisc has served us well over the years for diffserv but has not been getting much attention due to other more popular approaches to do diffserv services. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim Acked-by: Jiri Pirko Signed-off-by: Paolo Abeni --- net/sched/Kconfig | 11 - net/sched/Makefile | 1 - net/sched/sch_dsmark.c | 518 --------------------- .../tc-testing/tc-tests/qdiscs/dsmark.json | 140 ------ 4 files changed, 670 deletions(-) delete mode 100644 net/sched/sch_dsmark.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dsmark.json (limited to 'net') diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 869321db12f7..d909f289a67f 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -192,17 +192,6 @@ config NET_SCH_GRED To compile this code as a module, choose M here: the module will be called sch_gred. -config NET_SCH_DSMARK - tristate "Differentiated Services marker (DSMARK)" - help - Say Y if you want to schedule packets according to the - Differentiated Services architecture proposed in RFC 2475. - Technical information on this method, with pointers to associated - RFCs, is available at . - - To compile this code as a module, choose M here: the - module will be called sch_dsmark. - config NET_SCH_NETEM tristate "Network emulator (NETEM)" help diff --git a/net/sched/Makefile b/net/sched/Makefile index d2612b47530c..0852e989af96 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -38,7 +38,6 @@ obj-$(CONFIG_NET_SCH_HFSC) += sch_hfsc.o obj-$(CONFIG_NET_SCH_RED) += sch_red.o obj-$(CONFIG_NET_SCH_GRED) += sch_gred.o obj-$(CONFIG_NET_SCH_INGRESS) += sch_ingress.o -obj-$(CONFIG_NET_SCH_DSMARK) += sch_dsmark.o obj-$(CONFIG_NET_SCH_SFB) += sch_sfb.o obj-$(CONFIG_NET_SCH_SFQ) += sch_sfq.o obj-$(CONFIG_NET_SCH_TBF) += sch_tbf.o diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c deleted file mode 100644 index 401ffaf87d62..000000000000 --- a/net/sched/sch_dsmark.c +++ /dev/null @@ -1,518 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* net/sched/sch_dsmark.c - Differentiated Services field marker */ - -/* Written 1998-2000 by Werner Almesberger, EPFL ICA */ - - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * classid class marking - * ------- ----- ------- - * n/a 0 n/a - * x:0 1 use entry [0] - * ... ... ... - * x:y y>0 y+1 use entry [y] - * ... ... ... - * x:indices-1 indices use entry [indices-1] - * ... ... ... - * x:y y+1 use entry [y & (indices-1)] - * ... ... ... - * 0xffff 0x10000 use entry [indices-1] - */ - - -#define NO_DEFAULT_INDEX (1 << 16) - -struct mask_value { - u8 mask; - u8 value; -}; - -struct dsmark_qdisc_data { - struct Qdisc *q; - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - struct mask_value *mv; - u16 indices; - u8 set_tc_index; - u32 default_index; /* index range is 0...0xffff */ -#define DSMARK_EMBEDDED_SZ 16 - struct mask_value embedded[DSMARK_EMBEDDED_SZ]; -}; - -static inline int dsmark_valid_index(struct dsmark_qdisc_data *p, u16 index) -{ - return index <= p->indices && index > 0; -} - -/* ------------------------- Class/flow operations ------------------------- */ - -static int dsmark_graft(struct Qdisc *sch, unsigned long arg, - struct Qdisc *new, struct Qdisc **old, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p],new %p,old %p)\n", - __func__, sch, p, new, old); - - if (new == NULL) { - new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - sch->handle, NULL); - if (new == NULL) - new = &noop_qdisc; - } - - *old = qdisc_replace(sch, new, &p->q); - return 0; -} - -static struct Qdisc *dsmark_leaf(struct Qdisc *sch, unsigned long arg) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - return p->q; -} - -static unsigned long dsmark_find(struct Qdisc *sch, u32 classid) -{ - return TC_H_MIN(classid) + 1; -} - -static unsigned long dsmark_bind_filter(struct Qdisc *sch, - unsigned long parent, u32 classid) -{ - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", - __func__, sch, qdisc_priv(sch), classid); - - return dsmark_find(sch, classid); -} - -static void dsmark_unbind_filter(struct Qdisc *sch, unsigned long cl) -{ -} - -static const struct nla_policy dsmark_policy[TCA_DSMARK_MAX + 1] = { - [TCA_DSMARK_INDICES] = { .type = NLA_U16 }, - [TCA_DSMARK_DEFAULT_INDEX] = { .type = NLA_U16 }, - [TCA_DSMARK_SET_TC_INDEX] = { .type = NLA_FLAG }, - [TCA_DSMARK_MASK] = { .type = NLA_U8 }, - [TCA_DSMARK_VALUE] = { .type = NLA_U8 }, -}; - -static int dsmark_change(struct Qdisc *sch, u32 classid, u32 parent, - struct nlattr **tca, unsigned long *arg, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_DSMARK_MAX + 1]; - int err = -EINVAL; - - pr_debug("%s(sch %p,[qdisc %p],classid %x,parent %x), arg 0x%lx\n", - __func__, sch, p, classid, parent, *arg); - - if (!dsmark_valid_index(p, *arg)) { - err = -ENOENT; - goto errout; - } - - if (!opt) - goto errout; - - err = nla_parse_nested_deprecated(tb, TCA_DSMARK_MAX, opt, - dsmark_policy, NULL); - if (err < 0) - goto errout; - - if (tb[TCA_DSMARK_VALUE]) - p->mv[*arg - 1].value = nla_get_u8(tb[TCA_DSMARK_VALUE]); - - if (tb[TCA_DSMARK_MASK]) - p->mv[*arg - 1].mask = nla_get_u8(tb[TCA_DSMARK_MASK]); - - err = 0; - -errout: - return err; -} - -static int dsmark_delete(struct Qdisc *sch, unsigned long arg, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - if (!dsmark_valid_index(p, arg)) - return -EINVAL; - - p->mv[arg - 1].mask = 0xff; - p->mv[arg - 1].value = 0; - - return 0; -} - -static void dsmark_walk(struct Qdisc *sch, struct qdisc_walker *walker) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - int i; - - pr_debug("%s(sch %p,[qdisc %p],walker %p)\n", - __func__, sch, p, walker); - - if (walker->stop) - return; - - for (i = 0; i < p->indices; i++) { - if (p->mv[i].mask == 0xff && !p->mv[i].value) { - walker->count++; - continue; - } - if (!tc_qdisc_stats_dump(sch, i + 1, walker)) - break; - } -} - -static struct tcf_block *dsmark_tcf_block(struct Qdisc *sch, unsigned long cl, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - return p->block; -} - -/* --------------------------- Qdisc operations ---------------------------- */ - -static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - unsigned int len = qdisc_pkt_len(skb); - struct dsmark_qdisc_data *p = qdisc_priv(sch); - int err; - - pr_debug("%s(skb %p,sch %p,[qdisc %p])\n", __func__, skb, sch, p); - - if (p->set_tc_index) { - int wlen = skb_network_offset(skb); - - switch (skb_protocol(skb, true)) { - case htons(ETH_P_IP): - wlen += sizeof(struct iphdr); - if (!pskb_may_pull(skb, wlen) || - skb_try_make_writable(skb, wlen)) - goto drop; - - skb->tc_index = ipv4_get_dsfield(ip_hdr(skb)) - & ~INET_ECN_MASK; - break; - - case htons(ETH_P_IPV6): - wlen += sizeof(struct ipv6hdr); - if (!pskb_may_pull(skb, wlen) || - skb_try_make_writable(skb, wlen)) - goto drop; - - skb->tc_index = ipv6_get_dsfield(ipv6_hdr(skb)) - & ~INET_ECN_MASK; - break; - default: - skb->tc_index = 0; - break; - } - } - - if (TC_H_MAJ(skb->priority) == sch->handle) - skb->tc_index = TC_H_MIN(skb->priority); - else { - struct tcf_result res; - struct tcf_proto *fl = rcu_dereference_bh(p->filter_list); - int result = tcf_classify(skb, NULL, fl, &res, false); - - pr_debug("result %d class 0x%04x\n", result, res.classid); - - switch (result) { -#ifdef CONFIG_NET_CLS_ACT - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - __qdisc_drop(skb, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - - case TC_ACT_SHOT: - goto drop; -#endif - case TC_ACT_OK: - skb->tc_index = TC_H_MIN(res.classid); - break; - - default: - if (p->default_index != NO_DEFAULT_INDEX) - skb->tc_index = p->default_index; - break; - } - } - - err = qdisc_enqueue(skb, p->q, to_free); - if (err != NET_XMIT_SUCCESS) { - if (net_xmit_drop_count(err)) - qdisc_qstats_drop(sch); - return err; - } - - sch->qstats.backlog += len; - sch->q.qlen++; - - return NET_XMIT_SUCCESS; - -drop: - qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; -} - -static struct sk_buff *dsmark_dequeue(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct sk_buff *skb; - u32 index; - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - skb = qdisc_dequeue_peeked(p->q); - if (skb == NULL) - return NULL; - - qdisc_bstats_update(sch, skb); - qdisc_qstats_backlog_dec(sch, skb); - sch->q.qlen--; - - index = skb->tc_index & (p->indices - 1); - pr_debug("index %d->%d\n", skb->tc_index, index); - - switch (skb_protocol(skb, true)) { - case htons(ETH_P_IP): - ipv4_change_dsfield(ip_hdr(skb), p->mv[index].mask, - p->mv[index].value); - break; - case htons(ETH_P_IPV6): - ipv6_change_dsfield(ipv6_hdr(skb), p->mv[index].mask, - p->mv[index].value); - break; - default: - /* - * Only complain if a change was actually attempted. - * This way, we can send non-IP traffic through dsmark - * and don't need yet another qdisc as a bypass. - */ - if (p->mv[index].mask != 0xff || p->mv[index].value) - pr_warn("%s: unsupported protocol %d\n", - __func__, ntohs(skb_protocol(skb, true))); - break; - } - - return skb; -} - -static struct sk_buff *dsmark_peek(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - return p->q->ops->peek(p->q); -} - -static int dsmark_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *tb[TCA_DSMARK_MAX + 1]; - int err = -EINVAL; - u32 default_index = NO_DEFAULT_INDEX; - u16 indices; - int i; - - pr_debug("%s(sch %p,[qdisc %p],opt %p)\n", __func__, sch, p, opt); - - if (!opt) - goto errout; - - err = tcf_block_get(&p->block, &p->filter_list, sch, extack); - if (err) - return err; - - err = nla_parse_nested_deprecated(tb, TCA_DSMARK_MAX, opt, - dsmark_policy, NULL); - if (err < 0) - goto errout; - - err = -EINVAL; - if (!tb[TCA_DSMARK_INDICES]) - goto errout; - indices = nla_get_u16(tb[TCA_DSMARK_INDICES]); - - if (hweight32(indices) != 1) - goto errout; - - if (tb[TCA_DSMARK_DEFAULT_INDEX]) - default_index = nla_get_u16(tb[TCA_DSMARK_DEFAULT_INDEX]); - - if (indices <= DSMARK_EMBEDDED_SZ) - p->mv = p->embedded; - else - p->mv = kmalloc_array(indices, sizeof(*p->mv), GFP_KERNEL); - if (!p->mv) { - err = -ENOMEM; - goto errout; - } - for (i = 0; i < indices; i++) { - p->mv[i].mask = 0xff; - p->mv[i].value = 0; - } - p->indices = indices; - p->default_index = default_index; - p->set_tc_index = nla_get_flag(tb[TCA_DSMARK_SET_TC_INDEX]); - - p->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, sch->handle, - NULL); - if (p->q == NULL) - p->q = &noop_qdisc; - else - qdisc_hash_add(p->q, true); - - pr_debug("%s: qdisc %p\n", __func__, p->q); - - err = 0; -errout: - return err; -} - -static void dsmark_reset(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - if (p->q) - qdisc_reset(p->q); -} - -static void dsmark_destroy(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - tcf_block_put(p->block); - qdisc_put(p->q); - if (p->mv != p->embedded) - kfree(p->mv); -} - -static int dsmark_dump_class(struct Qdisc *sch, unsigned long cl, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opts = NULL; - - pr_debug("%s(sch %p,[qdisc %p],class %ld\n", __func__, sch, p, cl); - - if (!dsmark_valid_index(p, cl)) - return -EINVAL; - - tcm->tcm_handle = TC_H_MAKE(TC_H_MAJ(sch->handle), cl - 1); - tcm->tcm_info = p->q->handle; - - opts = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (opts == NULL) - goto nla_put_failure; - if (nla_put_u8(skb, TCA_DSMARK_MASK, p->mv[cl - 1].mask) || - nla_put_u8(skb, TCA_DSMARK_VALUE, p->mv[cl - 1].value)) - goto nla_put_failure; - - return nla_nest_end(skb, opts); - -nla_put_failure: - nla_nest_cancel(skb, opts); - return -EMSGSIZE; -} - -static int dsmark_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opts = NULL; - - opts = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (opts == NULL) - goto nla_put_failure; - if (nla_put_u16(skb, TCA_DSMARK_INDICES, p->indices)) - goto nla_put_failure; - - if (p->default_index != NO_DEFAULT_INDEX && - nla_put_u16(skb, TCA_DSMARK_DEFAULT_INDEX, p->default_index)) - goto nla_put_failure; - - if (p->set_tc_index && - nla_put_flag(skb, TCA_DSMARK_SET_TC_INDEX)) - goto nla_put_failure; - - return nla_nest_end(skb, opts); - -nla_put_failure: - nla_nest_cancel(skb, opts); - return -EMSGSIZE; -} - -static const struct Qdisc_class_ops dsmark_class_ops = { - .graft = dsmark_graft, - .leaf = dsmark_leaf, - .find = dsmark_find, - .change = dsmark_change, - .delete = dsmark_delete, - .walk = dsmark_walk, - .tcf_block = dsmark_tcf_block, - .bind_tcf = dsmark_bind_filter, - .unbind_tcf = dsmark_unbind_filter, - .dump = dsmark_dump_class, -}; - -static struct Qdisc_ops dsmark_qdisc_ops __read_mostly = { - .next = NULL, - .cl_ops = &dsmark_class_ops, - .id = "dsmark", - .priv_size = sizeof(struct dsmark_qdisc_data), - .enqueue = dsmark_enqueue, - .dequeue = dsmark_dequeue, - .peek = dsmark_peek, - .init = dsmark_init, - .reset = dsmark_reset, - .destroy = dsmark_destroy, - .change = NULL, - .dump = dsmark_dump, - .owner = THIS_MODULE, -}; - -static int __init dsmark_module_init(void) -{ - return register_qdisc(&dsmark_qdisc_ops); -} - -static void __exit dsmark_module_exit(void) -{ - unregister_qdisc(&dsmark_qdisc_ops); -} - -module_init(dsmark_module_init) -module_exit(dsmark_module_exit) - -MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dsmark.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dsmark.json deleted file mode 100644 index c030795f9c37..000000000000 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/dsmark.json +++ /dev/null @@ -1,140 +0,0 @@ -[ - { - "id": "6345", - "name": "Create DSMARK with default setting", - "category": [ - "qdisc", - "dsmark" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root dsmark indices 1024", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc dsmark 1: root refcnt [0-9]+ indices 0x0400", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "3462", - "name": "Create DSMARK with default_index setting", - "category": [ - "qdisc", - "dsmark" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root dsmark indices 1024 default_index 512", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc dsmark 1: root refcnt [0-9]+ indices 0x0400 default_index 0x0200", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "ca95", - "name": "Create DSMARK with set_tc_index flag", - "category": [ - "qdisc", - "dsmark" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root dsmark indices 1024 set_tc_index", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc dsmark 1: root refcnt [0-9]+ indices 0x0400 set_tc_index", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "a950", - "name": "Create DSMARK with multiple setting", - "category": [ - "qdisc", - "dsmark" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root dsmark indices 1024 default_index 1024 set_tc_index", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc dsmark 1: root refcnt [0-9]+ indices 0x0400 default_index 0x0400 set_tc_index", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "4092", - "name": "Delete DSMARK with handle", - "category": [ - "qdisc", - "dsmark" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true", - "$TC qdisc add dev $DUMMY handle 1: root dsmark indices 1024 default_index 1024" - ], - "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 1: root", - "expExitCode": "0", - "verifyCmd": "$TC qdisc show dev $DUMMY", - "matchPattern": "qdisc dsmark 1: root refcnt [0-9]+ indices 0x0400", - "matchCount": "0", - "teardown": [ - "$IP link del dev $DUMMY type dummy" - ] - }, - { - "id": "5930", - "name": "Show DSMARK class", - "category": [ - "qdisc", - "dsmark" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$IP link add dev $DUMMY type dummy || /bin/true" - ], - "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root dsmark indices 1024", - "expExitCode": "0", - "verifyCmd": "$TC class show dev $DUMMY", - "matchPattern": "class dsmark 1:", - "matchCount": "0", - "teardown": [ - "$TC qdisc del dev $DUMMY handle 1: root", - "$IP link del dev $DUMMY type dummy" - ] - } -] -- cgit v1.2.3 From 8c710f75256bb3cf05ac7b1672c82b92c43f3d28 Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Tue, 14 Feb 2023 08:49:14 -0500 Subject: net/sched: Retire tcindex classifier The tcindex classifier has served us well for about a quarter of a century but has not been getting much TLC due to lack of known users. Most recently it has become easy prey to syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim Acked-by: Jiri Pirko Signed-off-by: Paolo Abeni --- include/net/tc_wrapper.h | 5 - net/sched/Kconfig | 11 - net/sched/Makefile | 1 - net/sched/cls_tcindex.c | 716 --------------------- .../tc-testing/tc-tests/filters/tcindex.json | 227 ------- 5 files changed, 960 deletions(-) delete mode 100644 net/sched/cls_tcindex.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/filters/tcindex.json (limited to 'net') diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h index d323fffb839a..8ba241760d0a 100644 --- a/include/net/tc_wrapper.h +++ b/include/net/tc_wrapper.h @@ -154,7 +154,6 @@ TC_INDIRECT_FILTER_DECLARE(mall_classify); TC_INDIRECT_FILTER_DECLARE(route4_classify); TC_INDIRECT_FILTER_DECLARE(rsvp_classify); TC_INDIRECT_FILTER_DECLARE(rsvp6_classify); -TC_INDIRECT_FILTER_DECLARE(tcindex_classify); TC_INDIRECT_FILTER_DECLARE(u32_classify); static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp, @@ -207,10 +206,6 @@ static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp, if (tp->classify == rsvp6_classify) return rsvp6_classify(skb, tp, res); #endif -#if IS_BUILTIN(CONFIG_NET_CLS_TCINDEX) - if (tp->classify == tcindex_classify) - return tcindex_classify(skb, tp, res); -#endif skip: return tp->classify(skb, tp, res); diff --git a/net/sched/Kconfig b/net/sched/Kconfig index d909f289a67f..111883853f08 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -468,17 +468,6 @@ config NET_CLS_BASIC To compile this code as a module, choose M here: the module will be called cls_basic. -config NET_CLS_TCINDEX - tristate "Traffic-Control Index (TCINDEX)" - select NET_CLS - help - Say Y here if you want to be able to classify packets based on - traffic control indices. You will want this feature if you want - to implement Differentiated Services together with DSMARK. - - To compile this code as a module, choose M here: the - module will be called cls_tcindex. - config NET_CLS_ROUTE4 tristate "Routing decision (ROUTE)" depends on INET diff --git a/net/sched/Makefile b/net/sched/Makefile index 0852e989af96..ea236d258c16 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -68,7 +68,6 @@ obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o obj-$(CONFIG_NET_CLS_RSVP) += cls_rsvp.o -obj-$(CONFIG_NET_CLS_TCINDEX) += cls_tcindex.o obj-$(CONFIG_NET_CLS_RSVP6) += cls_rsvp6.o obj-$(CONFIG_NET_CLS_BASIC) += cls_basic.o obj-$(CONFIG_NET_CLS_FLOW) += cls_flow.o diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c deleted file mode 100644 index ee2a050c887b..000000000000 --- a/net/sched/cls_tcindex.c +++ /dev/null @@ -1,716 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * net/sched/cls_tcindex.c Packet classifier for skb->tc_index - * - * Written 1998,1999 by Werner Almesberger, EPFL ICA - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -/* - * Passing parameters to the root seems to be done more awkwardly than really - * necessary. At least, u32 doesn't seem to use such dirty hacks. To be - * verified. FIXME. - */ - -#define PERFECT_HASH_THRESHOLD 64 /* use perfect hash if not bigger */ -#define DEFAULT_HASH_SIZE 64 /* optimized for diffserv */ - - -struct tcindex_data; - -struct tcindex_filter_result { - struct tcf_exts exts; - struct tcf_result res; - struct tcindex_data *p; - struct rcu_work rwork; -}; - -struct tcindex_filter { - u16 key; - struct tcindex_filter_result result; - struct tcindex_filter __rcu *next; - struct rcu_work rwork; -}; - - -struct tcindex_data { - struct tcindex_filter_result *perfect; /* perfect hash; NULL if none */ - struct tcindex_filter __rcu **h; /* imperfect hash; */ - struct tcf_proto *tp; - u16 mask; /* AND key with mask */ - u32 shift; /* shift ANDed key to the right */ - u32 hash; /* hash table size; 0 if undefined */ - u32 alloc_hash; /* allocated size */ - u32 fall_through; /* 0: only classify if explicit match */ - refcount_t refcnt; /* a temporary refcnt for perfect hash */ - struct rcu_work rwork; -}; - -static inline int tcindex_filter_is_set(struct tcindex_filter_result *r) -{ - return tcf_exts_has_actions(&r->exts) || r->res.classid; -} - -static void tcindex_data_get(struct tcindex_data *p) -{ - refcount_inc(&p->refcnt); -} - -static void tcindex_data_put(struct tcindex_data *p) -{ - if (refcount_dec_and_test(&p->refcnt)) { - kfree(p->perfect); - kfree(p->h); - kfree(p); - } -} - -static struct tcindex_filter_result *tcindex_lookup(struct tcindex_data *p, - u16 key) -{ - if (p->perfect) { - struct tcindex_filter_result *f = p->perfect + key; - - return tcindex_filter_is_set(f) ? f : NULL; - } else if (p->h) { - struct tcindex_filter __rcu **fp; - struct tcindex_filter *f; - - fp = &p->h[key % p->hash]; - for (f = rcu_dereference_bh_rtnl(*fp); - f; - fp = &f->next, f = rcu_dereference_bh_rtnl(*fp)) - if (f->key == key) - return &f->result; - } - - return NULL; -} - -TC_INDIRECT_SCOPE int tcindex_classify(struct sk_buff *skb, - const struct tcf_proto *tp, - struct tcf_result *res) -{ - struct tcindex_data *p = rcu_dereference_bh(tp->root); - struct tcindex_filter_result *f; - int key = (skb->tc_index & p->mask) >> p->shift; - - pr_debug("tcindex_classify(skb %p,tp %p,res %p),p %p\n", - skb, tp, res, p); - - f = tcindex_lookup(p, key); - if (!f) { - struct Qdisc *q = tcf_block_q(tp->chain->block); - - if (!p->fall_through) - return -1; - res->classid = TC_H_MAKE(TC_H_MAJ(q->handle), key); - res->class = 0; - pr_debug("alg 0x%x\n", res->classid); - return 0; - } - *res = f->res; - pr_debug("map 0x%x\n", res->classid); - - return tcf_exts_exec(skb, &f->exts, res); -} - - -static void *tcindex_get(struct tcf_proto *tp, u32 handle) -{ - struct tcindex_data *p = rtnl_dereference(tp->root); - struct tcindex_filter_result *r; - - pr_debug("tcindex_get(tp %p,handle 0x%08x)\n", tp, handle); - if (p->perfect && handle >= p->alloc_hash) - return NULL; - r = tcindex_lookup(p, handle); - return r && tcindex_filter_is_set(r) ? r : NULL; -} - -static int tcindex_init(struct tcf_proto *tp) -{ - struct tcindex_data *p; - - pr_debug("tcindex_init(tp %p)\n", tp); - p = kzalloc(sizeof(struct tcindex_data), GFP_KERNEL); - if (!p) - return -ENOMEM; - - p->mask = 0xffff; - p->hash = DEFAULT_HASH_SIZE; - p->fall_through = 1; - refcount_set(&p->refcnt, 1); /* Paired with tcindex_destroy_work() */ - - rcu_assign_pointer(tp->root, p); - return 0; -} - -static void __tcindex_destroy_rexts(struct tcindex_filter_result *r) -{ - tcf_exts_destroy(&r->exts); - tcf_exts_put_net(&r->exts); - tcindex_data_put(r->p); -} - -static void tcindex_destroy_rexts_work(struct work_struct *work) -{ - struct tcindex_filter_result *r; - - r = container_of(to_rcu_work(work), - struct tcindex_filter_result, - rwork); - rtnl_lock(); - __tcindex_destroy_rexts(r); - rtnl_unlock(); -} - -static void __tcindex_destroy_fexts(struct tcindex_filter *f) -{ - tcf_exts_destroy(&f->result.exts); - tcf_exts_put_net(&f->result.exts); - kfree(f); -} - -static void tcindex_destroy_fexts_work(struct work_struct *work) -{ - struct tcindex_filter *f = container_of(to_rcu_work(work), - struct tcindex_filter, - rwork); - - rtnl_lock(); - __tcindex_destroy_fexts(f); - rtnl_unlock(); -} - -static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last, - bool rtnl_held, struct netlink_ext_ack *extack) -{ - struct tcindex_data *p = rtnl_dereference(tp->root); - struct tcindex_filter_result *r = arg; - struct tcindex_filter __rcu **walk; - struct tcindex_filter *f = NULL; - - pr_debug("tcindex_delete(tp %p,arg %p),p %p\n", tp, arg, p); - if (p->perfect) { - if (!r->res.class) - return -ENOENT; - } else { - int i; - - for (i = 0; i < p->hash; i++) { - walk = p->h + i; - for (f = rtnl_dereference(*walk); f; - walk = &f->next, f = rtnl_dereference(*walk)) { - if (&f->result == r) - goto found; - } - } - return -ENOENT; - -found: - rcu_assign_pointer(*walk, rtnl_dereference(f->next)); - } - tcf_unbind_filter(tp, &r->res); - /* all classifiers are required to call tcf_exts_destroy() after rcu - * grace period, since converted-to-rcu actions are relying on that - * in cleanup() callback - */ - if (f) { - if (tcf_exts_get_net(&f->result.exts)) - tcf_queue_work(&f->rwork, tcindex_destroy_fexts_work); - else - __tcindex_destroy_fexts(f); - } else { - tcindex_data_get(p); - - if (tcf_exts_get_net(&r->exts)) - tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work); - else - __tcindex_destroy_rexts(r); - } - - *last = false; - return 0; -} - -static void tcindex_destroy_work(struct work_struct *work) -{ - struct tcindex_data *p = container_of(to_rcu_work(work), - struct tcindex_data, - rwork); - - tcindex_data_put(p); -} - -static inline int -valid_perfect_hash(struct tcindex_data *p) -{ - return p->hash > (p->mask >> p->shift); -} - -static const struct nla_policy tcindex_policy[TCA_TCINDEX_MAX + 1] = { - [TCA_TCINDEX_HASH] = { .type = NLA_U32 }, - [TCA_TCINDEX_MASK] = { .type = NLA_U16 }, - [TCA_TCINDEX_SHIFT] = { .type = NLA_U32 }, - [TCA_TCINDEX_FALL_THROUGH] = { .type = NLA_U32 }, - [TCA_TCINDEX_CLASSID] = { .type = NLA_U32 }, -}; - -static int tcindex_filter_result_init(struct tcindex_filter_result *r, - struct tcindex_data *p, - struct net *net) -{ - memset(r, 0, sizeof(*r)); - r->p = p; - return tcf_exts_init(&r->exts, net, TCA_TCINDEX_ACT, - TCA_TCINDEX_POLICE); -} - -static void tcindex_free_perfect_hash(struct tcindex_data *cp); - -static void tcindex_partial_destroy_work(struct work_struct *work) -{ - struct tcindex_data *p = container_of(to_rcu_work(work), - struct tcindex_data, - rwork); - - rtnl_lock(); - if (p->perfect) - tcindex_free_perfect_hash(p); - kfree(p); - rtnl_unlock(); -} - -static void tcindex_free_perfect_hash(struct tcindex_data *cp) -{ - int i; - - for (i = 0; i < cp->hash; i++) - tcf_exts_destroy(&cp->perfect[i].exts); - kfree(cp->perfect); -} - -static int tcindex_alloc_perfect_hash(struct net *net, struct tcindex_data *cp) -{ - int i, err = 0; - - cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result), - GFP_KERNEL | __GFP_NOWARN); - if (!cp->perfect) - return -ENOMEM; - - for (i = 0; i < cp->hash; i++) { - err = tcf_exts_init(&cp->perfect[i].exts, net, - TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); - if (err < 0) - goto errout; - cp->perfect[i].p = cp; - } - - return 0; - -errout: - tcindex_free_perfect_hash(cp); - return err; -} - -static int -tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, - u32 handle, struct tcindex_data *p, - struct tcindex_filter_result *r, struct nlattr **tb, - struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) -{ - struct tcindex_filter_result new_filter_result; - struct tcindex_data *cp = NULL, *oldp; - struct tcindex_filter *f = NULL; /* make gcc behave */ - struct tcf_result cr = {}; - int err, balloc = 0; - struct tcf_exts e; - - err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); - if (err < 0) - return err; - err = tcf_exts_validate(net, tp, tb, est, &e, flags, extack); - if (err < 0) - goto errout; - - err = -ENOMEM; - /* tcindex_data attributes must look atomic to classifier/lookup so - * allocate new tcindex data and RCU assign it onto root. Keeping - * perfect hash and hash pointers from old data. - */ - cp = kzalloc(sizeof(*cp), GFP_KERNEL); - if (!cp) - goto errout; - - cp->mask = p->mask; - cp->shift = p->shift; - cp->hash = p->hash; - cp->alloc_hash = p->alloc_hash; - cp->fall_through = p->fall_through; - cp->tp = tp; - refcount_set(&cp->refcnt, 1); /* Paired with tcindex_destroy_work() */ - - if (tb[TCA_TCINDEX_HASH]) - cp->hash = nla_get_u32(tb[TCA_TCINDEX_HASH]); - - if (tb[TCA_TCINDEX_MASK]) - cp->mask = nla_get_u16(tb[TCA_TCINDEX_MASK]); - - if (tb[TCA_TCINDEX_SHIFT]) { - cp->shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]); - if (cp->shift > 16) { - err = -EINVAL; - goto errout; - } - } - if (!cp->hash) { - /* Hash not specified, use perfect hash if the upper limit - * of the hashing index is below the threshold. - */ - if ((cp->mask >> cp->shift) < PERFECT_HASH_THRESHOLD) - cp->hash = (cp->mask >> cp->shift) + 1; - else - cp->hash = DEFAULT_HASH_SIZE; - } - - if (p->perfect) { - int i; - - if (tcindex_alloc_perfect_hash(net, cp) < 0) - goto errout; - cp->alloc_hash = cp->hash; - for (i = 0; i < min(cp->hash, p->hash); i++) - cp->perfect[i].res = p->perfect[i].res; - balloc = 1; - } - cp->h = p->h; - - err = tcindex_filter_result_init(&new_filter_result, cp, net); - if (err < 0) - goto errout_alloc; - if (r) - cr = r->res; - - err = -EBUSY; - - /* Hash already allocated, make sure that we still meet the - * requirements for the allocated hash. - */ - if (cp->perfect) { - if (!valid_perfect_hash(cp) || - cp->hash > cp->alloc_hash) - goto errout_alloc; - } else if (cp->h && cp->hash != cp->alloc_hash) { - goto errout_alloc; - } - - err = -EINVAL; - if (tb[TCA_TCINDEX_FALL_THROUGH]) - cp->fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]); - - if (!cp->perfect && !cp->h) - cp->alloc_hash = cp->hash; - - /* Note: this could be as restrictive as if (handle & ~(mask >> shift)) - * but then, we'd fail handles that may become valid after some future - * mask change. While this is extremely unlikely to ever matter, - * the check below is safer (and also more backwards-compatible). - */ - if (cp->perfect || valid_perfect_hash(cp)) - if (handle >= cp->alloc_hash) - goto errout_alloc; - - - err = -ENOMEM; - if (!cp->perfect && !cp->h) { - if (valid_perfect_hash(cp)) { - if (tcindex_alloc_perfect_hash(net, cp) < 0) - goto errout_alloc; - balloc = 1; - } else { - struct tcindex_filter __rcu **hash; - - hash = kcalloc(cp->hash, - sizeof(struct tcindex_filter *), - GFP_KERNEL); - - if (!hash) - goto errout_alloc; - - cp->h = hash; - balloc = 2; - } - } - - if (cp->perfect) - r = cp->perfect + handle; - else - r = tcindex_lookup(cp, handle) ? : &new_filter_result; - - if (r == &new_filter_result) { - f = kzalloc(sizeof(*f), GFP_KERNEL); - if (!f) - goto errout_alloc; - f->key = handle; - f->next = NULL; - err = tcindex_filter_result_init(&f->result, cp, net); - if (err < 0) { - kfree(f); - goto errout_alloc; - } - } - - if (tb[TCA_TCINDEX_CLASSID]) { - cr.classid = nla_get_u32(tb[TCA_TCINDEX_CLASSID]); - tcf_bind_filter(tp, &cr, base); - } - - oldp = p; - r->res = cr; - tcf_exts_change(&r->exts, &e); - - rcu_assign_pointer(tp->root, cp); - - if (r == &new_filter_result) { - struct tcindex_filter *nfp; - struct tcindex_filter __rcu **fp; - - f->result.res = r->res; - tcf_exts_change(&f->result.exts, &r->exts); - - fp = cp->h + (handle % cp->hash); - for (nfp = rtnl_dereference(*fp); - nfp; - fp = &nfp->next, nfp = rtnl_dereference(*fp)) - ; /* nothing */ - - rcu_assign_pointer(*fp, f); - } else { - tcf_exts_destroy(&new_filter_result.exts); - } - - if (oldp) - tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); - return 0; - -errout_alloc: - if (balloc == 1) - tcindex_free_perfect_hash(cp); - else if (balloc == 2) - kfree(cp->h); - tcf_exts_destroy(&new_filter_result.exts); -errout: - kfree(cp); - tcf_exts_destroy(&e); - return err; -} - -static int -tcindex_change(struct net *net, struct sk_buff *in_skb, - struct tcf_proto *tp, unsigned long base, u32 handle, - struct nlattr **tca, void **arg, u32 flags, - struct netlink_ext_ack *extack) -{ - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_TCINDEX_MAX + 1]; - struct tcindex_data *p = rtnl_dereference(tp->root); - struct tcindex_filter_result *r = *arg; - int err; - - pr_debug("tcindex_change(tp %p,handle 0x%08x,tca %p,arg %p),opt %p," - "p %p,r %p,*arg %p\n", - tp, handle, tca, arg, opt, p, r, *arg); - - if (!opt) - return 0; - - err = nla_parse_nested_deprecated(tb, TCA_TCINDEX_MAX, opt, - tcindex_policy, NULL); - if (err < 0) - return err; - - return tcindex_set_parms(net, tp, base, handle, p, r, tb, - tca[TCA_RATE], flags, extack); -} - -static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker, - bool rtnl_held) -{ - struct tcindex_data *p = rtnl_dereference(tp->root); - struct tcindex_filter *f, *next; - int i; - - pr_debug("tcindex_walk(tp %p,walker %p),p %p\n", tp, walker, p); - if (p->perfect) { - for (i = 0; i < p->hash; i++) { - if (!p->perfect[i].res.class) - continue; - if (!tc_cls_stats_dump(tp, walker, p->perfect + i)) - return; - } - } - if (!p->h) - return; - for (i = 0; i < p->hash; i++) { - for (f = rtnl_dereference(p->h[i]); f; f = next) { - next = rtnl_dereference(f->next); - if (!tc_cls_stats_dump(tp, walker, &f->result)) - return; - } - } -} - -static void tcindex_destroy(struct tcf_proto *tp, bool rtnl_held, - struct netlink_ext_ack *extack) -{ - struct tcindex_data *p = rtnl_dereference(tp->root); - int i; - - pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p); - - if (p->perfect) { - for (i = 0; i < p->hash; i++) { - struct tcindex_filter_result *r = p->perfect + i; - - /* tcf_queue_work() does not guarantee the ordering we - * want, so we have to take this refcnt temporarily to - * ensure 'p' is freed after all tcindex_filter_result - * here. Imperfect hash does not need this, because it - * uses linked lists rather than an array. - */ - tcindex_data_get(p); - - tcf_unbind_filter(tp, &r->res); - if (tcf_exts_get_net(&r->exts)) - tcf_queue_work(&r->rwork, - tcindex_destroy_rexts_work); - else - __tcindex_destroy_rexts(r); - } - } - - for (i = 0; p->h && i < p->hash; i++) { - struct tcindex_filter *f, *next; - bool last; - - for (f = rtnl_dereference(p->h[i]); f; f = next) { - next = rtnl_dereference(f->next); - tcindex_delete(tp, &f->result, &last, rtnl_held, NULL); - } - } - - tcf_queue_work(&p->rwork, tcindex_destroy_work); -} - - -static int tcindex_dump(struct net *net, struct tcf_proto *tp, void *fh, - struct sk_buff *skb, struct tcmsg *t, bool rtnl_held) -{ - struct tcindex_data *p = rtnl_dereference(tp->root); - struct tcindex_filter_result *r = fh; - struct nlattr *nest; - - pr_debug("tcindex_dump(tp %p,fh %p,skb %p,t %p),p %p,r %p\n", - tp, fh, skb, t, p, r); - pr_debug("p->perfect %p p->h %p\n", p->perfect, p->h); - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - - if (!fh) { - t->tcm_handle = ~0; /* whatever ... */ - if (nla_put_u32(skb, TCA_TCINDEX_HASH, p->hash) || - nla_put_u16(skb, TCA_TCINDEX_MASK, p->mask) || - nla_put_u32(skb, TCA_TCINDEX_SHIFT, p->shift) || - nla_put_u32(skb, TCA_TCINDEX_FALL_THROUGH, p->fall_through)) - goto nla_put_failure; - nla_nest_end(skb, nest); - } else { - if (p->perfect) { - t->tcm_handle = r - p->perfect; - } else { - struct tcindex_filter *f; - struct tcindex_filter __rcu **fp; - int i; - - t->tcm_handle = 0; - for (i = 0; !t->tcm_handle && i < p->hash; i++) { - fp = &p->h[i]; - for (f = rtnl_dereference(*fp); - !t->tcm_handle && f; - fp = &f->next, f = rtnl_dereference(*fp)) { - if (&f->result == r) - t->tcm_handle = f->key; - } - } - } - pr_debug("handle = %d\n", t->tcm_handle); - if (r->res.class && - nla_put_u32(skb, TCA_TCINDEX_CLASSID, r->res.classid)) - goto nla_put_failure; - - if (tcf_exts_dump(skb, &r->exts) < 0) - goto nla_put_failure; - nla_nest_end(skb, nest); - - if (tcf_exts_dump_stats(skb, &r->exts) < 0) - goto nla_put_failure; - } - - return skb->len; - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static void tcindex_bind_class(void *fh, u32 classid, unsigned long cl, - void *q, unsigned long base) -{ - struct tcindex_filter_result *r = fh; - - tc_cls_bind_class(classid, cl, q, &r->res, base); -} - -static struct tcf_proto_ops cls_tcindex_ops __read_mostly = { - .kind = "tcindex", - .classify = tcindex_classify, - .init = tcindex_init, - .destroy = tcindex_destroy, - .get = tcindex_get, - .change = tcindex_change, - .delete = tcindex_delete, - .walk = tcindex_walk, - .dump = tcindex_dump, - .bind_class = tcindex_bind_class, - .owner = THIS_MODULE, -}; - -static int __init init_tcindex(void) -{ - return register_tcf_proto_ops(&cls_tcindex_ops); -} - -static void __exit exit_tcindex(void) -{ - unregister_tcf_proto_ops(&cls_tcindex_ops); -} - -module_init(init_tcindex) -module_exit(exit_tcindex) -MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/tcindex.json b/tools/testing/selftests/tc-testing/tc-tests/filters/tcindex.json deleted file mode 100644 index 44901db70376..000000000000 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/tcindex.json +++ /dev/null @@ -1,227 +0,0 @@ -[ - { - "id": "8293", - "name": "Add tcindex filter with default action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "^filter parent ffff: protocol ip pref 1 tcindex chain 0 handle 0x0001 classid 1:1", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "7281", - "name": "Add tcindex filter with hash size and pass action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex hash 32 fall_through classid 1:1 action pass", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "^filter parent ffff: protocol ip pref.*tcindex chain [0-9]+ handle 0x0001 classid 1:1.*action order [0-9]+: gact action pass", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "b294", - "name": "Add tcindex filter with mask shift and reclassify action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex hash 32 mask 1 shift 2 fall_through classid 1:1 action reclassify", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "^filter parent ffff: protocol ip pref.*tcindex chain [0-9]+ handle 0x0001 classid 1:1.*action order [0-9]+: gact action reclassify", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "0532", - "name": "Add tcindex filter with pass_on and continue actions", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex hash 32 mask 1 shift 2 pass_on classid 1:1 action continue", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "^filter parent ffff: protocol ip pref.*tcindex chain [0-9]+ handle 0x0001 classid 1:1.*action order [0-9]+: gact action continue", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "d473", - "name": "Add tcindex filter with pipe action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex hash 32 mask 1 shift 2 fall_through classid 1:1 action pipe", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "^filter parent ffff: protocol ip pref.*tcindex chain [0-9]+ handle 0x0001 classid 1:1.*action order [0-9]+: gact action pipe", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "2940", - "name": "Add tcindex filter with miltiple actions", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 7 tcindex hash 32 mask 1 shift 2 fall_through classid 1:1 action skbedit mark 7 pipe action gact drop", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 7 protocol ip tcindex", - "matchPattern": "^filter parent ffff: protocol ip pref 7 tcindex.*handle 0x0001.*action.*skbedit.*mark 7 pipe.*action.*gact action drop", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "1893", - "name": "List tcindex filters", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress", - "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1", - "$TC filter add dev $DEV1 parent ffff: handle 2 protocol ip prio 1 tcindex classid 1:1" - ], - "cmdUnderTest": "$TC filter show dev $DEV1 parent ffff:", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "handle 0x000[0-9]+ classid 1:1", - "matchCount": "2", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "2041", - "name": "Change tcindex filter with pass action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress", - "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1 action drop" - ], - "cmdUnderTest": "$TC filter change dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1 action pass", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "handle 0x0001 classid 1:1.*action order [0-9]+: gact action pass", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "9203", - "name": "Replace tcindex filter with pass action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress", - "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1 action drop" - ], - "cmdUnderTest": "$TC filter replace dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1 action pass", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "handle 0x0001 classid 1:1.*action order [0-9]+: gact action pass", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "7957", - "name": "Delete tcindex filter with drop action", - "category": [ - "filter", - "tcindex" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress", - "$TC filter add dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1 action drop" - ], - "cmdUnderTest": "$TC filter del dev $DEV1 parent ffff: handle 1 protocol ip prio 1 tcindex classid 1:1 action drop", - "expExitCode": "0", - "verifyCmd": "$TC filter get dev $DEV1 parent ffff: handle 1 prio 1 protocol ip tcindex", - "matchPattern": "handle 0x0001 classid 1:1.*action order [0-9]+: gact action drop", - "matchCount": "0", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - } -] -- cgit v1.2.3 From 265b4da82dbf5df04bee5a5d46b7474b1aaf326a Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Tue, 14 Feb 2023 08:49:15 -0500 Subject: net/sched: Retire rsvp classifier The rsvp classifier has served us well for about a quarter of a century but has has not been getting much maintenance attention due to lack of known users. Signed-off-by: Jamal Hadi Salim Acked-by: Jiri Pirko Signed-off-by: Paolo Abeni --- include/net/tc_wrapper.h | 10 - net/sched/Kconfig | 28 - net/sched/Makefile | 2 - net/sched/cls_rsvp.c | 26 - net/sched/cls_rsvp.h | 764 --------------------- net/sched/cls_rsvp6.c | 26 - .../tc-testing/tc-tests/filters/rsvp.json | 203 ------ 7 files changed, 1059 deletions(-) delete mode 100644 net/sched/cls_rsvp.c delete mode 100644 net/sched/cls_rsvp.h delete mode 100644 net/sched/cls_rsvp6.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/filters/rsvp.json (limited to 'net') diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h index 8ba241760d0a..a6d481b5bcbc 100644 --- a/include/net/tc_wrapper.h +++ b/include/net/tc_wrapper.h @@ -152,8 +152,6 @@ TC_INDIRECT_FILTER_DECLARE(flow_classify); TC_INDIRECT_FILTER_DECLARE(fw_classify); TC_INDIRECT_FILTER_DECLARE(mall_classify); TC_INDIRECT_FILTER_DECLARE(route4_classify); -TC_INDIRECT_FILTER_DECLARE(rsvp_classify); -TC_INDIRECT_FILTER_DECLARE(rsvp6_classify); TC_INDIRECT_FILTER_DECLARE(u32_classify); static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp, @@ -198,14 +196,6 @@ static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp, if (tp->classify == route4_classify) return route4_classify(skb, tp, res); #endif -#if IS_BUILTIN(CONFIG_NET_CLS_RSVP) - if (tp->classify == rsvp_classify) - return rsvp_classify(skb, tp, res); -#endif -#if IS_BUILTIN(CONFIG_NET_CLS_RSVP6) - if (tp->classify == rsvp6_classify) - return rsvp6_classify(skb, tp, res); -#endif skip: return tp->classify(skb, tp, res); diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 111883853f08..4b95cb1ac435 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -513,34 +513,6 @@ config CLS_U32_MARK help Say Y here to be able to use netfilter marks as u32 key. -config NET_CLS_RSVP - tristate "IPv4 Resource Reservation Protocol (RSVP)" - select NET_CLS - help - The Resource Reservation Protocol (RSVP) permits end systems to - request a minimum and maximum data flow rate for a connection; this - is important for real time data such as streaming sound or video. - - Say Y here if you want to be able to classify outgoing packets based - on their RSVP requests. - - To compile this code as a module, choose M here: the - module will be called cls_rsvp. - -config NET_CLS_RSVP6 - tristate "IPv6 Resource Reservation Protocol (RSVP6)" - select NET_CLS - help - The Resource Reservation Protocol (RSVP) permits end systems to - request a minimum and maximum data flow rate for a connection; this - is important for real time data such as streaming sound or video. - - Say Y here if you want to be able to classify outgoing packets based - on their RSVP requests and you are using the IPv6 protocol. - - To compile this code as a module, choose M here: the - module will be called cls_rsvp6. - config NET_CLS_FLOW tristate "Flow classifier" select NET_CLS diff --git a/net/sched/Makefile b/net/sched/Makefile index ea236d258c16..b5fd49641d91 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -67,8 +67,6 @@ obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o -obj-$(CONFIG_NET_CLS_RSVP) += cls_rsvp.o -obj-$(CONFIG_NET_CLS_RSVP6) += cls_rsvp6.o obj-$(CONFIG_NET_CLS_BASIC) += cls_basic.o obj-$(CONFIG_NET_CLS_FLOW) += cls_flow.o obj-$(CONFIG_NET_CLS_CGROUP) += cls_cgroup.o diff --git a/net/sched/cls_rsvp.c b/net/sched/cls_rsvp.c deleted file mode 100644 index 03d8619bd9c6..000000000000 --- a/net/sched/cls_rsvp.c +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * net/sched/cls_rsvp.c Special RSVP packet classifier for IPv4. - * - * Authors: Alexey Kuznetsov, - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define RSVP_DST_LEN 1 -#define RSVP_ID "rsvp" -#define RSVP_OPS cls_rsvp_ops -#define RSVP_CLS rsvp_classify - -#include "cls_rsvp.h" -MODULE_LICENSE("GPL"); diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h deleted file mode 100644 index 869efba9f834..000000000000 --- a/net/sched/cls_rsvp.h +++ /dev/null @@ -1,764 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * net/sched/cls_rsvp.h Template file for RSVPv[46] classifiers. - * - * Authors: Alexey Kuznetsov, - */ - -/* - Comparing to general packet classification problem, - RSVP needs only several relatively simple rules: - - * (dst, protocol) are always specified, - so that we are able to hash them. - * src may be exact, or may be wildcard, so that - we can keep a hash table plus one wildcard entry. - * source port (or flow label) is important only if src is given. - - IMPLEMENTATION. - - We use a two level hash table: The top level is keyed by - destination address and protocol ID, every bucket contains a list - of "rsvp sessions", identified by destination address, protocol and - DPI(="Destination Port ID"): triple (key, mask, offset). - - Every bucket has a smaller hash table keyed by source address - (cf. RSVP flowspec) and one wildcard entry for wildcard reservations. - Every bucket is again a list of "RSVP flows", selected by - source address and SPI(="Source Port ID" here rather than - "security parameter index"): triple (key, mask, offset). - - - NOTE 1. All the packets with IPv6 extension headers (but AH and ESP) - and all fragmented packets go to the best-effort traffic class. - - - NOTE 2. Two "port id"'s seems to be redundant, rfc2207 requires - only one "Generalized Port Identifier". So that for classic - ah, esp (and udp,tcp) both *pi should coincide or one of them - should be wildcard. - - At first sight, this redundancy is just a waste of CPU - resources. But DPI and SPI add the possibility to assign different - priorities to GPIs. Look also at note 4 about tunnels below. - - - NOTE 3. One complication is the case of tunneled packets. - We implement it as following: if the first lookup - matches a special session with "tunnelhdr" value not zero, - flowid doesn't contain the true flow ID, but the tunnel ID (1...255). - In this case, we pull tunnelhdr bytes and restart lookup - with tunnel ID added to the list of keys. Simple and stupid 8)8) - It's enough for PIMREG and IPIP. - - - NOTE 4. Two GPIs make it possible to parse even GRE packets. - F.e. DPI can select ETH_P_IP (and necessary flags to make - tunnelhdr correct) in GRE protocol field and SPI matches - GRE key. Is it not nice? 8)8) - - - Well, as result, despite its simplicity, we get a pretty - powerful classification engine. */ - - -struct rsvp_head { - u32 tmap[256/32]; - u32 hgenerator; - u8 tgenerator; - struct rsvp_session __rcu *ht[256]; - struct rcu_head rcu; -}; - -struct rsvp_session { - struct rsvp_session __rcu *next; - __be32 dst[RSVP_DST_LEN]; - struct tc_rsvp_gpi dpi; - u8 protocol; - u8 tunnelid; - /* 16 (src,sport) hash slots, and one wildcard source slot */ - struct rsvp_filter __rcu *ht[16 + 1]; - struct rcu_head rcu; -}; - - -struct rsvp_filter { - struct rsvp_filter __rcu *next; - __be32 src[RSVP_DST_LEN]; - struct tc_rsvp_gpi spi; - u8 tunnelhdr; - - struct tcf_result res; - struct tcf_exts exts; - - u32 handle; - struct rsvp_session *sess; - struct rcu_work rwork; -}; - -static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid) -{ - unsigned int h = (__force __u32)dst[RSVP_DST_LEN - 1]; - - h ^= h>>16; - h ^= h>>8; - return (h ^ protocol ^ tunnelid) & 0xFF; -} - -static inline unsigned int hash_src(__be32 *src) -{ - unsigned int h = (__force __u32)src[RSVP_DST_LEN-1]; - - h ^= h>>16; - h ^= h>>8; - h ^= h>>4; - return h & 0xF; -} - -#define RSVP_APPLY_RESULT() \ -{ \ - int r = tcf_exts_exec(skb, &f->exts, res); \ - if (r < 0) \ - continue; \ - else if (r > 0) \ - return r; \ -} - -TC_INDIRECT_SCOPE int RSVP_CLS(struct sk_buff *skb, const struct tcf_proto *tp, - struct tcf_result *res) -{ - struct rsvp_head *head = rcu_dereference_bh(tp->root); - struct rsvp_session *s; - struct rsvp_filter *f; - unsigned int h1, h2; - __be32 *dst, *src; - u8 protocol; - u8 tunnelid = 0; - u8 *xprt; -#if RSVP_DST_LEN == 4 - struct ipv6hdr *nhptr; - - if (!pskb_network_may_pull(skb, sizeof(*nhptr))) - return -1; - nhptr = ipv6_hdr(skb); -#else - struct iphdr *nhptr; - - if (!pskb_network_may_pull(skb, sizeof(*nhptr))) - return -1; - nhptr = ip_hdr(skb); -#endif -restart: - -#if RSVP_DST_LEN == 4 - src = &nhptr->saddr.s6_addr32[0]; - dst = &nhptr->daddr.s6_addr32[0]; - protocol = nhptr->nexthdr; - xprt = ((u8 *)nhptr) + sizeof(struct ipv6hdr); -#else - src = &nhptr->saddr; - dst = &nhptr->daddr; - protocol = nhptr->protocol; - xprt = ((u8 *)nhptr) + (nhptr->ihl<<2); - if (ip_is_fragment(nhptr)) - return -1; -#endif - - h1 = hash_dst(dst, protocol, tunnelid); - h2 = hash_src(src); - - for (s = rcu_dereference_bh(head->ht[h1]); s; - s = rcu_dereference_bh(s->next)) { - if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] && - protocol == s->protocol && - !(s->dpi.mask & - (*(u32 *)(xprt + s->dpi.offset) ^ s->dpi.key)) && -#if RSVP_DST_LEN == 4 - dst[0] == s->dst[0] && - dst[1] == s->dst[1] && - dst[2] == s->dst[2] && -#endif - tunnelid == s->tunnelid) { - - for (f = rcu_dereference_bh(s->ht[h2]); f; - f = rcu_dereference_bh(f->next)) { - if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] && - !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key)) -#if RSVP_DST_LEN == 4 - && - src[0] == f->src[0] && - src[1] == f->src[1] && - src[2] == f->src[2] -#endif - ) { - *res = f->res; - RSVP_APPLY_RESULT(); - -matched: - if (f->tunnelhdr == 0) - return 0; - - tunnelid = f->res.classid; - nhptr = (void *)(xprt + f->tunnelhdr - sizeof(*nhptr)); - goto restart; - } - } - - /* And wildcard bucket... */ - for (f = rcu_dereference_bh(s->ht[16]); f; - f = rcu_dereference_bh(f->next)) { - *res = f->res; - RSVP_APPLY_RESULT(); - goto matched; - } - return -1; - } - } - return -1; -} - -static void rsvp_replace(struct tcf_proto *tp, struct rsvp_filter *n, u32 h) -{ - struct rsvp_head *head = rtnl_dereference(tp->root); - struct rsvp_session *s; - struct rsvp_filter __rcu **ins; - struct rsvp_filter *pins; - unsigned int h1 = h & 0xFF; - unsigned int h2 = (h >> 8) & 0xFF; - - for (s = rtnl_dereference(head->ht[h1]); s; - s = rtnl_dereference(s->next)) { - for (ins = &s->ht[h2], pins = rtnl_dereference(*ins); ; - ins = &pins->next, pins = rtnl_dereference(*ins)) { - if (pins->handle == h) { - RCU_INIT_POINTER(n->next, pins->next); - rcu_assign_pointer(*ins, n); - return; - } - } - } - - /* Something went wrong if we are trying to replace a non-existent - * node. Mind as well halt instead of silently failing. - */ - BUG_ON(1); -} - -static void *rsvp_get(struct tcf_proto *tp, u32 handle) -{ - struct rsvp_head *head = rtnl_dereference(tp->root); - struct rsvp_session *s; - struct rsvp_filter *f; - unsigned int h1 = handle & 0xFF; - unsigned int h2 = (handle >> 8) & 0xFF; - - if (h2 > 16) - return NULL; - - for (s = rtnl_dereference(head->ht[h1]); s; - s = rtnl_dereference(s->next)) { - for (f = rtnl_dereference(s->ht[h2]); f; - f = rtnl_dereference(f->next)) { - if (f->handle == handle) - return f; - } - } - return NULL; -} - -static int rsvp_init(struct tcf_proto *tp) -{ - struct rsvp_head *data; - - data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL); - if (data) { - rcu_assign_pointer(tp->root, data); - return 0; - } - return -ENOBUFS; -} - -static void __rsvp_delete_filter(struct rsvp_filter *f) -{ - tcf_exts_destroy(&f->exts); - tcf_exts_put_net(&f->exts); - kfree(f); -} - -static void rsvp_delete_filter_work(struct work_struct *work) -{ - struct rsvp_filter *f = container_of(to_rcu_work(work), - struct rsvp_filter, - rwork); - rtnl_lock(); - __rsvp_delete_filter(f); - rtnl_unlock(); -} - -static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f) -{ - tcf_unbind_filter(tp, &f->res); - /* all classifiers are required to call tcf_exts_destroy() after rcu - * grace period, since converted-to-rcu actions are relying on that - * in cleanup() callback - */ - if (tcf_exts_get_net(&f->exts)) - tcf_queue_work(&f->rwork, rsvp_delete_filter_work); - else - __rsvp_delete_filter(f); -} - -static void rsvp_destroy(struct tcf_proto *tp, bool rtnl_held, - struct netlink_ext_ack *extack) -{ - struct rsvp_head *data = rtnl_dereference(tp->root); - int h1, h2; - - if (data == NULL) - return; - - for (h1 = 0; h1 < 256; h1++) { - struct rsvp_session *s; - - while ((s = rtnl_dereference(data->ht[h1])) != NULL) { - RCU_INIT_POINTER(data->ht[h1], s->next); - - for (h2 = 0; h2 <= 16; h2++) { - struct rsvp_filter *f; - - while ((f = rtnl_dereference(s->ht[h2])) != NULL) { - rcu_assign_pointer(s->ht[h2], f->next); - rsvp_delete_filter(tp, f); - } - } - kfree_rcu(s, rcu); - } - } - kfree_rcu(data, rcu); -} - -static int rsvp_delete(struct tcf_proto *tp, void *arg, bool *last, - bool rtnl_held, struct netlink_ext_ack *extack) -{ - struct rsvp_head *head = rtnl_dereference(tp->root); - struct rsvp_filter *nfp, *f = arg; - struct rsvp_filter __rcu **fp; - unsigned int h = f->handle; - struct rsvp_session __rcu **sp; - struct rsvp_session *nsp, *s = f->sess; - int i, h1; - - fp = &s->ht[(h >> 8) & 0xFF]; - for (nfp = rtnl_dereference(*fp); nfp; - fp = &nfp->next, nfp = rtnl_dereference(*fp)) { - if (nfp == f) { - RCU_INIT_POINTER(*fp, f->next); - rsvp_delete_filter(tp, f); - - /* Strip tree */ - - for (i = 0; i <= 16; i++) - if (s->ht[i]) - goto out; - - /* OK, session has no flows */ - sp = &head->ht[h & 0xFF]; - for (nsp = rtnl_dereference(*sp); nsp; - sp = &nsp->next, nsp = rtnl_dereference(*sp)) { - if (nsp == s) { - RCU_INIT_POINTER(*sp, s->next); - kfree_rcu(s, rcu); - goto out; - } - } - - break; - } - } - -out: - *last = true; - for (h1 = 0; h1 < 256; h1++) { - if (rcu_access_pointer(head->ht[h1])) { - *last = false; - break; - } - } - - return 0; -} - -static unsigned int gen_handle(struct tcf_proto *tp, unsigned salt) -{ - struct rsvp_head *data = rtnl_dereference(tp->root); - int i = 0xFFFF; - - while (i-- > 0) { - u32 h; - - if ((data->hgenerator += 0x10000) == 0) - data->hgenerator = 0x10000; - h = data->hgenerator|salt; - if (!rsvp_get(tp, h)) - return h; - } - return 0; -} - -static int tunnel_bts(struct rsvp_head *data) -{ - int n = data->tgenerator >> 5; - u32 b = 1 << (data->tgenerator & 0x1F); - - if (data->tmap[n] & b) - return 0; - data->tmap[n] |= b; - return 1; -} - -static void tunnel_recycle(struct rsvp_head *data) -{ - struct rsvp_session __rcu **sht = data->ht; - u32 tmap[256/32]; - int h1, h2; - - memset(tmap, 0, sizeof(tmap)); - - for (h1 = 0; h1 < 256; h1++) { - struct rsvp_session *s; - for (s = rtnl_dereference(sht[h1]); s; - s = rtnl_dereference(s->next)) { - for (h2 = 0; h2 <= 16; h2++) { - struct rsvp_filter *f; - - for (f = rtnl_dereference(s->ht[h2]); f; - f = rtnl_dereference(f->next)) { - if (f->tunnelhdr == 0) - continue; - data->tgenerator = f->res.classid; - tunnel_bts(data); - } - } - } - } - - memcpy(data->tmap, tmap, sizeof(tmap)); -} - -static u32 gen_tunnel(struct rsvp_head *data) -{ - int i, k; - - for (k = 0; k < 2; k++) { - for (i = 255; i > 0; i--) { - if (++data->tgenerator == 0) - data->tgenerator = 1; - if (tunnel_bts(data)) - return data->tgenerator; - } - tunnel_recycle(data); - } - return 0; -} - -static const struct nla_policy rsvp_policy[TCA_RSVP_MAX + 1] = { - [TCA_RSVP_CLASSID] = { .type = NLA_U32 }, - [TCA_RSVP_DST] = { .len = RSVP_DST_LEN * sizeof(u32) }, - [TCA_RSVP_SRC] = { .len = RSVP_DST_LEN * sizeof(u32) }, - [TCA_RSVP_PINFO] = { .len = sizeof(struct tc_rsvp_pinfo) }, -}; - -static int rsvp_change(struct net *net, struct sk_buff *in_skb, - struct tcf_proto *tp, unsigned long base, - u32 handle, struct nlattr **tca, - void **arg, u32 flags, - struct netlink_ext_ack *extack) -{ - struct rsvp_head *data = rtnl_dereference(tp->root); - struct rsvp_filter *f, *nfp; - struct rsvp_filter __rcu **fp; - struct rsvp_session *nsp, *s; - struct rsvp_session __rcu **sp; - struct tc_rsvp_pinfo *pinfo = NULL; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_RSVP_MAX + 1]; - struct tcf_exts e; - unsigned int h1, h2; - __be32 *dst; - int err; - - if (opt == NULL) - return handle ? -EINVAL : 0; - - err = nla_parse_nested_deprecated(tb, TCA_RSVP_MAX, opt, rsvp_policy, - NULL); - if (err < 0) - return err; - - err = tcf_exts_init(&e, net, TCA_RSVP_ACT, TCA_RSVP_POLICE); - if (err < 0) - return err; - err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &e, flags, - extack); - if (err < 0) - goto errout2; - - f = *arg; - if (f) { - /* Node exists: adjust only classid */ - struct rsvp_filter *n; - - if (f->handle != handle && handle) - goto errout2; - - n = kmemdup(f, sizeof(*f), GFP_KERNEL); - if (!n) { - err = -ENOMEM; - goto errout2; - } - - err = tcf_exts_init(&n->exts, net, TCA_RSVP_ACT, - TCA_RSVP_POLICE); - if (err < 0) { - kfree(n); - goto errout2; - } - - if (tb[TCA_RSVP_CLASSID]) { - n->res.classid = nla_get_u32(tb[TCA_RSVP_CLASSID]); - tcf_bind_filter(tp, &n->res, base); - } - - tcf_exts_change(&n->exts, &e); - rsvp_replace(tp, n, handle); - return 0; - } - - /* Now more serious part... */ - err = -EINVAL; - if (handle) - goto errout2; - if (tb[TCA_RSVP_DST] == NULL) - goto errout2; - - err = -ENOBUFS; - f = kzalloc(sizeof(struct rsvp_filter), GFP_KERNEL); - if (f == NULL) - goto errout2; - - err = tcf_exts_init(&f->exts, net, TCA_RSVP_ACT, TCA_RSVP_POLICE); - if (err < 0) - goto errout; - h2 = 16; - if (tb[TCA_RSVP_SRC]) { - memcpy(f->src, nla_data(tb[TCA_RSVP_SRC]), sizeof(f->src)); - h2 = hash_src(f->src); - } - if (tb[TCA_RSVP_PINFO]) { - pinfo = nla_data(tb[TCA_RSVP_PINFO]); - f->spi = pinfo->spi; - f->tunnelhdr = pinfo->tunnelhdr; - } - if (tb[TCA_RSVP_CLASSID]) - f->res.classid = nla_get_u32(tb[TCA_RSVP_CLASSID]); - - dst = nla_data(tb[TCA_RSVP_DST]); - h1 = hash_dst(dst, pinfo ? pinfo->protocol : 0, pinfo ? pinfo->tunnelid : 0); - - err = -ENOMEM; - if ((f->handle = gen_handle(tp, h1 | (h2<<8))) == 0) - goto errout; - - if (f->tunnelhdr) { - err = -EINVAL; - if (f->res.classid > 255) - goto errout; - - err = -ENOMEM; - if (f->res.classid == 0 && - (f->res.classid = gen_tunnel(data)) == 0) - goto errout; - } - - for (sp = &data->ht[h1]; - (s = rtnl_dereference(*sp)) != NULL; - sp = &s->next) { - if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN-1] && - pinfo && pinfo->protocol == s->protocol && - memcmp(&pinfo->dpi, &s->dpi, sizeof(s->dpi)) == 0 && -#if RSVP_DST_LEN == 4 - dst[0] == s->dst[0] && - dst[1] == s->dst[1] && - dst[2] == s->dst[2] && -#endif - pinfo->tunnelid == s->tunnelid) { - -insert: - /* OK, we found appropriate session */ - - fp = &s->ht[h2]; - - f->sess = s; - if (f->tunnelhdr == 0) - tcf_bind_filter(tp, &f->res, base); - - tcf_exts_change(&f->exts, &e); - - fp = &s->ht[h2]; - for (nfp = rtnl_dereference(*fp); nfp; - fp = &nfp->next, nfp = rtnl_dereference(*fp)) { - __u32 mask = nfp->spi.mask & f->spi.mask; - - if (mask != f->spi.mask) - break; - } - RCU_INIT_POINTER(f->next, nfp); - rcu_assign_pointer(*fp, f); - - *arg = f; - return 0; - } - } - - /* No session found. Create new one. */ - - err = -ENOBUFS; - s = kzalloc(sizeof(struct rsvp_session), GFP_KERNEL); - if (s == NULL) - goto errout; - memcpy(s->dst, dst, sizeof(s->dst)); - - if (pinfo) { - s->dpi = pinfo->dpi; - s->protocol = pinfo->protocol; - s->tunnelid = pinfo->tunnelid; - } - sp = &data->ht[h1]; - for (nsp = rtnl_dereference(*sp); nsp; - sp = &nsp->next, nsp = rtnl_dereference(*sp)) { - if ((nsp->dpi.mask & s->dpi.mask) != s->dpi.mask) - break; - } - RCU_INIT_POINTER(s->next, nsp); - rcu_assign_pointer(*sp, s); - - goto insert; - -errout: - tcf_exts_destroy(&f->exts); - kfree(f); -errout2: - tcf_exts_destroy(&e); - return err; -} - -static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg, - bool rtnl_held) -{ - struct rsvp_head *head = rtnl_dereference(tp->root); - unsigned int h, h1; - - if (arg->stop) - return; - - for (h = 0; h < 256; h++) { - struct rsvp_session *s; - - for (s = rtnl_dereference(head->ht[h]); s; - s = rtnl_dereference(s->next)) { - for (h1 = 0; h1 <= 16; h1++) { - struct rsvp_filter *f; - - for (f = rtnl_dereference(s->ht[h1]); f; - f = rtnl_dereference(f->next)) { - if (!tc_cls_stats_dump(tp, arg, f)) - return; - } - } - } - } -} - -static int rsvp_dump(struct net *net, struct tcf_proto *tp, void *fh, - struct sk_buff *skb, struct tcmsg *t, bool rtnl_held) -{ - struct rsvp_filter *f = fh; - struct rsvp_session *s; - struct nlattr *nest; - struct tc_rsvp_pinfo pinfo; - - if (f == NULL) - return skb->len; - s = f->sess; - - t->tcm_handle = f->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - - if (nla_put(skb, TCA_RSVP_DST, sizeof(s->dst), &s->dst)) - goto nla_put_failure; - pinfo.dpi = s->dpi; - pinfo.spi = f->spi; - pinfo.protocol = s->protocol; - pinfo.tunnelid = s->tunnelid; - pinfo.tunnelhdr = f->tunnelhdr; - pinfo.pad = 0; - if (nla_put(skb, TCA_RSVP_PINFO, sizeof(pinfo), &pinfo)) - goto nla_put_failure; - if (f->res.classid && - nla_put_u32(skb, TCA_RSVP_CLASSID, f->res.classid)) - goto nla_put_failure; - if (((f->handle >> 8) & 0xFF) != 16 && - nla_put(skb, TCA_RSVP_SRC, sizeof(f->src), f->src)) - goto nla_put_failure; - - if (tcf_exts_dump(skb, &f->exts) < 0) - goto nla_put_failure; - - nla_nest_end(skb, nest); - - if (tcf_exts_dump_stats(skb, &f->exts) < 0) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static void rsvp_bind_class(void *fh, u32 classid, unsigned long cl, void *q, - unsigned long base) -{ - struct rsvp_filter *f = fh; - - tc_cls_bind_class(classid, cl, q, &f->res, base); -} - -static struct tcf_proto_ops RSVP_OPS __read_mostly = { - .kind = RSVP_ID, - .classify = RSVP_CLS, - .init = rsvp_init, - .destroy = rsvp_destroy, - .get = rsvp_get, - .change = rsvp_change, - .delete = rsvp_delete, - .walk = rsvp_walk, - .dump = rsvp_dump, - .bind_class = rsvp_bind_class, - .owner = THIS_MODULE, -}; - -static int __init init_rsvp(void) -{ - return register_tcf_proto_ops(&RSVP_OPS); -} - -static void __exit exit_rsvp(void) -{ - unregister_tcf_proto_ops(&RSVP_OPS); -} - -module_init(init_rsvp) -module_exit(exit_rsvp) diff --git a/net/sched/cls_rsvp6.c b/net/sched/cls_rsvp6.c deleted file mode 100644 index e627cc32d633..000000000000 --- a/net/sched/cls_rsvp6.c +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * net/sched/cls_rsvp6.c Special RSVP packet classifier for IPv6. - * - * Authors: Alexey Kuznetsov, - */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define RSVP_DST_LEN 4 -#define RSVP_ID "rsvp6" -#define RSVP_OPS cls_rsvp6_ops -#define RSVP_CLS rsvp6_classify - -#include "cls_rsvp.h" -MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/rsvp.json b/tools/testing/selftests/tc-testing/tc-tests/filters/rsvp.json deleted file mode 100644 index bdcbaa4c5663..000000000000 --- a/tools/testing/selftests/tc-testing/tc-tests/filters/rsvp.json +++ /dev/null @@ -1,203 +0,0 @@ -[ - { - "id": "2141", - "name": "Add rsvp filter with tcp proto and specific IP address", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto tcp session 198.168.10.64", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*session 198.168.10.64 ipproto tcp", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "5267", - "name": "Add rsvp filter with udp proto and specific IP address", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*session 1.1.1.1 ipproto udp", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "2819", - "name": "Add rsvp filter with src ip and src port", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1 sender 2.2.2.2/5021 classid 1:1", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*flowid 1:1 session 1.1.1.1 ipproto udp sender 2.2.2.2/5021", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "c967", - "name": "Add rsvp filter with tunnelid and continue action", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1 tunnelid 2 classid 1:1 action continue", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*flowid 1:1 session 1.1.1.1 ipproto udp tunnelid 2.*action order [0-9]+: gact action continue", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "5463", - "name": "Add rsvp filter with tunnel and pipe action", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1 tunnel 2 skip 1 action pipe", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*tunnel 2 skip 1 session 1.1.1.1 ipproto udp.*action order [0-9]+: gact action pipe", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "2332", - "name": "Add rsvp filter with miltiple actions", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 7 rsvp ipproto udp session 1.1.1.1 classid 1:1 action skbedit mark 7 pipe action gact drop", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*flowid 1:1 session 1.1.1.1 ipproto udp.*action order [0-9]+: skbedit mark 7 pipe.*action order [0-9]+: gact action drop", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "8879", - "name": "Add rsvp filter with tunnel and skp flag", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress" - ], - "cmdUnderTest": "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1 tunnel 2 skip 1 action pipe", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*tunnel 2 skip 1 session 1.1.1.1 ipproto udp.*action order [0-9]+: gact action pipe", - "matchCount": "1", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "8261", - "name": "List rsvp filters", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress", - "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1/1234 classid 1:1", - "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto tcp session 2.2.2.2/1234 classid 2:1" - ], - "cmdUnderTest": "$TC filter show dev $DEV1 parent ffff:", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "^filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh", - "matchCount": "2", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - }, - { - "id": "8989", - "name": "Delete rsvp filter", - "category": [ - "filter", - "rsvp" - ], - "plugins": { - "requires": "nsPlugin" - }, - "setup": [ - "$TC qdisc add dev $DEV1 ingress", - "$TC filter add dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1/1234 tunnelid 9 classid 2:1" - ], - "cmdUnderTest": "$TC filter del dev $DEV1 parent ffff: protocol ip prio 1 rsvp ipproto udp session 1.1.1.1/1234 tunnelid 9 classid 2:1", - "expExitCode": "0", - "verifyCmd": "$TC filter show dev $DEV1 parent ffff:", - "matchPattern": "filter protocol ip pref [0-9]+ rsvp chain [0-9]+ fh 0x.*flowid 2:1 session 1.1.1.1/1234 ipproto udp tunnelid 9", - "matchCount": "0", - "teardown": [ - "$TC qdisc del dev $DEV1 ingress" - ] - } -] -- cgit v1.2.3 From 802dcbd6f30feaa7c96a1fb4ecb1db57082df9d7 Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Tue, 14 Feb 2023 13:01:16 -0800 Subject: net/core: print message for allmulticast When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are no log messages printed to the kernel log to indicate anything happened. This is inexplicably different from most other dev->flags changes, and could suprise the user. Typically this occurs from user-space when a user: ip link set eth0 allmulticast However, other devices like bridge set allmulticast as well, and many other flows might trigger entry into allmulticast as well. The new message uses the standard netdev_info print and looks like: [ 413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode [ 415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode Signed-off-by: Jesse Brandeburg Signed-off-by: Paolo Abeni --- net/core/dev.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/core/dev.c b/net/core/dev.c index 7307a0c15c9f..ad1e6482e1c1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8391,6 +8391,8 @@ static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify) } } if (dev->flags ^ old_flags) { + netdev_info(dev, "%s allmulticast mode\n", + dev->flags & IFF_ALLMULTI ? "entered" : "left"); dev_change_rx_flags(dev, IFF_ALLMULTI); dev_set_rx_mode(dev); if (notify) -- cgit v1.2.3 From 3ba0bf47edf955d6f52fdb16b54acd1932cb9445 Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Tue, 14 Feb 2023 13:01:17 -0800 Subject: net/core: refactor promiscuous mode message The kernel stack can be more consistent by printing the IFF_PROMISC aka promiscuous enable/disable messages with the standard netdev_info message which can include bus and driver info as well as the device. typical command usage from user space looks like: ip link set eth0 promisc But lots of utilities such as bridge, tcpdump, etc put the interface into promiscuous mode. old message: [ 406.034418] device eth0 entered promiscuous mode [ 408.424703] device eth0 left promiscuous mode new message: [ 406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode [ 408.424715] ice 0000:17:00.0 eth0: left promiscuous mode Signed-off-by: Jesse Brandeburg Signed-off-by: Paolo Abeni --- net/core/dev.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/core/dev.c b/net/core/dev.c index ad1e6482e1c1..357081b0113c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8321,9 +8321,8 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify) } } if (dev->flags != old_flags) { - pr_info("device %s %s promiscuous mode\n", - dev->name, - dev->flags & IFF_PROMISC ? "entered" : "left"); + netdev_info(dev, "%s promiscuous mode\n", + dev->flags & IFF_PROMISC ? "entered" : "left"); if (audit_enabled) { current_uid_gid(&uid, &gid); audit_log(audit_context(), GFP_ATOMIC, -- cgit v1.2.3 From 7d12057b45fbc1e1315d327dca13e8d6b5019113 Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Tue, 14 Feb 2023 18:15:31 -0300 Subject: net/sched: act_nat: transition to percpu stats and rcu The tc action act_nat was using shared stats and taking the per action lock in the datapath. Improve it by using percpu stats and rcu. perf before: - 10.48% tcf_nat_act - 81.83% _raw_spin_lock 81.08% native_queued_spin_lock_slowpath perf after: - 0.48% tcf_nat_act tdc results: 1..27 ok 1 7565 - Add nat action on ingress with default control action ok 2 fd79 - Add nat action on ingress with pipe control action ok 3 eab9 - Add nat action on ingress with continue control action ok 4 c53a - Add nat action on ingress with reclassify control action ok 5 76c9 - Add nat action on ingress with jump control action ok 6 24c6 - Add nat action on ingress with drop control action ok 7 2120 - Add nat action on ingress with maximum index value ok 8 3e9d - Add nat action on ingress with invalid index value ok 9 f6c9 - Add nat action on ingress with invalid IP address ok 10 be25 - Add nat action on ingress with invalid argument ok 11 a7bd - Add nat action on ingress with DEFAULT IP address ok 12 ee1e - Add nat action on ingress with ANY IP address ok 13 1de8 - Add nat action on ingress with ALL IP address ok 14 8dba - Add nat action on egress with default control action ok 15 19a7 - Add nat action on egress with pipe control action ok 16 f1d9 - Add nat action on egress with continue control action ok 17 6d4a - Add nat action on egress with reclassify control action ok 18 b313 - Add nat action on egress with jump control action ok 19 d9fc - Add nat action on egress with drop control action ok 20 a895 - Add nat action on egress with DEFAULT IP address ok 21 2572 - Add nat action on egress with ANY IP address ok 22 37f3 - Add nat action on egress with ALL IP address ok 23 6054 - Add nat action on egress with cookie ok 24 79d6 - Add nat action on ingress with cookie ok 25 4b12 - Replace nat action with invalid goto chain control ok 26 b811 - Delete nat action with valid index ok 27 a521 - Delete nat action with invalid index Reviewed-by: Jamal Hadi Salim Signed-off-by: Pedro Tammela Signed-off-by: Paolo Abeni --- include/net/tc_act/tc_nat.h | 10 +++++-- net/sched/act_nat.c | 72 ++++++++++++++++++++++++++++++--------------- 2 files changed, 56 insertions(+), 26 deletions(-) (limited to 'net') diff --git a/include/net/tc_act/tc_nat.h b/include/net/tc_act/tc_nat.h index c14407160812..c869274ac529 100644 --- a/include/net/tc_act/tc_nat.h +++ b/include/net/tc_act/tc_nat.h @@ -5,13 +5,17 @@ #include #include -struct tcf_nat { - struct tc_action common; - +struct tcf_nat_parms { __be32 old_addr; __be32 new_addr; __be32 mask; u32 flags; + struct rcu_head rcu; +}; + +struct tcf_nat { + struct tc_action common; + struct tcf_nat_parms __rcu *parms; }; #define to_tcf_nat(a) ((struct tcf_nat *)a) diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c index 74c74be33048..4184af5abbf3 100644 --- a/net/sched/act_nat.c +++ b/net/sched/act_nat.c @@ -38,6 +38,7 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, { struct tc_action_net *tn = net_generic(net, act_nat_ops.net_id); bool bind = flags & TCA_ACT_FLAGS_BIND; + struct tcf_nat_parms *nparm, *oparm; struct nlattr *tb[TCA_NAT_MAX + 1]; struct tcf_chain *goto_ch = NULL; struct tc_nat *parm; @@ -59,8 +60,8 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, index = parm->index; err = tcf_idr_check_alloc(tn, &index, a, bind); if (!err) { - ret = tcf_idr_create(tn, index, est, a, - &act_nat_ops, bind, false, flags); + ret = tcf_idr_create_from_flags(tn, index, est, a, &act_nat_ops, + bind, flags); if (ret) { tcf_idr_cleanup(tn, index); return ret; @@ -79,19 +80,31 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); if (err < 0) goto release_idr; + + nparm = kzalloc(sizeof(*nparm), GFP_KERNEL); + if (!nparm) { + err = -ENOMEM; + goto release_idr; + } + + nparm->old_addr = parm->old_addr; + nparm->new_addr = parm->new_addr; + nparm->mask = parm->mask; + nparm->flags = parm->flags; + p = to_tcf_nat(*a); spin_lock_bh(&p->tcf_lock); - p->old_addr = parm->old_addr; - p->new_addr = parm->new_addr; - p->mask = parm->mask; - p->flags = parm->flags; - goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); + oparm = rcu_replace_pointer(p->parms, nparm, lockdep_is_held(&p->tcf_lock)); spin_unlock_bh(&p->tcf_lock); + if (goto_ch) tcf_chain_put_by_act(goto_ch); + if (oparm) + kfree_rcu(oparm, rcu); + return ret; release_idr: tcf_idr_release(*a, bind); @@ -103,6 +116,7 @@ TC_INDIRECT_SCOPE int tcf_nat_act(struct sk_buff *skb, struct tcf_result *res) { struct tcf_nat *p = to_tcf_nat(a); + struct tcf_nat_parms *parms; struct iphdr *iph; __be32 old_addr; __be32 new_addr; @@ -113,18 +127,16 @@ TC_INDIRECT_SCOPE int tcf_nat_act(struct sk_buff *skb, int ihl; int noff; - spin_lock(&p->tcf_lock); - tcf_lastuse_update(&p->tcf_tm); - old_addr = p->old_addr; - new_addr = p->new_addr; - mask = p->mask; - egress = p->flags & TCA_NAT_FLAG_EGRESS; - action = p->tcf_action; + tcf_action_update_bstats(&p->common, skb); - bstats_update(&p->tcf_bstats, skb); + action = READ_ONCE(p->tcf_action); - spin_unlock(&p->tcf_lock); + parms = rcu_dereference_bh(p->parms); + old_addr = parms->old_addr; + new_addr = parms->new_addr; + mask = parms->mask; + egress = parms->flags & TCA_NAT_FLAG_EGRESS; if (unlikely(action == TC_ACT_SHOT)) goto drop; @@ -248,9 +260,7 @@ out: return action; drop: - spin_lock(&p->tcf_lock); - p->tcf_qstats.drops++; - spin_unlock(&p->tcf_lock); + tcf_action_inc_drop_qstats(&p->common); return TC_ACT_SHOT; } @@ -264,15 +274,20 @@ static int tcf_nat_dump(struct sk_buff *skb, struct tc_action *a, .refcnt = refcount_read(&p->tcf_refcnt) - ref, .bindcnt = atomic_read(&p->tcf_bindcnt) - bind, }; + struct tcf_nat_parms *parms; struct tcf_t t; spin_lock_bh(&p->tcf_lock); - opt.old_addr = p->old_addr; - opt.new_addr = p->new_addr; - opt.mask = p->mask; - opt.flags = p->flags; + opt.action = p->tcf_action; + parms = rcu_dereference_protected(p->parms, lockdep_is_held(&p->tcf_lock)); + + opt.old_addr = parms->old_addr; + opt.new_addr = parms->new_addr; + opt.mask = parms->mask; + opt.flags = parms->flags; + if (nla_put(skb, TCA_NAT_PARMS, sizeof(opt), &opt)) goto nla_put_failure; @@ -289,6 +304,16 @@ nla_put_failure: return -1; } +static void tcf_nat_cleanup(struct tc_action *a) +{ + struct tcf_nat *p = to_tcf_nat(a); + struct tcf_nat_parms *parms; + + parms = rcu_dereference_protected(p->parms, 1); + if (parms) + kfree_rcu(parms, rcu); +} + static struct tc_action_ops act_nat_ops = { .kind = "nat", .id = TCA_ID_NAT, @@ -296,6 +321,7 @@ static struct tc_action_ops act_nat_ops = { .act = tcf_nat_act, .dump = tcf_nat_dump, .init = tcf_nat_init, + .cleanup = tcf_nat_cleanup, .size = sizeof(struct tcf_nat), }; -- cgit v1.2.3 From 288864effe33885988d53faf7830b35cb9a84c7a Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Tue, 14 Feb 2023 18:15:32 -0300 Subject: net/sched: act_connmark: transition to percpu stats and rcu The tc action act_connmark was using shared stats and taking the per action lock in the datapath. Improve it by using percpu stats and rcu. perf before: - 13.55% tcf_connmark_act - 81.18% _raw_spin_lock 80.46% native_queued_spin_lock_slowpath perf after: - 2.85% tcf_connmark_act tdc results: 1..15 ok 1 2002 - Add valid connmark action with defaults ok 2 56a5 - Add valid connmark action with control pass ok 3 7c66 - Add valid connmark action with control drop ok 4 a913 - Add valid connmark action with control pipe ok 5 bdd8 - Add valid connmark action with control reclassify ok 6 b8be - Add valid connmark action with control continue ok 7 d8a6 - Add valid connmark action with control jump ok 8 aae8 - Add valid connmark action with zone argument ok 9 2f0b - Add valid connmark action with invalid zone argument ok 10 9305 - Add connmark action with unsupported argument ok 11 71ca - Add valid connmark action and replace it ok 12 5f8f - Add valid connmark action with cookie ok 13 c506 - Replace connmark with invalid goto chain control ok 14 6571 - Delete connmark action with valid index ok 15 3426 - Delete connmark action with invalid index Reviewed-by: Jamal Hadi Salim Signed-off-by: Pedro Tammela Signed-off-by: Paolo Abeni --- include/net/tc_act/tc_connmark.h | 9 +++- net/sched/act_connmark.c | 107 +++++++++++++++++++++++++-------------- 2 files changed, 75 insertions(+), 41 deletions(-) (limited to 'net') diff --git a/include/net/tc_act/tc_connmark.h b/include/net/tc_act/tc_connmark.h index 1f4cb477bb5d..e8dd77a96748 100644 --- a/include/net/tc_act/tc_connmark.h +++ b/include/net/tc_act/tc_connmark.h @@ -4,10 +4,15 @@ #include -struct tcf_connmark_info { - struct tc_action common; +struct tcf_connmark_parms { struct net *net; u16 zone; + struct rcu_head rcu; +}; + +struct tcf_connmark_info { + struct tc_action common; + struct tcf_connmark_parms __rcu *parms; }; #define to_connmark(a) ((struct tcf_connmark_info *)a) diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c index 7e63ff7e3ed7..8dabfb52ea3d 100644 --- a/net/sched/act_connmark.c +++ b/net/sched/act_connmark.c @@ -36,13 +36,15 @@ TC_INDIRECT_SCOPE int tcf_connmark_act(struct sk_buff *skb, struct nf_conntrack_tuple tuple; enum ip_conntrack_info ctinfo; struct tcf_connmark_info *ca = to_connmark(a); + struct tcf_connmark_parms *parms; struct nf_conntrack_zone zone; struct nf_conn *c; int proto; - spin_lock(&ca->tcf_lock); tcf_lastuse_update(&ca->tcf_tm); - bstats_update(&ca->tcf_bstats, skb); + tcf_action_update_bstats(&ca->common, skb); + + parms = rcu_dereference_bh(ca->parms); switch (skb_protocol(skb, true)) { case htons(ETH_P_IP): @@ -64,31 +66,29 @@ TC_INDIRECT_SCOPE int tcf_connmark_act(struct sk_buff *skb, c = nf_ct_get(skb, &ctinfo); if (c) { skb->mark = READ_ONCE(c->mark); - /* using overlimits stats to count how many packets marked */ - ca->tcf_qstats.overlimits++; - goto out; + goto count; } - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), - proto, ca->net, &tuple)) + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, parms->net, + &tuple)) goto out; - zone.id = ca->zone; + zone.id = parms->zone; zone.dir = NF_CT_DEFAULT_ZONE_DIR; - thash = nf_conntrack_find_get(ca->net, &zone, &tuple); + thash = nf_conntrack_find_get(parms->net, &zone, &tuple); if (!thash) goto out; c = nf_ct_tuplehash_to_ctrack(thash); - /* using overlimits stats to count how many packets marked */ - ca->tcf_qstats.overlimits++; skb->mark = READ_ONCE(c->mark); nf_ct_put(c); +count: + /* using overlimits stats to count how many packets marked */ + tcf_action_inc_overlimit_qstats(&ca->common); out: - spin_unlock(&ca->tcf_lock); - return ca->tcf_action; + return READ_ONCE(ca->tcf_action); } static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = { @@ -101,6 +101,7 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla, struct netlink_ext_ack *extack) { struct tc_action_net *tn = net_generic(net, act_connmark_ops.net_id); + struct tcf_connmark_parms *nparms, *oparms; struct nlattr *tb[TCA_CONNMARK_MAX + 1]; bool bind = flags & TCA_ACT_FLAGS_BIND; struct tcf_chain *goto_ch = NULL; @@ -120,52 +121,66 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla, if (!tb[TCA_CONNMARK_PARMS]) return -EINVAL; + nparms = kzalloc(sizeof(*nparms), GFP_KERNEL); + if (!nparms) + return -ENOMEM; + parm = nla_data(tb[TCA_CONNMARK_PARMS]); index = parm->index; ret = tcf_idr_check_alloc(tn, &index, a, bind); if (!ret) { - ret = tcf_idr_create(tn, index, est, a, - &act_connmark_ops, bind, false, flags); + ret = tcf_idr_create_from_flags(tn, index, est, a, + &act_connmark_ops, bind, flags); if (ret) { tcf_idr_cleanup(tn, index); - return ret; + err = ret; + goto out_free; } ci = to_connmark(*a); - err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, - extack); - if (err < 0) - goto release_idr; - tcf_action_set_ctrlact(*a, parm->action, goto_ch); - ci->net = net; - ci->zone = parm->zone; + + nparms->net = net; + nparms->zone = parm->zone; ret = ACT_P_CREATED; } else if (ret > 0) { ci = to_connmark(*a); - if (bind) - return 0; - if (!(flags & TCA_ACT_FLAGS_REPLACE)) { - tcf_idr_release(*a, bind); - return -EEXIST; + if (bind) { + err = 0; + goto out_free; } - err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, - extack); - if (err < 0) + if (!(flags & TCA_ACT_FLAGS_REPLACE)) { + err = -EEXIST; goto release_idr; - /* replacing action and zone */ - spin_lock_bh(&ci->tcf_lock); - goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); - ci->zone = parm->zone; - spin_unlock_bh(&ci->tcf_lock); - if (goto_ch) - tcf_chain_put_by_act(goto_ch); + } + + nparms->net = rtnl_dereference(ci->parms)->net; + nparms->zone = parm->zone; + ret = 0; } + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack); + if (err < 0) + goto release_idr; + + spin_lock_bh(&ci->tcf_lock); + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch); + oparms = rcu_replace_pointer(ci->parms, nparms, lockdep_is_held(&ci->tcf_lock)); + spin_unlock_bh(&ci->tcf_lock); + + if (goto_ch) + tcf_chain_put_by_act(goto_ch); + + if (oparms) + kfree_rcu(oparms, rcu); + return ret; + release_idr: tcf_idr_release(*a, bind); +out_free: + kfree(nparms); return err; } @@ -179,11 +194,14 @@ static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a, .refcnt = refcount_read(&ci->tcf_refcnt) - ref, .bindcnt = atomic_read(&ci->tcf_bindcnt) - bind, }; + struct tcf_connmark_parms *parms; struct tcf_t t; spin_lock_bh(&ci->tcf_lock); + parms = rcu_dereference_protected(ci->parms, lockdep_is_held(&ci->tcf_lock)); + opt.action = ci->tcf_action; - opt.zone = ci->zone; + opt.zone = parms->zone; if (nla_put(skb, TCA_CONNMARK_PARMS, sizeof(opt), &opt)) goto nla_put_failure; @@ -201,6 +219,16 @@ nla_put_failure: return -1; } +static void tcf_connmark_cleanup(struct tc_action *a) +{ + struct tcf_connmark_info *ci = to_connmark(a); + struct tcf_connmark_parms *parms; + + parms = rcu_dereference_protected(ci->parms, 1); + if (parms) + kfree_rcu(parms, rcu); +} + static struct tc_action_ops act_connmark_ops = { .kind = "connmark", .id = TCA_ID_CONNMARK, @@ -208,6 +236,7 @@ static struct tc_action_ops act_connmark_ops = { .act = tcf_connmark_act, .dump = tcf_connmark_dump, .init = tcf_connmark_init, + .cleanup = tcf_connmark_cleanup, .size = sizeof(struct tcf_connmark_info), }; -- cgit v1.2.3 From 7afd073e5521bfc1045096802bc4dc640b63ed54 Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Tue, 14 Feb 2023 18:15:33 -0300 Subject: net/sched: act_gate: use percpu stats The tc action act_gate was using shared stats, move it to percpu stats. tdc results: 1..12 ok 1 5153 - Add gate action with priority and sched-entry ok 2 7189 - Add gate action with base-time ok 3 a721 - Add gate action with cycle-time ok 4 c029 - Add gate action with cycle-time-ext ok 5 3719 - Replace gate base-time action ok 6 d821 - Delete gate action with valid index ok 7 3128 - Delete gate action with invalid index ok 8 7837 - List gate actions ok 9 9273 - Flush gate actions ok 10 c829 - Add gate action with duplicate index ok 11 3043 - Add gate action with invalid index ok 12 2930 - Add gate action with cookie Reviewed-by: Jamal Hadi Salim Signed-off-by: Pedro Tammela Signed-off-by: Paolo Abeni --- net/sched/act_gate.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) (limited to 'net') diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c index 9b8def0be41e..c9a811f4c7ee 100644 --- a/net/sched/act_gate.c +++ b/net/sched/act_gate.c @@ -119,35 +119,37 @@ TC_INDIRECT_SCOPE int tcf_gate_act(struct sk_buff *skb, struct tcf_result *res) { struct tcf_gate *gact = to_gate(a); - - spin_lock(&gact->tcf_lock); + int action = READ_ONCE(gact->tcf_action); tcf_lastuse_update(&gact->tcf_tm); - bstats_update(&gact->tcf_bstats, skb); + tcf_action_update_bstats(&gact->common, skb); + spin_lock(&gact->tcf_lock); if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) { spin_unlock(&gact->tcf_lock); - return gact->tcf_action; + return action; } - if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN)) + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN)) { + spin_unlock(&gact->tcf_lock); goto drop; + } if (gact->current_max_octets >= 0) { gact->current_entry_octets += qdisc_pkt_len(skb); if (gact->current_entry_octets > gact->current_max_octets) { - gact->tcf_qstats.overlimits++; - goto drop; + spin_unlock(&gact->tcf_lock); + goto overlimit; } } - spin_unlock(&gact->tcf_lock); - return gact->tcf_action; -drop: - gact->tcf_qstats.drops++; - spin_unlock(&gact->tcf_lock); + return action; +overlimit: + tcf_action_inc_overlimit_qstats(&gact->common); +drop: + tcf_action_inc_drop_qstats(&gact->common); return TC_ACT_SHOT; } @@ -357,8 +359,8 @@ static int tcf_gate_init(struct net *net, struct nlattr *nla, return 0; if (!err) { - ret = tcf_idr_create(tn, index, est, a, - &act_gate_ops, bind, false, flags); + ret = tcf_idr_create_from_flags(tn, index, est, a, + &act_gate_ops, bind, flags); if (ret) { tcf_idr_cleanup(tn, index); return ret; -- cgit v1.2.3 From 2d2e75d2d4a2245175d77899764b56e19c5769b4 Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Tue, 14 Feb 2023 18:15:34 -0300 Subject: net/sched: act_pedit: use percpu overlimit counter when available Since act_pedit now has access to percpu counters, use the tcf_action_inc_overlimit_qstats wrapper that will use the percpu counter whenever they are available. Reviewed-by: Jamal Hadi Salim Signed-off-by: Pedro Tammela Signed-off-by: Paolo Abeni --- net/sched/act_pedit.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'net') diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 35ebe5d5c261..77d288d384ae 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -443,9 +443,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, goto done; bad: - spin_lock(&p->tcf_lock); - p->tcf_qstats.overlimits++; - spin_unlock(&p->tcf_lock); + tcf_action_inc_overlimit_qstats(&p->common); done: return p->tcf_action; } -- cgit v1.2.3 From 525c65ff5696f7ccd8335edde1d9964420707910 Mon Sep 17 00:00:00 2001 From: Andrea Mayer Date: Wed, 15 Feb 2023 14:46:57 +0100 Subject: seg6: factor out End lookup nexthop processing to a dedicated function The End nexthop lookup/input operations are moved into a new helper function named input_action_end_finish(). This avoids duplicating the code needed to compute the nexthop in the different flavors of the End behavior. Signed-off-by: Andrea Mayer Reviewed-by: David Ahern Signed-off-by: Paolo Abeni --- net/ipv6/seg6_local.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 487f8e98deaa..765e89a24bc2 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -364,6 +364,14 @@ static void seg6_next_csid_advance_arg(struct in6_addr *addr, memset(&addr->s6_addr[16 - fnc_octects], 0x00, fnc_octects); } +static int input_action_end_finish(struct sk_buff *skb, + struct seg6_local_lwt *slwt) +{ + seg6_lookup_nexthop(skb, NULL, 0); + + return dst_input(skb); +} + static int input_action_end_core(struct sk_buff *skb, struct seg6_local_lwt *slwt) { @@ -375,9 +383,7 @@ static int input_action_end_core(struct sk_buff *skb, advance_nextseg(srh, &ipv6_hdr(skb)->daddr); - seg6_lookup_nexthop(skb, NULL, 0); - - return dst_input(skb); + return input_action_end_finish(skb, slwt); drop: kfree_skb(skb); @@ -395,9 +401,7 @@ static int end_next_csid_core(struct sk_buff *skb, struct seg6_local_lwt *slwt) /* update DA */ seg6_next_csid_advance_arg(daddr, finfo); - seg6_lookup_nexthop(skb, NULL, 0); - - return dst_input(skb); + return input_action_end_finish(skb, slwt); } static bool seg6_next_csid_enabled(__u32 fops) -- cgit v1.2.3 From bdf3c0b9c10b44c9355dd68f359bad25489445b6 Mon Sep 17 00:00:00 2001 From: Andrea Mayer Date: Wed, 15 Feb 2023 14:46:58 +0100 Subject: seg6: add PSP flavor support for SRv6 End behavior The "flavors" framework defined in RFC8986 [1] represents additional operations that can modify or extend a subset of existing behaviors such as SRv6 End, End.X and End.T. We report these flavors hereafter: - Penultimate Segment Pop (PSP); - Ultimate Segment Pop (USP); - Ultimate Segment Decapsulation (USD). Depending on how the Segment Routing Header (SRH) has to be handled, an SRv6 End* behavior can support these flavors either individually or in combinations. In this patch, we only consider the PSP flavor for the SRv6 End behavior. A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node (i.e., the one applying the SRv6 Policy) when it needs to instruct the penultimate SR Endpoint node listed in the SID List (carried by the SRH) to remove the SRH from the IPv6 header. Specifically, a PSP enabled SRv6 End behavior processes the SRH by: i) decreasing the Segment Left (SL) from 1 to 0; ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination Address (DA); iii) removing (i.e., popping) the outer SRH from the extension headers following the IPv6 header. It is important to note that PSP operation (steps i, ii, iii) takes place only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the PSP operation is not performed because it would not be possible to decrease the SL from 1 to 0. SL=2 SL=1 SL=0 | | | For example, given the SRv6 policy (SID List := < X, Y, Z >): - a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP operation as Segment Left (SL) is 1, corresponding to the Penultimate Segment of the SID List; - a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the PSP operation as the Segment Left is 2. This behavior instance will apply the "standard" End packet processing, ignoring the configured PSP flavor at all. [1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986 Signed-off-by: Andrea Mayer Reviewed-by: David Ahern Signed-off-by: Paolo Abeni --- net/ipv6/seg6_local.c | 336 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 333 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 765e89a24bc2..dd433cc265c8 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -109,8 +109,15 @@ struct bpf_lwt_prog { #define next_csid_chk_lcnode_fn_bits(flen) \ next_csid_chk_lcblock_bits(flen) +#define SEG6_F_LOCAL_FLV_OP(flvname) BIT(SEG6_LOCAL_FLV_OP_##flvname) +#define SEG6_F_LOCAL_FLV_PSP SEG6_F_LOCAL_FLV_OP(PSP) + +/* Supported RFC8986 Flavor operations are reported in this bitmask */ +#define SEG6_LOCAL_FLV8986_SUPP_OPS SEG6_F_LOCAL_FLV_PSP + /* Supported Flavor operations are reported in this bitmask */ -#define SEG6_LOCAL_FLV_SUPP_OPS (BIT(SEG6_LOCAL_FLV_OP_NEXT_CSID)) +#define SEG6_LOCAL_FLV_SUPP_OPS (SEG6_F_LOCAL_FLV_OP(NEXT_CSID) | \ + SEG6_LOCAL_FLV8986_SUPP_OPS) struct seg6_flavors_info { /* Flavor operations */ @@ -409,15 +416,331 @@ static bool seg6_next_csid_enabled(__u32 fops) return fops & BIT(SEG6_LOCAL_FLV_OP_NEXT_CSID); } +/* We describe the packet state in relation to the absence/presence of the SRH + * and the Segment Left (SL) field. + * For our purposes, it is not necessary to record the exact value of the SL + * when the SID List consists of two or more segments. + */ +enum seg6_local_pktinfo { + /* the order really matters! */ + SEG6_LOCAL_PKTINFO_NOHDR = 0, + SEG6_LOCAL_PKTINFO_SL_ZERO, + SEG6_LOCAL_PKTINFO_SL_ONE, + SEG6_LOCAL_PKTINFO_SL_MORE, + __SEG6_LOCAL_PKTINFO_MAX, +}; + +#define SEG6_LOCAL_PKTINFO_MAX (__SEG6_LOCAL_PKTINFO_MAX - 1) + +static enum seg6_local_pktinfo seg6_get_srh_pktinfo(struct ipv6_sr_hdr *srh) +{ + __u8 sgl; + + if (!srh) + return SEG6_LOCAL_PKTINFO_NOHDR; + + sgl = srh->segments_left; + if (sgl < 2) + return SEG6_LOCAL_PKTINFO_SL_ZERO + sgl; + + return SEG6_LOCAL_PKTINFO_SL_MORE; +} + +enum seg6_local_flv_action { + SEG6_LOCAL_FLV_ACT_UNSPEC = 0, + SEG6_LOCAL_FLV_ACT_END, + SEG6_LOCAL_FLV_ACT_PSP, + SEG6_LOCAL_FLV_ACT_USP, + SEG6_LOCAL_FLV_ACT_USD, + __SEG6_LOCAL_FLV_ACT_MAX +}; + +#define SEG6_LOCAL_FLV_ACT_MAX (__SEG6_LOCAL_FLV_ACT_MAX - 1) + +/* The action table for RFC8986 flavors (see the flv8986_act_tbl below) + * contains the actions (i.e. processing operations) to be applied on packets + * when flavors are configured for an End* behavior. + * By combining the pkinfo data and from the flavors mask, the macro + * computes the index used to access the elements (actions) stored in the + * action table. The index is structured as follows: + * + * index + * _______________/\________________ + * / \ + * +----------------+----------------+ + * | pf | afm | + * +----------------+----------------+ + * ph-1 ... p1 p0 fk-1 ... f1 f0 + * MSB LSB + * + * where: + * - 'afm' (adjusted flavor mask) is the mask containing a combination of the + * RFC8986 flavors currently supported. 'afm' corresponds to the @fm + * argument of the macro whose value is righ-shifted by 1 bit. By doing so, + * we discard the SEG6_LOCAL_FLV_OP_UNSPEC flag (bit 0 in @fm) which is + * never used here; + * - 'pf' encodes the packet info (pktinfo) regarding the presence/absence of + * the SRH, SL = 0, etc. 'pf' is set with the value of @pf provided as + * argument to the macro. + */ +#define flv8986_act_tbl_idx(pf, fm) \ + ((((pf) << bits_per(SEG6_LOCAL_FLV8986_SUPP_OPS)) | \ + ((fm) & SEG6_LOCAL_FLV8986_SUPP_OPS)) >> SEG6_LOCAL_FLV_OP_PSP) + +/* We compute the size of the action table by considering the RFC8986 flavors + * actually supported by the kernel. In this way, the size is automatically + * adjusted when new flavors are supported. + */ +#define FLV8986_ACT_TBL_SIZE \ + roundup_pow_of_two(flv8986_act_tbl_idx(SEG6_LOCAL_PKTINFO_MAX, \ + SEG6_LOCAL_FLV8986_SUPP_OPS)) + +/* tbl_cfg(act, pf, fm) macro is used to easily configure the action + * table; it accepts 3 arguments: + * i) @act, the suffix from SEG6_LOCAL_FLV_ACT_{act} representing + * the action that should be applied on the packet; + * ii) @pf, the suffix from SEG6_LOCAL_PKTINFO_{pf} reporting the packet + * info about the lack/presence of SRH, SRH with SL = 0, etc; + * iii) @fm, the mask of flavors. + */ +#define tbl_cfg(act, pf, fm) \ + [flv8986_act_tbl_idx(SEG6_LOCAL_PKTINFO_##pf, \ + (fm))] = SEG6_LOCAL_FLV_ACT_##act + +/* shorthand for improving readability */ +#define F_PSP SEG6_F_LOCAL_FLV_PSP + +/* The table contains, for each combination of the pktinfo data and + * flavors, the action that should be taken on a packet (e.g. + * "standard" Endpoint processing, Penultimate Segment Pop, etc). + * + * By default, table entries not explicitly configured are initialized with the + * SEG6_LOCAL_FLV_ACT_UNSPEC action, which generally has the effect of + * discarding the processed packet. + */ +static const u8 flv8986_act_tbl[FLV8986_ACT_TBL_SIZE] = { + /* PSP variant for packet where SRH with SL = 1 */ + tbl_cfg(PSP, SL_ONE, F_PSP), + /* End for packet where the SRH with SL > 1*/ + tbl_cfg(END, SL_MORE, F_PSP), +}; + +#undef F_PSP +#undef tbl_cfg + +/* For each flavor defined in RFC8986 (or a combination of them) an action is + * performed on the packet. The specific action depends on: + * - info extracted from the packet (i.e. pktinfo data) regarding the + * lack/presence of the SRH, and if the SRH is available, on the value of + * Segment Left field; + * - the mask of flavors configured for the specific SRv6 End* behavior. + * + * The function combines both the pkinfo and the flavors mask to evaluate the + * corresponding action to be taken on the packet. + */ +static enum seg6_local_flv_action +seg6_local_flv8986_act_lookup(enum seg6_local_pktinfo pinfo, __u32 flvmask) +{ + unsigned long index; + + /* check if the provided mask of flavors is supported */ + if (unlikely(flvmask & ~SEG6_LOCAL_FLV8986_SUPP_OPS)) + return SEG6_LOCAL_FLV_ACT_UNSPEC; + + index = flv8986_act_tbl_idx(pinfo, flvmask); + if (unlikely(index >= FLV8986_ACT_TBL_SIZE)) + return SEG6_LOCAL_FLV_ACT_UNSPEC; + + return flv8986_act_tbl[index]; +} + +/* skb->data must be aligned with skb->network_header */ +static bool seg6_pop_srh(struct sk_buff *skb, int srhoff) +{ + struct ipv6_sr_hdr *srh; + struct ipv6hdr *iph; + __u8 srh_nexthdr; + int thoff = -1; + int srhlen; + int nhlen; + + if (unlikely(srhoff < sizeof(*iph) || + !pskb_may_pull(skb, srhoff + sizeof(*srh)))) + return false; + + srh = (struct ipv6_sr_hdr *)(skb->data + srhoff); + srhlen = ipv6_optlen(srh); + + /* we are about to mangle the pkt, let's check if we can write on it */ + if (unlikely(skb_ensure_writable(skb, srhoff + srhlen))) + return false; + + /* skb_ensure_writable() may change skb pointers; evaluate srh again */ + srh = (struct ipv6_sr_hdr *)(skb->data + srhoff); + srh_nexthdr = srh->nexthdr; + + if (unlikely(!skb_transport_header_was_set(skb))) + goto pull; + + nhlen = skb_network_header_len(skb); + /* we have to deal with the transport header: it could be set before + * the SRH, after the SRH, or within it (which is considered wrong, + * however). + */ + if (likely(nhlen <= srhoff)) + thoff = nhlen; + else if (nhlen >= srhoff + srhlen) + /* transport_header is set after the SRH */ + thoff = nhlen - srhlen; + else + /* transport_header falls inside the SRH; hence, we can't + * restore the transport_header pointer properly after + * SRH removing operation. + */ + return false; +pull: + /* we need to pop the SRH: + * 1) first of all, we pull out everything from IPv6 header up to SRH + * (included) evaluating also the rcsum; + * 2) we overwrite (and then remove) the SRH by properly moving the + * IPv6 along with any extension header that precedes the SRH; + * 3) At the end, we push back the pulled headers (except for SRH, + * obviously). + */ + skb_pull_rcsum(skb, srhoff + srhlen); + memmove(skb_network_header(skb) + srhlen, skb_network_header(skb), + srhoff); + skb_push(skb, srhoff); + + skb_reset_network_header(skb); + skb_mac_header_rebuild(skb); + if (likely(thoff >= 0)) + skb_set_transport_header(skb, thoff); + + iph = ipv6_hdr(skb); + if (iph->nexthdr == NEXTHDR_ROUTING) { + iph->nexthdr = srh_nexthdr; + } else { + /* we must look for the extension header (EXTH, for short) that + * immediately precedes the SRH we have just removed. + * Then, we update the value of the EXTH nexthdr with the one + * contained in the SRH nexthdr. + */ + unsigned int off = sizeof(*iph); + struct ipv6_opt_hdr *hp, _hdr; + __u8 nexthdr = iph->nexthdr; + + for (;;) { + if (unlikely(!ipv6_ext_hdr(nexthdr) || + nexthdr == NEXTHDR_NONE)) + return false; + + hp = skb_header_pointer(skb, off, sizeof(_hdr), &_hdr); + if (unlikely(!hp)) + return false; + + if (hp->nexthdr == NEXTHDR_ROUTING) { + hp->nexthdr = srh_nexthdr; + break; + } + + switch (nexthdr) { + case NEXTHDR_FRAGMENT: + fallthrough; + case NEXTHDR_AUTH: + /* we expect SRH before FRAG and AUTH */ + return false; + default: + off += ipv6_optlen(hp); + break; + } + + nexthdr = hp->nexthdr; + } + } + + iph->payload_len = htons(skb->len - sizeof(struct ipv6hdr)); + + skb_postpush_rcsum(skb, iph, srhoff); + + return true; +} + +/* process the packet on the basis of the RFC8986 flavors set for the given + * SRv6 End behavior instance. + */ +static int end_flv8986_core(struct sk_buff *skb, struct seg6_local_lwt *slwt) +{ + const struct seg6_flavors_info *finfo = &slwt->flv_info; + enum seg6_local_flv_action action; + enum seg6_local_pktinfo pinfo; + struct ipv6_sr_hdr *srh; + __u32 flvmask; + int srhoff; + + srh = seg6_get_srh(skb, 0); + srhoff = srh ? ((unsigned char *)srh - skb->data) : 0; + pinfo = seg6_get_srh_pktinfo(srh); +#ifdef CONFIG_IPV6_SEG6_HMAC + if (srh && !seg6_hmac_validate_skb(skb)) + goto drop; +#endif + flvmask = finfo->flv_ops; + if (unlikely(flvmask & ~SEG6_LOCAL_FLV8986_SUPP_OPS)) { + pr_warn_once("seg6local: invalid RFC8986 flavors\n"); + goto drop; + } + + /* retrieve the action triggered by the combination of pktinfo data and + * the flavors mask. + */ + action = seg6_local_flv8986_act_lookup(pinfo, flvmask); + switch (action) { + case SEG6_LOCAL_FLV_ACT_END: + /* process the packet as the "standard" End behavior */ + advance_nextseg(srh, &ipv6_hdr(skb)->daddr); + break; + case SEG6_LOCAL_FLV_ACT_PSP: + advance_nextseg(srh, &ipv6_hdr(skb)->daddr); + + if (unlikely(!seg6_pop_srh(skb, srhoff))) + goto drop; + break; + case SEG6_LOCAL_FLV_ACT_UNSPEC: + fallthrough; + default: + /* by default, we drop the packet since we could not find a + * suitable action. + */ + goto drop; + } + + return input_action_end_finish(skb, slwt); + +drop: + kfree_skb(skb); + return -EINVAL; +} + /* regular endpoint function */ static int input_action_end(struct sk_buff *skb, struct seg6_local_lwt *slwt) { const struct seg6_flavors_info *finfo = &slwt->flv_info; + __u32 fops = finfo->flv_ops; - if (seg6_next_csid_enabled(finfo->flv_ops)) + if (!fops) + return input_action_end_core(skb, slwt); + + /* check for the presence of NEXT-C-SID since it applies first */ + if (seg6_next_csid_enabled(fops)) return end_next_csid_core(skb, slwt); - return input_action_end_core(skb, slwt); + /* the specific processing function to be performed on the packet + * depends on the combination of flavors defined in RFC8986 and some + * information extracted from the packet, e.g. presence/absence of SRH, + * Segment Left = 0, etc. + */ + return end_flv8986_core(skb, slwt); } /* regular endpoint, and forward to specified nexthop */ @@ -2304,6 +2627,13 @@ int __init seg6_local_init(void) BUILD_BUG_ON(next_csid_chk_lcblock_bits(SEG6_LOCAL_LCBLOCK_DBITS)); BUILD_BUG_ON(next_csid_chk_lcnode_fn_bits(SEG6_LOCAL_LCNODE_FN_DBITS)); + /* To be memory efficient, we use 'u8' to represent the different + * actions related to RFC8986 flavors. If the kernel build stops here, + * it means that it is not possible to correctly encode these actions + * with the data type chosen for the action table. + */ + BUILD_BUG_ON(SEG6_LOCAL_FLV_ACT_MAX > (typeof(flv8986_act_tbl[0]))~0U); + return lwtunnel_encap_add_ops(&seg6_local_ops, LWTUNNEL_ENCAP_SEG6_LOCAL); } -- cgit v1.2.3 From 2954fe60e33da0f4de4d81a4c95c7dddb517d00c Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 1 Feb 2023 14:45:22 +0100 Subject: netfilter: let reset rules clean out conntrack entries iptables/nftables support responding to tcp packets with tcp resets. The generated tcp reset packet passes through both output and postrouting netfilter hooks, but conntrack will never see them because the generated skb has its ->nfct pointer copied over from the packet that triggered the reset rule. If the reset rule is used for established connections, this may result in the conntrack entry to be around for a very long time (default timeout is 5 days). One way to avoid this would be to not copy the nf_conn pointer so that the rest packet passes through conntrack too. Problem is that output rules might not have the same conntrack zone setup as the prerouting ones, so its possible that the reset skb won't find the correct entry. Generating a template entry for the skb seems error prone as well. Add an explicit "closing" function that switches a confirmed conntrack entry to closed state and wire this up for tcp. If the entry isn't confirmed, no action is needed because the conntrack entry will never be committed to the table. Reported-by: Russel King Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter.h | 3 +++ include/net/netfilter/nf_conntrack.h | 8 ++++++++ net/ipv4/netfilter/nf_reject_ipv4.c | 1 + net/ipv6/netfilter/nf_reject_ipv6.c | 1 + net/netfilter/core.c | 16 ++++++++++++++++ net/netfilter/nf_conntrack_core.c | 12 ++++++++++++ net/netfilter/nf_conntrack_proto_tcp.c | 35 ++++++++++++++++++++++++++++++++++ 7 files changed, 76 insertions(+) (limited to 'net') diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index d8817d381c14..6863e271a9de 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -437,11 +437,13 @@ nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, u_int8_t family) #include void nf_ct_attach(struct sk_buff *, const struct sk_buff *); +void nf_ct_set_closing(struct nf_conntrack *nfct); struct nf_conntrack_tuple; bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb); #else static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {} +static inline void nf_ct_set_closing(struct nf_conntrack *nfct) {} struct nf_conntrack_tuple; static inline bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb) @@ -459,6 +461,7 @@ struct nf_ct_hook { bool (*get_tuple_skb)(struct nf_conntrack_tuple *, const struct sk_buff *); void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb); + void (*set_closing)(struct nf_conntrack *nfct); }; extern const struct nf_ct_hook __rcu *nf_ct_hook; diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 6a2019aaa464..3dbf947285be 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -125,6 +125,12 @@ struct nf_conn { union nf_conntrack_proto proto; }; +static inline struct nf_conn * +nf_ct_to_nf_conn(const struct nf_conntrack *nfct) +{ + return container_of(nfct, struct nf_conn, ct_general); +} + static inline struct nf_conn * nf_ct_tuplehash_to_ctrack(const struct nf_conntrack_tuple_hash *hash) { @@ -175,6 +181,8 @@ nf_ct_get(const struct sk_buff *skb, enum ip_conntrack_info *ctinfo) void nf_ct_destroy(struct nf_conntrack *nfct); +void nf_conntrack_tcp_set_closing(struct nf_conn *ct); + /* decrement reference count on a conntrack */ static inline void nf_ct_put(struct nf_conn *ct) { diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c index d640adcaf1b1..f33aeab9424f 100644 --- a/net/ipv4/netfilter/nf_reject_ipv4.c +++ b/net/ipv4/netfilter/nf_reject_ipv4.c @@ -280,6 +280,7 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb, goto free_nskb; nf_ct_attach(nskb, oldskb); + nf_ct_set_closing(skb_nfct(oldskb)); #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) /* If we use ip_local_out for bridged traffic, the MAC source on diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c index f61d4f18e1cf..58ccdb08c0fd 100644 --- a/net/ipv6/netfilter/nf_reject_ipv6.c +++ b/net/ipv6/netfilter/nf_reject_ipv6.c @@ -345,6 +345,7 @@ void nf_send_reset6(struct net *net, struct sock *sk, struct sk_buff *oldskb, nf_reject_ip6_tcphdr_put(nskb, oldskb, otcph, otcplen); nf_ct_attach(nskb, oldskb); + nf_ct_set_closing(skb_nfct(oldskb)); #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) /* If we use ip6_local_out for bridged traffic, the MAC source on diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 5a6705a0e4ec..b2fdbbed2b4b 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -702,6 +702,22 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct) } EXPORT_SYMBOL(nf_conntrack_destroy); +void nf_ct_set_closing(struct nf_conntrack *nfct) +{ + const struct nf_ct_hook *ct_hook; + + if (!nfct) + return; + + rcu_read_lock(); + ct_hook = rcu_dereference(nf_ct_hook); + if (ct_hook) + ct_hook->set_closing(nfct); + + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(nf_ct_set_closing); + bool nf_ct_get_tuple_skb(struct nf_conntrack_tuple *dst_tuple, const struct sk_buff *skb) { diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index c00858344f02..430bb52b6454 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2747,11 +2747,23 @@ err_cachep: return ret; } +static void nf_conntrack_set_closing(struct nf_conntrack *nfct) +{ + struct nf_conn *ct = nf_ct_to_nf_conn(nfct); + + switch (nf_ct_protonum(ct)) { + case IPPROTO_TCP: + nf_conntrack_tcp_set_closing(ct); + break; + } +} + static const struct nf_ct_hook nf_conntrack_hook = { .update = nf_conntrack_update, .destroy = nf_ct_destroy, .get_tuple_skb = nf_conntrack_get_tuple_skb, .attach = nf_conntrack_attach, + .set_closing = nf_conntrack_set_closing, }; void nf_conntrack_init_end(void) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 16ee5ebe1ce1..4018acb1d674 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -911,6 +911,41 @@ static bool tcp_can_early_drop(const struct nf_conn *ct) return false; } +void nf_conntrack_tcp_set_closing(struct nf_conn *ct) +{ + enum tcp_conntrack old_state; + const unsigned int *timeouts; + u32 timeout; + + if (!nf_ct_is_confirmed(ct)) + return; + + spin_lock_bh(&ct->lock); + old_state = ct->proto.tcp.state; + ct->proto.tcp.state = TCP_CONNTRACK_CLOSE; + + if (old_state == TCP_CONNTRACK_CLOSE || + test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) { + spin_unlock_bh(&ct->lock); + return; + } + + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) { + const struct nf_tcp_net *tn; + + tn = nf_tcp_pernet(nf_ct_net(ct)); + timeouts = tn->timeouts; + } + + timeout = timeouts[TCP_CONNTRACK_CLOSE]; + WRITE_ONCE(ct->timeout, timeout + nfct_time_stamp); + + spin_unlock_bh(&ct->lock); + + nf_conntrack_event_cache(IPCT_PROTOINFO, ct); +} + static void nf_ct_tcp_state_reset(struct ip_ct_tcp_state *state) { state->td_end = 0; -- cgit v1.2.3 From 1596dae2f17ec5c6e8c8f0e3fec78c5ae55c1e0b Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Wed, 15 Feb 2023 15:33:09 +0100 Subject: xsk: check IFF_UP earlier in Tx path Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These two paths share a call to common function xsk_xmit() which has two sanity checks within. A pseudo code example to show the two paths: __xsk_sendmsg() : xsk_poll(): if (unlikely(!xsk_is_bound(xs))) if (unlikely(!xsk_is_bound(xs))) return -ENXIO; return mask; if (unlikely(need_wait)) (...) return -EOPNOTSUPP; xsk_xmit() mark napi id (...) xsk_xmit() xsk_xmit(): if (unlikely(!(xs->dev->flags & IFF_UP))) return -ENETDOWN; if (unlikely(!xs->tx)) return -ENOBUFS; As it can be observed above, in sendmsg() napi id can be marked on interface that was not brought up and this causes a NULL ptr dereference: [31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018 [31757.512710] #PF: supervisor read access in kernel mode [31757.517936] #PF: error_code(0x0000) - not-present page [31757.523149] PGD 0 P4D 0 [31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI [31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40 [31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180 [31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f [31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246 [31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000 [31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100 [31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000 [31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa [31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 [31757.612364] FS: 00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000 [31757.620571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0 [31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [31757.648139] PKRU: 55555554 [31757.650894] Call Trace: [31757.653385] [31757.655524] sock_sendmsg+0x8f/0xa0 [31757.659077] ? sockfd_lookup_light+0x12/0x70 [31757.663416] __sys_sendto+0xfc/0x170 [31757.667051] ? do_sched_setscheduler+0xdb/0x1b0 [31757.671658] __x64_sys_sendto+0x20/0x30 [31757.675557] do_syscall_64+0x38/0x90 [31757.679197] entry_SYSCALL_64_after_hwframe+0x72/0xdc [31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48 [31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16 [31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 [31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000 [31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000 [31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [31757.754940] To fix this, let's make xsk_xmit a function that will be responsible for generic Tx, where RCU is handled accordingly and pull out sanity checks and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and xsk_poll(). Fixes: ca2e1a627035 ("xsk: Mark napi_id on sendmsg()") Fixes: 18b1ab7aa76b ("xsk: Fix race at socket teardown") Signed-off-by: Maciej Fijalkowski Reviewed-by: Alexander Lobakin Link: https://lore.kernel.org/r/20230215143309.13145-1-maciej.fijalkowski@intel.com Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann --- net/xdp/xsk.c | 59 +++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 33 insertions(+), 26 deletions(-) (limited to 'net') diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 9f0561b67c12..13f62d2402e7 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -511,7 +511,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, return skb; } -static int xsk_generic_xmit(struct sock *sk) +static int __xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); u32 max_batch = TX_BATCH_SIZE; @@ -594,22 +594,13 @@ out: return err; } -static int xsk_xmit(struct sock *sk) +static int xsk_generic_xmit(struct sock *sk) { - struct xdp_sock *xs = xdp_sk(sk); int ret; - if (unlikely(!(xs->dev->flags & IFF_UP))) - return -ENETDOWN; - if (unlikely(!xs->tx)) - return -ENOBUFS; - - if (xs->zc) - return xsk_wakeup(xs, XDP_WAKEUP_TX); - /* Drop the RCU lock since the SKB path might sleep. */ rcu_read_unlock(); - ret = xsk_generic_xmit(sk); + ret = __xsk_generic_xmit(sk); /* Reaquire RCU lock before going into common code. */ rcu_read_lock(); @@ -627,17 +618,31 @@ static bool xsk_no_wakeup(struct sock *sk) #endif } +static int xsk_check_common(struct xdp_sock *xs) +{ + if (unlikely(!xsk_is_bound(xs))) + return -ENXIO; + if (unlikely(!(xs->dev->flags & IFF_UP))) + return -ENETDOWN; + + return 0; +} + static int __xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { bool need_wait = !(m->msg_flags & MSG_DONTWAIT); struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); struct xsk_buff_pool *pool; + int err; - if (unlikely(!xsk_is_bound(xs))) - return -ENXIO; + err = xsk_check_common(xs); + if (err) + return err; if (unlikely(need_wait)) return -EOPNOTSUPP; + if (unlikely(!xs->tx)) + return -ENOBUFS; if (sk_can_busy_loop(sk)) { if (xs->zc) @@ -649,8 +654,11 @@ static int __xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len return 0; pool = xs->pool; - if (pool->cached_need_wakeup & XDP_WAKEUP_TX) - return xsk_xmit(sk); + if (pool->cached_need_wakeup & XDP_WAKEUP_TX) { + if (xs->zc) + return xsk_wakeup(xs, XDP_WAKEUP_TX); + return xsk_generic_xmit(sk); + } return 0; } @@ -670,11 +678,11 @@ static int __xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int bool need_wait = !(flags & MSG_DONTWAIT); struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); + int err; - if (unlikely(!xsk_is_bound(xs))) - return -ENXIO; - if (unlikely(!(xs->dev->flags & IFF_UP))) - return -ENETDOWN; + err = xsk_check_common(xs); + if (err) + return err; if (unlikely(!xs->rx)) return -ENOBUFS; if (unlikely(need_wait)) @@ -713,21 +721,20 @@ static __poll_t xsk_poll(struct file *file, struct socket *sock, sock_poll_wait(file, sock, wait); rcu_read_lock(); - if (unlikely(!xsk_is_bound(xs))) { - rcu_read_unlock(); - return mask; - } + if (xsk_check_common(xs)) + goto skip_tx; pool = xs->pool; if (pool->cached_need_wakeup) { if (xs->zc) xsk_wakeup(xs, pool->cached_need_wakeup); - else + else if (xs->tx) /* Poll needs to drive Tx also in copy mode */ - xsk_xmit(sk); + xsk_generic_xmit(sk); } +skip_tx: if (xs->rx && !xskq_prod_is_empty(xs->rx)) mask |= EPOLLIN | EPOLLRDNORM; if (xs->tx && xsk_tx_writeable(xs)) -- cgit v1.2.3 From af2d0d09eabe98b01bf02b236e381edae4209778 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 16 Feb 2023 16:41:47 -0800 Subject: bpf: Disable bh in bpf_test_run for xdp and tc prog Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup helper that will be used in a latter selftest. In particular, it calls ___neigh_lookup_noref that expects the bh disabled. This patch disables bh before calling bpf_prog_run[_xdp], so the testing prog can also use those helpers. Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230217004150.2980689-2-martin.lau@linux.dev --- net/bpf/test_run.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 1ab396a2b87f..982e81bba6cf 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -413,10 +413,12 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); do { run_ctx.prog_item = &item; + local_bh_disable(); if (xdp) *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); + local_bh_enable(); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); bpf_test_timer_leave(&t); -- cgit v1.2.3 From 1fe4850b34ab512ff911e2c035c75fb6438f7307 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 16 Feb 2023 16:41:48 -0800 Subject: bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state The bpf_fib_lookup() helper does not only look up the fib (ie. route) but it also looks up the neigh. Before returning the neigh, the helper does not check for NUD_VALID. When a neigh state (neigh->nud_state) is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper still returns SUCCESS instead of NO_NEIGH in this case. Because of the SUCCESS return value, the bpf prog directly uses the returned dmac and ends up filling all zero in the eth header. This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is not valid. Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev --- net/core/filter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/net/core/filter.c b/net/core/filter.c index 2ce06a72a5ba..8daaaf76ab15 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5849,7 +5849,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); } - if (!neigh) + if (!neigh || !(neigh->nud_state & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; return bpf_fib_set_fwd_params(params, neigh, dev, mtu); @@ -5964,7 +5964,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, * not needed here. */ neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); - if (!neigh) + if (!neigh || !(neigh->nud_state & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; return bpf_fib_set_fwd_params(params, neigh, dev, mtu); -- cgit v1.2.3 From 181127fb76e62d06ab17a75fd610129688612343 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 17 Feb 2023 12:13:09 -0800 Subject: Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" This reverts commit 6c20822fada1b8adb77fa450d03a0d449686a4a9. build bot failed on arch with different cache line size: https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/ Signed-off-by: Martin KaFai Lau --- net/bpf/test_run.c | 29 +++++----------------- .../selftests/bpf/prog_tests/xdp_do_redirect.c | 7 +++--- 2 files changed, 9 insertions(+), 27 deletions(-) (limited to 'net') diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 982e81bba6cf..6f3d654b3339 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -97,11 +97,8 @@ reset: struct xdp_page_head { struct xdp_buff orig_ctx; struct xdp_buff ctx; - union { - /* ::data_hard_start starts here */ - DECLARE_FLEX_ARRAY(struct xdp_frame, frame); - DECLARE_FLEX_ARRAY(u8, data); - }; + struct xdp_frame frm; + u8 data[]; }; struct xdp_test_data { @@ -119,20 +116,6 @@ struct xdp_test_data { #define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)) #define TEST_XDP_MAX_BATCH 256 -#if BITS_PER_LONG == 64 && PAGE_SIZE == SZ_4K -/* tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c:%MAX_PKT_SIZE - * must be updated accordingly when any of these changes, otherwise BPF - * selftests will fail. - */ -#ifdef __s390x__ -#define TEST_MAX_PKT_SIZE 3216 -#else -#define TEST_MAX_PKT_SIZE 3408 -#endif -static_assert(SKB_WITH_OVERHEAD(TEST_XDP_FRAME_SIZE - XDP_PACKET_HEADROOM) == - TEST_MAX_PKT_SIZE); -#endif - static void xdp_test_run_init_page(struct page *page, void *arg) { struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); @@ -149,8 +132,8 @@ static void xdp_test_run_init_page(struct page *page, void *arg) headroom -= meta_len; new_ctx = &head->ctx; - frm = head->frame; - data = head->data; + frm = &head->frm; + data = &head->data; memcpy(data + headroom, orig_ctx->data_meta, frm_len); xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq); @@ -240,7 +223,7 @@ static void reset_ctx(struct xdp_page_head *head) head->ctx.data = head->orig_ctx.data; head->ctx.data_meta = head->orig_ctx.data_meta; head->ctx.data_end = head->orig_ctx.data_end; - xdp_update_frame_from_buff(&head->ctx, head->frame); + xdp_update_frame_from_buff(&head->ctx, &head->frm); } static int xdp_recv_frames(struct xdp_frame **frames, int nframes, @@ -302,7 +285,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, head = phys_to_virt(page_to_phys(page)); reset_ctx(head); ctx = &head->ctx; - frm = head->frame; + frm = &head->frm; xdp->frame_cnt++; act = bpf_prog_run_xdp(prog, ctx); diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c index 7271a18ab3e2..2666c84dbd01 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c @@ -65,13 +65,12 @@ static int attach_tc_prog(struct bpf_tc_hook *hook, int fd) } /* The maximum permissible size is: PAGE_SIZE - sizeof(struct xdp_page_head) - - * SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) - XDP_PACKET_HEADROOM = - * 3408 bytes for 64-byte cacheline and 3216 for 256-byte one. + * sizeof(struct skb_shared_info) - XDP_PACKET_HEADROOM = 3368 bytes */ #if defined(__s390x__) -#define MAX_PKT_SIZE 3216 +#define MAX_PKT_SIZE 3176 #else -#define MAX_PKT_SIZE 3408 +#define MAX_PKT_SIZE 3368 #endif static void test_max_pkt_size(int fd) { -- cgit v1.2.3 From 31de4105f00d64570139bc5494a201b0bd57349f Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Fri, 17 Feb 2023 12:55:14 -0800 Subject: bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev --- include/uapi/linux/bpf.h | 6 ++++++ net/core/filter.c | 39 ++++++++++++++++++++++++++------------- tools/include/uapi/linux/bpf.h | 6 ++++++ 3 files changed, 38 insertions(+), 13 deletions(-) (limited to 'net') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1503f61336b6..62ce1f5d1b1d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3134,6 +3134,11 @@ union bpf_attr { * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). + * **BPF_FIB_LOOKUP_SKIP_NEIGH** + * Skip the neighbour table lookup. *params*->dmac + * and *params*->smac will not be set as output. A common + * use case is to call **bpf_redirect_neigh**\ () after + * doing **bpf_fib_lookup**\ (). * * *ctx* is either **struct xdp_md** for XDP programs or * **struct sk_buff** tc cls_act programs. @@ -6750,6 +6755,7 @@ struct bpf_raw_tracepoint_args { enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index 8daaaf76ab15..1d6f165923bf 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5722,12 +5722,8 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = { #endif #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6) -static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, - const struct neighbour *neigh, - const struct net_device *dev, u32 mtu) +static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu) { - memcpy(params->dmac, neigh->ha, ETH_ALEN); - memcpy(params->smac, dev->dev_addr, ETH_ALEN); params->h_vlan_TCI = 0; params->h_vlan_proto = 0; if (mtu) @@ -5838,21 +5834,29 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, if (likely(nhc->nhc_gw_family != AF_INET6)) { if (nhc->nhc_gw_family) params->ipv4_dst = nhc->nhc_gw.ipv4; - - neigh = __ipv4_neigh_lookup_noref(dev, - (__force u32)params->ipv4_dst); } else { struct in6_addr *dst = (struct in6_addr *)params->ipv6_dst; params->family = AF_INET6; *dst = nhc->nhc_gw.ipv6; - neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); } + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + goto set_fwd_params; + + if (likely(nhc->nhc_gw_family != AF_INET6)) + neigh = __ipv4_neigh_lookup_noref(dev, + (__force u32)params->ipv4_dst); + else + neigh = __ipv6_neigh_lookup_noref_stub(dev, params->ipv6_dst); + if (!neigh || !(neigh->nud_state & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; + memcpy(params->dmac, neigh->ha, ETH_ALEN); + memcpy(params->smac, dev->dev_addr, ETH_ALEN); - return bpf_fib_set_fwd_params(params, neigh, dev, mtu); +set_fwd_params: + return bpf_fib_set_fwd_params(params, mtu); } #endif @@ -5960,24 +5964,33 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, params->rt_metric = res.f6i->fib6_metric; params->ifindex = dev->ifindex; + if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH) + goto set_fwd_params; + /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is * not needed here. */ neigh = __ipv6_neigh_lookup_noref_stub(dev, dst); if (!neigh || !(neigh->nud_state & NUD_VALID)) return BPF_FIB_LKUP_RET_NO_NEIGH; + memcpy(params->dmac, neigh->ha, ETH_ALEN); + memcpy(params->smac, dev->dev_addr, ETH_ALEN); - return bpf_fib_set_fwd_params(params, neigh, dev, mtu); +set_fwd_params: + return bpf_fib_set_fwd_params(params, mtu); } #endif +#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \ + BPF_FIB_LOOKUP_SKIP_NEIGH) + BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, struct bpf_fib_lookup *, params, int, plen, u32, flags) { if (plen < sizeof(*params)) return -EINVAL; - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~BPF_FIB_LOOKUP_MASK) return -EINVAL; switch (params->family) { @@ -6015,7 +6028,7 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, if (plen < sizeof(*params)) return -EINVAL; - if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) + if (flags & ~BPF_FIB_LOOKUP_MASK) return -EINVAL; if (params->tot_len) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1503f61336b6..62ce1f5d1b1d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3134,6 +3134,11 @@ union bpf_attr { * **BPF_FIB_LOOKUP_OUTPUT** * Perform lookup from an egress perspective (default is * ingress). + * **BPF_FIB_LOOKUP_SKIP_NEIGH** + * Skip the neighbour table lookup. *params*->dmac + * and *params*->smac will not be set as output. A common + * use case is to call **bpf_redirect_neigh**\ () after + * doing **bpf_fib_lookup**\ (). * * *ctx* is either **struct xdp_md** for XDP programs or * **struct sk_buff** tc cls_act programs. @@ -6750,6 +6755,7 @@ struct bpf_raw_tracepoint_args { enum { BPF_FIB_LOOKUP_DIRECT = (1U << 0), BPF_FIB_LOOKUP_OUTPUT = (1U << 1), + BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2), }; enum { -- cgit v1.2.3 From 648324c9b690bcda0ac6f6499370effc0336ee27 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 14 Feb 2023 14:50:30 +0100 Subject: ieee802154: Use netlink policies when relevant on scan parameters Instead of open-coding scan parameters (page, channels, duration, etc), let's use the existing NLA_POLICY* macros. This help greatly reducing the error handling and clarifying the overall logic. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Suggested-by: Jakub Kicinski Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230214135035.1202471-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/ieee802154/nl802154.c | 84 ++++++++++++++++------------------------------- 1 file changed, 28 insertions(+), 56 deletions(-) (limited to 'net') diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index 8661907599e1..3808a8e62a7b 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -187,8 +187,8 @@ static const struct nla_policy nl802154_policy[NL802154_ATTR_MAX+1] = { [NL802154_ATTR_WPAN_DEV] = { .type = NLA_U64 }, - [NL802154_ATTR_PAGE] = { .type = NLA_U8, }, - [NL802154_ATTR_CHANNEL] = { .type = NLA_U8, }, + [NL802154_ATTR_PAGE] = NLA_POLICY_MAX(NLA_U8, IEEE802154_MAX_PAGE), + [NL802154_ATTR_CHANNEL] = NLA_POLICY_MAX(NLA_U8, IEEE802154_MAX_CHANNEL), [NL802154_ATTR_TX_POWER] = { .type = NLA_S32, }, @@ -221,13 +221,19 @@ static const struct nla_policy nl802154_policy[NL802154_ATTR_MAX+1] = { [NL802154_ATTR_COORDINATOR] = { .type = NLA_NESTED }, - [NL802154_ATTR_SCAN_TYPE] = { .type = NLA_U8 }, - [NL802154_ATTR_SCAN_CHANNELS] = { .type = NLA_U32 }, - [NL802154_ATTR_SCAN_PREAMBLE_CODES] = { .type = NLA_U64 }, - [NL802154_ATTR_SCAN_MEAN_PRF] = { .type = NLA_U8 }, - [NL802154_ATTR_SCAN_DURATION] = { .type = NLA_U8 }, - [NL802154_ATTR_SCAN_DONE_REASON] = { .type = NLA_U8 }, - [NL802154_ATTR_BEACON_INTERVAL] = { .type = NLA_U8 }, + [NL802154_ATTR_SCAN_TYPE] = + NLA_POLICY_RANGE(NLA_U8, NL802154_SCAN_ED, NL802154_SCAN_RIT_PASSIVE), + [NL802154_ATTR_SCAN_CHANNELS] = + NLA_POLICY_MASK(NLA_U32, GENMASK(IEEE802154_MAX_CHANNEL, 0)), + [NL802154_ATTR_SCAN_PREAMBLE_CODES] = { .type = NLA_REJECT }, + [NL802154_ATTR_SCAN_MEAN_PRF] = { .type = NLA_REJECT }, + [NL802154_ATTR_SCAN_DURATION] = + NLA_POLICY_MAX(NLA_U8, IEEE802154_MAX_SCAN_DURATION), + [NL802154_ATTR_SCAN_DONE_REASON] = + NLA_POLICY_RANGE(NLA_U8, NL802154_SCAN_DONE_REASON_FINISHED, + NL802154_SCAN_DONE_REASON_ABORTED), + [NL802154_ATTR_BEACON_INTERVAL] = + NLA_POLICY_MAX(NLA_U8, IEEE802154_MAX_SCAN_DURATION), #ifdef CONFIG_IEEE802154_NL802154_EXPERIMENTAL [NL802154_ATTR_SEC_ENABLED] = { .type = NLA_U8, }, @@ -1423,51 +1429,23 @@ static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info) goto free_request; } - if (info->attrs[NL802154_ATTR_PAGE]) { + /* Use current page by default */ + if (info->attrs[NL802154_ATTR_PAGE]) request->page = nla_get_u8(info->attrs[NL802154_ATTR_PAGE]); - if (request->page > IEEE802154_MAX_PAGE) { - pr_err("Invalid page %d > %d\n", - request->page, IEEE802154_MAX_PAGE); - err = -EINVAL; - goto free_request; - } - } else { - /* Use current page by default */ + else request->page = wpan_phy->current_page; - } - if (info->attrs[NL802154_ATTR_SCAN_CHANNELS]) { + /* Scan all supported channels by default */ + if (info->attrs[NL802154_ATTR_SCAN_CHANNELS]) request->channels = nla_get_u32(info->attrs[NL802154_ATTR_SCAN_CHANNELS]); - if (request->channels >= BIT(IEEE802154_MAX_CHANNEL + 1)) { - pr_err("Invalid channels bitfield %x ≥ %lx\n", - request->channels, - BIT(IEEE802154_MAX_CHANNEL + 1)); - err = -EINVAL; - goto free_request; - } - } else { - /* Scan all supported channels by default */ + else request->channels = wpan_phy->supported.channels[request->page]; - } - - if (info->attrs[NL802154_ATTR_SCAN_PREAMBLE_CODES] || - info->attrs[NL802154_ATTR_SCAN_MEAN_PRF]) { - pr_err("Preamble codes and mean PRF not supported yet\n"); - err = -EINVAL; - goto free_request; - } - if (info->attrs[NL802154_ATTR_SCAN_DURATION]) { + /* Use maximum duration order by default */ + if (info->attrs[NL802154_ATTR_SCAN_DURATION]) request->duration = nla_get_u8(info->attrs[NL802154_ATTR_SCAN_DURATION]); - if (request->duration > IEEE802154_MAX_SCAN_DURATION) { - pr_err("Duration is out of range\n"); - err = -EINVAL; - goto free_request; - } - } else { - /* Use maximum duration order by default */ + else request->duration = IEEE802154_MAX_SCAN_DURATION; - } if (wpan_dev->netdev) dev_hold(wpan_dev->netdev); @@ -1614,17 +1592,11 @@ nl802154_send_beacons(struct sk_buff *skb, struct genl_info *info) request->wpan_dev = wpan_dev; request->wpan_phy = wpan_phy; - if (info->attrs[NL802154_ATTR_BEACON_INTERVAL]) { + /* Use maximum duration order by default */ + if (info->attrs[NL802154_ATTR_BEACON_INTERVAL]) request->interval = nla_get_u8(info->attrs[NL802154_ATTR_BEACON_INTERVAL]); - if (request->interval > IEEE802154_MAX_SCAN_DURATION) { - pr_err("Interval is out of range\n"); - err = -EINVAL; - goto free_request; - } - } else { - /* Use maximum duration order by default */ + else request->interval = IEEE802154_MAX_SCAN_DURATION; - } if (wpan_dev->netdev) dev_hold(wpan_dev->netdev); @@ -1640,7 +1612,7 @@ nl802154_send_beacons(struct sk_buff *skb, struct genl_info *info) free_device: if (wpan_dev->netdev) dev_put(wpan_dev->netdev); -free_request: + kfree(request); return err; -- cgit v1.2.3 From a0b6106672b53158bf141d1a713aef2c891d74b1 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 14 Feb 2023 14:50:31 +0100 Subject: ieee802154: Convert scan error messages to extack Instead of printing error messages in the kernel log, let's use extack. When there is a netlink error returned that could be further specified with a string, use extack as well. Apply this logic to the very recent scan/beacon infrastructure. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests") Suggested-by: Jakub Kicinski Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230214135035.1202471-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/ieee802154/nl802154.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) (limited to 'net') diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index 3808a8e62a7b..12ca84e1724d 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -1407,9 +1407,15 @@ static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info) u8 type; int err; - /* Monitors are not allowed to perform scans */ - if (wpan_dev->iftype == NL802154_IFTYPE_MONITOR) + if (wpan_dev->iftype == NL802154_IFTYPE_MONITOR) { + NL_SET_ERR_MSG(info->extack, "Monitors are not allowed to perform scans"); return -EPERM; + } + + if (!nla_get_u8(info->attrs[NL802154_ATTR_SCAN_TYPE])) { + NL_SET_ERR_MSG(info->extack, "Malformed request, missing scan type"); + return -EINVAL; + } request = kzalloc(sizeof(*request), GFP_KERNEL); if (!request) @@ -1424,7 +1430,7 @@ static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info) request->type = type; break; default: - pr_err("Unsupported scan type: %d\n", type); + NL_SET_ERR_MSG_FMT(info->extack, "Unsupported scan type: %d", type); err = -EINVAL; goto free_request; } @@ -1576,12 +1582,13 @@ nl802154_send_beacons(struct sk_buff *skb, struct genl_info *info) struct cfg802154_beacon_request *request; int err; - /* Only coordinators can send beacons */ - if (wpan_dev->iftype != NL802154_IFTYPE_COORD) + if (wpan_dev->iftype != NL802154_IFTYPE_COORD) { + NL_SET_ERR_MSG(info->extack, "Only coordinators can send beacons"); return -EOPNOTSUPP; + } if (wpan_dev->pan_id == cpu_to_le16(IEEE802154_PANID_BROADCAST)) { - pr_err("Device is not part of any PAN\n"); + NL_SET_ERR_MSG(info->extack, "Device is not part of any PAN"); return -EPERM; } -- cgit v1.2.3 From 1edecbd0bd45c9c899e0f82b123342f28423468c Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 14 Feb 2023 14:50:32 +0100 Subject: ieee802154: Change error code on monitor scan netlink request Returning EPERM gives the impression that "right now" it is not possible, but "later" it could be, while what we want to express is the fact that this is not currently supported at all (might change in the future). So let's return EOPNOTSUPP instead. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Suggested-by: Alexander Aring Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230214135035.1202471-4-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/ieee802154/nl802154.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index 12ca84e1724d..f4a5070a9faf 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -1409,7 +1409,7 @@ static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info) if (wpan_dev->iftype == NL802154_IFTYPE_MONITOR) { NL_SET_ERR_MSG(info->extack, "Monitors are not allowed to perform scans"); - return -EPERM; + return -EOPNOTSUPP; } if (!nla_get_u8(info->attrs[NL802154_ATTR_SCAN_TYPE])) { -- cgit v1.2.3 From 1375e3ba9d773f2dbac96ebddfdd0d160276ca40 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 14 Feb 2023 14:50:33 +0100 Subject: mac802154: Send beacons using the MLME Tx path Using ieee802154_subif_start_xmit() to bypass the net queue when sending beacons is broken because it does not acquire the HARD_TX_LOCK(), hence not preventing datagram buffers to be smashed by beacons upon contention situation. Using the mlme_tx helper is not the best fit either but at least it is not buggy and has little-to-no performance hit. More details are given in the comment explaining this choice in the code. Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: Alexander Aring Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230214135035.1202471-5-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/mac802154/scan.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c index 8f98efec7753..fff41e59099e 100644 --- a/net/mac802154/scan.c +++ b/net/mac802154/scan.c @@ -326,7 +326,25 @@ static int mac802154_transmit_beacon(struct ieee802154_local *local, return ret; } - return ieee802154_subif_start_xmit(skb, sdata->dev); + /* Using the MLME transmission helper for sending beacons is a bit + * overkill because we do not really care about the final outcome. + * + * Even though, going through the whole net stack with a regular + * dev_queue_xmit() is not relevant either because we want beacons to be + * sent "now" rather than go through the whole net stack scheduling + * (qdisc & co). + * + * Finally, using ieee802154_subif_start_xmit() would only be an option + * if we had a generic transmit helper which would acquire the + * HARD_TX_LOCK() to prevent buffer handling conflicts with regular + * packets. + * + * So for now we keep it simple and send beacons with our MLME helper, + * even if it stops the ieee802154 queue entirely during these + * transmissions, wich anyway does not have a huge impact on the + * performances given the current design of the stack. + */ + return ieee802154_mlme_tx(local, sdata, skb); } void mac802154_beacon_worker(struct work_struct *work) -- cgit v1.2.3 From 61d7dddf46caa1c6dd869385a275e4d2931e7090 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 14 Feb 2023 14:50:34 +0100 Subject: mac802154: Fix an always true condition At this stage we simply do not care about the delayed work value, because active scan is not yet supported, so we can blindly queue another work once a beacon has been sent. It fixes a smatch warning: mac802154_beacon_worker() warn: always true condition '(local->beacon_interval >= 0) => (0-u32max >= 0)' Fixes: 3accf4762734 ("mac802154: Handle basic beaconing") Reported-by: kernel test robot Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230214135035.1202471-6-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/mac802154/scan.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'net') diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c index fff41e59099e..9b0933a185eb 100644 --- a/net/mac802154/scan.c +++ b/net/mac802154/scan.c @@ -383,9 +383,8 @@ void mac802154_beacon_worker(struct work_struct *work) dev_err(&sdata->dev->dev, "Beacon could not be transmitted (%d)\n", ret); - if (local->beacon_interval >= 0) - queue_delayed_work(local->mac_wq, &local->beacon_work, - local->beacon_interval); + queue_delayed_work(local->mac_wq, &local->beacon_work, + local->beacon_interval); } int mac802154_stop_beacons_locked(struct ieee802154_local *local, -- cgit v1.2.3 From ed9a8ad7d8a1a0eb7d4e1414d0a04ece7c2265df Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Tue, 14 Feb 2023 14:50:35 +0100 Subject: ieee802154: Drop device trackers In order to prevent a device from disappearing when a background job was started, dev_hold() and dev_put() calls were made. During the stabilization phase of the scan/beacon features, it was later decided that removing the device while a background job was ongoing was a valid use case, and we should instead stop the background job and then remove the device, rather than prevent the device from being removed. This is what is currently done, which means manually reference counting the device during background jobs is no longer needed. Fixes: ed3557c947e1 ("ieee802154: Add support for user scanning requests") Fixes: 9bc114504b07 ("ieee802154: Add support for user beaconing requests") Reported-by: Jakub Kicinski Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20230214135035.1202471-7-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- net/ieee802154/nl802154.c | 24 ++++-------------------- 1 file changed, 4 insertions(+), 20 deletions(-) (limited to 'net') diff --git a/net/ieee802154/nl802154.c b/net/ieee802154/nl802154.c index f4a5070a9faf..2215f576ee37 100644 --- a/net/ieee802154/nl802154.c +++ b/net/ieee802154/nl802154.c @@ -1453,20 +1453,14 @@ static int nl802154_trigger_scan(struct sk_buff *skb, struct genl_info *info) else request->duration = IEEE802154_MAX_SCAN_DURATION; - if (wpan_dev->netdev) - dev_hold(wpan_dev->netdev); - err = rdev_trigger_scan(rdev, request); if (err) { pr_err("Failure starting scanning (%d)\n", err); - goto free_device; + goto free_request; } return 0; -free_device: - if (wpan_dev->netdev) - dev_put(wpan_dev->netdev); free_request: kfree(request); @@ -1555,9 +1549,6 @@ int nl802154_scan_done(struct wpan_phy *wpan_phy, struct wpan_dev *wpan_dev, if (err == -ESRCH) err = 0; - if (wpan_dev->netdev) - dev_put(wpan_dev->netdev); - return err; } EXPORT_SYMBOL_GPL(nl802154_scan_done); @@ -1605,21 +1596,15 @@ nl802154_send_beacons(struct sk_buff *skb, struct genl_info *info) else request->interval = IEEE802154_MAX_SCAN_DURATION; - if (wpan_dev->netdev) - dev_hold(wpan_dev->netdev); - err = rdev_send_beacons(rdev, request); if (err) { pr_err("Failure starting sending beacons (%d)\n", err); - goto free_device; + goto free_request; } return 0; -free_device: - if (wpan_dev->netdev) - dev_put(wpan_dev->netdev); - +free_request: kfree(request); return err; @@ -1627,8 +1612,7 @@ free_device: void nl802154_beaconing_done(struct wpan_dev *wpan_dev) { - if (wpan_dev->netdev) - dev_put(wpan_dev->netdev); + /* NOP */ } EXPORT_SYMBOL_GPL(nl802154_beaconing_done); -- cgit v1.2.3 From c078381856230f1e8e13738661d83c2b4b433819 Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 15 Feb 2023 21:48:05 +0000 Subject: rxrpc: Fix overproduction of wakeups to recvmsg() Fix three cases of overproduction of wakeups: (1) rxrpc_input_split_jumbo() conditionally notifies the app that there's data for recvmsg() to collect if it queues some data - and then its only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway. Fix the rxrpc_input_data() to only do the wakeup in failure cases. (2) If a DATA packet is received for a call by the I/O thread whilst recvmsg() is busy draining the call's rx queue in the app thread, the call will left on the recvmsg() queue for recvmsg() to pick up, even though there isn't any data on it. This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR set after the reply has been posted to a service call. Fix this by discarding pending calls from the recvmsg() queue that don't need servicing yet. (3) Not-yet-completed calls get requeued after having data read from them, even if they have no data to read. Fix this by only requeuing them if they have data waiting on them; if they don't, the I/O thread will requeue them when data arrives or they fail. Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk Signed-off-by: Paolo Abeni --- include/trace/events/rxrpc.h | 1 + net/rxrpc/input.c | 2 +- net/rxrpc/recvmsg.c | 16 +++++++++++++++- 3 files changed, 17 insertions(+), 2 deletions(-) (limited to 'net') diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index c3c0b0aa8381..4c53a5ef6257 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -318,6 +318,7 @@ EM(rxrpc_recvmsg_return, "RETN") \ EM(rxrpc_recvmsg_terminal, "TERM") \ EM(rxrpc_recvmsg_to_be_accepted, "TBAC") \ + EM(rxrpc_recvmsg_unqueue, "UNQU") \ E_(rxrpc_recvmsg_wait, "WAIT") #define rxrpc_rtt_tx_traces \ diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index d68848fce51f..030d64f282f3 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -606,7 +606,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb) rxrpc_proto_abort(call, sp->hdr.seq, rxrpc_badmsg_bad_jumbo); goto out_notify; } - skb = NULL; + return; out_notify: trace_rxrpc_notify_socket(call->debug_id, serial); diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index 76eb2b9cd936..a482f88c5fc5 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -334,10 +334,23 @@ try_again: /* Find the next call and dequeue it if we're not just peeking. If we * do dequeue it, that comes with a ref that we will need to release. + * We also want to weed out calls that got requeued whilst we were + * shovelling data out. */ spin_lock(&rx->recvmsg_lock); l = rx->recvmsg_q.next; call = list_entry(l, struct rxrpc_call, recvmsg_link); + + if (!rxrpc_call_is_complete(call) && + skb_queue_empty(&call->recvmsg_queue)) { + list_del_init(&call->recvmsg_link); + spin_unlock(&rx->recvmsg_lock); + release_sock(&rx->sk); + trace_rxrpc_recvmsg(call->debug_id, rxrpc_recvmsg_unqueue, 0); + rxrpc_put_call(call, rxrpc_call_put_recvmsg); + goto try_again; + } + if (!(flags & MSG_PEEK)) list_del_init(&call->recvmsg_link); else @@ -402,7 +415,8 @@ try_again: if (rxrpc_call_has_failed(call)) goto call_failed; - rxrpc_notify_socket(call); + if (!skb_queue_empty(&call->recvmsg_queue)) + rxrpc_notify_socket(call); goto not_yet_complete; call_failed: -- cgit v1.2.3 From 09dbdf28f9f9fa27e16895c67fbd94ff36124766 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 16 Feb 2023 00:46:30 +0200 Subject: net/sched: taprio: fix calculation of maximum gate durations taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: Paolo Abeni --- net/sched/sch_taprio.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 9781b47962bb..556e72ec0f38 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1833,23 +1833,6 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, goto free_sched; } - err = parse_taprio_schedule(q, tb, new_admin, extack); - if (err < 0) - goto free_sched; - - if (new_admin->num_entries == 0) { - NL_SET_ERR_MSG(extack, "There should be at least one entry in the schedule"); - err = -EINVAL; - goto free_sched; - } - - err = taprio_parse_clockid(sch, tb, extack); - if (err < 0) - goto free_sched; - - taprio_set_picos_per_byte(dev, q); - taprio_update_queue_max_sdu(q, new_admin, stab); - if (mqprio) { err = netdev_set_num_tc(dev, mqprio->num_tc); if (err) @@ -1867,6 +1850,23 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, mqprio->prio_tc_map[i]); } + err = parse_taprio_schedule(q, tb, new_admin, extack); + if (err < 0) + goto free_sched; + + if (new_admin->num_entries == 0) { + NL_SET_ERR_MSG(extack, "There should be at least one entry in the schedule"); + err = -EINVAL; + goto free_sched; + } + + err = taprio_parse_clockid(sch, tb, extack); + if (err < 0) + goto free_sched; + + taprio_set_picos_per_byte(dev, q); + taprio_update_queue_max_sdu(q, new_admin, stab); + if (FULL_OFFLOAD_IS_ENABLED(q->flags)) err = taprio_enable_offload(dev, q, new_admin, extack); else -- cgit v1.2.3 From bdf366bd867c4565b535a5825df7ddcb4773fc28 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 16 Feb 2023 00:46:31 +0200 Subject: net/sched: taprio: don't allow dynamic max_sdu to go negative after stab adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: Paolo Abeni --- net/sched/sch_taprio.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 556e72ec0f38..53ba4d6b0218 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -279,8 +279,14 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q, u32 max_frm_len; max_frm_len = duration_to_length(q, sched->max_open_gate_duration[tc]); - if (stab) + /* Compensate for L1 overhead from size table, + * but don't let the frame size go negative + */ + if (stab) { max_frm_len -= stab->szopts.overhead; + max_frm_len = max_t(int, max_frm_len, + dev->hard_header_len + 1); + } max_sdu_dynamic = max_frm_len - dev->hard_header_len; } -- cgit v1.2.3 From 64cb6aad12328015202af5b2a9623c6bcc021855 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 16 Feb 2023 00:46:32 +0200 Subject: net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean Reviewed-by: Kurt Kanzenbach Signed-off-by: Paolo Abeni --- net/sched/sch_taprio.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 53ba4d6b0218..1f469861eae3 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -288,6 +288,8 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q, dev->hard_header_len + 1); } max_sdu_dynamic = max_frm_len - dev->hard_header_len; + if (max_sdu_dynamic > dev->max_mtu) + max_sdu_dynamic = U32_MAX; } max_sdu = min(max_sdu_dynamic, max_sdu_from_user); -- cgit v1.2.3 From e40b801b3603a8f90b46acbacdea3505c27f01c0 Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 16 Feb 2023 14:37:36 +0800 Subject: net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link() There is a certain chance to trigger the following panic: PID: 5900 TASK: ffff88c1c8af4100 CPU: 1 COMMAND: "kworker/1:48" #0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7 #1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a #2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60 #3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7 #4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715 #5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654 #6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62 [exception RIP: ib_alloc_mr+19] RIP: ffffffffc0c9cce3 RSP: ffff9456c1cc7c38 RFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000004 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88c1ea281d00 R8: 000000020a34ffff R9: ffff88c1350bbb20 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000010 R14: ffff88c1ab040a50 R15: ffff88c1ea281d00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc] #8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc] #9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc] The reason here is that when the server tries to create a second link, smc_llc_srv_add_link() has no protection and may add a new link to link group. This breaks the security environment protected by llc_conf_mutex. Fixes: 2d2209f20189 ("net/smc: first part of add link processing as SMC server") Signed-off-by: D. Wythe Reviewed-by: Larysa Zaremba Reviewed-by: Wenjia Zhang Signed-off-by: David S. Miller --- net/smc/af_smc.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index e12d4fa5aece..d9413d43b104 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -1826,8 +1826,10 @@ static int smcr_serv_conf_first_link(struct smc_sock *smc) smc_llc_link_active(link); smcr_lgr_set_type(link->lgr, SMC_LGR_SINGLE); + mutex_lock(&link->lgr->llc_conf_mutex); /* initial contact - try to establish second link */ smc_llc_srv_add_link(link, NULL); + mutex_unlock(&link->lgr->llc_conf_mutex); return 0; } -- cgit v1.2.3 From 475f9ff63ee8c296aa46c6e9e9ad9bdd301c6bdf Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 16 Feb 2023 14:39:05 +0800 Subject: net/smc: fix application data exception There is a certain probability that following exceptions will occur in the wrk benchmark test: Running 10s test @ http://11.213.45.6:80 8 threads and 64 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.72ms 13.94ms 245.33ms 94.17% Req/Sec 1.96k 713.67 5.41k 75.16% 155262 requests in 10.10s, 23.10MB read Non-2xx or 3xx responses: 3 We will find that the error is HTTP 400 error, which is a serious exception in our test, which means the application data was corrupted. Consider the following scenarios: CPU0 CPU1 buf_desc->used = 0; cmpxchg(buf_desc->used, 0, 1) deal_with(buf_desc) memset(buf_desc->cpu_addr,0); This will cause the data received by a victim connection to be cleared, thus triggering an HTTP 400 error in the server. This patch exchange the order between clear used and memset, add barrier to ensure memory consistency. Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer") Signed-off-by: D. Wythe Reviewed-by: Wenjia Zhang Signed-off-by: David S. Miller --- net/smc/smc_core.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index c305d8dd23f8..c19d4b7c1f28 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1120,8 +1120,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb, smc_buf_free(lgr, is_rmb, buf_desc); } else { - buf_desc->used = 0; - memset(buf_desc->cpu_addr, 0, buf_desc->len); + /* memzero_explicit provides potential memory barrier semantics */ + memzero_explicit(buf_desc->cpu_addr, buf_desc->len); + WRITE_ONCE(buf_desc->used, 0); } } @@ -1132,19 +1133,17 @@ static void smc_buf_unuse(struct smc_connection *conn, if (!lgr->is_smcd && conn->sndbuf_desc->is_vm) { smcr_buf_unuse(conn->sndbuf_desc, false, lgr); } else { - conn->sndbuf_desc->used = 0; - memset(conn->sndbuf_desc->cpu_addr, 0, - conn->sndbuf_desc->len); + memzero_explicit(conn->sndbuf_desc->cpu_addr, conn->sndbuf_desc->len); + WRITE_ONCE(conn->sndbuf_desc->used, 0); } } if (conn->rmb_desc) { if (!lgr->is_smcd) { smcr_buf_unuse(conn->rmb_desc, true, lgr); } else { - conn->rmb_desc->used = 0; - memset(conn->rmb_desc->cpu_addr, 0, - conn->rmb_desc->len + - sizeof(struct smcd_cdc_msg)); + memzero_explicit(conn->rmb_desc->cpu_addr, + conn->rmb_desc->len + sizeof(struct smcd_cdc_msg)); + WRITE_ONCE(conn->rmb_desc->used, 0); } } } -- cgit v1.2.3 From 9f78bf330a66cd400b3e00f370f597e9fa939207 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Thu, 16 Feb 2023 16:30:47 +0800 Subject: xsk: support use vaddr as ring When we try to start AF_XDP on some machines with long running time, due to the machine's memory fragmentation problem, there is no sufficient contiguous physical memory that will cause the start failure. If the size of the queue is 8 * 1024, then the size of the desc[] is 8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is 16page+. This is necessary to apply for a 4-order memory. If there are a lot of queues, it is difficult to these machine with long running time. Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but we only use 17 pages. This patch replaces __get_free_pages() by vmalloc() to allocate memory to solve these problems. Signed-off-by: Xuan Zhuo Acked-by: Magnus Karlsson Reviewed-by: Alexander Lobakin Signed-off-by: David S. Miller --- net/xdp/xsk.c | 9 ++------- net/xdp/xsk_queue.c | 11 +++++------ net/xdp/xsk_queue.h | 1 + 3 files changed, 8 insertions(+), 13 deletions(-) (limited to 'net') diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index a245c1b4a21b..63c82e8bcd8e 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1294,8 +1294,6 @@ static int xsk_mmap(struct file *file, struct socket *sock, unsigned long size = vma->vm_end - vma->vm_start; struct xdp_sock *xs = xdp_sk(sock->sk); struct xsk_queue *q = NULL; - unsigned long pfn; - struct page *qpg; if (READ_ONCE(xs->state) != XSK_READY) return -EBUSY; @@ -1318,13 +1316,10 @@ static int xsk_mmap(struct file *file, struct socket *sock, /* Matches the smp_wmb() in xsk_init_queue */ smp_rmb(); - qpg = virt_to_head_page(q->ring); - if (size > page_size(qpg)) + if (size > q->ring_vmalloc_size) return -EINVAL; - pfn = virt_to_phys(q->ring) >> PAGE_SHIFT; - return remap_pfn_range(vma, vma->vm_start, pfn, - size, vma->vm_page_prot); + return remap_vmalloc_range(vma, q->ring, 0); } static int xsk_notifier(struct notifier_block *this, diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c index 6cf9586e5027..f8905400ee07 100644 --- a/net/xdp/xsk_queue.c +++ b/net/xdp/xsk_queue.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include "xsk_queue.h" @@ -23,7 +24,6 @@ static size_t xskq_get_ring_size(struct xsk_queue *q, bool umem_queue) struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) { struct xsk_queue *q; - gfp_t gfp_flags; size_t size; q = kzalloc(sizeof(*q), GFP_KERNEL); @@ -33,17 +33,16 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) q->nentries = nentries; q->ring_mask = nentries - 1; - gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | - __GFP_COMP | __GFP_NORETRY; size = xskq_get_ring_size(q, umem_queue); + size = PAGE_ALIGN(size); - q->ring = (struct xdp_ring *)__get_free_pages(gfp_flags, - get_order(size)); + q->ring = vmalloc_user(size); if (!q->ring) { kfree(q); return NULL; } + q->ring_vmalloc_size = size; return q; } @@ -52,6 +51,6 @@ void xskq_destroy(struct xsk_queue *q) if (!q) return; - page_frag_free(q->ring); + vfree(q->ring); kfree(q); } diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index c6fb6b763658..bfb2a7e50c26 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -45,6 +45,7 @@ struct xsk_queue { struct xdp_ring *ring; u64 invalid_descs; u64 queue_empty_descs; + size_t ring_vmalloc_size; }; /* The structure of the shared state of the rings are a simple -- cgit v1.2.3 From dd1b527831a3ed659afa01b672d8e1f7e6ca95a5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 15:47:18 +0000 Subject: net: add location to trace_consume_skb() kfree_skb() includes the location, it makes sense to add it to consume_skb() as well. After patch: taskd_EventMana 8602 [004] 420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic swapper 0 [011] 422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc discipline 9141 [043] 423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp swapper 0 [010] 423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv borglet 8672 [014] 425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump swapper 0 [028] 426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action wget 14339 [009] 426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/trace/events/skb.h | 10 ++++++---- net/core/dev.c | 2 +- net/core/skbuff.c | 8 ++++---- 3 files changed, 11 insertions(+), 9 deletions(-) (limited to 'net') diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 25ab1ff9423d..07e0715628ec 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -53,19 +53,21 @@ TRACE_EVENT(kfree_skb, TRACE_EVENT(consume_skb, - TP_PROTO(struct sk_buff *skb), + TP_PROTO(struct sk_buff *skb, void *location), - TP_ARGS(skb), + TP_ARGS(skb, location), TP_STRUCT__entry( - __field( void *, skbaddr ) + __field( void *, skbaddr) + __field( void *, location) ), TP_fast_assign( __entry->skbaddr = skb; + __entry->location = location; ), - TP_printk("skbaddr=%p", __entry->skbaddr) + TP_printk("skbaddr=%p location=%pS", __entry->skbaddr, __entry->location) ); TRACE_EVENT(skb_copy_datagram_iovec, diff --git a/net/core/dev.c b/net/core/dev.c index 5687b528d4c1..18dc8d75ead9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5019,7 +5019,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) WARN_ON(refcount_read(&skb->users)); if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED)) - trace_consume_skb(skb); + trace_consume_skb(skb, net_tx_action); else trace_kfree_skb(skb, net_tx_action, SKB_DROP_REASON_NOT_SPECIFIED); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 98ebce9f6a51..eb7d33b41e71 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -991,7 +991,7 @@ bool __kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) DEBUG_NET_WARN_ON_ONCE(reason <= 0 || reason >= SKB_DROP_REASON_MAX); if (reason == SKB_CONSUMED) - trace_consume_skb(skb); + trace_consume_skb(skb, __builtin_return_address(0)); else trace_kfree_skb(skb, __builtin_return_address(0), reason); return true; @@ -1189,7 +1189,7 @@ void consume_skb(struct sk_buff *skb) if (!skb_unref(skb)) return; - trace_consume_skb(skb); + trace_consume_skb(skb, __builtin_return_address(0)); __kfree_skb(skb); } EXPORT_SYMBOL(consume_skb); @@ -1204,7 +1204,7 @@ EXPORT_SYMBOL(consume_skb); */ void __consume_stateless_skb(struct sk_buff *skb) { - trace_consume_skb(skb); + trace_consume_skb(skb, __builtin_return_address(0)); skb_release_data(skb, SKB_CONSUMED); kfree_skbmem(skb); } @@ -1260,7 +1260,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget) return; /* if reaching here SKB is ready to free */ - trace_consume_skb(skb); + trace_consume_skb(skb, __builtin_return_address(0)); /* if SKB is a clone, don't handle this case */ if (skb->fclone != SKB_FCLONE_UNAVAILABLE) { -- cgit v1.2.3 From 7c9c8913f4523cda30a710736844d74bfee060a4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:35 +0000 Subject: ipv6: icmp6: add drop reason support to ndisc_recv_ns() Change ndisc_recv_ns() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/ndisc.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) (limited to 'net') diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 9548b5a44714..039722f83c66 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -783,7 +783,7 @@ void ndisc_update(const struct net_device *dev, struct neighbour *neigh, ndisc_ops_update(dev, neigh, flags, icmp6_type, ndopts); } -static void ndisc_recv_ns(struct sk_buff *skb) +static enum skb_drop_reason ndisc_recv_ns(struct sk_buff *skb) { struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb); const struct in6_addr *saddr = &ipv6_hdr(skb)->saddr; @@ -797,18 +797,17 @@ static void ndisc_recv_ns(struct sk_buff *skb) struct inet6_dev *idev = NULL; struct neighbour *neigh; int dad = ipv6_addr_any(saddr); - bool inc; int is_router = -1; + SKB_DR(reason); u64 nonce = 0; + bool inc; - if (skb->len < sizeof(struct nd_msg)) { - ND_PRINTK(2, warn, "NS: packet too short\n"); - return; - } + if (skb->len < sizeof(struct nd_msg)) + return SKB_DROP_REASON_PKT_TOO_SMALL; if (ipv6_addr_is_multicast(&msg->target)) { ND_PRINTK(2, warn, "NS: multicast target address\n"); - return; + return reason; } /* @@ -817,12 +816,12 @@ static void ndisc_recv_ns(struct sk_buff *skb) */ if (dad && !ipv6_addr_is_solict_mult(daddr)) { ND_PRINTK(2, warn, "NS: bad DAD packet (wrong destination)\n"); - return; + return reason; } if (!ndisc_parse_options(dev, msg->opt, ndoptlen, &ndopts)) { ND_PRINTK(2, warn, "NS: invalid ND options\n"); - return; + return reason; } if (ndopts.nd_opts_src_lladdr) { @@ -830,7 +829,7 @@ static void ndisc_recv_ns(struct sk_buff *skb) if (!lladdr) { ND_PRINTK(2, warn, "NS: invalid link-layer address length\n"); - return; + return reason; } /* RFC2461 7.1.1: @@ -841,7 +840,7 @@ static void ndisc_recv_ns(struct sk_buff *skb) if (dad) { ND_PRINTK(2, warn, "NS: bad DAD packet (link-layer address option)\n"); - return; + return reason; } } if (ndopts.nd_opts_nonce && ndopts.nd_opts_nonce->nd_opt_len == 1) @@ -869,7 +868,7 @@ have_ifp: * so fail our DAD process */ addrconf_dad_failure(skb, ifp); - return; + return reason; } else { /* * This is not a dad solicitation. @@ -901,7 +900,7 @@ have_ifp: idev = in6_dev_get(dev); if (!idev) { /* XXX: count this drop? */ - return; + return reason; } if (ipv6_chk_acast_addr(net, dev, &msg->target) || @@ -958,6 +957,7 @@ have_ifp: true, (ifp != NULL && inc), inc); if (neigh) neigh_release(neigh); + reason = SKB_CONSUMED; } out: @@ -965,6 +965,7 @@ out: in6_ifa_put(ifp); else in6_dev_put(idev); + return reason; } static int accept_untracked_na(struct net_device *dev, struct in6_addr *saddr) @@ -1781,8 +1782,9 @@ release: static void pndisc_redo(struct sk_buff *skb) { - ndisc_recv_ns(skb); - kfree_skb(skb); + enum skb_drop_reason reason = ndisc_recv_ns(skb); + + kfree_skb_reason(skb, reason); } static int ndisc_is_multicast(const void *pkey) @@ -1834,7 +1836,7 @@ enum skb_drop_reason ndisc_rcv(struct sk_buff *skb) switch (msg->icmph.icmp6_type) { case NDISC_NEIGHBOUR_SOLICITATION: memset(NEIGH_CB(skb), 0, sizeof(struct neighbour_cb)); - ndisc_recv_ns(skb); + reason = ndisc_recv_ns(skb); break; case NDISC_NEIGHBOUR_ADVERTISEMENT: -- cgit v1.2.3 From 3009f9ae21ec539985977dcaa0cebb628afeb181 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:36 +0000 Subject: ipv6: icmp6: add drop reason support to ndisc_recv_na() Change ndisc_recv_na() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/ndisc.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) (limited to 'net') diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 039722f83c66..9354cb3669c8 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -987,7 +987,7 @@ static int accept_untracked_na(struct net_device *dev, struct in6_addr *saddr) } } -static void ndisc_recv_na(struct sk_buff *skb) +static enum skb_drop_reason ndisc_recv_na(struct sk_buff *skb) { struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb); struct in6_addr *saddr = &ipv6_hdr(skb)->saddr; @@ -1000,22 +1000,21 @@ static void ndisc_recv_na(struct sk_buff *skb) struct inet6_dev *idev = __in6_dev_get(dev); struct inet6_ifaddr *ifp; struct neighbour *neigh; + SKB_DR(reason); u8 new_state; - if (skb->len < sizeof(struct nd_msg)) { - ND_PRINTK(2, warn, "NA: packet too short\n"); - return; - } + if (skb->len < sizeof(struct nd_msg)) + return SKB_DROP_REASON_PKT_TOO_SMALL; if (ipv6_addr_is_multicast(&msg->target)) { ND_PRINTK(2, warn, "NA: target address is multicast\n"); - return; + return reason; } if (ipv6_addr_is_multicast(daddr) && msg->icmph.icmp6_solicited) { ND_PRINTK(2, warn, "NA: solicited NA is multicasted\n"); - return; + return reason; } /* For some 802.11 wireless deployments (and possibly other networks), @@ -1025,18 +1024,18 @@ static void ndisc_recv_na(struct sk_buff *skb) */ if (!msg->icmph.icmp6_solicited && idev && idev->cnf.drop_unsolicited_na) - return; + return reason; if (!ndisc_parse_options(dev, msg->opt, ndoptlen, &ndopts)) { ND_PRINTK(2, warn, "NS: invalid ND option\n"); - return; + return reason; } if (ndopts.nd_opts_tgt_lladdr) { lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr, dev); if (!lladdr) { ND_PRINTK(2, warn, "NA: invalid link-layer address length\n"); - return; + return reason; } } ifp = ipv6_get_ifaddr(dev_net(dev), &msg->target, dev, 1); @@ -1044,7 +1043,7 @@ static void ndisc_recv_na(struct sk_buff *skb) if (skb->pkt_type != PACKET_LOOPBACK && (ifp->flags & IFA_F_TENTATIVE)) { addrconf_dad_failure(skb, ifp); - return; + return reason; } /* What should we make now? The advertisement is invalid, but ndisc specs say nothing @@ -1060,7 +1059,7 @@ static void ndisc_recv_na(struct sk_buff *skb) "NA: %pM advertised our address %pI6c on %s!\n", eth_hdr(skb)->h_source, &ifp->addr, ifp->idev->dev->name); in6_ifa_put(ifp); - return; + return reason; } neigh = neigh_lookup(&nd_tbl, &msg->target, dev); @@ -1121,10 +1120,11 @@ static void ndisc_recv_na(struct sk_buff *skb) */ rt6_clean_tohost(dev_net(dev), saddr); } - + reason = SKB_CONSUMED; out: neigh_release(neigh); } + return reason; } static void ndisc_recv_rs(struct sk_buff *skb) @@ -1840,7 +1840,7 @@ enum skb_drop_reason ndisc_rcv(struct sk_buff *skb) break; case NDISC_NEIGHBOUR_ADVERTISEMENT: - ndisc_recv_na(skb); + reason = ndisc_recv_na(skb); break; case NDISC_ROUTER_SOLICITATION: -- cgit v1.2.3 From 243e37c642ac1d52858bc5f3559e380babf21169 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:37 +0000 Subject: ipv6: icmp6: add drop reason support to ndisc_recv_rs() Change ndisc_recv_rs() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED or SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/ndisc.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 9354cb3669c8..514eb8cc7879 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1127,7 +1127,7 @@ out: return reason; } -static void ndisc_recv_rs(struct sk_buff *skb) +static enum skb_drop_reason ndisc_recv_rs(struct sk_buff *skb) { struct rs_msg *rs_msg = (struct rs_msg *)skb_transport_header(skb); unsigned long ndoptlen = skb->len - sizeof(*rs_msg); @@ -1136,14 +1136,15 @@ static void ndisc_recv_rs(struct sk_buff *skb) const struct in6_addr *saddr = &ipv6_hdr(skb)->saddr; struct ndisc_options ndopts; u8 *lladdr = NULL; + SKB_DR(reason); if (skb->len < sizeof(*rs_msg)) - return; + return SKB_DROP_REASON_PKT_TOO_SMALL; idev = __in6_dev_get(skb->dev); if (!idev) { ND_PRINTK(1, err, "RS: can't find in6 device\n"); - return; + return reason; } /* Don't accept RS if we're not in router mode */ @@ -1178,9 +1179,10 @@ static void ndisc_recv_rs(struct sk_buff *skb) NEIGH_UPDATE_F_OVERRIDE_ISROUTER, NDISC_ROUTER_SOLICITATION, &ndopts); neigh_release(neigh); + reason = SKB_CONSUMED; } out: - return; + return reason; } static void ndisc_ra_useropt(struct sk_buff *ra, struct nd_opt_hdr *opt) @@ -1844,7 +1846,7 @@ enum skb_drop_reason ndisc_rcv(struct sk_buff *skb) break; case NDISC_ROUTER_SOLICITATION: - ndisc_recv_rs(skb); + reason = ndisc_recv_rs(skb); break; case NDISC_ROUTER_ADVERTISEMENT: -- cgit v1.2.3 From 2f326d9d9ff46fb2e45fb3b6ae77eff04332dde6 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:38 +0000 Subject: ipv6: icmp6: add drop reason support to ndisc_router_discovery() Change ndisc_router_discovery() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED and SKB_CONSUMED. More reasons are added later. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/ndisc.c | 37 +++++++++++++++++++------------------ 1 file changed, 19 insertions(+), 18 deletions(-) (limited to 'net') diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 514eb8cc7879..7c8ba308ea49 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1231,20 +1231,21 @@ errout: rtnl_set_sk_err(net, RTNLGRP_ND_USEROPT, err); } -static void ndisc_router_discovery(struct sk_buff *skb) +static enum skb_drop_reason ndisc_router_discovery(struct sk_buff *skb) { struct ra_msg *ra_msg = (struct ra_msg *)skb_transport_header(skb); + bool send_ifinfo_notify = false; struct neighbour *neigh = NULL; - struct inet6_dev *in6_dev; + struct ndisc_options ndopts; struct fib6_info *rt = NULL; + struct inet6_dev *in6_dev; u32 defrtr_usr_metric; + unsigned int pref = 0; + __u32 old_if_flags; struct net *net; + SKB_DR(reason); int lifetime; - struct ndisc_options ndopts; int optlen; - unsigned int pref = 0; - __u32 old_if_flags; - bool send_ifinfo_notify = false; __u8 *opt = (__u8 *)(ra_msg + 1); @@ -1256,17 +1257,15 @@ static void ndisc_router_discovery(struct sk_buff *skb) __func__, skb->dev->name); if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) { ND_PRINTK(2, warn, "RA: source address is not link-local\n"); - return; - } - if (optlen < 0) { - ND_PRINTK(2, warn, "RA: packet too short\n"); - return; + return reason; } + if (optlen < 0) + return SKB_DROP_REASON_PKT_TOO_SMALL; #ifdef CONFIG_IPV6_NDISC_NODETYPE if (skb->ndisc_nodetype == NDISC_NODETYPE_HOST) { ND_PRINTK(2, warn, "RA: from host or unauthorized router\n"); - return; + return reason; } #endif @@ -1278,12 +1277,12 @@ static void ndisc_router_discovery(struct sk_buff *skb) if (!in6_dev) { ND_PRINTK(0, err, "RA: can't find inet6 device for %s\n", skb->dev->name); - return; + return reason; } if (!ndisc_parse_options(skb->dev, opt, optlen, &ndopts)) { ND_PRINTK(2, warn, "RA: invalid ND options\n"); - return; + return reason; } if (!ipv6_accept_ra(in6_dev)) { @@ -1365,7 +1364,7 @@ static void ndisc_router_discovery(struct sk_buff *skb) "RA: %s got default router without neighbour\n", __func__); fib6_info_release(rt); - return; + return reason; } } /* Set default route metric as specified by user */ @@ -1390,7 +1389,7 @@ static void ndisc_router_discovery(struct sk_buff *skb) ND_PRINTK(0, err, "RA: %s failed to add default route\n", __func__); - return; + return reason; } neigh = ip6_neigh_lookup(&rt->fib6_nh->fib_nh_gw6, @@ -1401,7 +1400,7 @@ static void ndisc_router_discovery(struct sk_buff *skb) "RA: %s got default router without neighbour\n", __func__); fib6_info_release(rt); - return; + return reason; } neigh->flags |= NTF_ROUTER; } else if (rt && IPV6_EXTRACT_PREF(rt->fib6_flags) != pref) { @@ -1488,6 +1487,7 @@ skip_linkparms: NEIGH_UPDATE_F_OVERRIDE_ISROUTER| NEIGH_UPDATE_F_ISROUTER, NDISC_ROUTER_ADVERTISEMENT, &ndopts); + reason = SKB_CONSUMED; } if (!ipv6_accept_ra(in6_dev)) { @@ -1598,6 +1598,7 @@ out: fib6_info_release(rt); if (neigh) neigh_release(neigh); + return reason; } static void ndisc_redirect_rcv(struct sk_buff *skb) @@ -1850,7 +1851,7 @@ enum skb_drop_reason ndisc_rcv(struct sk_buff *skb) break; case NDISC_ROUTER_ADVERTISEMENT: - ndisc_router_discovery(skb); + reason = ndisc_router_discovery(skb); break; case NDISC_REDIRECT: -- cgit v1.2.3 From ec993edf05ca4eee5878edf51fdb5ffa2b1decc3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:39 +0000 Subject: ipv6: icmp6: add drop reason support to ndisc_redirect_rcv() Change ndisc_redirect_rcv() to return a drop reason. For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED and values from icmpv6_notify(). More reasons are added later. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/ndisc.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) (limited to 'net') diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 7c8ba308ea49..e9776aa6f167 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -1601,13 +1601,14 @@ out: return reason; } -static void ndisc_redirect_rcv(struct sk_buff *skb) +static enum skb_drop_reason ndisc_redirect_rcv(struct sk_buff *skb) { - u8 *hdr; - struct ndisc_options ndopts; struct rd_msg *msg = (struct rd_msg *)skb_transport_header(skb); u32 ndoptlen = skb_tail_pointer(skb) - (skb_transport_header(skb) + offsetof(struct rd_msg, opt)); + struct ndisc_options ndopts; + SKB_DR(reason); + u8 *hdr; #ifdef CONFIG_IPV6_NDISC_NODETYPE switch (skb->ndisc_nodetype) { @@ -1615,31 +1616,31 @@ static void ndisc_redirect_rcv(struct sk_buff *skb) case NDISC_NODETYPE_NODEFAULT: ND_PRINTK(2, warn, "Redirect: from host or unauthorized router\n"); - return; + return reason; } #endif if (!(ipv6_addr_type(&ipv6_hdr(skb)->saddr) & IPV6_ADDR_LINKLOCAL)) { ND_PRINTK(2, warn, "Redirect: source address is not link-local\n"); - return; + return reason; } if (!ndisc_parse_options(skb->dev, msg->opt, ndoptlen, &ndopts)) - return; + return reason; if (!ndopts.nd_opts_rh) { ip6_redirect_no_header(skb, dev_net(skb->dev), skb->dev->ifindex); - return; + return reason; } hdr = (u8 *)ndopts.nd_opts_rh; hdr += 8; if (!pskb_pull(skb, hdr - skb_transport_header(skb))) - return; + return SKB_DROP_REASON_PKT_TOO_SMALL; - icmpv6_notify(skb, NDISC_REDIRECT, 0, 0); + return icmpv6_notify(skb, NDISC_REDIRECT, 0, 0); } static void ndisc_fill_redirect_hdr_option(struct sk_buff *skb, @@ -1855,7 +1856,7 @@ enum skb_drop_reason ndisc_rcv(struct sk_buff *skb) break; case NDISC_REDIRECT: - ndisc_redirect_rcv(skb); + reason = ndisc_redirect_rcv(skb); break; } -- cgit v1.2.3 From 784d4477f07b930df73bc77e842e03f1dacb83aa Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:40 +0000 Subject: ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS This is a generic drop reason for any error detected in ndisc_parse_options(). Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/dropreason.h | 3 +++ net/ipv6/ndisc.c | 27 ++++++++++----------------- 2 files changed, 13 insertions(+), 17 deletions(-) (limited to 'net') diff --git a/include/net/dropreason.h b/include/net/dropreason.h index ef3f65d135d3..239a5c0ea83e 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -76,6 +76,7 @@ FN(IPV6_NDISC_FRAG) \ FN(IPV6_NDISC_HOP_LIMIT) \ FN(IPV6_NDISC_BAD_CODE) \ + FN(IPV6_NDISC_BAD_OPTIONS) \ FNe(MAX) /** @@ -330,6 +331,8 @@ enum skb_drop_reason { SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT, /** @SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code. */ SKB_DROP_REASON_IPV6_NDISC_BAD_CODE, + /** @SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS: invalid NDISC options. */ + SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS, /** * @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be * used as a real 'reason' diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index e9776aa6f167..b47e845d66eb 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -819,10 +819,8 @@ static enum skb_drop_reason ndisc_recv_ns(struct sk_buff *skb) return reason; } - if (!ndisc_parse_options(dev, msg->opt, ndoptlen, &ndopts)) { - ND_PRINTK(2, warn, "NS: invalid ND options\n"); - return reason; - } + if (!ndisc_parse_options(dev, msg->opt, ndoptlen, &ndopts)) + return SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS; if (ndopts.nd_opts_src_lladdr) { lladdr = ndisc_opt_addr_data(ndopts.nd_opts_src_lladdr, dev); @@ -1026,10 +1024,9 @@ static enum skb_drop_reason ndisc_recv_na(struct sk_buff *skb) idev->cnf.drop_unsolicited_na) return reason; - if (!ndisc_parse_options(dev, msg->opt, ndoptlen, &ndopts)) { - ND_PRINTK(2, warn, "NS: invalid ND option\n"); - return reason; - } + if (!ndisc_parse_options(dev, msg->opt, ndoptlen, &ndopts)) + return SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS; + if (ndopts.nd_opts_tgt_lladdr) { lladdr = ndisc_opt_addr_data(ndopts.nd_opts_tgt_lladdr, dev); if (!lladdr) { @@ -1159,10 +1156,8 @@ static enum skb_drop_reason ndisc_recv_rs(struct sk_buff *skb) goto out; /* Parse ND options */ - if (!ndisc_parse_options(skb->dev, rs_msg->opt, ndoptlen, &ndopts)) { - ND_PRINTK(2, notice, "NS: invalid ND option, ignored\n"); - goto out; - } + if (!ndisc_parse_options(skb->dev, rs_msg->opt, ndoptlen, &ndopts)) + return SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS; if (ndopts.nd_opts_src_lladdr) { lladdr = ndisc_opt_addr_data(ndopts.nd_opts_src_lladdr, @@ -1280,10 +1275,8 @@ static enum skb_drop_reason ndisc_router_discovery(struct sk_buff *skb) return reason; } - if (!ndisc_parse_options(skb->dev, opt, optlen, &ndopts)) { - ND_PRINTK(2, warn, "RA: invalid ND options\n"); - return reason; - } + if (!ndisc_parse_options(skb->dev, opt, optlen, &ndopts)) + return SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS; if (!ipv6_accept_ra(in6_dev)) { ND_PRINTK(2, info, @@ -1627,7 +1620,7 @@ static enum skb_drop_reason ndisc_redirect_rcv(struct sk_buff *skb) } if (!ndisc_parse_options(skb->dev, msg->opt, ndoptlen, &ndopts)) - return reason; + return SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS; if (!ndopts.nd_opts_rh) { ip6_redirect_no_header(skb, dev_net(skb->dev), -- cgit v1.2.3 From c34b8bb11ebc135e970653bd6fc8e3f863fb6a81 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:41 +0000 Subject: ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST Hosts can often receive neighbour discovery messages that are not for them. Use a dedicated drop reason to make clear the packet is dropped for this normal case. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/dropreason.h | 5 +++++ net/ipv6/ndisc.c | 4 +++- 2 files changed, 8 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 239a5c0ea83e..c0a3ea806cd5 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -77,6 +77,7 @@ FN(IPV6_NDISC_HOP_LIMIT) \ FN(IPV6_NDISC_BAD_CODE) \ FN(IPV6_NDISC_BAD_OPTIONS) \ + FN(IPV6_NDISC_NS_OTHERHOST) \ FNe(MAX) /** @@ -333,6 +334,10 @@ enum skb_drop_reason { SKB_DROP_REASON_IPV6_NDISC_BAD_CODE, /** @SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS: invalid NDISC options. */ SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS, + /** @SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST: NEIGHBOUR SOLICITATION + * for another host. + */ + SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST, /** * @SKB_DROP_REASON_MAX: the maximum of drop reason, which shouldn't be * used as a real 'reason' diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index b47e845d66eb..c4be62c99f73 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -921,8 +921,10 @@ have_ifp: pneigh_enqueue(&nd_tbl, idev->nd_parms, n); goto out; } - } else + } else { + SKB_DR_SET(reason, IPV6_NDISC_NS_OTHERHOST); goto out; + } } if (is_router < 0) -- cgit v1.2.3 From ac03694bc009703022cf45a9c90675d5505c584c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Feb 2023 16:28:42 +0000 Subject: ipv6: icmp6: add drop reason support to icmpv6_echo_reply() Change icmpv6_echo_reply() to return a drop reason. For the moment, return NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/icmp.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) (limited to 'net') diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index f32bc98155bf..1f53f2a74480 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -705,7 +705,7 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type, } EXPORT_SYMBOL(ip6_err_gen_icmpv6_unreach); -static void icmpv6_echo_reply(struct sk_buff *skb) +static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); struct sock *sk; @@ -719,18 +719,19 @@ static void icmpv6_echo_reply(struct sk_buff *skb) struct dst_entry *dst; struct ipcm6_cookie ipc6; u32 mark = IP6_REPLY_MARK(net, skb->mark); + SKB_DR(reason); bool acast; u8 type; if (ipv6_addr_is_multicast(&ipv6_hdr(skb)->daddr) && net->ipv6.sysctl.icmpv6_echo_ignore_multicast) - return; + return reason; saddr = &ipv6_hdr(skb)->daddr; acast = ipv6_anycast_destination(skb_dst(skb), saddr); if (acast && net->ipv6.sysctl.icmpv6_echo_ignore_anycast) - return; + return reason; if (!ipv6_unicast_destination(skb) && !(net->ipv6.sysctl.anycast_src_echo_reply && acast)) @@ -804,6 +805,7 @@ static void icmpv6_echo_reply(struct sk_buff *skb) } else { icmpv6_push_pending_frames(sk, &fl6, &tmp_hdr, skb->len + sizeof(struct icmp6hdr)); + reason = SKB_CONSUMED; } out_dst_release: dst_release(dst); @@ -811,6 +813,7 @@ out: icmpv6_xmit_unlock(sk); out_bh_enable: local_bh_enable(); + return reason; } enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type, @@ -929,12 +932,12 @@ static int icmpv6_rcv(struct sk_buff *skb) switch (type) { case ICMPV6_ECHO_REQUEST: if (!net->ipv6.sysctl.icmpv6_echo_ignore_all) - icmpv6_echo_reply(skb); + reason = icmpv6_echo_reply(skb); break; case ICMPV6_EXT_ECHO_REQUEST: if (!net->ipv6.sysctl.icmpv6_echo_ignore_all && READ_ONCE(net->ipv4.sysctl_icmp_echo_enable_probe)) - icmpv6_echo_reply(skb); + reason = icmpv6_echo_reply(skb); break; case ICMPV6_ECHO_REPLY: -- cgit v1.2.3 From 9ca5e7ecab064f1f47da07f7c1ddf40e4bc0e5ac Mon Sep 17 00:00:00 2001 From: Shigeru Yoshida Date: Fri, 17 Feb 2023 01:37:10 +0900 Subject: l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register() When a file descriptor of pppol2tp socket is passed as file descriptor of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register(). This situation is reproduced by the following program: int main(void) { int sock; struct sockaddr_pppol2tp addr; sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); if (sock < 0) { perror("socket"); return 1; } addr.sa_family = AF_PPPOX; addr.sa_protocol = PX_PROTO_OL2TP; addr.pppol2tp.pid = 0; addr.pppol2tp.fd = sock; addr.pppol2tp.addr.sin_family = PF_INET; addr.pppol2tp.addr.sin_port = htons(0); addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1"); addr.pppol2tp.s_tunnel = 1; addr.pppol2tp.s_session = 0; addr.pppol2tp.d_tunnel = 0; addr.pppol2tp.d_session = 0; if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) { perror("connect"); return 1; } return 0; } This program causes the following lockdep warning: ============================================ WARNING: possible recursive locking detected 6.2.0-rc5-00205-gc96618275234 #56 Not tainted -------------------------------------------- repro/8607 is trying to acquire lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0 but task is already holding lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_PPPOX); lock(sk_lock-AF_PPPOX); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by repro/8607: #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 stack backtrace: CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: dump_stack_lvl+0x100/0x178 __lock_acquire.cold+0x119/0x3b9 ? lockdep_hardirqs_on_prepare+0x410/0x410 lock_acquire+0x1e0/0x610 ? l2tp_tunnel_register+0x2b7/0x11c0 ? lock_downgrade+0x710/0x710 ? __fget_files+0x283/0x3e0 lock_sock_nested+0x3a/0xf0 ? l2tp_tunnel_register+0x2b7/0x11c0 l2tp_tunnel_register+0x2b7/0x11c0 ? sprintf+0xc4/0x100 ? l2tp_tunnel_del_work+0x6b0/0x6b0 ? debug_object_deactivate+0x320/0x320 ? lockdep_init_map_type+0x16d/0x7a0 ? lockdep_init_map_type+0x16d/0x7a0 ? l2tp_tunnel_create+0x2bf/0x4b0 ? l2tp_tunnel_create+0x3c6/0x4b0 pppol2tp_connect+0x14e1/0x1a30 ? pppol2tp_put_sk+0xd0/0xd0 ? aa_sk_perm+0x2b7/0xa80 ? aa_af_perm+0x260/0x260 ? bpf_lsm_socket_connect+0x9/0x10 ? pppol2tp_put_sk+0xd0/0xd0 __sys_connect_file+0x14f/0x190 __sys_connect+0x133/0x160 ? __sys_connect_file+0x190/0x190 ? lockdep_hardirqs_on+0x7d/0x100 ? ktime_get_coarse_real_ts64+0x1b7/0x200 ? ktime_get_coarse_real_ts64+0x147/0x200 ? __audit_syscall_entry+0x396/0x500 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x38/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This patch fixes the issue by getting/creating the tunnel before locking the pppol2tp socket. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Cc: Cong Wang Signed-off-by: Shigeru Yoshida Reviewed-by: Guillaume Nault Signed-off-by: David S. Miller --- net/l2tp/l2tp_ppp.c | 125 ++++++++++++++++++++++++++++------------------------ 1 file changed, 67 insertions(+), 58 deletions(-) (limited to 'net') diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index db2e584c625e..f011af6601c9 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -650,54 +650,22 @@ static int pppol2tp_tunnel_mtu(const struct l2tp_tunnel *tunnel) return mtu - PPPOL2TP_HEADER_OVERHEAD; } -/* connect() handler. Attach a PPPoX socket to a tunnel UDP socket - */ -static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, - int sockaddr_len, int flags) +static struct l2tp_tunnel *pppol2tp_tunnel_get(struct net *net, + const struct l2tp_connect_info *info, + bool *new_tunnel) { - struct sock *sk = sock->sk; - struct pppox_sock *po = pppox_sk(sk); - struct l2tp_session *session = NULL; - struct l2tp_connect_info info; struct l2tp_tunnel *tunnel; - struct pppol2tp_session *ps; - struct l2tp_session_cfg cfg = { 0, }; - bool drop_refcnt = false; - bool drop_tunnel = false; - bool new_session = false; - bool new_tunnel = false; int error; - error = pppol2tp_sockaddr_get_info(uservaddr, sockaddr_len, &info); - if (error < 0) - return error; + *new_tunnel = false; - lock_sock(sk); - - /* Check for already bound sockets */ - error = -EBUSY; - if (sk->sk_state & PPPOX_CONNECTED) - goto end; - - /* We don't supporting rebinding anyway */ - error = -EALREADY; - if (sk->sk_user_data) - goto end; /* socket is already attached */ - - /* Don't bind if tunnel_id is 0 */ - error = -EINVAL; - if (!info.tunnel_id) - goto end; - - tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id); - if (tunnel) - drop_tunnel = true; + tunnel = l2tp_tunnel_get(net, info->tunnel_id); /* Special case: create tunnel context if session_id and * peer_session_id is 0. Otherwise look up tunnel using supplied * tunnel id. */ - if (!info.session_id && !info.peer_session_id) { + if (!info->session_id && !info->peer_session_id) { if (!tunnel) { struct l2tp_tunnel_cfg tcfg = { .encap = L2TP_ENCAPTYPE_UDP, @@ -706,40 +674,82 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, /* Prevent l2tp_tunnel_register() from trying to set up * a kernel socket. */ - if (info.fd < 0) { - error = -EBADF; - goto end; - } + if (info->fd < 0) + return ERR_PTR(-EBADF); - error = l2tp_tunnel_create(info.fd, - info.version, - info.tunnel_id, - info.peer_tunnel_id, &tcfg, + error = l2tp_tunnel_create(info->fd, + info->version, + info->tunnel_id, + info->peer_tunnel_id, &tcfg, &tunnel); if (error < 0) - goto end; + return ERR_PTR(error); l2tp_tunnel_inc_refcount(tunnel); - error = l2tp_tunnel_register(tunnel, sock_net(sk), - &tcfg); + error = l2tp_tunnel_register(tunnel, net, &tcfg); if (error < 0) { kfree(tunnel); - goto end; + return ERR_PTR(error); } - drop_tunnel = true; - new_tunnel = true; + + *new_tunnel = true; } } else { /* Error if we can't find the tunnel */ - error = -ENOENT; if (!tunnel) - goto end; + return ERR_PTR(-ENOENT); /* Error if socket is not prepped */ - if (!tunnel->sock) - goto end; + if (!tunnel->sock) { + l2tp_tunnel_dec_refcount(tunnel); + return ERR_PTR(-ENOENT); + } } + return tunnel; +} + +/* connect() handler. Attach a PPPoX socket to a tunnel UDP socket + */ +static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr, + int sockaddr_len, int flags) +{ + struct sock *sk = sock->sk; + struct pppox_sock *po = pppox_sk(sk); + struct l2tp_session *session = NULL; + struct l2tp_connect_info info; + struct l2tp_tunnel *tunnel; + struct pppol2tp_session *ps; + struct l2tp_session_cfg cfg = { 0, }; + bool drop_refcnt = false; + bool new_session = false; + bool new_tunnel = false; + int error; + + error = pppol2tp_sockaddr_get_info(uservaddr, sockaddr_len, &info); + if (error < 0) + return error; + + /* Don't bind if tunnel_id is 0 */ + if (!info.tunnel_id) + return -EINVAL; + + tunnel = pppol2tp_tunnel_get(sock_net(sk), &info, &new_tunnel); + if (IS_ERR(tunnel)) + return PTR_ERR(tunnel); + + lock_sock(sk); + + /* Check for already bound sockets */ + error = -EBUSY; + if (sk->sk_state & PPPOX_CONNECTED) + goto end; + + /* We don't supporting rebinding anyway */ + error = -EALREADY; + if (sk->sk_user_data) + goto end; /* socket is already attached */ + if (tunnel->peer_tunnel_id == 0) tunnel->peer_tunnel_id = info.peer_tunnel_id; @@ -840,8 +850,7 @@ end: } if (drop_refcnt) l2tp_session_dec_refcount(session); - if (drop_tunnel) - l2tp_tunnel_dec_refcount(tunnel); + l2tp_tunnel_dec_refcount(tunnel); release_sock(sk); return error; -- cgit v1.2.3 From 50bcfe8df7c73ce51762f65d218b4ef0cc5da3ee Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Fri, 17 Feb 2023 13:28:49 +0100 Subject: net: make default_rps_mask a per netns attribute That really was meant to be a per netns attribute from the beginning. The idea is that once proper isolation is in place in the main namespace, additional demux in the child namespaces will be redundant. Let's make child netns default rps mask empty by default. To avoid bloating the netns with a possibly large cpumask, allocate it on-demand during the first write operation. Signed-off-by: Paolo Abeni Signed-off-by: David S. Miller --- include/linux/netdevice.h | 1 - include/net/netns/core.h | 5 +++++ net/core/net-sysfs.c | 23 ++++++++++++++------- net/core/sysctl_net_core.c | 51 ++++++++++++++++++++++++++++++++++------------ 4 files changed, 59 insertions(+), 21 deletions(-) (limited to 'net') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index efbee940bb03..6a14b7b11766 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -224,7 +224,6 @@ struct net_device_core_stats { #include extern struct static_key_false rps_needed; extern struct static_key_false rfs_needed; -extern struct cpumask rps_default_mask; #endif struct neighbour; diff --git a/include/net/netns/core.h b/include/net/netns/core.h index 8249060cf5d0..a91ef9f8de60 100644 --- a/include/net/netns/core.h +++ b/include/net/netns/core.h @@ -6,6 +6,7 @@ struct ctl_table_header; struct prot_inuse; +struct cpumask; struct netns_core { /* core sysctls */ @@ -17,6 +18,10 @@ struct netns_core { #ifdef CONFIG_PROC_FS struct prot_inuse __percpu *prot_inuse; #endif + +#if IS_ENABLED(CONFIG_RPS) && IS_ENABLED(CONFIG_SYSCTL) + struct cpumask *rps_default_mask; +#endif }; #endif diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index e20784b6f873..15e3f4606b5f 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1060,6 +1060,18 @@ static const struct kobj_type rx_queue_ktype = { .get_ownership = rx_queue_get_ownership, }; +static int rx_queue_default_mask(struct net_device *dev, + struct netdev_rx_queue *queue) +{ +#if IS_ENABLED(CONFIG_RPS) && IS_ENABLED(CONFIG_SYSCTL) + struct cpumask *rps_default_mask = READ_ONCE(dev_net(dev)->core.rps_default_mask); + + if (rps_default_mask && !cpumask_empty(rps_default_mask)) + return netdev_rx_queue_set_rps_mask(queue, rps_default_mask); +#endif + return 0; +} + static int rx_queue_add_kobject(struct net_device *dev, int index) { struct netdev_rx_queue *queue = dev->_rx + index; @@ -1083,13 +1095,10 @@ static int rx_queue_add_kobject(struct net_device *dev, int index) goto err; } -#if IS_ENABLED(CONFIG_RPS) && IS_ENABLED(CONFIG_SYSCTL) - if (!cpumask_empty(&rps_default_mask)) { - error = netdev_rx_queue_set_rps_mask(queue, &rps_default_mask); - if (error) - goto err; - } -#endif + error = rx_queue_default_mask(dev, queue); + if (error) + goto err; + kobject_uevent(kobj, KOBJ_ADD); return error; diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 7130e6d9e263..74842b453407 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -74,24 +74,47 @@ static void dump_cpumask(void *buffer, size_t *lenp, loff_t *ppos, #endif #ifdef CONFIG_RPS -struct cpumask rps_default_mask; + +static struct cpumask *rps_default_mask_cow_alloc(struct net *net) +{ + struct cpumask *rps_default_mask; + + if (net->core.rps_default_mask) + return net->core.rps_default_mask; + + rps_default_mask = kzalloc(cpumask_size(), GFP_KERNEL); + if (!rps_default_mask) + return NULL; + + /* pairs with READ_ONCE in rx_queue_default_mask() */ + WRITE_ONCE(net->core.rps_default_mask, rps_default_mask); + return rps_default_mask; +} static int rps_default_mask_sysctl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { + struct net *net = (struct net *)table->data; int err = 0; rtnl_lock(); if (write) { - err = cpumask_parse(buffer, &rps_default_mask); + struct cpumask *rps_default_mask = rps_default_mask_cow_alloc(net); + + err = -ENOMEM; + if (!rps_default_mask) + goto done; + + err = cpumask_parse(buffer, rps_default_mask); if (err) goto done; - err = rps_cpumask_housekeeping(&rps_default_mask); + err = rps_cpumask_housekeeping(rps_default_mask); if (err) goto done; } else { - dump_cpumask(buffer, lenp, ppos, &rps_default_mask); + dump_cpumask(buffer, lenp, ppos, + net->core.rps_default_mask ? : cpu_none_mask); } done: @@ -508,11 +531,6 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = rps_sock_flow_sysctl }, - { - .procname = "rps_default_mask", - .mode = 0644, - .proc_handler = rps_default_mask_sysctl - }, #endif #ifdef CONFIG_NET_FLOW_LIMIT { @@ -639,6 +657,14 @@ static struct ctl_table net_core_table[] = { }; static struct ctl_table netns_core_table[] = { +#if IS_ENABLED(CONFIG_RPS) + { + .procname = "rps_default_mask", + .data = &init_net, + .mode = 0644, + .proc_handler = rps_default_mask_sysctl + }, +#endif { .procname = "somaxconn", .data = &init_net.core.sysctl_somaxconn, @@ -706,6 +732,9 @@ static __net_exit void sysctl_core_net_exit(struct net *net) tbl = net->core.sysctl_hdr->ctl_table_arg; unregister_net_sysctl_table(net->core.sysctl_hdr); BUG_ON(tbl == netns_core_table); +#if IS_ENABLED(CONFIG_RPS) + kfree(net->core.rps_default_mask); +#endif kfree(tbl); } @@ -716,10 +745,6 @@ static __net_initdata struct pernet_operations sysctl_core_ops = { static __init int sysctl_core_init(void) { -#if IS_ENABLED(CONFIG_RPS) - cpumask_copy(&rps_default_mask, cpu_none_mask); -#endif - register_net_sysctl(&init_net, "net/core", net_core_table); return register_pernet_subsys(&sysctl_core_ops); } -- cgit v1.2.3 From fce10282a03db59bdb1cba6333d0564461d47bd6 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Fri, 17 Feb 2023 19:09:20 +0100 Subject: devlink: drop leftover duplicate/unused code The recent merge from net left-over some unused code in leftover.c - nomen omen. Just drop the unused bits. Signed-off-by: Paolo Abeni Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller --- net/devlink/leftover.c | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'net') diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 8fc85ba3f6e0..dffca2f9bfa7 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -3837,19 +3837,6 @@ int devlink_resources_validate(struct devlink *devlink, return err; } -static void devlink_param_notify(struct devlink *devlink, - unsigned int port_index, - struct devlink_param_item *param_item, - enum devlink_command cmd); - -struct devlink_info_req { - struct sk_buff *msg; - void (*version_cb)(const char *version_name, - enum devlink_info_version_type version_type, - void *version_cb_priv); - void *version_cb_priv; -}; - static const struct devlink_param devlink_param_generic[] = { { .id = DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET, -- cgit v1.2.3 From 5f1eb1ff58ea122e24adf0bc940f268ed2227462 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 17 Feb 2023 18:24:54 +0000 Subject: scm: add user copy checks to put_cmsg() This is a followup of commit 2558b8039d05 ("net: use a bounce buffer for copying skb->mark") x86 and powerpc define user_access_begin, meaning that they are not able to perform user copy checks when using user_write_access_begin() / unsafe_copy_to_user() and friends [1] Instead of waiting bugs to trigger on other arches, add a check_object_size() in put_cmsg() to make sure that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y will perform more security checks. [1] We can not generically call check_object_size() from unsafe_copy_to_user() because UACCESS is enabled at this point. Signed-off-by: Eric Dumazet Cc: Kees Cook Acked-by: Kees Cook Signed-off-by: David S. Miller --- net/core/scm.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'net') diff --git a/net/core/scm.c b/net/core/scm.c index 5c356f0dee30..acb7d776fa6e 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -229,6 +229,8 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data) if (msg->msg_control_is_user) { struct cmsghdr __user *cm = msg->msg_control_user; + check_object_size(data, cmlen - sizeof(*cm), true); + if (!user_write_access_begin(cm, cmlen)) goto efault; -- cgit v1.2.3 From be9832c2e9cc4c15906a77baddcd906fb4bb864b Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 17 Feb 2023 12:09:20 -0800 Subject: net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). Commit 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status") guarantees that all ULP listeners have clone() op, so we no longer need to test it in inet_clone_ulp(). Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski --- net/ipv4/inet_connection_sock.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'net') diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index eedcf4146d29..65ad4251f6fd 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1122,8 +1122,7 @@ static void inet_clone_ulp(const struct request_sock *req, struct sock *newsk, if (!icsk->icsk_ulp_ops) return; - if (icsk->icsk_ulp_ops->clone) - icsk->icsk_ulp_ops->clone(req, newsk, priority); + icsk->icsk_ulp_ops->clone(req, newsk, priority); } /** -- cgit v1.2.3 From db4b49025c0c7116f1d2dfe8d5bbfc983ac054de Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Sat, 18 Feb 2023 00:36:13 +0200 Subject: net/sched: Rename user cookie and act cookie struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by: Paul Blakey Reviewed-by: Marcelo Ricardo Leitner Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +- .../net/ethernet/mellanox/mlxsw/spectrum_flower.c | 2 +- include/net/act_api.h | 2 +- include/net/flow_offload.h | 4 ++-- net/sched/act_api.c | 26 ++++++++++---------- net/sched/cls_api.c | 28 +++++++++++----------- 6 files changed, 32 insertions(+), 32 deletions(-) (limited to 'net') diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 9bbd31e304be..b401cf291782 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -4176,7 +4176,7 @@ parse_tc_actions(struct mlx5e_tc_act_parse_state *parse_state, parse_state->actions |= attr->action; if (!tc_act->stats_action) - attr->tc_act_cookies[attr->tc_act_cookies_count++] = act->act_cookie; + attr->tc_act_cookies[attr->tc_act_cookies_count++] = act->cookie; /* Split attr for multi table act if not the last act. */ if (jump_state.jump_target || diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c index e91fb205e0b4..594cdcb90b3d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c @@ -103,7 +103,7 @@ static int mlxsw_sp_flower_parse_actions(struct mlxsw_sp *mlxsw_sp, } ingress = mlxsw_sp_flow_block_is_ingress_bound(block); err = mlxsw_sp_acl_rulei_act_drop(rulei, ingress, - act->cookie, extack); + act->user_cookie, extack); if (err) { NL_SET_ERR_MSG_MOD(extack, "Cannot append drop action"); return err; diff --git a/include/net/act_api.h b/include/net/act_api.h index 2a6f443f0ef6..4ae0580b63ca 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -39,7 +39,7 @@ struct tc_action { struct gnet_stats_basic_sync __percpu *cpu_bstats; struct gnet_stats_basic_sync __percpu *cpu_bstats_hw; struct gnet_stats_queue __percpu *cpu_qstats; - struct tc_cookie __rcu *act_cookie; + struct tc_cookie __rcu *user_cookie; struct tcf_chain __rcu *goto_chain; u32 tcfa_flags; u8 hw_stats; diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 8c05455b1e34..9c5cb12f8a90 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -228,7 +228,7 @@ void flow_action_cookie_destroy(struct flow_action_cookie *cookie); struct flow_action_entry { enum flow_action_id id; u32 hw_index; - unsigned long act_cookie; + unsigned long cookie; enum flow_action_hw_stats hw_stats; action_destr destructor; void *destructor_priv; @@ -321,7 +321,7 @@ struct flow_action_entry { u16 sid; } pppoe; }; - struct flow_action_cookie *cookie; /* user defined action cookie */ + struct flow_action_cookie *user_cookie; /* user defined action cookie */ }; struct flow_action { diff --git a/net/sched/act_api.c b/net/sched/act_api.c index eda58b78da13..e67ebc939901 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -125,7 +125,7 @@ static void free_tcf(struct tc_action *p) free_percpu(p->cpu_bstats_hw); free_percpu(p->cpu_qstats); - tcf_set_action_cookie(&p->act_cookie, NULL); + tcf_set_action_cookie(&p->user_cookie, NULL); if (chain) tcf_chain_put_by_act(chain); @@ -431,14 +431,14 @@ EXPORT_SYMBOL(tcf_idr_release); static size_t tcf_action_shared_attrs_size(const struct tc_action *act) { - struct tc_cookie *act_cookie; + struct tc_cookie *user_cookie; u32 cookie_len = 0; rcu_read_lock(); - act_cookie = rcu_dereference(act->act_cookie); + user_cookie = rcu_dereference(act->user_cookie); - if (act_cookie) - cookie_len = nla_total_size(act_cookie->len); + if (user_cookie) + cookie_len = nla_total_size(user_cookie->len); rcu_read_unlock(); return nla_total_size(0) /* action number nested */ @@ -488,7 +488,7 @@ tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a, bool from_act) goto nla_put_failure; rcu_read_lock(); - cookie = rcu_dereference(a->act_cookie); + cookie = rcu_dereference(a->user_cookie); if (cookie) { if (nla_put(skb, TCA_ACT_COOKIE, cookie->len, cookie->data)) { rcu_read_unlock(); @@ -1362,9 +1362,9 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, { bool police = flags & TCA_ACT_FLAGS_POLICE; struct nla_bitfield32 userflags = { 0, 0 }; + struct tc_cookie *user_cookie = NULL; u8 hw_stats = TCA_ACT_HW_STATS_ANY; struct nlattr *tb[TCA_ACT_MAX + 1]; - struct tc_cookie *cookie = NULL; struct tc_action *a; int err; @@ -1375,8 +1375,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, if (err < 0) return ERR_PTR(err); if (tb[TCA_ACT_COOKIE]) { - cookie = nla_memdup_cookie(tb); - if (!cookie) { + user_cookie = nla_memdup_cookie(tb); + if (!user_cookie) { NL_SET_ERR_MSG(extack, "No memory to generate TC cookie"); err = -ENOMEM; goto err_out; @@ -1402,7 +1402,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, *init_res = err; if (!police && tb[TCA_ACT_COOKIE]) - tcf_set_action_cookie(&a->act_cookie, cookie); + tcf_set_action_cookie(&a->user_cookie, user_cookie); if (!police) a->hw_stats = hw_stats; @@ -1410,9 +1410,9 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp, return a; err_out: - if (cookie) { - kfree(cookie->data); - kfree(cookie); + if (user_cookie) { + kfree(user_cookie->data); + kfree(user_cookie); } return ERR_PTR(err); } diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index bfabc9c95fa9..656049ead8bb 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -3490,28 +3490,28 @@ int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp, } EXPORT_SYMBOL(tc_setup_cb_reoffload); -static int tcf_act_get_cookie(struct flow_action_entry *entry, - const struct tc_action *act) +static int tcf_act_get_user_cookie(struct flow_action_entry *entry, + const struct tc_action *act) { - struct tc_cookie *cookie; + struct tc_cookie *user_cookie; int err = 0; rcu_read_lock(); - cookie = rcu_dereference(act->act_cookie); - if (cookie) { - entry->cookie = flow_action_cookie_create(cookie->data, - cookie->len, - GFP_ATOMIC); - if (!entry->cookie) + user_cookie = rcu_dereference(act->user_cookie); + if (user_cookie) { + entry->user_cookie = flow_action_cookie_create(user_cookie->data, + user_cookie->len, + GFP_ATOMIC); + if (!entry->user_cookie) err = -ENOMEM; } rcu_read_unlock(); return err; } -static void tcf_act_put_cookie(struct flow_action_entry *entry) +static void tcf_act_put_user_cookie(struct flow_action_entry *entry) { - flow_action_cookie_destroy(entry->cookie); + flow_action_cookie_destroy(entry->user_cookie); } void tc_cleanup_offload_action(struct flow_action *flow_action) @@ -3520,7 +3520,7 @@ void tc_cleanup_offload_action(struct flow_action *flow_action) int i; flow_action_for_each(i, entry, flow_action) { - tcf_act_put_cookie(entry); + tcf_act_put_user_cookie(entry); if (entry->destructor) entry->destructor(entry->destructor_priv); } @@ -3565,7 +3565,7 @@ int tc_setup_action(struct flow_action *flow_action, entry = &flow_action->entries[j]; spin_lock_bh(&act->tcfa_lock); - err = tcf_act_get_cookie(entry, act); + err = tcf_act_get_user_cookie(entry, act); if (err) goto err_out_locked; @@ -3577,7 +3577,7 @@ int tc_setup_action(struct flow_action *flow_action, for (k = 0; k < index ; k++) { entry[k].hw_stats = tc_act_hw_stats(act->hw_stats); entry[k].hw_index = act->tcfa_index; - entry[k].act_cookie = (unsigned long)act; + entry[k].cookie = (unsigned long)act; } j += index; -- cgit v1.2.3 From 80cd22c35c9001fe72bf614d29439de41933deca Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Sat, 18 Feb 2023 00:36:14 +0200 Subject: net/sched: cls_api: Support hardware miss to tc action For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by: Paul Blakey Reviewed-by: Jiri Pirko Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Acked-by: Jamal Hadi Salim Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 6 +- include/net/flow_offload.h | 1 + include/net/pkt_cls.h | 34 ++++--- include/net/sch_generic.h | 2 + net/openvswitch/flow.c | 3 +- net/sched/act_api.c | 2 +- net/sched/cls_api.c | 215 ++++++++++++++++++++++++++++++++++++++++++--- 7 files changed, 236 insertions(+), 27 deletions(-) (limited to 'net') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d5602b15c714..ff7ad331fb82 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -319,12 +319,16 @@ struct nf_bridge_info { * and read by ovs to recirc_id. */ struct tc_skb_ext { - __u32 chain; + union { + u64 act_miss_cookie; + __u32 chain; + }; __u16 mru; __u16 zone; u8 post_ct:1; u8 post_ct_snat:1; u8 post_ct_dnat:1; + u8 act_miss:1; /* Set if act_miss_cookie is used */ }; #endif diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h index 9c5cb12f8a90..118082eae48c 100644 --- a/include/net/flow_offload.h +++ b/include/net/flow_offload.h @@ -229,6 +229,7 @@ struct flow_action_entry { enum flow_action_id id; u32 hw_index; unsigned long cookie; + u64 miss_cookie; enum flow_action_hw_stats hw_stats; action_destr destructor; void *destructor_priv; diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index ace437c6754b..b3b5b0b62f16 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -59,6 +59,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, void tcf_block_put(struct tcf_block *block); void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, struct tcf_block_ext_info *ei); +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action, + int police, struct tcf_proto *tp, u32 handle, bool used_action_miss); static inline bool tcf_block_shared(struct tcf_block *block) { @@ -229,6 +231,7 @@ struct tcf_exts { struct tc_action **actions; struct net *net; netns_tracker ns_tracker; + struct tcf_exts_miss_cookie_node *miss_cookie_node; #endif /* Map to export classifier specific extension TLV types to the * generic extensions API. Unsupported extensions must be set to 0. @@ -240,21 +243,11 @@ struct tcf_exts { static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net, int action, int police) { -#ifdef CONFIG_NET_CLS_ACT - exts->type = 0; - exts->nr_actions = 0; - /* Note: we do not own yet a reference on net. - * This reference might be taken later from tcf_exts_get_net(). - */ - exts->net = net; - exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *), - GFP_KERNEL); - if (!exts->actions) - return -ENOMEM; +#ifdef CONFIG_NET_CLS + return tcf_exts_init_ex(exts, net, action, police, NULL, 0, false); +#else + return -EOPNOTSUPP; #endif - exts->action = action; - exts->police = police; - return 0; } /* Return false if the netns is being destroyed in cleanup_net(). Callers @@ -360,6 +353,18 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts, return TC_ACT_OK; } +static inline int +tcf_exts_exec_ex(struct sk_buff *skb, struct tcf_exts *exts, int act_index, + struct tcf_result *res) +{ +#ifdef CONFIG_NET_CLS_ACT + return tcf_action_exec(skb, exts->actions + act_index, + exts->nr_actions - act_index, res); +#else + return TC_ACT_OK; +#endif +} + int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb, struct nlattr *rate_tlv, struct tcf_exts *exts, u32 flags, @@ -584,6 +589,7 @@ int tc_setup_offload_action(struct flow_action *flow_action, void tc_cleanup_offload_action(struct flow_action *flow_action); int tc_setup_action(struct flow_action *flow_action, struct tc_action *actions[], + u32 miss_cookie_base, struct netlink_ext_ack *extack); int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index af4aa66aaa4e..fab5ba3e61b7 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -369,6 +369,8 @@ struct tcf_proto_ops { struct nlattr **tca, struct netlink_ext_ack *extack); void (*tmplt_destroy)(void *tmplt_priv); + struct tcf_exts * (*get_exts)(const struct tcf_proto *tp, + u32 handle); /* rtnetlink specific */ int (*dump)(struct net*, struct tcf_proto*, void *, diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 416976f70322..33b21a0c0548 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -1041,7 +1041,8 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) if (tc_skb_ext_tc_enabled()) { tc_ext = skb_ext_find(skb, TC_SKB_EXT); - key->recirc_id = tc_ext ? tc_ext->chain : 0; + key->recirc_id = tc_ext && !tc_ext->act_miss ? + tc_ext->chain : 0; OVS_CB(skb)->mru = tc_ext ? tc_ext->mru : 0; post_ct = tc_ext ? tc_ext->post_ct : false; post_ct_snat = post_ct ? tc_ext->post_ct_snat : false; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index e67ebc939901..fce522886099 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -268,7 +268,7 @@ static int tcf_action_offload_add_ex(struct tc_action *action, if (err) goto fl_err; - err = tc_setup_action(&fl_action->action, actions, extack); + err = tc_setup_action(&fl_action->action, actions, 0, extack); if (err) { NL_SET_ERR_MSG_MOD(extack, "Failed to setup tc actions for offload"); diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 656049ead8bb..3569e2c3660c 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -50,6 +51,109 @@ static LIST_HEAD(tcf_proto_base); /* Protects list of registered TC modules. It is pure SMP lock. */ static DEFINE_RWLOCK(cls_mod_lock); +static struct xarray tcf_exts_miss_cookies_xa; +struct tcf_exts_miss_cookie_node { + const struct tcf_chain *chain; + const struct tcf_proto *tp; + const struct tcf_exts *exts; + u32 chain_index; + u32 tp_prio; + u32 handle; + u32 miss_cookie_base; + struct rcu_head rcu; +}; + +/* Each tc action entry cookie will be comprised of 32bit miss_cookie_base + + * action index in the exts tc actions array. + */ +union tcf_exts_miss_cookie { + struct { + u32 miss_cookie_base; + u32 act_index; + }; + u64 miss_cookie; +}; + +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) +static int +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp, + u32 handle) +{ + struct tcf_exts_miss_cookie_node *n; + static u32 next; + int err; + + if (WARN_ON(!handle || !tp->ops->get_exts)) + return -EINVAL; + + n = kzalloc(sizeof(*n), GFP_KERNEL); + if (!n) + return -ENOMEM; + + n->chain_index = tp->chain->index; + n->chain = tp->chain; + n->tp_prio = tp->prio; + n->tp = tp; + n->exts = exts; + n->handle = handle; + + err = xa_alloc_cyclic(&tcf_exts_miss_cookies_xa, &n->miss_cookie_base, + n, xa_limit_32b, &next, GFP_KERNEL); + if (err) + goto err_xa_alloc; + + exts->miss_cookie_node = n; + return 0; + +err_xa_alloc: + kfree(n); + return err; +} + +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts) +{ + struct tcf_exts_miss_cookie_node *n; + + if (!exts->miss_cookie_node) + return; + + n = exts->miss_cookie_node; + xa_erase(&tcf_exts_miss_cookies_xa, n->miss_cookie_base); + kfree_rcu(n, rcu); +} + +static struct tcf_exts_miss_cookie_node * +tcf_exts_miss_cookie_lookup(u64 miss_cookie, int *act_index) +{ + union tcf_exts_miss_cookie mc = { .miss_cookie = miss_cookie, }; + + *act_index = mc.act_index; + return xa_load(&tcf_exts_miss_cookies_xa, mc.miss_cookie_base); +} +#else /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */ +static int +tcf_exts_miss_cookie_base_alloc(struct tcf_exts *exts, struct tcf_proto *tp, + u32 handle) +{ + return 0; +} + +static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts) +{ +} +#endif /* IS_ENABLED(CONFIG_NET_TC_SKB_EXT) */ + +static u64 tcf_exts_miss_cookie_get(u32 miss_cookie_base, int act_index) +{ + union tcf_exts_miss_cookie mc = { .act_index = act_index, }; + + if (!miss_cookie_base) + return 0; + + mc.miss_cookie_base = miss_cookie_base; + return mc.miss_cookie; +} + #ifdef CONFIG_NET_CLS_ACT DEFINE_STATIC_KEY_FALSE(tc_skb_ext_tc); EXPORT_SYMBOL(tc_skb_ext_tc); @@ -1549,6 +1653,8 @@ static inline int __tcf_classify(struct sk_buff *skb, const struct tcf_proto *orig_tp, struct tcf_result *res, bool compat_mode, + struct tcf_exts_miss_cookie_node *n, + int act_index, u32 *last_executed_chain) { #ifdef CONFIG_NET_CLS_ACT @@ -1560,13 +1666,36 @@ reclassify: #endif for (; tp; tp = rcu_dereference_bh(tp->next)) { __be16 protocol = skb_protocol(skb, false); - int err; + int err = 0; - if (tp->protocol != protocol && - tp->protocol != htons(ETH_P_ALL)) - continue; + if (n) { + struct tcf_exts *exts; + + if (n->tp_prio != tp->prio) + continue; + + /* We re-lookup the tp and chain based on index instead + * of having hard refs and locks to them, so do a sanity + * check if any of tp,chain,exts was replaced by the + * time we got here with a cookie from hardware. + */ + if (unlikely(n->tp != tp || n->tp->chain != n->chain || + !tp->ops->get_exts)) + return TC_ACT_SHOT; + + exts = tp->ops->get_exts(tp, n->handle); + if (unlikely(!exts || n->exts != exts)) + return TC_ACT_SHOT; - err = tc_classify(skb, tp, res); + n = NULL; + err = tcf_exts_exec_ex(skb, exts, act_index, res); + } else { + if (tp->protocol != protocol && + tp->protocol != htons(ETH_P_ALL)) + continue; + + err = tc_classify(skb, tp, res); + } #ifdef CONFIG_NET_CLS_ACT if (unlikely(err == TC_ACT_RECLASSIFY && !compat_mode)) { first_tp = orig_tp; @@ -1582,6 +1711,9 @@ reclassify: return err; } + if (unlikely(n)) + return TC_ACT_SHOT; + return TC_ACT_UNSPEC; /* signal: continue lookup */ #ifdef CONFIG_NET_CLS_ACT reset: @@ -1606,21 +1738,35 @@ int tcf_classify(struct sk_buff *skb, #if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT) u32 last_executed_chain = 0; - return __tcf_classify(skb, tp, tp, res, compat_mode, + return __tcf_classify(skb, tp, tp, res, compat_mode, NULL, 0, &last_executed_chain); #else u32 last_executed_chain = tp ? tp->chain->index : 0; + struct tcf_exts_miss_cookie_node *n = NULL; const struct tcf_proto *orig_tp = tp; struct tc_skb_ext *ext; + int act_index = 0; int ret; if (block) { ext = skb_ext_find(skb, TC_SKB_EXT); - if (ext && ext->chain) { + if (ext && (ext->chain || ext->act_miss)) { struct tcf_chain *fchain; + u32 chain; + + if (ext->act_miss) { + n = tcf_exts_miss_cookie_lookup(ext->act_miss_cookie, + &act_index); + if (!n) + return TC_ACT_SHOT; - fchain = tcf_chain_lookup_rcu(block, ext->chain); + chain = n->chain_index; + } else { + chain = ext->chain; + } + + fchain = tcf_chain_lookup_rcu(block, chain); if (!fchain) return TC_ACT_SHOT; @@ -1632,7 +1778,7 @@ int tcf_classify(struct sk_buff *skb, } } - ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, + ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode, n, act_index, &last_executed_chain); if (tc_skb_ext_tc_enabled()) { @@ -3056,9 +3202,48 @@ out: return skb->len; } +int tcf_exts_init_ex(struct tcf_exts *exts, struct net *net, int action, + int police, struct tcf_proto *tp, u32 handle, + bool use_action_miss) +{ + int err = 0; + +#ifdef CONFIG_NET_CLS_ACT + exts->type = 0; + exts->nr_actions = 0; + /* Note: we do not own yet a reference on net. + * This reference might be taken later from tcf_exts_get_net(). + */ + exts->net = net; + exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *), + GFP_KERNEL); + if (!exts->actions) + return -ENOMEM; +#endif + + exts->action = action; + exts->police = police; + + if (!use_action_miss) + return 0; + + err = tcf_exts_miss_cookie_base_alloc(exts, tp, handle); + if (err) + goto err_miss_alloc; + + return 0; + +err_miss_alloc: + tcf_exts_destroy(exts); + return err; +} +EXPORT_SYMBOL(tcf_exts_init_ex); + void tcf_exts_destroy(struct tcf_exts *exts) { #ifdef CONFIG_NET_CLS_ACT + tcf_exts_miss_cookie_base_destroy(exts); + if (exts->actions) { tcf_action_destroy(exts->actions, TCA_ACT_UNBIND); kfree(exts->actions); @@ -3547,6 +3732,7 @@ static int tc_setup_offload_act(struct tc_action *act, int tc_setup_action(struct flow_action *flow_action, struct tc_action *actions[], + u32 miss_cookie_base, struct netlink_ext_ack *extack) { int i, j, k, index, err = 0; @@ -3578,6 +3764,8 @@ int tc_setup_action(struct flow_action *flow_action, entry[k].hw_stats = tc_act_hw_stats(act->hw_stats); entry[k].hw_index = act->tcfa_index; entry[k].cookie = (unsigned long)act; + entry[k].miss_cookie = + tcf_exts_miss_cookie_get(miss_cookie_base, i); } j += index; @@ -3600,10 +3788,15 @@ int tc_setup_offload_action(struct flow_action *flow_action, struct netlink_ext_ack *extack) { #ifdef CONFIG_NET_CLS_ACT + u32 miss_cookie_base; + if (!exts) return 0; - return tc_setup_action(flow_action, exts->actions, extack); + miss_cookie_base = exts->miss_cookie_node ? + exts->miss_cookie_node->miss_cookie_base : 0; + return tc_setup_action(flow_action, exts->actions, miss_cookie_base, + extack); #else return 0; #endif @@ -3771,6 +3964,8 @@ static int __init tc_filter_init(void) if (err) goto err_register_pernet_subsys; + xa_init_flags(&tcf_exts_miss_cookies_xa, XA_FLAGS_ALLOC1); + rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, RTNL_FLAG_DOIT_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, -- cgit v1.2.3 From 08a0063df3aed8d76a4034279117db12dbc1050f Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Sat, 18 Feb 2023 00:36:15 +0200 Subject: net/sched: flower: Move filter handle initialization earlier To support miss to action during hardware offload the filter's handle is needed when setting up the actions (tcf_exts_init()), and before offloading. Move filter handle initialization earlier. Signed-off-by: Paul Blakey Reviewed-by: Jiri Pirko Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Signed-off-by: Jakub Kicinski --- net/sched/cls_flower.c | 62 ++++++++++++++++++++++++++++---------------------- 1 file changed, 35 insertions(+), 27 deletions(-) (limited to 'net') diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 885c95191ccf..be01d39dd7b9 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -2187,10 +2187,6 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, INIT_LIST_HEAD(&fnew->hw_list); refcount_set(&fnew->refcnt, 1); - err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0); - if (err < 0) - goto errout; - if (tb[TCA_FLOWER_FLAGS]) { fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]); @@ -2200,15 +2196,45 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, } } + if (!fold) { + spin_lock(&tp->lock); + if (!handle) { + handle = 1; + err = idr_alloc_u32(&head->handle_idr, fnew, &handle, + INT_MAX, GFP_ATOMIC); + } else { + err = idr_alloc_u32(&head->handle_idr, fnew, &handle, + handle, GFP_ATOMIC); + + /* Filter with specified handle was concurrently + * inserted after initial check in cls_api. This is not + * necessarily an error if NLM_F_EXCL is not set in + * message flags. Returning EAGAIN will cause cls_api to + * try to update concurrently inserted rule. + */ + if (err == -ENOSPC) + err = -EAGAIN; + } + spin_unlock(&tp->lock); + + if (err) + goto errout; + } + fnew->handle = handle; + + err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0); + if (err < 0) + goto errout_idr; + err = fl_set_parms(net, tp, fnew, mask, base, tb, tca[TCA_RATE], tp->chain->tmplt_priv, flags, fnew->flags, extack); if (err) - goto errout; + goto errout_idr; err = fl_check_assign_mask(head, fnew, fold, mask); if (err) - goto errout; + goto errout_idr; err = fl_ht_insert_unique(fnew, fold, &in_ht); if (err) @@ -2274,29 +2300,9 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, refcount_dec(&fold->refcnt); __fl_put(fold); } else { - if (handle) { - /* user specifies a handle and it doesn't exist */ - err = idr_alloc_u32(&head->handle_idr, fnew, &handle, - handle, GFP_ATOMIC); - - /* Filter with specified handle was concurrently - * inserted after initial check in cls_api. This is not - * necessarily an error if NLM_F_EXCL is not set in - * message flags. Returning EAGAIN will cause cls_api to - * try to update concurrently inserted rule. - */ - if (err == -ENOSPC) - err = -EAGAIN; - } else { - handle = 1; - err = idr_alloc_u32(&head->handle_idr, fnew, &handle, - INT_MAX, GFP_ATOMIC); - } - if (err) - goto errout_hw; + idr_replace(&head->handle_idr, fnew, fnew->handle); refcount_inc(&fnew->refcnt); - fnew->handle = handle; list_add_tail_rcu(&fnew->list, &fnew->mask->filters); spin_unlock(&tp->lock); } @@ -2319,6 +2325,8 @@ errout_hw: fnew->mask->filter_ht_params); errout_mask: fl_mask_put(head, fnew->mask); +errout_idr: + idr_remove(&head->handle_idr, fnew->handle); errout: __fl_put(fnew); errout_tb: -- cgit v1.2.3 From 606c7c43d08c4daf954ec8d58ae6dd39292fb457 Mon Sep 17 00:00:00 2001 From: Paul Blakey Date: Sat, 18 Feb 2023 00:36:16 +0200 Subject: net/sched: flower: Support hardware miss to tc action To support hardware miss to tc action in actions on the flower classifier, implement the required getting of filter actions, and setup filter exts (actions) miss by giving it the filter's handle and actions. Signed-off-by: Paul Blakey Reviewed-by: Jiri Pirko Reviewed-by: Simon Horman Reviewed-by: Marcelo Ricardo Leitner Signed-off-by: Jakub Kicinski --- net/sched/cls_flower.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) (limited to 'net') diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index be01d39dd7b9..e960a46b0520 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -529,6 +529,15 @@ static struct cls_fl_filter *__fl_get(struct cls_fl_head *head, u32 handle) return f; } +static struct tcf_exts *fl_get_exts(const struct tcf_proto *tp, u32 handle) +{ + struct cls_fl_head *head = rcu_dereference_bh(tp->root); + struct cls_fl_filter *f; + + f = idr_find(&head->handle_idr, handle); + return f ? &f->exts : NULL; +} + static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f, bool *last, bool rtnl_held, struct netlink_ext_ack *extack) @@ -2222,7 +2231,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, } fnew->handle = handle; - err = tcf_exts_init(&fnew->exts, net, TCA_FLOWER_ACT, 0); + err = tcf_exts_init_ex(&fnew->exts, net, TCA_FLOWER_ACT, 0, tp, handle, + !tc_skip_hw(fnew->flags)); if (err < 0) goto errout_idr; @@ -3444,6 +3454,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = { .tmplt_create = fl_tmplt_create, .tmplt_destroy = fl_tmplt_destroy, .tmplt_dump = fl_tmplt_dump, + .get_exts = fl_get_exts, .owner = THIS_MODULE, .flags = TCF_PROTO_OPS_DOIT_UNLOCKED, }; -- cgit v1.2.3 From 951bce29c8988209cc359e1fa35a4aaa35542fd5 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Tue, 21 Feb 2023 15:51:40 +0800 Subject: xsk: add linux/vmalloc.h to xsk.c Fix the failure of the compilation under the sh4. Because we introduced remap_vmalloc_range() earlier, this has caused the compilation failure on the sh4 platform. So this introduction of the header file of linux/vmalloc.h. config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20230221/202302210041.kpPQLlNQ-lkp@intel.com/config) compiler: sh4-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=9f78bf330a66cd400b3e00f370f597e9fa939207 git remote add net-next https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git git fetch --no-tags net-next master git checkout 9f78bf330a66cd400b3e00f370f597e9fa939207 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sh SHELL=/bin/bash net/ Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring") Signed-off-by: Xuan Zhuo Reported-by: kernel test robot Link: https://lore.kernel.org/oe-kbuild-all/202302210041.kpPQLlNQ-lkp@intel.com/ Link: https://lore.kernel.org/r/20230221075140.46988-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski --- net/xdp/xsk.c | 1 + 1 file changed, 1 insertion(+) (limited to 'net') diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 45eef5af0a51..2ac58b282b5e 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include -- cgit v1.2.3 From 7ec077744aade63729d58c2d26861c3147a4d8aa Mon Sep 17 00:00:00 2001 From: Bo Liu Date: Tue, 21 Feb 2023 03:30:36 -0500 Subject: ethtool: pse-pd: Fix double word in comments Remove the repeated word "for" in comments. Signed-off-by: Bo Liu Reviewed-by: Oleksij Rempel Link: https://lore.kernel.org/r/20230221083036.2414-1-liubo03@inspur.com Signed-off-by: Jakub Kicinski --- net/ethtool/pse-pd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ethtool/pse-pd.c b/net/ethtool/pse-pd.c index a5b607b0a652..530b8b99e6df 100644 --- a/net/ethtool/pse-pd.c +++ b/net/ethtool/pse-pd.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only // -// ethtool interface for for Ethernet PSE (Power Sourcing Equipment) +// ethtool interface for Ethernet PSE (Power Sourcing Equipment) // and PD (Powered Device) // // Copyright (c) 2022 Pengutronix, Oleksij Rempel -- cgit v1.2.3 From a00da30c052f07d67da56efd6a4f1fc85956c979 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Mon, 20 Feb 2023 14:23:31 +0200 Subject: net: ethtool: fix __ethtool_dev_mm_supported() implementation The MAC Merge layer is supported when ops->get_mm() returns 0. The implementation was changed during review, and in this process, a bug was introduced. Link: https://lore.kernel.org/netdev/20230111161706.1465242-5-vladimir.oltean@nxp.com/ Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Signed-off-by: Vladimir Oltean Reviewed-by: Ferenc Fejes Link: https://lore.kernel.org/all/20230220122343.1156614-2-vladimir.oltean@nxp.com/ Signed-off-by: Jakub Kicinski --- net/ethtool/mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net') diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c index e612856eed8c..fce3cc2734f9 100644 --- a/net/ethtool/mm.c +++ b/net/ethtool/mm.c @@ -247,5 +247,5 @@ bool __ethtool_dev_mm_supported(struct net_device *dev) if (ops && ops->get_mm) ret = ops->get_mm(dev, &state); - return !!ret; + return !ret; } -- cgit v1.2.3