summaryrefslogtreecommitdiff
path: root/net/bridge/br_private.h
AgeCommit message (Collapse)AuthorFilesLines
2023-05-31skbuff: bridge: Add layer 2 miss indicationIdo Schimmel1-0/+27
For EVPN non-DF (Designated Forwarder) filtering we need to be able to prevent decapsulated traffic from being flooded to a multi-homed host. Filtering of multicast and broadcast traffic can be achieved using the following flower filter: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop Unlike broadcast and multicast traffic, it is not currently possible to filter unknown unicast traffic. The classification into unknown unicast is performed by the bridge driver, but is not visible to other layers such as tc. Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear the bit whenever a packet enters the bridge (received from a bridge port or transmitted via the bridge) and set it if the packet did not match an FDB or MDB entry. If there is no skb extension and the bit needs to be cleared, then do not allocate one as no extension is equivalent to the bit being cleared. The bit is not set for broadcast packets as they never perform a lookup and therefore never incur a miss. A bit that is set for every flooded packet would also work for the current use case, but it does not allow us to differentiate between registered and unregistered multicast traffic, which might be useful in the future. To keep the performance impact to a minimum, the marking of packets is guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not touched and an skb extension is not allocated. Instead, only a 5 bytes nop is executed, as demonstrated below for the call site in br_handle_frame(). Before the patch: ``` memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37b09: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12) c37b10: 00 00 p = br_port_get_rcu(skb->dev); c37b12: 49 8b 44 24 10 mov 0x10(%r12),%rax memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37b17: 49 c7 44 24 30 00 00 movq $0x0,0x30(%r12) c37b1e: 00 00 c37b20: 49 c7 44 24 38 00 00 movq $0x0,0x38(%r12) c37b27: 00 00 ``` After the patch (when static key is disabled): ``` memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37c29: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12) c37c30: 00 00 c37c32: 49 8d 44 24 28 lea 0x28(%r12),%rax c37c37: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) c37c3e: 00 c37c3f: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) c37c46: 00 #ifdef CONFIG_HAVE_JUMP_LABEL_HACK static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { asm_volatile_goto("1:" c37c47: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) br_tc_skb_miss_set(skb, false); p = br_port_get_rcu(skb->dev); c37c4c: 49 8b 44 24 10 mov 0x10(%r12),%rax ``` Subsequent patches will extend the flower classifier to be able to match on the new 'l2_miss' bit and enable / disable the static key when filters that match on it are added / deleted. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21bridge: Encapsulate data path neighbor suppression logicIdo Schimmel1-0/+1
Currently, there are various places in the bridge data path that check whether neighbor suppression is enabled on a given bridge port. As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate this logic in a function and pass the VLAN ID of the packet as an argument. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Add internal flags for per-{Port, VLAN} neighbor suppressionIdo Schimmel1-0/+1
Add two internal flags that will be used to enable / disable per-{Port, VLAN} neighbor suppression: 1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that per-{Port, VLAN} neighbor suppression is enabled on the bridge port. When set, 'BR_NEIGH_SUPPRESS' has no effect. 2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate that neighbor suppression is enabled on the given VLAN. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Pass VLAN ID to br_flood()Ido Schimmel1-1/+2
Subsequent patches are going to add per-{Port, VLAN} neighbor suppression, which will require br_flood() to potentially suppress ARP / NS packets on a per-{Port, VLAN} basis. As a preparation, pass the VLAN ID of the packet as another argument to br_flood(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17rtnetlink: bridge: mcast: Move MDB handlers out of bridge driverIdo Schimmel1-23/+12
Currently, the bridge driver registers handlers for MDB netlink messages, making it impossible for other drivers to implement MDB support. As a preparation for VXLAN MDB support, move the MDB handlers out of the bridge driver to the core rtnetlink code. The rtnetlink code will call into individual drivers by invoking their previously added MDB net device operations. Note that while the diffstat is large, the change is mechanical. It moves code out of the bridge driver to rtnetlink code. Also note that a similar change was made in 2012 with commit 77162022ab26 ("net: add generic PF_BRIDGE:RTM_ FDB hooks") that moved FDB handlers out of the bridge driver to the core rtnetlink code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17bridge: mcast: Implement MDB net device operationsIdo Schimmel1-0/+25
Implement the previously added MDB net device operations in the bridge driver so that they could be invoked by core rtnetlink code in the next patch. The operations are identical to the existing br_mdb_{dump,add,del} functions. The '_new' suffix will be removed in the next patch. The functions are re-implemented in this patch to make the conversion in the next patch easier to review. Add dummy implementations when 'CONFIG_BRIDGE_IGMP_SNOOPING' is disabled, so that an error will be returned to user space when it is trying to add or delete an MDB entry. This is consistent with existing behavior where the bridge driver does not even register rtnetlink handlers for RTM_{NEW,DEL,GET}MDB messages when this Kconfig option is disabled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06net: bridge: Add netlink knobs for number / maximum MDB entriesPetr Machata1-1/+5
The previous patch added accounting for number of MDB entries per port and per port-VLAN, and the logic to verify that these values stay within configured bounds. However it didn't provide means to actually configure those bounds or read the occupancy. This patch does that. Two new netlink attributes are added for the MDB occupancy: IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy. And another two for the maximum number of MDB entries: IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one. Note that the two new IFLA_BRPORT_ attributes prompt bumping of RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough. The new attributes are used like this: # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \ mcast_vlan_snooping 1 mcast_querier 1 # ip link set dev v1 master br # bridge vlan add dev v1 vid 2 # bridge vlan set dev v1 vid 1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1 # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1 Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1. # bridge link set dev v1 mcast_max_groups 1 # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2 Error: bridge: Port is already in 1 groups, and mcast_max_groups=1. # bridge -d link show 5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...] [...] mcast_n_groups 1 mcast_max_groups 1 # bridge -d vlan show port vlan-id br 1 PVID Egress Untagged state forwarding mcast_router 1 v1 1 PVID Egress Untagged [...] mcast_n_groups 1 mcast_max_groups 1 2 [...] mcast_n_groups 0 mcast_max_groups 0 Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06net: bridge: Maintain number of MDB entries in net_bridge_mcast_portPetr Machata1-0/+2
The MDB maintained by the bridge is limited. When the bridge is configured for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its capacity. In SW datapath, the capacity is configurable through the IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a similar limit exists in the HW datapath for purposes of offloading. In order to prevent the issue of unilateral exhaustion of MDB resources, introduce two parameters in each of two contexts: - Per-port and per-port-VLAN number of MDB entries that the port is member in. - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled) per-port-VLAN maximum permitted number of MDB entries, or 0 for no limit. The per-port multicast context is used for tracking of MDB entries for the port as a whole. This is available for all bridges. The per-port-VLAN multicast context is then only available on VLAN-filtering bridges on VLANs that have multicast snooping on. With these changes in place, it will be possible to configure MDB limit for bridge as a whole, or any one port as a whole, or any single port-VLAN. Note that unlike the global limit, exhaustion of the per-port and per-port-VLAN maximums does not cause disablement of multicast snooping. It is also permitted to configure the local limit larger than hash_max, even though that is not useful. In this patch, introduce only the accounting for number of entries, and the max field itself, but not the means to toggle the max. The next patch introduces the netlink APIs to toggle and read the values. Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06net: bridge: Add br_multicast_del_port_group()Petr Machata1-0/+1
Since cleaning up the effects of br_multicast_new_port_group() just consists of delisting and freeing the memory, the function br_mdb_add_group_star_g() inlines the corresponding code. In the following patches, number of per-port and per-port-VLAN MDB entries is going to be maintained, and that counter will have to be updated. Because that logic is going to be hidden in the br_multicast module, introduce a new hook intended to again remove a newly-created group. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06net: bridge: Add extack to br_multicast_new_port_group()Petr Machata1-1/+2
Make it possible to set an extack in br_multicast_new_port_group(). Eventually, this function will check for per-port and per-port-vlan MDB maximums, and will use the extack to communicate the reason for the bounce. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-13bridge: mcast: Support replacement of MDB port group entriesIdo Schimmel1-0/+1
Now that user space can specify additional attributes of port group entries such as filter mode and source list, it makes sense to allow user space to atomically modify these attributes by replacing entries instead of forcing user space to delete the entries and add them back. Replace MDB port group entries when the 'NLM_F_REPLACE' flag is specified in the netlink message header. When a (*, G) entry is replaced, update the following attributes: Source list, state, filter mode, protocol and flags. If the entry is temporary and in EXCLUDE mode, reset the group timer to the group membership interval. If the entry is temporary and in INCLUDE mode, reset the source timers of associated sources to the group membership interval. Examples: # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.2 filter_mode include # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.2 permanent filter_mode include proto static 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto static 0.00 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode include source_list 192.0.2.2/0.00,192.0.2.1/0.00 proto static 0.00 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra blocked 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra blocked 0.00 dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra 0.00 # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 temp source_list 192.0.2.4,192.0.2.3 filter_mode include proto bgp # bridge -d -s mdb show dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.4 temp filter_mode include proto bgp 0.00 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 temp filter_mode include proto bgp 0.00 dev br0 port dummy10 grp 239.1.1.1 temp filter_mode include source_list 192.0.2.4/259.44,192.0.2.3/259.44 proto bgp 0.00 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13bridge: mcast: Allow user space to specify MDB entry routing protocolIdo Schimmel1-0/+1
Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the routing protocol of the MDB port group entry. Enforce a minimum value of 'RTPROT_STATIC' to prevent user space from using protocol values that should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain backward compatibility by defaulting to 'RTPROT_STATIC'. The protocol is already visible to user space in RTM_NEWMDB responses and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute. The routing protocol allows a routing daemon to distinguish between entries configured by it and those configured by the administrator. Once MDB flush is supported, the protocol can be used as a criterion according to which the flush is performed. Examples: # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel Error: integer out of range. # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra # bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250 # bridge -d mdb show dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250 dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250 dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250 dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13bridge: mcast: Add support for (*, G) with a source list and filter modeIdo Schimmel1-0/+7
In preparation for allowing user space to add (*, G) entries with a source list and associated filter mode, add the necessary plumbing to handle such requests. Extend the MDB configuration structure with a currently empty source array and filter mode that is currently hard coded to EXCLUDE. Add the source entries and the corresponding (S, G) entries before making the new (*, G) port group entry visible to the data path. Handle the creation of each source entry in a similar fashion to how it is created from the data path in response to received Membership Reports: Create the source entry, arm the source timer (if needed), add a corresponding (S, G) forwarding entry and finally mark the source entry as installed (by user space). Add the (S, G) entry by populating an MDB configuration structure and calling br_mdb_add_group_sg() as if a new entry is created by user space, with the sole difference that the 'src_entry' field is set to make sure that the group timer of such entries is never armed. Note that it is not currently possible to add more than 32 source entries to a port group entry. If this proves to be a problem we can either increase 'PG_SRC_ENT_LIMIT' or avoid forcing a limit on entries created by user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13bridge: mcast: Avoid arming group timer when (S, G) corresponds to a sourceIdo Schimmel1-0/+1
User space will soon be able to install a (*, G) with a source list, prompting the creation of a (S, G) entry for each source. In this case, the group timer of the (S, G) entry should never be set. Solve this by adding a new field to the MDB configuration structure that denotes whether the (S, G) corresponds to a source or not. The field will be set in a subsequent patch where br_mdb_add_group_sg() is called in order to create a (S, G) entry for each user provided source. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13bridge: mcast: Add a flag for user installed source entriesIdo Schimmel1-0/+1
There are a few places where the bridge driver differentiates between (S, G) entries installed by the kernel (in response to Membership Reports) and those installed by user space. One of them is when deleting an (S, G) entry corresponding to a source entry that is being deleted. While user space cannot currently add a source entry to a (*, G), it can add an (S, G) entry that later corresponds to a source entry created by the reception of a Membership Report. If this source entry is later deleted because its source timer expired or because the (*, G) entry is being deleted, the bridge driver will not delete the corresponding (S, G) entry if it was added by user space as permanent. This is going to be a problem when the ability to install a (*, G) with a source list is exposed to user space. In this case, when user space installs the (*, G) as permanent, then all the (S, G) entries corresponding to its source list will also be installed as permanent. When user space deletes the (*, G), all the source entries will be deleted and the expectation is that the corresponding (S, G) entries will be deleted as well. Solve this by introducing a new source entry flag denoting that the entry was installed by user space. When the entry is deleted, delete the corresponding (S, G) entry even if it was installed by user space as permanent, as the flag tells us that it was installed in response to the source entry being created. The flag will be set in a subsequent patch where source entries are created in response to user requests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13bridge: mcast: Expose __br_multicast_del_group_src()Ido Schimmel1-0/+1
Expose __br_multicast_del_group_src() which is symmetric to br_multicast_new_group_src() and does not remove the installed {S, G} forwarding entry, unlike br_multicast_del_group_src(). The function will be used in the error path when user space was able to add a new source entry, but failed to install a corresponding forwarding entry. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13bridge: mcast: Expose br_multicast_new_group_src()Ido Schimmel1-0/+3
Currently, new group source entries are only created in response to received Membership Reports. Subsequent patches are going to allow user space to install (*, G) entries with a source list. As a preparatory step, expose br_multicast_new_group_src() so that it could later be invoked from the MDB code (i.e., br_mdb.c) that handles RTM_NEWMDB messages. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08bridge: mcast: Constify 'group' argument in br_multicast_new_port_group()Ido Schimmel1-1/+2
The 'group' argument is not modified, so mark it as 'const'. It will allow us to constify arguments of the callers of this function in future patches. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08bridge: mcast: Centralize netlink attribute parsingIdo Schimmel1-0/+7
Netlink attributes are currently passed deep in the MDB creation call chain, making it difficult to add new attributes. In addition, some validity checks are performed under the multicast lock although they can be performed before it is ever acquired. As a first step towards solving these issues, parse the RTM_{NEW,DEL}MDB messages into a configuration structure, relieving other functions from the need to handle raw netlink attributes. Subsequent patches will convert the MDB code to use this configuration structure. This is consistent with how other rtnetlink objects are handled, such as routes and nexthops. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10bridge: switchdev: Allow device drivers to install locked FDB entriesHans J. Schultz1-1/+1
When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize the hardware and software FDBs by generating switchdev events towards the bridge. When a port is locked, the hardware must not learn autonomously, as otherwise any host will blindly gain authorization. Instead, the hardware should generate events regarding hosts that are trying to gain authorization and their MAC addresses should be notified by the device driver as locked FDB entries towards the bridge driver. Allow device drivers to notify the bridge driver about such entries by extending the 'switchdev_notifier_fdb_info' structure with the 'locked' bit. The bit can only be set by device drivers and not by the bridge driver. Prevent a locked entry from being installed if MAB is not enabled on the bridge port. If an entry already exists in the bridge driver, reject the locked entry if the current entry does not have the "locked" flag set or if it points to a different port. The same semantics are implemented in the software data path. Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-04bridge: Add MAC Authentication Bypass (MAB) supportHans J. Schultz1-1/+2
Hosts that support 802.1X authentication are able to authenticate themselves by exchanging EAPOL frames with an authenticator (Ethernet bridge, in this case) and an authentication server. Access to the network is only granted by the authenticator to successfully authenticated hosts. The above is implemented in the bridge using the "locked" bridge port option. When enabled, link-local frames (e.g., EAPOL) can be locally received by the bridge, but all other frames are dropped unless the host is authenticated. That is, unless the user space control plane installed an FDB entry according to which the source address of the frame is located behind the locked ingress port. The entry can be dynamic, in which case learning needs to be enabled so that the entry will be refreshed by incoming traffic. There are deployments in which not all the devices connected to the authenticator (the bridge) support 802.1X. Such devices can include printers and cameras. One option to support such deployments is to unlock the bridge ports connecting these devices, but a slightly more secure option is to use MAB. When MAB is enabled, the MAC address of the connected device is used as the user name and password for the authentication. For MAB to work, the user space control plane needs to be notified about MAC addresses that are trying to gain access so that they will be compared against an allow list. This can be implemented via the regular learning process with the sole difference that learned FDB entries are installed with a new "locked" flag indicating that the entry cannot be used to authenticate the device. The flag cannot be set by user space, but user space can clear the flag by replacing the entry, thereby authenticating the device. Locked FDB entries implement the following semantics with regards to roaming, aging and forwarding: 1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports, in which case the "locked" flag is cleared. FDB entries cannot roam to locked ports regardless of MAB being enabled or not. Therefore, locked FDB entries are only created if an FDB entry with the given {MAC, VID} does not already exist. This behavior prevents unauthenticated devices from disrupting traffic destined to already authenticated devices. 2. Aging: Locked FDB entries age and refresh by incoming traffic like regular entries. 3. Forwarding: Locked FDB entries forward traffic like regular entries. If user space detects an unauthorized MAC behind a locked port and wishes to prevent traffic with this MAC DA from reaching the host, it can do so using tc or a different mechanism. Enable the above behavior using a new bridge port option called "mab". It can only be enabled on a bridge port that is both locked and has learning enabled. Locked FDB entries are flushed from the port once MAB is disabled. A new option is added because there are pure 802.1X deployments that are not interested in notifications about locked FDB entries. Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09rtnetlink: add extack support in fdb del handlersAlaa Mohamed1-1/+2
Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for flush filtering based on ndm flags and stateNikolay Aleksandrov1-0/+5
Add support for fdb flush filtering based on ndm flags and state. NDM state and flags are mapped to bridge-specific flags and matched according to the specified masks. NTF_USE is used to represent added_by_user flag since it sets it on fdb add and we don't have a 1:1 mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are ignored. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add support for fine-grained flushingNikolay Aleksandrov1-1/+9
Add the ability to specify exactly which fdbs to be flushed. They are described by a new structure - net_bridge_fdb_flush_desc. Currently it can match on port/bridge ifindex, vlan id and fdb flags. It is used to describe the existing dynamic fdb flush operation. Note that this flush operation doesn't treat permanent entries in a special way (fdb_delete vs fdb_delete_local), it will delete them regardless if any port is using them, so currently it can't directly replace deletes which need to handle that case, although we can extend it later for that too. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13net: bridge: fdb: add ndo_fdb_del_bulkNikolay Aleksandrov1-0/+3
Add a minimal ndo_fdb_del_bulk implementation which flushes all entries. Support for more fine-grained filtering will be added in the following patches. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18net: bridge: mst: Support setting and reporting MST port statesTobias Waldekranz1-0/+23
Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Allow changing a VLAN's MSTITobias Waldekranz1-0/+1
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-18net: bridge: mst: Multiple Spanning Tree (MST) modeTobias Waldekranz1-0/+37
Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16net: bridge: switchdev: differentiate new VLANs from changed onesVladimir Oltean1-3/+3
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases: - a struct net_bridge_vlan got created - an existing struct net_bridge_vlan was modified This makes it impossible for switchdev drivers to properly balance PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to happen, we must provide a way for drivers to distinguish between a VLAN with changed flags and a new one. Annotate struct switchdev_obj_port_vlan with a "bool changed" that distinguishes the 2 cases above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+9
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c commit 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port") commit 31108d142f36 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") commit 4390c6edc0fb ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/ net/smc/smc_wr.c commit 49dc9013e34b ("net/smc: Use the bitmap API when applicable") commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") bitmap_zero()/memset() is removed by the fix Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-30net: bridge: mcast: fix br_multicast_ctx_vlan_global_disabled helperNikolay Aleksandrov1-3/+3
We need to first check if the context is a vlan one, then we need to check the global bridge multicast vlan snooping flag, and finally the vlan's multicast flag, otherwise we will unnecessarily enable vlan mcast processing (e.g. querier timers). Fixes: 7b54aaaf53cb ("net: bridge: multicast: add vlan state initialization and control") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20211228153142.536969-1-nikolay@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29net: bridge: mcast: add and enforce startup query interval minimumNikolay Aleksandrov1-0/+3
As reported[1] if startup query interval is set too low in combination with large number of startup queries and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the admin know that the startup interval has been set to the minimum. It doesn't make sense to make the startup interval lower than the normal query interval so use the same value of 1 second. The issue has been present since these intervals could be user-controlled. [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-29net: bridge: mcast: add and enforce query interval minimumNikolay Aleksandrov1-0/+3
As reported[1] if query interval is set too low and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the administrator know that the interval has been set to the minimum. The issue has been present since these intervals could be user-controlled. [1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.com/ Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: bridge: add net device refcount trackerEric Dumazet1-0/+1
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
Merge in the fixes we had queued in case there was another -rc. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-30net: bridge: switchdev: fix shim definition for br_switchdev_mdb_notifyVladimir Oltean1-12/+11
br_switchdev_mdb_notify() is conditionally compiled only when CONFIG_NET_SWITCHDEV=y and CONFIG_BRIDGE_IGMP_SNOOPING=y. It is called from br_mdb.c, which is conditionally compiled only when CONFIG_BRIDGE_IGMP_SNOOPING=y. The shim definition of br_switchdev_mdb_notify() is therefore needed for the case where CONFIG_NET_SWITCHDEV=n, however we mistakenly put it there for the case where CONFIG_BRIDGE_IGMP_SNOOPING=n. This results in build failures when CONFIG_BRIDGE_IGMP_SNOOPING=y and CONFIG_NET_SWITCHDEV=n. To fix this, put the shim definition right next to br_switchdev_fdb_notify(), which is properly guarded by NET_SWITCHDEV=n. Since this is called only from br_mdb.c, we need not take any extra safety precautions, when NET_SWITCHDEV=n and BRIDGE_IGMP_SNOOPING=n, this shim definition will be absent but nobody will be needing it. Fixes: 9776457c784f ("net: bridge: mdb: move all switchdev logic to br_switchdev.c") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211029223606.3450523-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29net: bridge: fix uninitialized variables when BRIDGE_CFM is disabledIvan Vecera1-0/+2
Function br_get_link_af_size_filtered() calls br_cfm_{,peer}_mep_count() that return a count. When BRIDGE_CFM is not enabled these functions simply return -EOPNOTSUPP but do not modify count parameter and calling function then works with uninitialized variables. Modify these inline functions to return zero in count parameter. Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.") Cc: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29net: bridge: mdb: move all switchdev logic to br_switchdev.cVladimir Oltean1-8/+9
The following functions: br_mdb_complete br_switchdev_mdb_populate br_mdb_replay_one br_mdb_queue_one br_mdb_replay br_mdb_switchdev_host_port br_mdb_switchdev_host br_switchdev_mdb_notify are only accessible from code paths where CONFIG_NET_SWITCHDEV is enabled. So move them to br_switchdev.c, in order for that code to be compiled out if that config option is disabled. Note that br_switchdev.c gets build regardless of whether CONFIG_BRIDGE_IGMP_SNOOPING is enabled or not, whereas br_mdb.c only got built when CONFIG_BRIDGE_IGMP_SNOOPING was enabled. So to preserve correct compilation with CONFIG_BRIDGE_IGMP_SNOOPING being disabled, we must now place an #ifdef around these functions in br_switchdev.c. The offending bridge data structures that need this are br->multicast_lock and br->mdb_list, these are also compiled out of struct net_bridge when CONFIG_BRIDGE_IGMP_SNOOPING is turned off. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29net: bridge: move br_vlan_replay to br_switchdev.cVladimir Oltean1-10/+0
br_vlan_replay() is relevant only if CONFIG_NET_SWITCHDEV is enabled, so move it to br_switchdev.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-29net: bridge: provide shim definition for br_vlan_flagsVladimir Oltean1-0/+5
br_vlan_replay() needs this, and we're preparing to move it to br_switchdev.c, which will be compiled regardless of whether or not CONFIG_BRIDGE_VLAN_FILTERING is enabled. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27net: bridge: move br_fdb_replay inside br_switchdev.cVladimir Oltean1-2/+0
br_fdb_replay is only called from switchdev code paths, so it makes sense to be disabled if switchdev is not enabled in the first place. As opposed to br_mdb_replay and br_vlan_replay which might be turned off depending on bridge support for multicast and VLANs, FDB support is always on. So moving br_mdb_replay and br_vlan_replay inside br_switchdev.c would mean adding some #ifdef's in br_switchdev.c, so we keep those where they are. The reason for the movement is that in future changes there will be some code reuse between br_switchdev_fdb_notify and br_fdb_replay. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: bridge: rename br_fdb_insert to br_fdb_add_localVladimir Oltean1-2/+2
br_fdb_insert() is a wrapper over fdb_insert() that also takes the bridge hash_lock. With fdb_insert() being renamed to fdb_add_local(), rename br_fdb_insert() to br_fdb_add_local(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net: bridge: mcast: use multicast_membership_interval for IGMPv3Nikolay Aleksandrov1-3/+1
When I added IGMPv3 support I decided to follow the RFC for computing the GMI dynamically: " 8.4. Group Membership Interval The Group Membership Interval is the amount of time that must pass before a multicast router decides there are no more members of a group or a particular source on a network. This value MUST be ((the Robustness Variable) times (the Query Interval)) plus (one Query Response Interval)." But that actually is inconsistent with how the bridge used to compute it for IGMPv2, where it was user-configurable that has a correct default value but it is up to user-space to maintain it. This would make it consistent with the other timer values which are also maintained correct by the user instead of being dynamically computed. It also changes back to the previous user-expected GMI behaviour for IGMPv3 queries which were supported before IGMPv3 was added. Note that to properly compute it dynamically we would need to add support for "Robustness Variable" which is currently missing. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29net: bridge: mcast: Associate the seqcount with its protecting lock.Thomas Gleixner1-1/+1
The sequence count bridge_mcast_querier::seq is protected by net_bridge::multicast_lock but seqcount_init() does not associate the seqcount with the lock. This leads to a warning on PREEMPT_RT because preemption is still enabled. Let seqcount_init() associate the seqcount with lock that protects the write section. Remove lockdep_assert_held_once() because lockdep already checks whether the associated lock is held. Fixes: 67b746f94ff39 ("net: bridge: mcast: make sure querier port/address updates are consistent") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-20net: bridge: vlan: convert mcast router global option to per-vlan entryNikolay Aleksandrov1-0/+15
The per-vlan router option controls the port/vlan and host vlan entries' mcast router config. The global option controlled only the host vlan config, but that is unnecessary and incosistent as it's not really a global vlan option, but rather bridge option to control host router config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host vlan and port vlan mcast router config. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: bridge: mcast: br_multicast_set_port_router takes multicast context as ↵Nikolay Aleksandrov1-1/+2
argument Change br_multicast_set_port_router to take port multicast context as its first argument so we can later use it to control port/vlan mcast router option. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17net: bridge: mcast: toggle also host vlan state in br_multicast_toggle_vlanNikolay Aleksandrov1-6/+0
When changing vlan mcast state by br_multicast_toggle_vlan it iterates over all ports and enables/disables the port mcast ctx based on the new state, but I forgot to update the host vlan (bridge master vlan entry) with the new state so it will be left out. Also that function is not used outside of br_multicast.c, so make it static. Fixes: f4b7002a7076 ("net: bridge: add vlan mcast snooping knob") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17net: bridge: vlan: account for router port lists when notifyingNikolay Aleksandrov1-0/+1
When sending a global vlan notification we should account for the number of router ports when allocating the skb, otherwise we might end up losing notifications. Fixes: dc002875c22b ("net: bridge: vlan: use br_rports_fill_info() to export mcast router ports") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14net: bridge: mcast: dump ipv4 querier stateNikolay Aleksandrov1-0/+4
Add support for dumping global IPv4 querier state, we dump the state only if our own querier is enabled or there has been another external querier which has won the election. For the bridge global state we use a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside. The structure is: [IFLA_BR_MCAST_QUERIER_STATE] `[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier `[BRIDGE_QUERIER_IP_PORT] - bridge port ifindex where the querier was seen (set only if external querier) `[BRIDGE_QUERIER_IP_OTHER_TIMER] - other querier timeout Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-14net: bridge: mcast: make sure querier port/address updates are consistentNikolay Aleksandrov1-0/+1
Use a sequence counter to make sure port/address updates can be read consistently without requiring the bridge multicast_lock. We need to zero out the port and address when the other querier has expired and we're about to select ourselves as querier. br_multicast_read_querier will be used later when dumping querier state. Updates are done only with the multicast spinlock and softirqs disabled, while reads are done from process context and from softirqs (due to notifications). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>