summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-11-11genetlink: fix single op policy dump when do is presentJakub Kicinski1-9/+21
Jonathan reports crashes when running net-next in Meta's fleet. Stats collection uses ethtool -I which does a per-op policy dump to check if stats are supported. We don't initialize the dumpit information if doit succeeds due to evaluation short-circuiting. The crash may look like this: BUG: kernel NULL pointer dereference, address: 0000000000000cc0 RIP: 0010:netlink_policy_dump_add_policy+0x174/0x2a0 ctrl_dumppolicy_start+0x19f/0x2f0 genl_start+0xe7/0x140 Or we may trigger a warning: WARNING: CPU: 1 PID: 785 at net/netlink/policy.c:87 netlink_policy_dump_get_policy_idx+0x79/0x80 RIP: 0010:netlink_policy_dump_get_policy_idx+0x79/0x80 ctrl_dumppolicy_put_op+0x214/0x360 depending on what garbage we pick up from the stack. Reported-by: Jonathan Lemon <bsd@meta.com> Fixes: 26588edbef60 ("genetlink: support split policies in ctrl_dumppolicy_put_op()") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221109183254.554051-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-11devlink: Fix warning when unregistering a portIdo Schimmel1-2/+2
When a devlink port is unregistered, its type is expected to be unset or otherwise a WARNING is generated [1]. This was supposed to be handled by cited commit by clearing the type upon 'NETDEV_PRE_UNINIT'. The assumption was that no other events can be generated for the netdev after this event, but this proved to be wrong. After the event is generated, netdev_wait_allrefs_any() will rebroadcast a 'NETDEV_UNREGISTER' until the netdev's reference count drops to 1. This causes devlink to set the port type back to Ethernet. Fix by only setting and clearing the port type upon 'NETDEV_POST_INIT' and 'NETDEV_PRE_UNINIT', respectively. For all other events, preserve the port type. [1] WARNING: CPU: 0 PID: 11 at net/core/devlink.c:9998 devl_port_unregister+0x2f6/0x390 net/core/devlink.c:9998 Modules linked in: CPU: 1 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.0-rc3-next-20221107-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: netns cleanup_net RIP: 0010:devl_port_unregister+0x2f6/0x390 net/core/devlink.c:9998 [...] Call Trace: <TASK> __nsim_dev_port_del+0x1bb/0x240 drivers/net/netdevsim/dev.c:1433 nsim_dev_port_del_all drivers/net/netdevsim/dev.c:1443 [inline] nsim_dev_reload_destroy+0x171/0x510 drivers/net/netdevsim/dev.c:1660 nsim_dev_reload_down+0x6b/0xd0 drivers/net/netdevsim/dev.c:968 devlink_reload+0x1c2/0x6b0 net/core/devlink.c:4501 devlink_pernet_pre_exit+0x104/0x1c0 net/core/devlink.c:12609 ops_pre_exit_list net/core/net_namespace.c:159 [inline] cleanup_net+0x451/0xb10 net/core/net_namespace.c:594 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Fixes: 02a68a47eade ("net: devlink: track netdev with devlink_port assigned") Reported-by: syzbot+85e47e1a08b3e159b159@syzkaller.appspotmail.com Reported-by: syzbot+c2ca18f0fccdd1f09c66@syzkaller.appspotmail.com Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20221110085150.520800-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10Merge branch 'mana-shared-6.2' of ↵Jakub Kicinski13-45/+372
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Long Li says: ==================== Introduce Microsoft Azure Network Adapter (MANA) RDMA driver [netdev prep] The first 11 patches which modify the MANA Ethernet driver to support RDMA driver. * 'mana-shared-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Define data structures for protection domain and memory registration net: mana: Define data structures for allocating doorbell page from GDMA net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES net: mana: Define max values for SGL entries net: mana: Move header files to a common location net: mana: Record port number in netdev net: mana: Export Work Queue functions for use by RDMA driver net: mana: Set the DMA device max segment size net: mana: Handle vport sharing between devices net: mana: Record the physical address for doorbell page region net: mana: Add support for auxiliary device ==================== Link: https://lore.kernel.org/all/1667502990-2559-1-git-send-email-longli@linuxonhyperv.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10ethtool: ethtool_get_drvinfo: populate drvinfo fields even if callback exitsVincent Mailhol1-3/+10
If ethtool_ops::get_drvinfo() callback isn't set, ethtool_get_drvinfo() will fill the ethtool_drvinfo::name and ethtool_drvinfo::bus_info fields. However, if the driver provides the callback function, those two fields are not touched. This means that the driver has to fill these itself. Allow the driver to leave those two fields empty and populate them in such case. This way, the driver can rely on the default values for the name and the bus_info. If the driver provides values, do nothing. Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/r/20221108035754.2143-1-mailhol.vincent@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: mana: Fix return type of mana_start_xmit()Nathan Huckleberry2-2/+2
The ndo_start_xmit field in net_device_ops is expected to be of type netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev). The mismatched return type breaks forward edge kCFI since the underlying function definition does not match the function hook definition. A new warning in clang will catch this at compile time: drivers/net/ethernet/microsoft/mana/mana_en.c:382:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict] .ndo_start_xmit = mana_start_xmit, ^~~~~~~~~~~~~~~ 1 error generated. The return type of mana_start_xmit should be changed from int to netdev_tx_t. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1703 Link: https://github.com/ClangBuiltLinux/linux/issues/1750 Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> [nathan: Rebase on net-next and resolve conflicts Add note about new clang warning] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20221109002629.1446680-1-nathan@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-10gro: avoid checking for a failed searchRichard Gobert1-35/+35
After searching for a protocol handler in dev_gro_receive, checking for failure is redundant. Skip the failure code after finding the corresponding handler. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221108123320.GA59373@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-10net: mana: Define data structures for protection domain and memory registrationAjay Sharma3-23/+143
The MANA hardware support protection domain and memory registration for use in RDMA environment. Add those definitions and expose them for use by the RDMA driver. Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-12-git-send-email-longli@linuxonhyperv.com Reviewed-by: Dexuan Cui <decui@microsoft.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Define data structures for allocating doorbell page from GDMALong Li1-0/+24
The RDMA device needs to allocate doorbell pages for each user context. Define the GDMA data structures for use by the RDMA driver. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-11-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIESAjay Sharma2-1/+3
When doing memory registration, the PF may respond with GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is not an error and should be processed as expected. Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-10-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Define max values for SGL entriesLong Li3-4/+9
The number of maximum SGl entries should be computed from the maximum WQE size for the intended queue type and the corresponding OOB data size. This guarantees the hardware queue can successfully queue requests up to the queue depth exposed to the upper layer. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-9-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Move header files to a common locationLong Li12-8/+9
In preparation to add MANA RDMA driver, move all the required header files to a common location for use by both Ethernet and RDMA drivers. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-8-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Record port number in netdevLong Li1-0/+1
The port number is useful for user-mode application to identify this net device based on port index. Set to the correct value in ndev. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-7-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Export Work Queue functions for use by RDMA driverLong Li3-7/+19
RDMA device may need to create Ethernet device queues for use by Queue Pair type RAW. This allows a user-mode context accesses Ethernet hardware queues. Export the supporting functions for use by the RDMA driver. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-6-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Set the DMA device max segment sizeAjay Sharma1-0/+6
MANA hardware doesn't have any restrictions on the DMA segment size, set it to the max allowed value. Signed-off-by: Ajay Sharma <sharmaajay@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-5-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Handle vport sharing between devicesLong Li2-2/+58
For outgoing packets, the PF requires the VF to configure the vport with corresponding protection domain and doorbell ID for the kernel or user context. The vport can't be shared between different contexts. Implement the logic to exclusively take over the vport by either the Ethernet device or RDMA device. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-4-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Record the physical address for doorbell page regionLong Li2-0/+6
For supporting RDMA device with multiple user contexts with their individual doorbell pages, record the start address of doorbell page region for use by the RDMA driver to allocate user context doorbell IDs. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-3-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10net: mana: Add support for auxiliary deviceLong Li4-1/+95
In preparation for supporting MANA RDMA driver, add support for auxiliary device in the Ethernet driver. The RDMA device is modeled as an auxiliary device to the Ethernet device. Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-2-git-send-email-longli@linuxonhyperv.com Acked-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-11-10Merge branch 'clean-up-pcs-xpcs-accessors'Jakub Kicinski2-8/+15
Russell King says: ==================== Clean up pcs-xpcs accessors This series cleans up the pcs-xpcs code to use mdiodev accessors for read/write just like xpcs_modify_changed() does. In order to do this, we need to introduce the mdiodev clause 45 accessors. ==================== Link: https://lore.kernel.org/r/Y2pm13+SDg6N/IVx@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: pcs: xpcs: use mdiodev accessorsRussell King (Oracle)1-8/+2
Use mdiodev accessors rather than accessing the bus and address in the mdio_device structure and using the mdiobus accessors. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: mdio: add mdiodev_c45_(read|write)Russell King (Oracle)1-0/+13
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mac_pton: Don't access memory over expected lengthAndy Shevchenko1-1/+2
The strlen() may go too far when estimating the length of the given string. In some cases it may go over the boundary and crash the system which is the case according to the commit 13a55372b64e ("ARM: orion5x: Revert commit 4904dbda41c8."). Rectify this by switching to strnlen() for the expected maximum length of the string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221108141108.62974-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: phy: dp83867: add TI PHY loopbackTan Tee Min1-0/+7
The existing genphy_loopback() is not working for TI DP83867 PHY as it will disable autoneg support while another side is still enabling autoneg. This is causing the link is not established and results in timeout error in genphy_loopback() function. Thus, based on TI PHY datasheet, introduce a TI PHY loopback function by just configuring BMCR_LOOPBACK(Bit-9) in MII_BMCR register (0x0). Tested working on TI DP83867 PHY for all speeds (10/100/1000Mbps). Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221108101527.612723-1-michael.wei.hong.sit@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements'Jakub Kicinski4-8/+179
Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series continues with the addition of supported features for the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. ==================== Link: https://lore.kernel.org/r/20221107085650.991470-1-Raju.Lakkaraju@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: lan743x: Add support to SGMII register dump for PCI11010/PCI11414 chipsRaju Lakkaraju4-4/+177
Add support to SGMII register dump Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: lan743x: Remove unused argument in lan743x_common_regs( )Raju Lakkaraju1-4/+2
Remove the unused argument (i.e. struct ethtool_regs *regs) in lan743x_common_regs( ) function arguments. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10Merge branch 'mlxsw-add-802-1x-and-mab-offload-support'Jakub Kicinski18-25/+366
Petr Machata says: ==================== mlxsw: Add 802.1X and MAB offload support This patchset adds 802.1X [1] and MAB [2] offload support in mlxsw. Patches #1-#3 add the required switchdev interfaces. Patches #4-#5 add the required packet traps for 802.1X. Patches #6-#10 are small preparations in mlxsw. Patch #11 adds locked bridge port support in mlxsw. Patches #12-#15 add mlxsw selftests. The patchset was also tested with the generic forwarding selftest ('bridge_locked_port.sh'). [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a21d9a670d81103db7f788de1a4a4a6e4b891a0b [2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a35ec8e38cdd1766f29924ca391a01de20163931 ==================== Link: https://lore.kernel.org/r/cover.1667902754.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10selftests: mlxsw: Add a test for invalid locked bridge port configurationsIdo Schimmel1-0/+31
Test that locked bridge port configurations that are not supported by mlxsw are rejected. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10selftests: mlxsw: Add a test for locked port trapIdo Schimmel1-0/+105
Test that packets received via a locked bridge port whose {SMAC, VID} does not appear in the bridge's FDB or appears with a different port, trigger the "locked_port" packet trap. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10selftests: mlxsw: Add a test for EAPOL trapIdo Schimmel1-0/+22
Test that packets with a destination MAC of 01:80:C2:00:00:03 trigger the "eapol" packet trap. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10selftests: devlink_lib: Split out helperIdo Schimmel1-7/+12
Merely checking whether a trap counter incremented or not without logging a test result is useful on its own. Split this functionality to a helper which will be used by subsequent patches. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: spectrum_switchdev: Add locked bridge port supportIdo Schimmel2-1/+26
Add locked bridge port support by reacting to changes in the 'BR_PORT_LOCKED' flag. When set, enable security checks on the local port via the previously added SPFSR register. When security checks are enabled, an incoming packet will trigger an FDB lookup with the packet's source MAC and the FID it was classified to. If an FDB entry was not found or was found to be pointing to a different port, the packet will be dropped. Such packets increment the "discard_ingress_general" ethtool counter. For added visibility, user space can trap such packets to the CPU by enabling the "locked_port" trap. Example: # devlink trap set pci/0000:06:00.0 trap locked_port action trap Unlike other configurations done via bridge port flags (e.g., learning, flooding), security checks are enabled in the device on a per-port basis and not on a per-{port, VLAN} basis. As such, scenarios where user space can configure different locking settings for different VLANs configured on a port need to be vetoed. To that end, veto the following scenarios: 1. Locking is set on a bridge port that is a VLAN upper 2. Locking is set on a bridge port that has VLAN uppers 3. VLAN upper is configured on a locked bridge port Examples: # bridge link set dev swp1.10 locked on Error: mlxsw_spectrum: Locked flag cannot be set on a VLAN upper. # ip link add link swp1 name swp1.10 type vlan id 10 # bridge link set dev swp1 locked on Error: mlxsw_spectrum: Locked flag cannot be set on a bridge port that has VLAN uppers. # bridge link set dev swp1 locked on # ip link add link swp1 name swp1.10 type vlan id 10 Error: mlxsw_spectrum: VLAN uppers are not supported on a locked port. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: spectrum_switchdev: Use extack in bridge port flag validationIdo Schimmel1-3/+7
Propagate extack to mlxsw_sp_port_attr_br_pre_flags_set() in order to communicate error messages related to bridge port flag validation. Example: # bridge link set dev swp1 locked on Error: mlxsw_spectrum: Unsupported bridge port flag. More error messages will be added in subsequent patches. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: spectrum_switchdev: Add support for locked FDB notificationsIdo Schimmel1-0/+12
In Spectrum, learning happens in parallel to the security checks. Therefore, regardless of the result of the security checks, a learning notification will be generated by the device and polled later on by the driver. Currently, the driver reacts to learning notifications by programming corresponding FDB entries to the device. When a port is locked (i.e., has security checks enabled), this can no longer happen, as otherwise any host will blindly gain authorization. Instead, notify the learned entry as a locked entry to the bridge driver that will in turn notify it to user space, in case MAB is enabled. User space can then decide to authorize the host by clearing the "locked" flag, which will cause the entry to be programmed to the device. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: spectrum_switchdev: Prepare for locked FDB notificationsIdo Schimmel1-8/+13
Subsequent patches will need to report locked FDB entries to the bridge driver. Prepare for that by adding a 'locked' argument to mlxsw_sp_fdb_call_notifiers() according to which the 'locked' bit is set in the FDB notification info. For now, always pass 'false'. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: spectrum: Add an API to configure security checksIdo Schimmel2-1/+22
Add an API to enable or disable security checks on a local port. It will be used by subsequent patches when the 'BR_PORT_LOCKED' flag is toggled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: reg: Add Switch Port FDB Security RegisterIdo Schimmel1-0/+34
Add the Switch Port FDB Security Register (SPFSR) that allows enabling and disabling security checks on a given local port. In Linux terms, it allows locking / unlocking a port. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10mlxsw: spectrum_trap: Register 802.1X packet traps with devlinkIdo Schimmel3-0/+28
Register the previously added packet traps with devlink. This allows user space to tune their policers and in the case of the locked port trap, user space can set its action to "trap" in order to gain visibility into packets that were discarded by the device due to the locked port check failure. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10devlink: Add packet traps for 802.1X operationIdo Schimmel3-0/+25
Add packet traps for 802.1X operation. The "eapol" control trap is used to trap EAPOL packets and is required for the correct operation of the control plane. The "locked_port" drop trap can be enabled to gain visibility into packets that were dropped by the device due to the locked bridge port check. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10bridge: switchdev: Reflect MAB bridge port flag to device driversIdo Schimmel1-1/+1
Reflect the 'BR_PORT_MAB' flag to device drivers so that: * Drivers that support MAB could act upon the flag being toggled. * Drivers that do not support MAB will prevent MAB from being enabled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10bridge: switchdev: Allow device drivers to install locked FDB entriesHans J. Schultz5-4/+28
When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize the hardware and software FDBs by generating switchdev events towards the bridge. When a port is locked, the hardware must not learn autonomously, as otherwise any host will blindly gain authorization. Instead, the hardware should generate events regarding hosts that are trying to gain authorization and their MAC addresses should be notified by the device driver as locked FDB entries towards the bridge driver. Allow device drivers to notify the bridge driver about such entries by extending the 'switchdev_notifier_fdb_info' structure with the 'locked' bit. The bit can only be set by device drivers and not by the bridge driver. Prevent a locked entry from being installed if MAB is not enabled on the bridge port. If an entry already exists in the bridge driver, reject the locked entry if the current entry does not have the "locked" flag set or if it points to a different port. The same semantics are implemented in the software data path. Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10bridge: switchdev: Let device drivers determine FDB offload indicationIdo Schimmel1-1/+1
Currently, FDB entries that are notified to the bridge via 'SWITCHDEV_FDB_ADD_TO_BRIDGE' are always marked as offloaded. With MAB enabled, this will no longer be universally true. Device drivers will report locked FDB entries to the bridge to let it know that the corresponding hosts required authorization, but it does not mean that these entries are necessarily programmed in the underlying hardware. Solve this by determining the offload indication based of the 'offloaded' bit in the FDB notification. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10Merge branch ↵Jakub Kicinski3-5/+24
'net-devlink-move-netdev-notifier-block-to-dest-namespace-during-reload' Jiri Pirko says: ==================== net: devlink: move netdev notifier block to dest namespace during reload Patch #1 is just a dependency of patch #2, which is the actual fix. ==================== Link: https://lore.kernel.org/r/20221108132208.938676-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: devlink: move netdev notifier block to dest namespace during reloadJiri Pirko1-1/+4
The notifier block tracking netdev changes in devlink is registered during devlink_alloc() per-net, it is then unregistered in devlink_free(). When devlink moves from net namespace to another one, the notifier block needs to move along. Fix this by adding forgotten call to move the block. Reported-by: Ido Schimmel <idosch@idosch.org> Fixes: 02a68a47eade ("net: devlink: track netdev with devlink_port assigned") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10net: introduce a helper to move notifier block to different namespaceJiri Pirko2-4/+20
Currently, net_dev() netdev notifier variant follows the netdev with per-net notifier from namespace to namespace. This is implemented by move_netdevice_notifiers_dev_net() helper. For devlink it is needed to re-register per-net notifier during devlink reload. Introduce a new helper called move_netdevice_notifier_net() and share the unregister/register code with existing move_netdevice_notifiers_dev_net() helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-09genetlink: correctly begin the iteration over policiesJakub Kicinski1-1/+3
The return value from genl_op_iter_init() only tells us if there are any policies but to begin the iteration (and therefore load the first entry) we need to call genl_op_iter_next(). Note that it's safe to call genl_op_iter_next() on a family with no ops, it will just return false. This may lead to various crashes, a warning in netlink_policy_dump_get_policy_idx() when policy is not found or.. no problem at all if the kmalloc'ed memory happens to be zeroed. Fixes: b502b3185cd6 ("genetlink: use iterator in the op to policy map dumping") Link: https://lore.kernel.org/r/20221108204128.330287-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-09Merge tag 'rxrpc-next-20221108' of ↵David S. Miller33-1797/+1844
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs rxrpc changes David Howells says: ==================== rxrpc: Increasing SACK size and moving away from softirq, part 1 AF_RXRPC has some issues that need addressing: (1) The SACK table has a maximum capacity of 255, but for modern networks that isn't sufficient. This is hard to increase in the upstream code because of the way the application thread is coupled to the softirq and retransmission side through a ring buffer. Adjustments to the rx protocol allows a capacity of up to 8192, and having a ring sufficiently large to accommodate that would use an excessive amount of memory as this is per-call. (2) Processing ACKs in softirq mode causes the ACKs get conflated, with only the most recent being considered. Whilst this has the upside that the retransmission algorithm only needs to deal with the most recent ACK, it causes DATA transmission for a call to be very bursty because DATA packets cannot be transmitted in softirq mode. Rather transmission must be delegated to either the application thread or a workqueue, so there tend to be sudden bursts of traffic for any particular call due to scheduling delays. (3) All crypto in a single call is done in series; however, each DATA packet is individually encrypted so encryption and decryption of large calls could be parallelised if spare CPU resources are available. This is the first of a number of sets of patches that try and address them. The overall aims of these changes include: (1) To get rid of the TxRx ring and instead pass the packets round in queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with a SACK table that can be parsed as-is, so there's no particular need to maintain our own; we just have to refer to the ACK. On the Rx side, we do need to maintain a SACK table with one bit per entry - but only if packets go missing - and we don't want to have to perform a complex transformation to get the information into an ACK packet. (2) To try and move almost all processing of received packets out of the softirq handler and into a high-priority kernel I/O thread. Only the transferral of packets would be left there. I would still use the encap_rcv hook to receive packets as there's a noticeable performance drop from letting the UDP socket put the packets into its own queue and then getting them out of there. (3) To make the I/O thread also do all the transmission. The app thread would be responsible for packaging the data into packets and then buffering them for the I/O thread to transmit. This would make it easier for the app thread to run ahead of the I/O thread, and would mean the I/O thread is less likely to have to wait around for a new packet to come available for transmission. (4) To logically partition the socket/UAPI/KAPI side of things from the I/O side of things. The local endpoint, connection, peer and call objects would belong to the I/O side. The socket side would not then touch the private internals of calls and suchlike and would not change their states. It would only look at the send queue, receive queue and a way to pass a message to cause an abort. (5) To remove as much locking, synchronisation, barriering and atomic ops as possible from the I/O side. Exclusion would be achieved by limiting modification of state to the I/O thread only. Locks would still need to be used in communication with the UDP socket and the AF_RXRPC socket API. (6) To provide crypto offload kernel threads that, when there's slack in the system, can see packets that need crypting and provide parallelisation in dealing with them. (7) To remove the use of system timers. Since each timer would then send a poke to the I/O thread, which would then deal with it when it had the opportunity, there seems no point in using system timers if, instead, a list of timeouts can be sensibly consulted. An I/O thread only then needs to schedule with a timeout when it is idle. (8) To use zero-copy sendmsg to send packets. This would make use of the I/O thread being the sole transmitter on the socket to manage the dead-reckoning sequencing of the completion notifications. There is a problem with zero-copy, though: the UDP socket doesn't handle running out of option memory very gracefully. With regard to this first patchset, the changes made include: (1) Some fixes, including a fallback for proc_create_net_single_write(), setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc congestion management, which shouldn't be saving the cwnd value between calls. (2) Improvements in rxrpc tracepoints, including splitting the timer tracepoint into a set-timer and a timer-expired trace. (3) Addition of a new proc file to display some stats. (4) Some code cleanups, including removing some unused bits and unnecessary header inclusions. (5) A change to the recently added UDP encap_err_rcv hook so that it has the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc point its UDP socket's hook directly at those. (6) Definition of a new struct, rxrpc_txbuf, that is used to hold transmissible packets of DATA and ACK type in a single 2KiB block rather than using an sk_buff. This allows the buffer to be on a number of queues simultaneously more easily, and also guarantees that the entire block is in a single unit for zerocopy purposes and that the data payload is aligned for in-place crypto purposes. (7) ACK txbufs are allocated at proposal and queued for later transmission rather than being stored in a single place in the rxrpc_call struct, which means only a single ACK can be pending transmission at a time. The queue is then drained at various points. This allows the ACK generation code to be simplified. (8) The Rx ring buffer is removed. When a jumbo packet is received (which comprises a number of ordinary DATA packets glued together), it used to be pointed to by the ring multiple times, with an annotation in a side ring indicating which subpacket was in that slot - but this is no longer possible. Instead, the packet is cloned once for each subpacket, barring the last, and the range of data is set in the skb private area. This makes it easier for the subpackets in a jumbo packet to be decrypted in parallel. (9) The Tx ring buffer is removed. The side annotation ring that held the SACK information is also removed. Instead, in the event of packet loss, the SACK data attached an ACK packet is parsed. (10) Allocate an skcipher request when needed in the rxkad security class rather than caching one in the rxrpc_call struct. This deals with a race between externally-driven call disconnection getting rid of the skcipher request and sendmsg/recvmsg trying to use it because they haven't seen the completion yet. This is also needed to support parallelisation as the skcipher request cannot be used by two or more threads simultaneously. (11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going through kernel_sendmsg() so that we can provide our own iterator (zerocopy explicitly doesn't work with a KVEC iterator). This also lets us avoid the overhead of the security hook. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-09net/core: Allow live renaming when an interface is upAndy Ren3-23/+6
Allow a network interface to be renamed when the interface is up. As described in the netconsole documentation [1], when netconsole is used as a built-in, it will bring up the specified interface as soon as possible. As a result, user space will not be able to rename the interface since the kernel disallows renaming of interfaces that are administratively up unless the 'IFF_LIVE_RENAME_OK' private flag was set by the kernel. The original solution [2] to this problem was to add a new parameter to the netconsole configuration parameters that allows renaming of the interface used by netconsole while it is administratively up. However, during the discussion that followed, it became apparent that we have no reason to keep the current restriction and instead we should allow user space to rename interfaces regardless of their administrative state: 1. The restriction was put in place over 20 years ago when renaming was only possible via IOCTL and before rtnetlink started notifying user space about such changes like it does today. 2. The 'IFF_LIVE_RENAME_OK' flag was added over 3 years ago in version 5.2 and no regressions were reported. 3. In-kernel listeners to 'NETDEV_CHANGENAME' do not seem to care about the administrative state of interface. Therefore, allow user space to rename running interfaces by removing the restriction and the associated 'IFF_LIVE_RENAME_OK' flag. Help in possible triage by emitting a message to the kernel log that an interface was renamed while UP. [1] https://www.kernel.org/doc/Documentation/networking/netconsole.rst [2] https://lore.kernel.org/netdev/20221102002420.2613004-1-andy.ren@getcruise.com/ Signed-off-by: Andy Ren <andy.ren@getcruise.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-09Merge branch 'dsa-microchip-checking'David S. Miller7-21/+56
Rakesh Sankaranarayanan says: ==================== net: dsa: microchip: ksz_pwrite status check for lan937x and irq and error checking updates for ksz series This patch series include following changes, - Add KSZ9563 inside ksz_switch_chips. As per current structure, KSZ9893 is reused inside ksz_switch_chips structure, but since there is a mismatch in number of irq's, new member added for KSZ9563 and sku detected based on Global Chip ID 4 Register. Compatible string from device tree mapped to KSZ9563 for spi and i2c mode probes. - Assign device interrupt during i2c probe operation. - Add error checking for ksz_pwrite inside lan937x_change_mtu. After v6.0, ksz_pwrite updated to have return type int instead of void, and lan937x_change_mtu still uses ksz_pwrite without status verification. - Add port_nirq as 3 for KSZ8563 switch family. - Use dev_err_probe() instead of dev_err() to have more standardized error formatting and logging. v1 -> v2: - Removed regmap validation patch from the series, planning to take up in future after checking for any better approach and studying the actual need for this change. - Resolved error reported in ksz8863_smi.c file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-09net: dsa: microchip: add dev_err_probe in probe functionsRakesh Sankaranarayanan3-15/+10
Probe functions uses normal dev_err() to check error conditions and print messages. Replace dev_err() with dev_err_probe() to have more standardized format and error logging. Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-09net: dsa: microchip: ksz8563: Add number of port irqRakesh Sankaranarayanan1-0/+1
KSZ8563 have three port interrupts: PTP, PHY and ACL. Add port_nirq as 3 for KSZ8563 inside ksz_chip_data. Signed-off-by: Rakesh Sankaranarayanan <rakesh.sankaranarayanan@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>