summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
AgeCommit message (Collapse)AuthorFilesLines
2022-03-02net/mlx5: Fix wrong limitation of metadata match on ecpfAriel Levkovich1-4/+0
commit 07666c75ad17d7389b18ac0235c8cf41e1504ea8 upstream. Match metadata support check returns false for ecpf device. However, this support does exist for ecpf and therefore this limitation should be removed to allow feature such as stacked devices and internal port offloaded to be supported. Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net/mlx5: E-Switch, fix single FDB creation on BlueFieldMark Bloch1-0/+1
[ Upstream commit 43a0696f11567278b9412f947e43dd7906c831a8 ] Always use MLX5_FLOW_TABLE_OTHER_VPORT flag when creating egress ACL table for single FDB. Not doing so on BlueField will make firmware fail the command. On BlueField the E-Switch manager is the ECPF (vport 0xFFFE) which is filled in the flow table creation command but as the other_vport field wasn't set the firmware complains about a bad parameter. This is different from a regular HCA where the E-Switch manager vport is the PF (vport 0x0). Passing MLX5_FLOW_TABLE_OTHER_VPORT will make the firmware happy both on BlueField and on regular HCAs without special condition for each. This fixes the bellow firmware syndrome: mlx5_cmd_check:819:(pid 571): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x754a4) Fixes: db202995f503 ("net/mlx5: E-Switch, add logic to enable shared FDB") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25net/mlx5: E-Switch, return error if encap isn't supportedRaed Salem1-1/+1
[ Upstream commit c4c3176739dfa6efcc5b1d1de4b3fd2b51b048c7 ] On regular ConnectX HCAs getting encap mode isn't supported when the E-Switch is in NONE mode. Current code would return no error code when trying to get encap mode in such case which is wrong. Fix by returning error value to indicate failure to caller in such case. Fixes: 8e0aa4bc959c ("net/mlx5: E-switch, Protect eswitch mode changes") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdevPaul Blakey1-7/+0
[ Upstream commit d7751d6476185ff754b9dad2cba0c0a6e43ecadc ] E-Switch encap mode is relevant only when in switchdev mode. The RDMA driver can query the encap configuration via mlx5_eswitch_get_encap_mode(). Make sure it returns the currently used mode and not the set one. This reverts the cited commit which reset the encap mode on entering switchdev and fixes the original issue properly. Fixes: 9a64144d683a ("net/mlx5: E-Switch, Fix default encap mode") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-20net/mlx5e: TC, Support sample offload action for tunneled trafficChris Mi1-4/+1
Currently the sample offload actions send the encapsulated packet to software. This commit decapsulates the packet before performing the sampling and set the tunnel properties on the skb metadata fields to make the behavior consistent with OVS sFlow. If decapsulating first, we can't use the same match like before in default table. So instantiate a post action instance to continue processing the action list. If HW can preserve reg_c, also use the post action instance. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-20net/mlx5e: Move sample attribute to flow attributeChris Mi1-6/+6
Currently it is in eswitch attribute. Move it to flow attribute to reflect the change in previous patch. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+11
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10Merge branch 'mlx5-next' of ↵Jakub Kicinski1-11/+372
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== pull-request: mlx5-next 2020-08-9 This pulls mlx5-next branch which includes patches already reviewed on net-next and rdma mailing lists. 1) mlx5 single E-Switch FDB for lag 2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq 3) Add DCS caps & fields support [1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/ [2] https://patchwork.kernel.org/project/netdevbpf/patch/0e3364dab7e0e4eea5423878b01aa42470be8d36.1626609184.git.leonro@nvidia.com/ [3] https://patchwork.kernel.org/project/netdevbpf/patch/55e1d69bef1fbfa5cf195c0bfcbe35c8019de35e.1624258894.git.leonro@nvidia.com/ * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Lag, Create shared FDB when in switchdev mode net/mlx5: E-Switch, add logic to enable shared FDB net/mlx5: Lag, move lag destruction to a workqueue net/mlx5: Lag, properly lock eswitch if needed net/mlx5: Add send to vport rules on paired device net/mlx5: E-Switch, Add event callback for representors net/mlx5e: Use shared mappings for restoring from metadata net/mlx5e: Add an option to create a shared mapping net/mlx5: E-Switch, set flow source for send to uplink rule RDMA/mlx5: Add shared FDB support {net, RDMA}/mlx5: Extend send to vport rules RDMA/mlx5: Fill port info based on the relevant eswitch net/mlx5: Lag, add initial logic for shared FDB net/mlx5: Return mdev from eswitch IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq net/mlx5: Add DCS caps & fields support ==================== Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10net/mlx5: Block switchdev mode while devlink traps are activeAya Levin1-3/+11
Since switchdev mode can't support devlink traps, verify there are no active devlink traps before moving eswitch to switchdev mode. If there are active traps, prevent the switchdev mode configuration. Fixes: eb3862a0525d ("net/mlx5e: Enable traps according to link state") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, add logic to enable shared FDBMark Bloch1-0/+293
Shared FDB allows to direct traffic from all the vports in the HCA to a single eswitch. In order to do that three things are needed. 1) Point the ingress ACL of the slave uplink to that of the master. With this, wire traffic from both uplinks will reach the same eswitch with the same metadata where a single steering rule can catch traffic from both ports. 2) Set the FDB root flow table of the slave's eswitch to that of the master. As this flow table can change dynamically make sure to sync it on any set root flow table FDB command. This will make sure traffic from SFs, VFs, ECPFs and PFs reach the master eswitch. 3) Split wire traffic at the eswitch manager egress ACL so that it's directed to the native eswitch manager. We only treat wire traffic from both ports the same at the eswitch level. If such traffic wasn't handled in the eswitch it needs to reach the right representor to be processed by software. For example LACP packets should *always* reach the right uplink representor for correct operation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Lag, properly lock eswitch if neededMark Bloch1-1/+4
Currently when doing hardware lag we check the eswitch mode but as this isn't done under a lock the check isn't valid. As the code needs to sync between two different devices an extra care is needed. - When going to change eswitch mode, if hardware lag is active destroy it. - While changing eswitch modes block any hardware bond creation. - Delay handling bonding events until there are no mode changes in progress. - When attaching a new mdev to lag, block until there is no mode change in progress. In order for the mode change to finish the interface lock will have to be taken. Release the lock and sleep for 100ms to allow forward progress. As this is a very rare condition (can happen if the user unbinds and binds a PCI function while also changing eswitch mode of the other PCI function) it has no real world impact. As taking multiple eswitch mode locks is now required lockdep will complain about a possible deadlock. Register a key per eswitch to make lockdep happy. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: Add send to vport rules on paired deviceMark Bloch1-2/+14
When two mlx5 devices are paired in switchdev mode, always offload the send-to-vport rule to the peer E-Switch. This allows to abstract the logic when this is really necessary (single FDB) and combine the logic of both cases into one. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, Add event callback for representorsMark Bloch1-3/+47
This callback will allow to notify representors about relevant events when in OFFLOADS mode. In downstream patches, this will be used to notify about PAIR/UNPAIR devcom events. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5e: Use shared mappings for restoring from metadataRoi Dayan1-3/+8
FTEs are added with mapped metadata which is saved per eswitch. When uplink reps are bonded and we are in a single FDB mode, we could fail to find metadata which was stored on one eswitch mapping but not the other or with a different id. To resolve this issue use shared mapping between eswitch ports. We do not have any conflict using a single mapping, for a type, between the ports. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05net/mlx5: E-Switch, set flow source for send to uplink ruleAriel Levkovich1-0/+3
Set the flow source param to local vport for the uplink rep send-to-vport rule. This will comply with the recent changes in SW steering that use the flow source as an indication for the rule type - rx or tx. Since the uplink send-to-vport rule is forwarding traffic to the wire it has to indicate that it is an sx rule and can't use the any port value in the flow source. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05{net, RDMA}/mlx5: Extend send to vport rulesMark Bloch1-2/+3
In shared FDB there is only one eswitch which is active and it receives traffic from all representors and all vports in the HCA. While the Ethernet representor will always reside on its native PF the IB representor will not. Extend send to vport rule creation to support such flows. Need to account for source vport that sends the traffic (on which the representors resides) and the target eswitch the traffic which reach. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-03net/mlx5: Fix missing return value in mlx5_devlink_eswitch_inline_mode_set()Jiapeng Chong1-1/+4
The return value is missing in this code scenario, add the return value '0' to the return value 'err'. Eliminate the follow smatch warning: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:3083 mlx5_devlink_eswitch_inline_mode_set() warn: missing error code 'err'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 8e0aa4bc959c ("net/mlx5: E-switch, Protect eswitch mode changes") Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-28net/mlx5: E-Switch, handle devcom events only for ports on the same deviceRoi Dayan1-0/+3
This is the same check as LAG mode checks if to enable lag. This will fix adding peer miss rules if lag is not supported and even an incorrect rules in socket direct mode. Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1 doesn't exists. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-28net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch ↵Maor Dickman1-3/+4
is supported Destination vport vhca id is valid flag is set only merged eswitch isn't supported. Change destination vport vhca id value to be set also only when merged eswitch is supported. Fixes: e4ad91f23f10 ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-10net/mlx5: Create TC-miss priority and tableVlad Buslov1-1/+18
In order to adhere to kernel software datapath model bridge offloads must come after TC and NF FDBs. Following patches in this series add new FDB priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload is implemented with unmanaged tables, its miss path is not automatically connected to next priority and requires the code to manually connect with slow table. To keep bridge offloads encapsulated and not mix it with eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD and FDB_SLOW_PATH: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Initialize the new priority with single default empty managed table and use the table as TC/NF miss patch instead of slow table. This approach allows bridge offloads to be created as new FDB namespace priority between FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any other modules since miss path of managed TC-miss table is automatically wired to next priority. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-02net/mlx5e: Fix adding encap rules to slow pathRoi Dayan1-1/+2
On some devices the ignore flow level cap is not supported and we shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft() already gives the correct end ft if ignore flow level cap is supported or not. Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: SF, Extend SF table for additional SF id rangeParav Pandit1-2/+15
Extended the SF table to cover additioanl SF id range of external controller. A user optionallly provides the external controller number when user wants to create SF on the external controller. An example on eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Use xarray for vport number to vport and rep mappingParav Pandit1-95/+181
Currently vport number to vport and its representor are mapped using an array and an index. Vport numbers of different types of functions are not contiguous. Adding new such discontiguous range using index and number mapping is increasingly complex and hard to maintain. Hence, maintain an xarray of vport and rep whose lookup is done based on the vport number. Each VF and SF entry is marked with a xarray mark to identify the function type. Additionally PF and VF needs special handling for legacy inline mode. They are additionally marked as host function using additional HOST_FN mark. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14net/mlx5: E-Switch, Make vport number u16Parav Pandit1-1/+2
Vport number is 16-bit field in hardware. Make it u16. Move location of vport in the structure so that it reduces a hole in the structure. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14net/mlx5: E-Switch, let user to enable disable metadataParav Pandit1-7/+23
Currently each packet inserted in eswitch is tagged with a internal metadata to indicate source vport. Metadata tagging is not always needed. Metadata insertion is needed for multi-port RoCE, failover between representors and stacked devices. In many other cases, metadata enablement is not needed. Metadata insertion slows down the packet processing rate of the E-switch when it is in switchdev mode. Below table show performance gain with metadata disabled for VXLAN offload rules in both SMFS and DMFS steering mode on ConnectX-5 device. ---------------------------------------------- | steering | metadata | pkt size | rx pps | | mode | | | (million) | ---------------------------------------------- | smfs | disabled | 128Bytes | 42 | ---------------------------------------------- | smfs | enabled | 128Bytes | 36 | ---------------------------------------------- | dmfs | disabled | 128Bytes | 42 | ---------------------------------------------- | dmfs | enabled | 128Bytes | 36 | ---------------------------------------------- Hence, allow user to disable metadata using driver specific devlink parameter. Metadata setting of the eswitch is applicable only for the switchdev mode. Example to show and disable metadata before changing eswitch mode: $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata pci/0000:06:00.0: name esw_port_metadata type driver-specific values: cmode runtime value true $ devlink dev param set pci/0000:06:00.0 \ name esw_port_metadata value false cmode runtime $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> --- changelog: v1->v2: - added performance numbers in commit log - updated commit log and documentation for switchdev mode - added explicit note on when user can disable metadata in documentation
2021-04-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-29/+39
Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-10Merge branch 'mlx5-next' of ↵Jakub Kicinski1-11/+23
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next 2021-04-09 This pr contains changes from mlx5-next branch, already reviewed on netdev and rdma mailing lists, links below. 1) From Leon, Dynamically assign MSI-X vectors count Already Acked by Bjorn Helgaas. https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/ 2) Cleanup series: https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/ From Mark, E-Switch cleanups and refactoring, and the addition of single FDB mode needed HW bits. From Mikhael, Remove unused struct field From Saeed, Cleanup W=1 prototype warning From Zheng, Esw related cleanup From Tariq, User order-0 page allocation for EQs * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks net/mlx5: Dynamically assign MSI-X vectors count net/mlx5: Add dynamic MSI-X capabilities bits PCI/IOV: Add sysfs MSI-X vector assignment interface net/mlx5: Use order-0 allocations for EQs net/mlx5: Add IFC bits needed for single FDB mode net/mlx5: E-Switch, Refactor send to vport to be more generic RDMA/mlx5: Use representor E-Switch when getting netdev and metadata net/mlx5: E-Switch, Add eswitch pointer to each representor net/mlx5: E-Switch, Add match on vhca id to default send rules net/mlx5: Remove unused mlx5_core_health member recover_work net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() net/mlx5: Cleanup prototype warning ==================== Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-07net/mlx5e: TC, Add support to offload sample actionChris Mi1-2/+41
The following diagram illustrates the hardware model for tc sample action: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +----------------------------------------+ + sample table + + default table per <vport, chain, prio> + +-----------------------------+ +----------------------------------------+ + forward to management vport + + original match + +-----------------------------+ +----------------------------------------+ + other actions + +----------------------------------------+ The sample action is translated to a goto flow table object destination which samples packets according to the provided sample ratio. Sampled packets are duplicated. One copy is processed by a termination table, named the sample table, which sends the packet to the eswitch manager port (that will be processed by software). The second copy is processed by the default table which executes the subsequent actions. The default table is created per <vport, chain, prio> tuple as rules with different prios and chains may overlap. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-07net/mlx5: Instantiate separate mapping objects for FDB and NIC tablesChris Mi1-1/+15
Currently, the u32 chain id is mapped to u16 value which is stored on the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The mapping is internally maintained by the chains object. However, with the introduction of reg_c0 objects the fdb may store more than just the chain id on reg_c0. This is not relevant for NIC tables. Separate the chains mapping instantiation for FDB and NIC tables. Remove the mapping from the chains object. For FDB tables, create the mapping per eswitch. For NIC tables, create the mapping per tc table. Pass the corresponding mapping pointer when creating the chains object. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-07net/mlx5: Map register values to restore objectsChris Mi1-10/+4
Currently reg_c0 lower 16 bits and reg_b are used to store the chain id that missed in FDB and NIC tables accordingly. However, the registers' values may index a restore object, rather than a single u32 value. Different object types can be used to restore mutually exclusive contexts such as chain id and sample group id. Use the mapping object to associate an index with a restore object as a prestep for supporting additional restore types. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-07net/mlx5: E-switch, Generalize per vport table APIChris Mi1-0/+14
Currently, per vport table was used only for port mirroring actions. However, sample action will also require a per vport table instance. Generalize the vport table API to work with multiple namespaces where each namespace manages its own vport table instance. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-07net/mlx5: E-switch, Rename functions to follow naming convention.Chris Mi1-13/+13
Public api starts with mlx5 and remove mlx5 for non-public api. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-07net/mlx5: E-switch, Move vport table functions to a new fileChris Mi1-179/+36
Currently, the vport table functions are in common eswitch offload file. This file is too big. Move the vport table create, delete and lookup functions to a separate file. Put the file in esw directory. Pre-step for generalizing its functionality for serving both the mirroring and the sample features. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-31net/mlx5: E-switch, Create vport miss group only if src rewrite is supportedMaor Dickman1-29/+39
Create send to vport miss group was added in order to support traffic recirculation to root table with metadata source rewrite. This group is created also in case source rewrite isn't supported. Fixed by creating send to vport miss group only if source rewrite is supported by FW. Fixes: 8e404fefa58b ("net/mlx5e: Match recirculated packet miss in slow table using reg_c1") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-1/+2
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24net: ethernet: indir_table.h is included twiceWan Jiabing1-1/+0
indir_table.h has been included at line 41, so remove the duplicate one at line 43. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17net/mlx5: E-Switch, Protect changing mode while adding rulesRoi Dayan1-3/+11
We re-use the native NIC port net device instance for the Uplink representor, a driver currently cannot unbind TC setup callback actively, hence protect changing E-Switch mode while adding rules. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-17net/mlx5: E-Switch, Change mode lock from mutex to rw semaphoreRoi Dayan1-13/+13
E-Switch mode change routine will take the write lock to prevent any consumer to access the E-Switch resources while E-Switch is going through a mode change. In the next patch E-Switch consumers (e.g vport representors) will take read_lock prior to accessing E-Switch resources to prevent E-Switch mode changing in the middle of the operation. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-17net/mlx5e: Move devlink port register and unregister callsRoi Dayan1-5/+10
We will re-use the native NIC port net device instance for the Uplink representor. As such we also don't want to unregister/register the devlink port as part of the profile. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: E-Switch, Refactor send to vport to be more genericMark Bloch1-8/+12
Now that each representor stores a pointer to the managing E-Switch use that information when creating the send-to-vport rules. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: E-Switch, Add eswitch pointer to each representorMark Bloch1-0/+1
Store the managing E-Switch of each representor. This will be used when a representor is created on eswitch manager 0 but the vport belongs to eswitch manager 1. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: E-Switch, Add match on vhca id to default send rulesMark Bloch1-0/+12
Match on the vhca id of the E-Switch owner when creating the send-to-vport representor rules. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()Zheng Yongjun1-6/+1
Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12net/mlx5e: Alloc flow spec using kvzalloc instead of kzallocRoi Dayan1-2/+2
flow spec is not small and we do allocate it using kvzalloc in most places of the driver. fix rest of the places to use kvzalloc to avoid failure in allocation when memory is too fragmented. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12net/mlx5: Don't skip vport checkSaeed Mahameed1-0/+6
Users of mlx5_eswitch_get_vport() are required to check return value prior to passing mlx5_vport further. Fix all the places to do not skip that check. Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-10net/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supportedMaor Dickman1-1/+2
VF tunnel TX traffic offload is adding flow which forward to flow tables with lower level, which isn't support on all FW versions and may cause firmware to fail with syndrome. Fixed by enabling VF tunnel TX offload only if flow table capability ignore_flow_level is enabled. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-06net/mlx5e: Match recirculated packet miss in slow table using reg_c1Vlad Buslov1-2/+117
Previous patch in series that implements stack devices RX path implements indirect table rules that match on tunnel VNI. After such rule is created all tunnel traffic is recirculated to root table. However, recirculated packet might not match on any rules installed in the table (for example, when IP traffic follows ARP traffic). In that case packets appear on representor of tunnel endpoint VF instead being redirected to the VF itself. Extend slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables implemented by previous patch in series) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Modify indirect tables code to also rewrite reg_c1 with special 0xFFF mark. Implementation reuses reg_c1 tunnel id bits. This is safe to do because recirculated packets are always matched before decapsulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-06net/mlx5e: VF tunnel RX traffic offloadingVlad Buslov1-5/+114
When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the representor of the VF without any further processing of rules installed on the VF. Detect such case by checking if the device returned by route lookup in decap rule handling code is a mlx5 VF and handle it with new redirection tables API. Example TC rules for VF tunnel traffic: 1. Rule that encapsulates the tunneled flow and redirects packets from source VF rep to tunnel device: $ tc -s filter show dev enp8s0f0_1 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 0a:40:bd:30:89:99 src_mac ca:2e:a7:3f:f5:0f eth_type ipv4 ip_tos 0/0x3 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 7.7.7.5 dst_ip 7.7.7.1 key_id 98 dst_port 4789 nocsum ttl 64 pipe index 1 ref 1 bind 1 installed 411 sec used 411 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen index 1 ref 1 bind 1 installed 411 sec used 0 sec Action statistics: Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 5615833 bytes 4028 pkt backlog 0b 0p requeues 0 cookie bb406d45d343bf7ade9690ae80c7cba4 no_percpu used_hw_stats delayed 2. Rule that redirects from tunnel device to UL rep: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-06net/mlx5: E-Switch, Indirect table infrastructureVlad Buslov1-0/+12
Indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. Kernel software model uses two TC rules for such traffic: UL rep to tunnel device, then tunnel VF rep to destination VF rep. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Indirect table API overview: - mlx5_esw_indir_table_{init|destroy}() - init and destroy opaque indirect table object. - mlx5_esw_indir_table_get() - get or create new table according to vport id and IP version. Table has following pre-created groups: recirculation group with match on ethertype and VNI (rules that match encapsulated packets are installed to this group) and forward group with default/miss rule that forwards to vport of tunnel endpoint VF (rule for regular non-encapsulated packets). - mlx5_esw_indir_table_put() - decrease reference to the indirect table and matching rule (for encapsulated traffic). - mlx5_esw_indir_table_needed() - check that in_port is an uplink port and out_port is VF on the same eswitch, verify that the rule is for IP traffic and source port rewrite functionality can be used. - mlx5_esw_indir_table_decap_vport() - function returns decap vport of flow attribute. Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-06net/mlx5e: VF tunnel TX traffic offloadingVlad Buslov1-11/+109
When tunnel endpoint is on VF, driver still assumes that endpoint is on uplink and incorrectly configures encap rule offload according to that assumption. As a result, traffic is sent directly to the uplink and rules installed on representor of tunnel endpoint VF are ignored. Implement following changes to allow offloading tx traffic with tunnel endpoint on VF: - For tunneling flows perform route lookup on route and out devices pair. If out device is uplink and route device is VF of same physical port, then modify packet reg_c_0 metadata register (source port) with the value of VF vport. Use eswitch vhca_id->vport mapping introduced in one of previous patches in the series to obtain vport from route netdevice. - Recirculate encapsulated packets to VF vport in order to apply any flow rules installed on VF representor that match on encapsulated traffic. Only enable support for this functionality when all following conditions are true: - Hardware advertises capability to preserve reg_c_0 value on packet recirculation. - Vport metadata matching is enabled. - Termination tables are to be used by the flow. Example TC rules for VF tunnel traffic: 1. Rule that redirects packets from UL to VF rep that has the tunnel endpoint IP address: $ tc -s filter show dev enp8s0f0 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 16:c9:a0:2d:69:2c src_mac 0c:42:a1:58:ab:e4 eth_type ipv4 ip_flags nofrag in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen index 3 ref 1 bind 1 installed 377 sec used 0 sec Action statistics: Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 114096 bytes 952 pkt backlog 0b 0p requeues 0 cookie 878fa48d8c423fc08c3b6ca599b50a97 no_percpu used_hw_stats delayed 2. Rule that decapsulates the tunneled flow and redirects to destination VF representor: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>