summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5
AgeCommit message (Collapse)AuthorFilesLines
2018-03-26net/mlx5: Add support for QUERY_VNIC_ENV commandMoshe Shemesh1-0/+2
Add support for new FW command QUERY_VNIC_ENV. The command is used by the driver to query vnic diagnostic statistics from FW. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-26net/mlx5e: PFC stall prevention supportInbar Karmy2-8/+113
Implement set/get functions to configure PFC stall prevention timeout by tunables api through ethtool. By default the stall prevention timeout is configured to 8 sec. Timeout range is: 80-8000 msec. Enabling stall prevention with the auto timeout will set the timeout to 100 msec. Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-26net/mlx5e: Expose PFC stall prevention countersInbar Karmy2-1/+21
Add the needed capability bit and counters to device spec description. Expose the following two counters in ethtool: tx_pause_storm_warning_events: when the device is stalled for a period longer than a pre-configured watermark, the counter increase, allowing the debug utility an insight into current device status. tx_pause_storm_error_events: when the device is stalled for a period longer than a pre-configured timeout, the pause transmission is disabled, and the counter increase. Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-23net/mlx5: Fix use-after-freeGustavo A. R. Silva1-1/+2
_rule_ is being freed and then dereferenced by accessing rule->ctx Fix this by copying the value returned by PTR_ERR(rule->ctx) into a local variable for its safe use after freeing _rule_ Addresses-Coverity-ID: 1466041 ("Read from pointer after free") Fixes: 05564d0ae075 ("net/mlx5: Add flow-steering commands for FPGA IPSec implementation") Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20mlx5: Remove call to ida_pre_getMatthew Wilcox1-1/+0
The mlx5 driver calls ida_pre_get() in a loop for no readily apparent reason. The driver uses ida_simple_get() which will call ida_pre_get() by itself and there's no need to use ida_pre_get() unless using ida_get_new(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+1
Pull rdma fixes from Doug Ledford: - Various driver bug fixes in mlx5, mlx4, bnxt_re and qedr, ranging from bugs under load to bad error case handling - There in one largish patch fixing the locking in bnxt_re to avoid a machine hard lock situation - A few core bugs on error paths - A patch to reduce stack usage in the new CQ API - One mlx5 regression introduced in this merge window - There were new syzkaller scripts written for the RDMA subsystem and we are fixing issues found by the bot - One of the commits (aa0de36a40f4 “RDMA/mlx5: Fix integer overflow while resizing CQ”) is missing part of the commit log message and one of the SOB lines. The original patch was from Leon Romanovsky, and a cut-n-paste separator in the commit message confused patchworks which then put the end of message separator in the wrong place in the downloaded patch, and I didn’t notice in time. The patch made it into the official branch, and the only way to fix it in-place was to rebase. Given the pain that a rebase causes, and the fact that the patch has relevant tags for stable and syzkaller, a revert of the munged patch and a reapplication of the original patch with the log message intact was done. * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits) RDMA/mlx5: Fix integer overflow while resizing CQ Revert "RDMA/mlx5: Fix integer overflow while resizing CQ" RDMA/ucma: Check that user doesn't overflow QP state RDMA/mlx5: Fix integer overflow while resizing CQ RDMA/ucma: Limit possible option size IB/core: Fix possible crash to access NULL netdev RDMA/bnxt_re: Avoid Hard lockup during error CQE processing RDMA/core: Reduce poll batch for direct cq polling IB/mlx5: Fix an error code in __mlx5_ib_modify_qp() IB/mlx5: When not in dual port RoCE mode, use provided port as native IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs RDMA/qedr: Fix iWARP write and send with immediate RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA RDMA/qedr: Fix iWARP connect with port mapper RDMA/qedr: Fix ipv6 destination address resolution IB/core : Add null pointer check in addr_resolve RDMA/bnxt_re: Fix the ib_reg failure cleanup RDMA/bnxt_re: Fix incorrect DB offset calculation RDMA/bnxt_re: Unconditionly fence non wire memory operations ...
2018-03-08Merge tag 'mlx5-updates-2018-02-28-2' of ↵David S. Miller11-278/+1620
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-2 (IPSec-2) This series follows our previous one to lay out the foundations for IPSec in user-space and extend current kernel netdev IPSec support. As noted in our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)", the IPSec mechanism will be supported through our flow steering mechanism. Therefore, we need to change the initialization order. Furthermore, IPsec is also supported in both egress and ingress. Since our current flow steering is egress only, we add an empty (only implemented through FPGA steering ops) egress namespace to handle that case. We also implement the required flow steering callbacks and logic in our FPGA driver. We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we add support for some new FPGA command interface that supports them. The other required bits are added too. The new features and requirements are advertised via cap bits. Last but not least, we revise our driver's accel_esp API. This API will be shared between our netdev and IB driver, so we need to have all the required functionality from both worlds. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net/mlx5: Fix wrongly assigned CQ reference counterLeon Romanovsky1-2/+1
The kernel compiled with CONFIG_REFCOUNT_FULL produces the following error. The reason to it that initial value of refcount_t is supposed to be more than 0, change it. [ 3.106634] ------------[ cut here ]------------ [ 3.107756] refcount_t: increment on 0; use-after-free. [ 3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 refcount_inc+0x27/0x30 [ 3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00028-gf683e04bdccc #137 [ 3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 3.110085] RIP: 0010:refcount_inc+0x27/0x30 [ 3.110085] RSP: 0000:ffffaa620000fba0 EFLAGS: 00010286 [ 3.110085] RAX: 0000000000000000 RBX: ffff9a6d1a1821c8 RCX: ffffffff98a50f48 [ 3.110085] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000246 [ 3.110085] RBP: ffff9a6d1ac800a0 R08: 0000000000000289 R09: 000000000000000a [ 3.110085] R10: fffff03bc0682840 R11: ffffffff9949856d R12: ffff9a6d1b4a4000 [ 3.110085] R13: 0000000000000000 R14: ffff9a6d1a0a6c00 R15: ffffaa620000fc5c [ 3.110085] FS: 0000000000000000(0000) GS:ffff9a6d1fc00000(0000) knlGS:0000000000000000 [ 3.110085] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.110085] CR2: 0000000000000000 CR3: 000000000ba0a000 CR4: 00000000000006b0 [ 3.110085] Call Trace: [ 3.110085] mlx5_core_create_cq+0xde/0x250 [ 3.110085] ? __kmalloc+0x1ce/0x1e0 [ 3.110085] mlx5e_create_cq+0x15c/0x1e0 [ 3.110085] mlx5e_open_drop_rq+0xea/0x190 [ 3.110085] mlx5e_attach_netdev+0x53/0x140 [ 3.110085] mlx5e_attach+0x3d/0x60 [ 3.110085] mlx5e_add+0x11d/0x2f0 [ 3.110085] mlx5_add_device+0x77/0x170 [ 3.110085] mlx5_register_interface+0x74/0xc0 [ 3.110085] ? set_debug_rodata+0x11/0x11 [ 3.110085] init+0x67/0x72 [ 3.110085] ? mlx4_en_init_ptys2ethtool_map+0x346/0x346 [ 3.110085] do_one_initcall+0x98/0x147 [ 3.110085] ? set_debug_rodata+0x11/0x11 [ 3.110085] kernel_init_freeable+0x164/0x1e0 [ 3.110085] ? rest_init+0xb0/0xb0 [ 3.110085] kernel_init+0xa/0x100 [ 3.110085] ret_from_fork+0x35/0x40 [ 3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89 [ 3.110085] ---[ end trace a0068e1c68438a74 ]--- Fixes: f105b45bf77c ("net/mlx5: CQ hold/put API") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: IPSec, Add support for ESNAviad Yehezkel5-12/+185
Currently ESN is not supported with IPSec device offload. This patch adds ESN support to IPsec device offload. Implementing new xfrm device operation to synchronize offloading device ESN with xfrm received SN. New QP command to update SA state at the following: ESN 1 ESN 2 ESN 3 |-----------*-----------|-----------*-----------|-----------* ^ ^ ^ ^ ^ ^ ^ - marks where QP command invoked to update the SA ESN state machine. | - marks the start of the ESN scope (0-2^32-1). At this point move SA ESN overlap bit to zero and increment ESN. * - marks the middle of the ESN scope (2^31). At this point move SA ESN overlap bit to one. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5e: Added common function for to_ipsec_sa_entryAviad Yehezkel1-10/+19
New function for getting driver internal sa entry from xfrm state. All checks are done in one function. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: Add flow-steering commands for FPGA IPSec implementationAviad Yehezkel5-0/+762
In order to add a context to the FPGA, we need to get both the software transform context (which includes the keys, etc) and the source/destination IPs (which are included in the steering rule). Therefore, we register new set of firmware like commands for the FPGA. Each time a rule is added, the steering core infrastructure calls the FPGA command layer. If the rule is intended for the FPGA, it combines the IPs information with the software transformation context and creates the respective hardware transform. Afterwards, it calls the standard steering command layer. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: Refactor accel IPSec codeAviad Yehezkel5-223/+528
The current code has one layer that executed FPGA commands and the Ethernet part directly used this code. Since downstream patches introduces support for IPSec in mlx5_ib, we need to provide some abstractions. This patch refactors the accel code into one layer that creates a software IPSec transformation and another one which creates the actual hardware context. The internal command implementation is now hidden in the FPGA core layer. The code also adds the ability to share FPGA hardware contexts. If two contexts are the same, only a reference count is taken. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: Added required metadata capability for ipsecAviad Yehezkel1-2/+4
Currently our device requires additional metadata in packet to perform ipsec crypto offload. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: Export ipsec capabilitiesAviad Yehezkel4-24/+16
We will need that for ipsec verbs. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: IPSec, Add command V2 supportAviad Yehezkel5-43/+63
This patch adds V2 command support. New fpga devices support extended features (udp encap, esn etc...), this features require new hardware sadb format therefore we have a new version of commands to manipulate it. Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5e: IPSec, Add support for ESP trailer removal by hardwareYossi Kuperman5-1/+68
Current hardware decrypts and authenticates incoming ESP packets. Subsequently, the software extracts the nexthdr field, truncates the trailer and adjusts csum accordingly. With this patch and a capable device, the trailer is being removed by the hardware and the nexthdr field is conveyed via PET. This way we avoid both the need to access the trailer (cache miss) and to compute its relative checksum, which significantly improve the performance. Experiment shows that trailer removal improves the performance by 2Gbps, (netperf). Both forwarding and host-to-host configurations. Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: IPSec, Generalize sandbox QP commandsYossi Kuperman1-51/+65
The current code assume only SA QP commands. Refactor in order to pave the way for new QP commands: 1. Generic cmd response format. 2. SA cmd checks are in dedicated functions. 3. Aligned debug prints. Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-08net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec capsSaeed Mahameed1-2/+1
Fix build break of mlx5_accel_ipsec_device_caps is not defined when MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles such case. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Doug Ledford <dledford@redhat.com>
2018-03-07Merge tag 'mlx5-updates-2018-02-28-1' of ↵David S. Miller11-161/+339
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-1 (IPSec-1) This series consists of some fixes and refactors for the mlx5 drivers, especially around the FPGA and flow steering. Most of them are trivial fixes and are the foundation of allowing IPSec acceleration from user-space. We use flow steering abstraction in order to accelerate IPSec packets. When a user creates a steering rule, [s]he states that we'll carry an encrypt/decrypt flow action (using a specific configuration) for every packet which conforms to a certain match. Since currently offloading these packets is done via FPGA, we'll add another set of flow steering ops. These ops will execute the required FPGA commands and then call the standard steering ops. In order to achieve this, we need that the commands will get all the required information. Therefore, we pass the fte object and embed the flow_action struct inside the fte. In addition, we add the shim layer that will later be used for alternating between the standard and the FPGA steering commands. Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout" are very relevant for user-space applications, as these applications could be killed, but we still want to wait for the FPGA and update the kernel's database. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07net/mlx5: Flow steering cmd interface should get the fte when deletingAviad Yehezkel3-5/+5
Previously, deleting a flow steering entry only got the index. Since the FPGA implementation of FTE's deletion might need to dig inside the FTE itself, we would like to get the FTE's context. Changing the interface to pass the FTE context. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5: Embed mlx5_flow_act into fs_fteMatan Barak4-25/+21
fte objects contain the match value and action. Currently, extending the actions require in adding them both to the API and fs_fte. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5: Add empty egress namespace to flow steering coreAviad Yehezkel4-0/+34
Currently, we don't support egress flow steering namespace in mlx5 flow steering core implementation. However, when we want to encrypt a packet, we model it as a flow steering rule in the egress path. To overcome this, we add an empty egress namespace to flow steering. This namespace is initialized only when ipsec support exists. In the future, this will grow to a full blown full steering implementation, resembling the ingress path. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5: Add shim layer between fs and cmdMatan Barak4-100/+248
The shim layer allows each namespace to define possibly different functionality for add/delete/update commands. The shim layer introduced here, will be used to support flow steering with the FPGA. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07{net,IB}/mlx5: Add has_tag to mlx5_flow_actMatan Barak2-1/+2
The has_tag member will indicate whether a tag action was specified in flow specification. A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0 means that the user doesn't care about flow_tag. HW always provide a flow_tag = 0 if all flow tags requested on a specific flow are 0. So we need a way (in the driver) to differentiate between a user really requesting flow_tag = 0 and a user who does not care, in order to be able to report conflicting flow tags on a specific flow. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5: FPGA and IPSec initialization to be before flow steeringMatan Barak1-19/+20
Some flow steering namespace initialization (i.e. egress namespace) might depend on FPGA capabilities. Changing the initialization order such that the FPGA will be initialized before flow steering. Flow steering fs cmds initialization might depend on IPSec capabilities. Changing the initialization order such that the IPSec will be initialized before flow steering as well. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5e: Removed not need synchronize_rcuAviad Yehezkel1-2/+2
This is already done by xfrm layer between state_dev_del callback to state_dev_free callback. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5e: Fixed sleeping inside atomic contextAviad Yehezkel1-9/+4
We can't allocate with GFP_KERNEL inside spinlock. Actually ida_simple doesn't require spinlock so remove it. Fixes: 547eede070eb ("net/mlx5e: IPSec, Innova IPSec offload infrastructure") Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5e: Wait for FPGA command responses with a timeoutAviad Yehezkel1-3/+6
Generally, FPGA IPSec commands must always complete. We want to wait for one minute for them to complete gracefully also when killing a process. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07net/mlx5: Fixed compilation issue when CONFIG_MLX5_ACCEL is disabledAviad Yehezkel1-2/+2
IPSec init and cleanup functions also depends on linux/mlx5/driver.h. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-28{net, IB}/mlx5: Raise fatal IB event when sys error occursDaniel Jurgens1-1/+1
All other mlx5_events report the port number as 1 based, which is how FW reports it in the port event EQE. Reporting 0 for this event causes mlx5_ib to not raise a fatal event notification to registered clients due to a seemingly invalid port. All switch cases in mlx5_ib_event that go through the port check are supposed to set the port now, so just do it once at variable declaration. Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver") Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28Merge tag 'mlx5-updates-2018-02-23' of ↵David S. Miller6-69/+126
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: mlx5-update-2018-02-23 (IB representors) From: Mark Bloch <markb@mellanox.com> ========= Add IB representor when in switchdev mode The following series adds support for an IB (RAW Ethernet only) device representor which is created when the user switches to switchdev mode. Today when switching to switchdev mode the only representors which are created are net devices. Each netdev is a representor of a virtual function and any data sent via the representor is received on the virtual function, and any data sent via the virtual function is received by the representor. For the mlx5 driver the main use of this functionality is to be able to use Open vSwitch on the hypervisor in order to manage/control traffic from/to the virtual functions. Open vSwitch can also work with DPDK devices and not just net devices, this series exposes an IB device, which Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK. An IB device representor exposes only RAW Ethernet QP capabilities and the ability to create flow rules to direct traffic to its RX queues. The state of the IB device (ACTIVE/DOWN etc..) is based on the state of the corresponding net device representor. No other RDMA/RoCE functionality is currently supported and no GID table is exposed. ========= Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller10-33/+70
2018-02-23net/mlx5: E-Switch, Reload IB interface when switching devlink modesMark Bloch4-17/+26
Up until this point it wasn't possible to activate IB representors when switching to switchdev mode, remove this limitation. We trigger reload of the PF IB interface in order to make sure that already allocated resources are invalid and new resources will be opened correctly with all the limitations of switchdev mode applied (only raw packet capabilities, without RoCE). We also move the remove/add to a place where the E-Switch mode is set/unset to better control when to trigger this action, this will allow the IB side to start in the correct mode. For better code reuse, create a function which reloads an interface and export it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Optimize HW steering tables in switchdev modeMark Bloch2-7/+44
Under switchdev mode we insert an eswitch miss rule causing any unmatched traffic to be sent towards the PF vport. This miss rule can be optimized if we break it to two, one case is for multicast traffic and the other for unicast. Breaking the miss rule into two (unicast and multicast) allows the firmware to program the hardware in a more efficient way. Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors): IXIA -> NIC -> PF -> IB representor -> NIC -> VF: - Without this optimization: 9.2 MPPS. - With this optimization: 18 MPPS. VF -> NIC -> IB representor-> PF -> NIC -> IXIA: - Without this optimization: 17 MPPS. - With this optimization: 23.4 MPPS. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Increase number of FTEs in FDB in switchdev modeMark Bloch1-2/+3
The max FTE number should be the max number of SQs that can be opened. Ethernet representors open one SQ each. Once we add IB representor this will increase (depends on the user). For now lets start with 31 per IB representor and if needed increase in the future. This increase only affects the number of FTEs in the slow path FDB, offloaded rules (done via TC on the fast path portion of the FDB) aren't affected. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Move representors definition to a global scopeMark Bloch4-49/+19
In preparation for IB representors, move representors structs to a global scope, also expose functions needed for registration, unregistration, eswitch mode and creating a flow rule to direct traffic from SQs to the right VF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Add callback to get representor deviceMark Bloch3-0/+40
Add a callback interface to get a protocol device (per representor type). The Ethernet representors will expose their netdev via this interface. This functionality can be later used by IB representor in order to find the corresponding net device representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: Fix error handling when adding flow rulesVlad Buslov1-2/+8
If building match list or adding existing fg fails when node is locked, function returned without unlocking it. This happened if node version changed or adding existing fg returned with EAGAIN after jumping to search_again_locked label. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: E-Switch, Fix drop counters use before creationEugenia Emantayev1-4/+4
First use of drop counters happens in esw_apply_vport_conf function, while they are allocated later in the flow. Fix that by moving esw_vport_create_drop_counters function to be called before the first use. Fixes: b8a0dbe3a90b ("net/mlx5e: E-switch, Add steering drop counters") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: Add header re-write to the checks for conflicting actionsOr Gerlitz1-1/+2
We can't allow only some of the rules sharing an FTE to ask for header re-write, add it to the conflicting action checks. Fixes: 0d235c3fabb7 ('net/mlx5: Add hash table to search FTEs in a flow-group') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: Use 128B cacheline size for 128B or larger cachelinesDaniel Jurgens1-1/+1
The adapter uses the cache_line_128byte setting to set the bounds for end padding. On systems where the cacheline size is greater than 128B use 128B instead of the default of 64B. This results in fewer partial cacheline writes. There's a 50% chance it will pad to the end of a 256B cache line vs only 25% when using 64B. Fixes: f32f5bd2eb7e ("net/mlx5: Configure cache line size for start and end padding") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5e: Specify numa node when allocating drop rqGal Pressman1-2/+8
When allocating a drop rq, no numa node is explicitly set which means allocations are done on node zero. This is not necessarily the nearest numa node to the HCA, and even worse, might even be a memoryless numa node. Choose the numa_node given to us by the pci device in order to properly allocate the coherent dma memory instead of assuming zero is valid. Fixes: 556dd1b9c313 ("net/mlx5e: Set drop RQ's necessary parameters only") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5e: Return error if prio is specified when offloading eswitch vlan pushOr Gerlitz1-1/+2
This isn't supported when we emulate eswitch vlan push action which is the current state of things. Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5: Address static checker warnings on non-constant initializersOr Gerlitz1-4/+4
Address these sparse warnings on drivers/net/ethernet/mellanox/mlx5 [..]/core/diag/fs_tracepoint.c:99:53: warning: non-constant initializer for static object [..]/core/diag/fs_tracepoint.c:102:53: warning: non-constant initializer for static object etc Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5e: Eliminate build warnings on no previous prototypeOr Gerlitz2-2/+3
Fix these gcc warnings on drivers/net/ethernet/mellanox/mlx5: [..]/core/lib/clock.c:454:6: warning: no previous prototype for 'mlx5_init_clock' [-Wmissing-prototypes] [..]/core/lib/clock.c:510:6: warning: no previous prototype for 'mlx5_cleanup_clock' [-Wmissing-prototypes] [..]/core/en_main.c:3141:5: warning: no previous prototype for 'mlx5e_setup_tc' [-Wmissing-prototypes] Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5e: Verify inline header size do not exceed SKB linear sizeEran Ben Elisha1-1/+1
Driver tries to copy at least MLX5E_MIN_INLINE bytes into the control segment of the WQE. It assumes that the linear part contains at least MLX5E_MIN_INLINE bytes, which can be wrong. Cited commit verified that driver will not copy more bytes into the inline header part that the actual size of the packet. Re-factor this check to make sure we do not exceed the linear part as well. This fix is aligned with the current driver's assumption that the entire L2 will be present in the linear part of the SKB. Fixes: 6aace17e64f4 ("net/mlx5e: Fix inline header size for small packets") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5e: Fix loopback self test when GRO is offInbar Karmy1-1/+2
When GRO is off, the transport header pointer in sk_buff is initialized to network's header. To find the udp header, instead of using udp_hdr() which assumes skb_network_header was set, manually calculate the udp header offset. Fixes: 0952da791c97 ("net/mlx5e: Add support for loopback selftest") Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20net/mlx5e: Fix TCP checksum in LRO buffersGal Pressman1-14/+35
When receiving an LRO packet, the checksum field is set by the hardware to the checksum of the first coalesced packet. Obviously, this checksum is not valid for the merged LRO packet and should be fixed. We can use the CQE checksum which covers the checksum of the entire merged packet TCP payload to help us calculate the checksum incrementally. Tested by sending IPv4/6 traffic with LRO enabled, RX checksum disabled and watching nstat checksum error counters (in addition to the obvious bandwidth drop caused by checksum errors). This bug is usually "hidden" since LRO packets would go through the CHECKSUM_UNNECESSARY flow which does not validate the packet checksum. It's important to note that previous to this patch, LRO packets provided with CHECKSUM_UNNECESSARY are indeed packets with a correct validated checksum (even though the checksum inside the TCP header is incorrect), since the hardware LRO aggregation is terminated upon receiving a packet with bad checksum. Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15IB/mlx5: Implement fragmented completion queue (CQ)Yonatan Cohen4-43/+45
The current implementation of create CQ requires contiguous memory, such requirement is problematic once the memory is fragmented or the system is low in memory, it causes for failures in dma_zalloc_coherent(). This patch implements new scheme of fragmented CQ to overcome this issue by introducing new type: 'struct mlx5_frag_buf_ctrl' to allocate fragmented buffers, rather than contiguous ones. Base the Completion Queues (CQs) on this new fragmented buffer. It fixes following crashes: kworker/29:0: page allocation failure: order:6, mode:0x80d0 CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0 Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<>] dump_stack+0x19/0x1b [<>] warn_alloc_failed+0x110/0x180 [<>] __alloc_pages_slowpath+0x6b7/0x725 [<>] __alloc_pages_nodemask+0x405/0x420 [<>] dma_generic_alloc_coherent+0x8f/0x140 [<>] x86_swiotlb_alloc_coherent+0x21/0x50 [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core] [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core] [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core] [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core] [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib] [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib] Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>