summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
AgeCommit message (Collapse)AuthorFilesLines
2022-05-04net/mlx5: Merge various control path IPsec headers into one fileLeon Romanovsky1-1/+1
The mlx5 IPsec code has logical separation between code that operates with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering logic (ipsec_fs.c) and data path (ipsec_rxtx.c). Such separation makes sense for C-files, but isn't needed at all for H-files as they are included in batch anyway. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-04-09net/mlx5: Move IPsec file to relevant directoryLeon Romanovsky1-1/+1
IPsec is part of ethernet side of mlx5 driver and needs to be placed in en_accel folder. Link: https://lore.kernel.org/r/a0ca88f4d9c602c574106c0de0511803e7dcbdff.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5: Unify device IPsec capabilities checkLeon Romanovsky1-2/+2
Merge two different function to one in order to provide coherent picture if the device is IPsec capable or not. Link: https://lore.kernel.org/r/8f10ea06ad19c6f651e9fb33921009658f01e1d5.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5: Remove ipsec vs. ipsec offload file separationLeon Romanovsky1-1/+1
The IPsec won't be initialized at all if device doesn't support IPsec offload. It means that we can combine the ipsec.c and ipsec_offload.c files to one file. Such change will allow us to remove ipsec_ops indirection. Link: https://lore.kernel.org/r/d0ac1fb7b14c10ae20a21ae17a393ee860c72ac3.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5_fpga: Drop INNOVA IPsec supportLeon Romanovsky1-7/+0
Mellanox INNOVA IPsec cards are EOL in Nov, 2019 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf Link: https://lore.kernel.org/r/2afe88ec5020a491079eacf6fe3c89b64d65195c.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06net/mlx5: Cleanup kTLS function names and their exposureLeon Romanovsky1-2/+2
The _accel_ part of the function is not relevant anymore, so rename kTLS functions to be without it, together with header cleanup to do not have declarations that are not used. Link: https://lore.kernel.org/r/72319e6020fb2553d02b3bbc7476bda363f6d60c.1649073691.git.leonro@nvidia.com Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06net/mlx5: Remove tls vs. ktls separation as it is the sameLeon Romanovsky1-1/+1
After removal FPGA TLS, we can remove tls->ktls indirection too, as it is the same thing. Link: https://lore.kernel.org/r/67e596599edcffb0de43f26551208dfd34ac777e.1649073691.git.leonro@nvidia.com Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-18net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer modeMaxim Mikityanskiy1-1/+3
When MPWQE is disabled, mlx5e_open_xdpsq() prefills the common fields of WQEs in the XDP SQ to save time when sending packets. mlx5e_xmit_xdp_frame() runs on the prefilled fields, however, sending multi buffer XDP frames would require changing some of these fields on a per-packet basis. Besides that, mlx5e_xmit_xdp_frame() will be used as a fallback to send multi buffer XDP frames when MPWQE is enabled (MPWQE can only handle linear packets). In order to prepare for XDP multi buffer support, this commit introduces a mode for mlx5e_xmit_xdp_frame() that fills all the fields itself. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-18net/mlx5e: Use page-sized fragments with XDP multi bufferMaxim Mikityanskiy1-1/+1
The implementation of XDP in mlx5e assumes that the frame size is equal to the page size. Force this limitation in the non-linear mode for XDP multi buffer. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-18net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDPMaxim Mikityanskiy1-9/+19
XDP multi buffer implementation in the kernel assumes that all fragments have the same size. bpf_xdp_frags_increase_tail uses this assumption to get the size of the last fragment, and __xdp_build_skb_from_frame uses it to calculate truesize as nr_frags * xdpf->frame_sz. The current implementation of mlx5e uses fragments of different size in non-linear legacy RQ. Specifically, the last fragment can be larger than the others. It's an optimization for packets smaller than MTU. This commit adapts mlx5e to the kernel limitations and makes it use fragments of the same size, in order to add support for XDP multi buffer. The change is applied only if XDP is active, otherwise the old optimization still applies. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17net/mlx5e: Build SKB in place over the first fragment in non-linear legacy RQMaxim Mikityanskiy1-11/+32
As a performance optimization and preparation to enabling XDP multi buffer on non-linear legacy RQ, build the linear part of the SKB over the first fragment, instead of allocating a new buffer and copying the first 256 bytes there. To achieve this, add headroom and tailroom to the first fragment. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17net/mlx5e: Validate MTU when building non-linear legacy RQ fragments infoMaxim Mikityanskiy1-7/+27
mlx5e_build_rq_frags_info() assumes that MTU is not bigger than PAGE_SIZE * MLX5E_MAX_RX_FRAGS, which is 16K for 4K pages. Currently, the firmware limits MTU to 10K, so the assumption doesn't lead to a bug. This commits adds an additional driver check for reliability, since the firmware boundary might be changed. The calculation is taken to a separate function with a comment explaining it. It's a preparation for the following patches that introcuce XDP multi buffer support. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: RX, Restrict bulk size for small Striding RQsTariq Toukan1-0/+6
In RQs of type multi-packet WQE (Striding RQ), each WQE is relatively large (typically 256KB) but their number is relatively small (8 in default). Re-mapping the descriptors' buffers before re-posting them is done via UMR (User-Mode Memory Registration) operations. On the one hand, posting UMR WQEs in bulks reduces communication overhead with the HW and better utilizes its processing units. On the other hand, delaying the WQE repost operations for a small RQ (say, of 4 WQEs) might drastically hit its performance, causing packet drops due to no receive buffer, for high or bursty incoming packets rate. Here we restrict the bulk size for too small RQs. Effectively, with the current constants, RQ of size 4 (minimum allowed) would have no bulking, while larger RQs will continue working with bulks of 2. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-17net/mlx5e: Default to Striding RQ when not conflicting with CQE compressionTariq Toukan1-2/+3
CQE compression is turned on by default on slow pci systems to help reduce the load on pci. In this case, Striding RQ was turned off as CQEs of packets that span several strides were not compressed, significantly reducing the compression effectiveness. This issue does not exist when using the newer mini_cqe format "stride_index". Hence, allow defaulting to Striding RQ in this case. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-15net/mlx5e: Use FW limitation for max MPW WQEBBsAya Levin1-1/+1
Calculate maximal count of MPW WQEBBs on SQ's creation and store it there. Remove MLX5E_TX_MPW_MAX_NUM_DS and MLX5E_TX_MPW_MAX_WQEBBS. Update mlx5e_tx_mpwqe_is_full() and mlx5e_xdp_mpqwe_is_full() . Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-15net/mlx5e: Read max WQEBBs on the SQ from firmwareAya Levin1-4/+4
Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks, where WQE is a Work Queue Element) on the TX side was assumed to be 16 (fixed value). All firmware versions till today comply to this. In order to be more flexible and resilient, read from FW the corresponding: max_wqe_sz_sq. This value describes the maximum WQE size given in bytes, thus max WQEBBs is given by the division in WQEBB's byte size. The driver uses the top between 16 and the division result. This ensures synchronization between driver and firmware and avoids unexpected behavior. Store this value on the different SQs (Send Queues) for easy access. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-03net/mlx5e: SHAMPO, clean MLX5E_MAX_KLM_PER_WQE macroBen Ben-Ishay1-1/+1
This commit reduces unused variable from MLX5E_MAX_KLM_PER_WQE macro that introduced by commit d7b896acbdcb ("net/mlx5e: Add support to klm_umr_wqe"). Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Add control path for SHAMPO featureBen Ben-Ishay1-11/+131
This commit introduces the control path infrastructure for SHAMPO feature. SHAMPO feature enables packet stitching by splitting packets to header and payload, the header is placed on a dedicated buffer and the payload on the RX ring, this allows stitching the data part of a flow together continuously in the receive buffer. SHAMPO feature is implemented as linked list striding RQ feature. To support packets splitting and payload stitching: - Enlarge the ICOSQ and the correspond CQ to support the header buffer memory regions. - Add support to create linked list striding RQ with SHAMPO feature set in the open_rq function. - Add deallocation function and corresponded calls for SHAMPO header buffer. - Add mlx5e_create_umr_klm_mkey to support KLM mkey for the header buffer. - Rename mlx5e_create_umr_mkey to mlx5e_create_umr_mtt_mkey. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Rename TIR lro functions to TIR packet merge functionsKhalid Manaa1-15/+6
This series introduces new packet merge type, therefore rename lro functions to packet merge to support the new merge type: - Generalize + rename mlx5e_build_tir_ctx_lro to mlx5e_build_tir_ctx_packet_merge. - Rename mlx5e_modify_tirs_lro to mlx5e_modify_tirs_packet_merge. - Rename lro bit in mlx5_ifc_modify_tir_bitmask_bits to packet_merge. - Rename lro_en in mlx5e_params to packet_merge_type type and combine packet_merge params into one struct mlx5e_packet_merge_param. Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-27net/mlx5e: Rename lro_timeout to packet_merge_timeoutBen Ben-Ishay1-1/+1
TIR stands for transport interface receive, the TIR object is responsible for performing all transport related operations on the receive side like packet processing, demultiplexing the packets to different RQ's, etc. lro_timeout is a field in the TIR that is used to set the timeout for lro session, this series introduces new packet merge type, therefore rename lro_timeout to packet_merge_timeout for all packet merge types. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+10
Conflicting commits, all resolutions pretty trivial: drivers/bus/mhi/pci_generic.c 5c2c85315948 ("bus: mhi: pci-generic: configurable network interface MRU") 56f6f4c4eb2a ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean") drivers/nfc/s3fwrn5/firmware.c a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label") 46573e3ab08f ("nfc: s3fwrn5: fix undefined parameter values in dev_err()") 801e541c79bb ("nfc: s3fwrn5: fix undefined parameter values in dev_err()") MAINTAINERS 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") 8a7b46fa7902 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-28net/mlx5e: RX, Avoid possible data corruption when relaxed ordering and LRO ↵Tariq Toukan1-1/+10
combined When HW aggregates packets for an LRO session, it writes the payload of two consecutive packets of a flow contiguously, so that they usually share a cacheline. The first byte of a packet's payload is written immediately after the last byte of the preceding packet. In this flow, there are two consecutive write requests to the shared cacheline: 1. Regular write for the earlier packet. 2. Read-modify-write for the following packet. In case of relaxed-ordering on, these two writes might be re-ordered. Using the end padding optimization (to avoid partial write for the last cacheline of a packet) becomes problematic if the two writes occur out-of-order, as the padding would overwrite payload that belongs to the following packet, causing data corruption. Avoid this by disabling the end padding optimization when both LRO and relaxed-ordering are enabled. Fixes: 17347d5430c4 ("net/mlx5e: Add support for PCI relaxed ordering") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26net/mlx5e: Remove mlx5e_priv usage from mlx5e_build_*tir_ctx*()Maxim Mikityanskiy1-0/+12
The functions that build TIR context for TIR create and modify commands used to depend on struct mlx5e_priv and fetch some values directly from different places. It increased coupling of code and the chance of weird misbehavior due to hidden complex dependencies. As the first step, this commit removes the priv parameter from these functions. Instead, the necessary values are passed directly. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-03net/mlx5e: Disable TLS device offload in kdump modeAlaa Hleihel1-2/+2
Under kdump environment we want to use the smallest possible amount of resources, that includes setting SQ size to minimum. However, when running on a device that supports TLS device offload, then the SQ stop room becomes larger than with non-capable device and requires increasing the SQ size. Since TLS device offload is not necessary in kdump mode, disable it to reduce the memory requirements for capable devices. With this change, the needed SQ stop room size drops by 33. Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-03net/mlx5e: Zero-init DIM structuresLama Kayal1-2/+2
Initialize structs to avoid unexpected behavior. No immediate issue in current code, structs are return values, it's safer to initialize. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-20net/mlx5e: RX, Add checks for calculated Striding RQ attributesTariq Toukan1-30/+56
Striding RQ attributes below are mutually dependent. An unaware change to one might take the others out of the valid range derived by the HW caps: - The MPWQE size in bytes - The number of strides in a MPWQE - The stride size Add checks to verify they are valid and comply to the HW spec and SW assumptions/requirements. This is not a fix, no particular issue exists today. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-20net/mlx5e: Fix lost changes during code movementsTariq Toukan1-2/+3
The changes done in commit [1] were missed by the code movements done in [2], as they were developed in ~parallel. Here we re-apply them. [1] commit e4484d9df500 ("net/mlx5e: Enable striding RQ for Connect-X IPsec capable devices") [2] commit b3a131c2a160 ("net/mlx5e: Move params logic into its dedicated file") Fixes: b3a131c2a160 ("net/mlx5e: Move params logic into its dedicated file") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16net/mlx5e: kTLS, Add resiliency to RX resync failuresTariq Toukan1-0/+3
When the TLS logic finds a tcp seq match for a kTLS RX resync request, it calls the driver callback function mlx5e_ktls_resync() to handle it and communicate it to the device. Errors might occur during mlx5e_ktls_resync(), however, they are not reported to the stack. Moreover, there is no error handling in the stack for these errors. In this patch, the driver obtains responsibility on errors handling, adding queue and retry mechanisms to these resyncs. We maintain a linked list of resync matches, and try posting them to the async ICOSQ in the NAPI context. Only possible failure that demands driver handling is ICOSQ being full. By relying on the NAPI mechanism, we make sure that the entries in list will be handled when ICOSQ completions arrive and make some room available. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-26net/mlx5e: Restrict usage of mlx5e_priv in params logic functionsTariq Toukan1-48/+46
Do not use generic struct mlx5e_priv as a parameter to param functions, as it is too generic. All calculations of the channel's param should be mainly based on struct mlx5_core_dev and struct mlx5e_params. Additional info can be explicitly passed. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-26net/mlx5e: Move params logic into its dedicated fileTariq Toukan1-5/+477
Take params logic out of en_main.c, into the dedicated params.c. Some functions are now hidden and become static. No functional change here. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-02net/mlx5e: remove h from printk format specifierTom Rix1-1/+1
This change fixes the checkpatch warning described in this commit commit cbacb5ab0aa0 ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") Standard integer promotion is already done and %hx and %hhx is useless so do not encourage the use of %hh[xudi] or %h[xudi]. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-11-05net/mlx5e: Validate stop_room size upon user inputVladyslav Tarasiuk1-0/+34
Stop room is a space that may be taken by WQEs in the SQ during a packet transmit. It is used to check if next packet has enough room in the SQ. Stop room guarantees this packet can be served and if not, the queue is stopped, so no more packets are passed to the driver until it's ready. Currently, stop_room size is calculated and validated upon tx queues allocation. This makes it impossible to know if user provided valid input for certain parameters when interface is down. Instead, store stop_room in mlx5e_sq_param and create mlx5e_validate_params(), to validate its fields upon user input even when the interface is down. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-05-22mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOLBjörn Töpel1-6/+7
Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common in AF_XDP core and was related to XSK RX frame allocation, DMA mapping, etc.) and slightly improve performance (RX +0.8 Mpps, TX +0.4 Mpps). rfc->v1: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim) v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim) v2->v3: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff API for DMA sync on TX, add performance numbers. (Maxim) v3->v4: Remove unused variable num_xsk_frames. (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-12-bjorn.topel@gmail.com
2019-08-31net/mlx5e: Allow XSK frames smaller than a pageMaxim Mikityanskiy1-4/+19
Relax the requirements to the XSK frame size to allow it to be smaller than a page and even not a power of two. The current implementation can work in this mode, both with Striding RQ and without it. The code that checks `mtu + headroom <= XSK frame size` is modified accordingly. Any frame size between 2048 and PAGE_SIZE is accepted. Functions that worked with pages only now work with XSK frames, even if their size is different from PAGE_SIZE. With XSK queues, regardless of the frame size, Striding RQ uses the stride size of PAGE_SIZE, and UMR MTTs are posted using starting addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no packet data will overlap with other frames. UMR MTT size is made equal to the stride size of the RQ, because UMEM frames may come in random order, and we need to handle them one by one. PAGE_SIZE is just a power of two that is bigger than any allowed XSK frame size, and also it doesn't require making additional changes to the code. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27net/mlx5e: Add XSK zero-copy supportMaxim Mikityanskiy1-21/+32
This commit adds support for AF_XDP zero-copy RX and TX. We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one for non-XSK traffic and the other for XSK traffic. The regular and XSK RQs use a single ID namespace split into two halves: the lower half is regular RQs, and the upper half is XSK RQs. When any zero-copy AF_XDP socket is active, changing the number of channels is not allowed, because it would break to mapping between XSK RQ IDs and channels. XSK requires different page allocation and release routines. Such functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are generic enough to be used for both regular and XSK RQs, and they use the mlx5e_page_{alloc,release} wrappers around the real allocation functions. Function pointers are not used to avoid losing the performance with retpolines. Wherever it's certain that the regular (non-XSK) page release function should be used, it's called directly. Only the stats that could be meaningful for XSK are exposed to the userspace. Those that don't take part in the XSK flow are not considered. Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ), because the newer xdpsock sample doesn't provide any Fill Ring entries at the setup stage. We create a dedicated XSK SQ in the channel. This separation has its advantages: 1. When the UMEM is closed, the XSK SQ can also be closed and stop receiving completions. If an existing SQ was used for XSK, it would continue receiving completions for the packets of the closed socket. If a new UMEM was opened at that point, it would start getting completions that don't belong to it. 2. Calculating statistics separately. When the userspace kicks the TX, the driver triggers a hardware interrupt by posting a NOP to a dedicated XSK ICO (internal control operations) SQ, in order to trigger NAPI on the right CPU core. This XSK ICO SQ is protected by a spinlock, as the userspace application may kick the TX from any core. Store the pointers to the UMEMs in the net device private context, independently from the kernel. This way the driver can distinguish between the zero-copy and non-zero-copy UMEMs. The kernel function xdp_get_umem_from_qid does not care about this difference, but the driver is only interested in zero-copy UMEMs, particularly, on the cleanup it determines whether to close the XSK RQ and SQ or not by looking at the presence of the UMEM. Use state_lock to protect the access to this area of UMEM pointers. LRO isn't compatible with XDP, but there may be active UMEMs while XDP is off. If this is the case, don't allow LRO to ensure XDP can be reenabled at any time. The validation of XSK parameters typically happens when XSK queues open. However, when the interface is down or the XDP program isn't set, it's still possible to have active AF_XDP sockets and even to open new, but the XSK queues will be closed. To cover these cases, perform the validation also in these flows: 1. A new UMEM is registered, but the XSK queues aren't going to be created due to missing XDP program or interface being down. 2. MTU changes while there are UMEMs registered. Having this early check prevents mlx5e_open_channels from failing at a later stage, where recovery is impossible and the application has no chance to handle the error, because it got the successful return value for an MTU change or XSK open operation. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled, single stream: txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) rxdrop: 12.2 Mpps l2fwd: 9.4 Mpps The results with retpoline enabled, single stream: txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) rxdrop: 9.9 Mpps l2fwd: 6.8 Mpps Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27net/mlx5e: Consider XSK in XDP MTU limit calculationMaxim Mikityanskiy1-2/+2
Use the existing mlx5e_get_linear_rq_headroom function to calculate the headroom for mlx5e_xdp_max_mtu. This function takes the XSK headroom into consideration, which will be used in the following patches. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-27net/mlx5e: Calculate linear RX frag size considering XSKMaxim Mikityanskiy1-21/+44
Additional conditions introduced: - XSK implies XDP. - Headroom includes the XSK headroom if it exists. - No space is reserved for struct shared_skb_info in XSK mode. - Fragment size smaller than the XSK chunk size is not allowed. A new auxiliary function mlx5e_get_linear_rq_headroom with the support for XSK is introduced. Use this function in the implementation of mlx5e_get_rq_headroom. Change headroom to u32 to match the headroom field in struct xdp_umem. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23net/mlx5e: Remove unused parameterMaxim Mikityanskiy1-4/+3
mdev is unused in mlx5e_rx_is_linear_skb. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23net/mlx5e: Add an underflow warning commentMaxim Mikityanskiy1-2/+5
mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on the requested number of frames in the RQ (F) and the number of packets per WQE (P). It ensures that N is not less than the minimum number of WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min should be true. This function deals with logarithms, so it should check that log(F) - log(P) >= log(N_min). However, if F < P, this expression will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min) instead. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23net/mlx5e: Move parameter calculation functions to en/params.cMaxim Mikityanskiy1-0/+102
This commit moves the parameter calculation functions to a separate file for better modularity and code sharing with future features. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>