summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-03-11Merge branch '100GbE' of ↵David S. Miller1-0/+48
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-10Merge tag 'trace-ring-buffer-v6.8-rc7' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Do not allow large strings (> 4096) as single write to trace_marker The size of a string written into trace_marker was determined by the size of the sub-buffer in the ring buffer. That size is dependent on the PAGE_SIZE of the architecture as it can be mapped into user space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of the string of writing into trace_marker 64K. One of the selftests looks at the size of the ring buffer sub-buffers and writes that plus more into the trace_marker. The write will take what it can and report back what it consumed so that the user space application (like echo) will write the rest of the string. The string is stored in the ring buffer and can be read via the "trace" or "trace_pipe" files. The reading of the ring buffer uses vsnprintf(), which uses a precision "%.*s" to make sure it only reads what is stored in the buffer, as a bug could cause the string to be non terminated. With the combination of the precision change and the PAGE_SIZE of 64K allowing huge strings to be added into the ring buffer, plus the test that would actually stress that limit, a bug was reported that the precision used was too big for "%.*s" as the string was close to 64K in size and the max precision of vsnprintf is 32K. Linus suggested not to have that precision as it could hide a bug if the string was again stored without a nul byte. Another issue that was brought up is that the trace_seq buffer is also based on PAGE_SIZE even though it is not tied to the architecture limit like the ring buffer sub-buffer is. Having it be 64K * 2 is simply just too big and wasting memory on systems with 64K page sizes. It is now hardcoded to 8K which is what all other architectures with 4K PAGE_SIZE has. Finally, the write to trace_marker is now limited to 4K as there is no reason to write larger strings into trace_marker. - ring_buffer_wait() should not loop. The ring_buffer_wait() does not have the full context (yet) on if it should loop or not. Just exit the loop as soon as its woken up and let the callers decide to loop or not (they already do, so it's a bit redundant). - Fix shortest_full field to be the smallest amount in the ring buffer that a waiter is waiting for. The "shortest_full" field is updated when a new waiter comes in and wants to wait for a smaller amount of data in the ring buffer than other waiters. But after all waiters are woken up, it's not reset, so if another waiter comes in wanting to wait for more data, it will be woken up when the ring buffer has a smaller amount from what the previous waiters were waiting for. - The wake up all waiters on close is incorrectly called frome .release() and not from .flush() so it will never wake up any waiters as the .release() will not get called until all .read() calls are finished. And the wakeup is for the waiters in those .read() calls. * tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use .flush() call to wake up readers ring-buffer: Fix resetting of shortest_full ring-buffer: Fix waking up ring buffer readers tracing: Limit trace_marker writes to just 4K tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE tracing: Remove precision vsnprintf() check from print event
2024-03-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+26
Pull kvm fixes from Paolo Bonzini: "KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable from userspace, so there would be no way to write to a read-only guest_memfd). - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely for development and testing. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes. x86 fixes: - Fix missing marking of a guest page as dirty when emulating an atomic access. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC in non-preemptible mode). - Disable AMD DebugSwap by default, it breaks VMSA signing and will be re-enabled with a better VM creation API in 6.10. - Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock, to avoid a race with unregistering of the same region and the consequent use-after-free issue" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: SEV: disable SEV-ES DebugSwap by default KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-09Merge tag 'mlx5-socket-direct-v3' of ↵Jakub Kicinski2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Support Multi-PF netdev (Socket Direct) This series adds support for combining multiple devices (PFs) of the same port under one netdev instance. Passing traffic through different devices belonging to different NUMA sockets saves cross-numa traffic and allows apps running on the same netdev from different numas to still feel a sense of proximity to the device and achieve improved performance. We achieve this by grouping PFs together, and creating the netdev only once all group members are probed. Symmetrically, we destroy the netdev once any of the PFs is removed. The channels are distributed between all devices, a proper configuration would utilize the correct close numa when working on a certain app/cpu. We pick one device to be a primary (leader), and it fills a special role. The other devices (secondaries) are disconnected from the network in the chip level (set to silent mode). All RX/TX traffic is steered through the primary to/from the secondaries. Currently, we limit the support to PFs only, and up to two devices (sockets). * tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: Documentation: networking: Add description for multi-pf netdev net/mlx5: Enable SD feature net/mlx5e: Block TLS device offload on combined SD netdev net/mlx5e: Support per-mdev queue counter net/mlx5e: Support cross-vhca RSS net/mlx5e: Let channels be SD-aware net/mlx5e: Create EN core HW resources for all secondary devices net/mlx5e: Create single netdev per SD group net/mlx5: SD, Add debugfs net/mlx5: SD, Add informative prints in kernel log net/mlx5: SD, Implement steering for primary and secondaries net/mlx5: SD, Implement devcom communication and primary election net/mlx5: SD, Implement basic query and instantiation net/mlx5: SD, Introduce SD lib net/mlx5: Add MPIR bit in mcam_access_reg ==================== Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-09Merge tag 'for-net-next-2024-03-08' of ↵Jakub Kicinski5-56/+68
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - hci_conn: Only do ACL connections sequentially - hci_core: Cancel request on command timeout - Remove CONFIG_BT_HS - btrtl: Add the support for RTL8852BT/RTL8852BE-VT - btusb: Add support Mediatek MT7920 - btusb: Add new VID/PID 13d3/3602 for MT7925 - Add new quirk for broken read key length on ATS2851 * tag 'for-net-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (52 commits) Bluetooth: hci_sync: Fix UAF in hci_acl_create_conn_sync Bluetooth: Fix eir name length Bluetooth: ISO: Align broadcast sync_timeout with connection timeout Bluetooth: Add new quirk for broken read key length on ATS2851 Bluetooth: mgmt: remove NULL check in add_ext_adv_params_complete() Bluetooth: mgmt: remove NULL check in mgmt_set_connectable_complete() Bluetooth: btusb: Add support Mediatek MT7920 Bluetooth: btmtk: Add MODULE_FIRMWARE() for MT7922 Bluetooth: btnxpuart: Fix btnxpuart_close Bluetooth: ISO: Clean up returns values in iso_connect_ind() Bluetooth: fix use-after-free in accessing skb after sending it Bluetooth: af_bluetooth: Fix deadlock Bluetooth: bnep: Fix out-of-bound access Bluetooth: btusb: Fix memory leak Bluetooth: msft: Fix memory leak Bluetooth: hci_core: Fix possible buffer overflow Bluetooth: btrtl: fix out of bounds memory access Bluetooth: hci_h5: Add ability to allocate memory for private data Bluetooth: hci_sync: Fix overwriting request callback Bluetooth: hci_sync: Use QoS to determine which PHY to scan ... ==================== Link: https://lore.kernel.org/r/20240308181056.120547-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-09Merge tag 'ieee802154-for-net-next-2024-03-07' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2024-03-07 Various cross tree patches for ieee802154v drivers and a resource leak fix for ieee802154 llsec. Andy Shevchenko changed GPIO header usage for at86rf230 and mcr20a to only include needed headers. Bo Liu converted the at86rf230, mcr20a and mrf24j40 driver regmap support to use the maple tree register cache. Fedor Pchelkin fixed a resource leak in the llsec key deletion path. Ricardo B. Marliere made wpan_phy_class const. Tejun Heo removed WQ_UNBOUND from a workqueue call in ca8210. * tag 'ieee802154-for-net-next-2024-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next: ieee802154: cfg802154: make wpan_phy_class constant ieee802154: mcr20a: Remove unused of_gpio.h ieee802154: at86rf230: Replace of_gpio.h by proper one mac802154: fix llsec key resources release in mac802154_llsec_key_del ieee802154: ca8210: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call net: ieee802154: mrf24j40: convert to use maple tree register cache net: ieee802154: mcr20a: convert to use maple tree register cache net: ieee802154: at86rf230: convert to use maple tree register cache ==================== Link: https://lore.kernel.org/r/20240307195105.292085-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: add skb_data_unref() helperEric Dumazet1-0/+18
Similar to skb_unref(), add skb_data_unref() to save an expensive atomic operation (and cache line dirtying) when last reference on shinfo->dataref is released. I saw this opportunity on hosts with RAW sockets accidentally bound to UDP protocol, forcing an skb_clone() on all received packets. These RAW sockets had their receive queue full, so all clone packets were immediately dropped. When UDP recvmsg() consumes later the original skb, skb_release_data() is hitting atomic_sub_return() quite badly, because skb->clone has been set permanently. Note that this patch helps TCP TX performance, because TCP stack also use (fast) clones. This means that at least one of the two packets (the main skb or its clone) will no longer have to perform this atomic operation in skb_release_data(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240307123446.2302230-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08Merge tag 'wireless-next-2024-03-08' of ↵Jakub Kicinski3-8/+67
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The fourth "new features" pull request for v6.9 with changes both in stack and in drivers. The theme in this pull request is to fix sparse warnings but we still have some left in wireless subsystem. Otherwise quite normal. Major changes: rtw89 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support * NL80211_EXT_FEATURE_SET_SCAN_DWELL support rtw88 * support for more rtw8811cu and rtw8821cu devices mt76 * mt76x2u: add Netgear WNDA3100v3 USB * mt7915: newer ADIE version support * mt7925: radio temperature sensor support * mt7996: remove GCMP IGTK offload * tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (125 commits) wifi: rtw89: wow: move release offload packet earlier for WoWLAN mode wifi: rtw89: wow: set security engine options for 802.11ax chips only wifi: rtw89: update suspend/resume for different generation wifi: rtw89: wow: update config mac function with different generation wifi: rtw89: update DMA function with different generation wifi: rtw89: wow: update WoWLAN status register for different generation wifi: rtw89: wow: update WoWLAN reason register for different chips wifi: brcm80211: handle pmk_op allocation failure wifi: rtw89: coex: Add coexistence policy to decrease WiFi packet CRC-ERR wifi: rtw89: coex: When Bluetooth not available don't set power/gain wifi: rtw89: coex: add return value to ensure H2C command is success or not wifi: rtw89: coex: Reorder H2C command index to align with firmware wifi: rtw89: coex: add BTC ctrl_info version 7 and related logic wifi: rtw89: coex: add init_info H2C command format version 7 wifi: rtw89: 8922a: add coexistence helpers of SW grant wifi: rtw89: mac: add coexistence helpers {cfg/get}_plt wifi: cw1200: restore endian swapping wifi: wlcore: sdio: Rate limit wl12xx_sdio_raw_{read,write}() failures warns wifi: rtlwifi: Remove rtl_intf_ops.read_efuse_byte wifi: rtw88: 8821c: Fix false alarm count ... ==================== Link: https://lore.kernel.org/r/20240308100429.B8EA2C433F1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLEJens Axboe1-3/+3
We only use the flag for this purpose, so rename it accordingly. This further prevents various other use cases of it, keeping it clean and consistent. Then we can also check it in one spot, when it's being attempted recycled, and remove some dead code in io_kbuf_recycle_ring(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-08net: nexthop: Expose nexthop group HW stats to user spaceIdo Schimmel2-0/+27
Add netlink support for reading NH group hardware stats. Stats collection is done through a new notifier, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for a given NH group are thereby asked to collect the stats and report back to core by calling nh_grp_hw_stats_report_delta(). This is similar to what netdevice L3 stats do. Besides exposing number of packets that passed in the HW datapath, also include information on whether any driver actually realizes the counters. The core can tell based on whether it got any _report_delta() reports from the drivers. This allows enabling the statistics at the group at any time, with drivers opting into supporting them. This is also in line with what netdevice L3 stats are doing. So as not to waste time and space, tie the collection and reporting of HW stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add ability to enable / disable hardware statisticsIdo Schimmel2-0/+5
Add netlink support for enabling collection of HW statistics on nexthop groups. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add hardware statistics notificationsIdo Schimmel1-0/+3
Add hw_stats field to several notifier structures to communicate to the drivers that HW statistics should be configured for nexthops within a given group. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Expose nexthop group stats to user spaceIdo Schimmel1-0/+30
Add netlink support for reading NH group stats. This data is only for statistics of the traffic in the SW datapath. HW nexthop group statistics will be added in the following patches. Emission of the stats is keyed to a new op_stats flag to avoid cluttering the netlink message with stats if the user doesn't need them: NHA_OP_FLAG_DUMP_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add nexthop group entry statsIdo Schimmel1-0/+6
Add nexthop group entry stats to count the number of packets forwarded via each nexthop in the group. The stats will be exposed to user space for better data path observability in the next patch. The per-CPU stats pointer is placed at the beginning of 'struct nh_grp_entry', so that all the fields accessed for the data path reside on the same cache line: struct nh_grp_entry { struct nexthop * nh; /* 0 8 */ struct nh_grp_entry_stats * stats; /* 8 8 */ u8 weight; /* 16 1 */ /* XXX 7 bytes hole, try to pack */ union { struct { atomic_t upper_bound; /* 24 4 */ } hthr; /* 24 4 */ struct { struct list_head uw_nh_entry; /* 24 16 */ u16 count_buckets; /* 40 2 */ u16 wants_buckets; /* 42 2 */ } res; /* 24 24 */ }; /* 24 24 */ struct list_head nh_list; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct nexthop * nh_parent; /* 64 8 */ /* size: 72, cachelines: 2, members: 6 */ /* sum members: 65, holes: 1, sum holes: 7 */ /* last cacheline: 8 bytes */ }; Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add NHA_OP_FLAGSPetr Machata1-0/+3
In order to add per-nexthop statistics, but still not increase netlink message size for consumers that do not care about them, there needs to be a toggle through which the user indicates their desire to get the statistics. To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to use the attribute for carrying of arbitrary operation-specific flags, i.e. not make it specific for get / dump. Add the new attribute to get and dump policies, but do not actually allow any flags yet -- those will come later as the flags themselves are defined. Add the necessary parsing code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08tcp: Add skb addr and sock addr to arguments of tracepoint tcp_probe.fuyuanli1-2/+8
It is useful to expose skb addr and sock addr to user in tracepoint tcp_probe, so that we can get more information while monitoring receiving of tcp data, by ebpf or other ways. For example, we need to identify a packet by seq and end_seq when calculate transmit latency between layer 2 and layer 4 by ebpf, but which is not available in tcp_probe, so we can only use kprobe hooking tcp_rcv_established to get them. But we can use tcp_probe directly if skb addr and sock addr are available, which is more efficient. Signed-off-by: fuyuanli <fuyuanli@didiglobal.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: dqs: add NIC stall detector based on BQLJakub Kicinski2-0/+78
softnet_data->time_squeeze is sometimes used as a proxy for host overload or indication of scheduling problems. In practice this statistic is very noisy and has hard to grasp units - e.g. is 10 squeezes a second to be expected, or high? Delaying network (NAPI) processing leads to drops on NIC queues but also RTT bloat, impacting pacing and CA decisions. Stalls are a little hard to detect on the Rx side, because there may simply have not been any packets received in given period of time. Packet timestamps help a little bit, but again we don't know if packets are stale because we're not keeping up or because someone (*cough* cgroups) disabled IRQs for a long time. We can, however, use Tx as a proxy for Rx stalls. Most drivers use combined Rx+Tx NAPIs so if Tx gets starved so will Rx. On the Tx side we know exactly when packets get queued, and completed, so there is no uncertainty. This patch adds stall checks to BQL. Why BQL? Because it's a convenient place to add such checks, already called by most drivers, and it has copious free space in its structures (this patch adds no extra cache references or dirtying to the fast path). The algorithm takes one parameter - max delay AKA stall threshold and increments a counter whenever NAPI got delayed for at least that amount of time. It also records the length of the longest stall. To be precise every time NAPI has not polled for at least stall thrs we check if there were any Tx packets queued between last NAPI run and now - stall_thrs/2. Unlike the classic Tx watchdog this mechanism does not ignore stalls caused by Tx being disabled, or loss of link. I don't think the check is worth the complexity, and stall is a stall, whether due to host overload, flow control, link down... doesn't matter much to the application. We have been running this detector in production at Meta for 2 years, with the threshold of 8ms. It's the lowest value where false positives become rare. There's still a constant stream of reported stalls (especially without the ksoftirqd deferral patches reverted), those who like their stall metrics to be 0 may prefer higher value. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08netdev: add queue stat for alloc failuresJakub Kicinski2-0/+3
Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08netdev: add per-queue statisticsJakub Kicinski3-0/+76
The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move rps_sock_flow_table to net_hotdataEric Dumazet2-5/+7
rps_sock_flow_table and rps_cpu_mask are used in fast path. Move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-19-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: introduce include/net/rps.hEric Dumazet3-117/+127
Move RPS related structures and helpers from include/linux/netdevice.h and include/net/sock.h to a new include file. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08ipv6: move tcp_ipv6_hash_secret and udp_ipv6_hash_secret to net_hotdataEric Dumazet2-0/+3
Use a 32bit hole in "struct net_offload" to store the remaining 32bit secrets used by TCPv6 and UDPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-17-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08ipv6: move inet6_ehash_secret and udp6_ehash_secret into net_hotdataEric Dumazet2-0/+3
"struct inet6_protocol" has a 32bit hole in 32bit arches. Use it to store the 32bit secret used by UDP and TCP, to increase cache locality in rx path. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-16-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08inet: move inet_ehash_secret and udp_ehash_secret into net_hotdataEric Dumazet2-0/+4
"struct net_protocol" has a 32bit hole in 32bit arches. Use it to store the 32bit secret used by UDP and TCP, to increase cache locality in rx path. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-15-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08inet: move tcp_protocol and udp_protocol to net_hotdataEric Dumazet1-0/+2
These structures are read in rx path, move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-14-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08ipv6: move tcpv6_protocol and udpv6_protocol to net_hotdataEric Dumazet1-0/+4
These structures are read in rx path, move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-13-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08udp: move udpv4_offload and udpv6_offload to net_hotdataEric Dumazet1-0/+2
These structures are used in GRO and GSO paths. Move them to net_hodata for better cache locality. v2: udpv6_offload definition depends on CONFIG_INET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move skbuff_cache(s) to net_hotdataEric Dumazet2-1/+3
skbuff_cache, skbuff_fclone_cache and skb_small_head_cache are used in rx/tx fast paths. Move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move dev_rx_weight to net_hotdataEric Dumazet2-2/+1
dev_rx_weight is read from process_backlog(). Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move dev_tx_weight to net_hotdataEric Dumazet2-1/+1
dev_tx_weight is used in tx fast path. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move tcpv4_offload and tcpv6_offload to net_hotdataEric Dumazet1-0/+3
These are used in TCP fast paths. Move them into net_hotdata for better cache locality. v2: tcpv6_offload definition depends on CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move ip_packet_offload and ipv6_packet_offload to net_hotdataEric Dumazet1-0/+5
These structures are used in GRO and GSO paths. v2: ipv6_packet_offload definition depends on CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move netdev_max_backlog to net_hotdataEric Dumazet2-1/+1
netdev_max_backlog is used in rx fat path. Move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move ptype_all into net_hotdataEric Dumazet2-1/+1
ptype_all is used in rx/tx fast paths. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move netdev_tstamp_prequeue into net_hotdataEric Dumazet1-0/+1
netdev_tstamp_prequeue is used in rx path. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: move netdev_budget and netdev_budget to net_hotdataEric Dumazet1-0/+2
netdev_budget and netdev_budget are used in rx path (net_rx_action()) Move them into net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: introduce struct net_hotdataEric Dumazet3-4/+17
Instead of spreading networking critical fields all over the places, add a custom net_hotdata structure so that we can precisely control its layout. In this first patch, move : - gro_normal_batch used in rx (GRO stack) - offload_base used in rx and tx (GRO and TSO stacks) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08Merge tag 'rxrpc-iothread-20240305' of ↵Jakub Kicinski1-91/+107
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== Here are some changes to AF_RXRPC: (1) Cache the transmission serial number of ACK and DATA packets in the rxrpc_txbuf struct and log this in the retransmit tracepoint. (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire header flags there too to avoid duplication. (3) Cache the wire checksum in rxrpc_txbuf to make it easier to create jumbo packets in future (which will require altering the wire header to a jumbo header and restoring it back again for retransmission). (4) Fix the protocol names in the wire ACK trailer struct. (5) Strip all the barriers and atomics out of the call timer tracking[*]. (6) Remove atomic handling from call->tx_transmitted and call->acks_prev_seq[*]. (7) Don't bother resetting the DF flag after UDP packet transmission. To change it, we now call directly into UDP code, so it's quick just to set it every time. (8) Merge together the DF/non-DF branches of the DATA transmission to reduce duplication in the code. (9) Add a kvec array into rxrpc_txbuf and start moving things over to it. This paves the way for using page frags. (10) Split (sub)packet preparation and timestamping out of the DATA transmission function. This helps pave the way for future jumbo packet generation. (11) In rxkad, don't pick values out of the wire header stored in rxrpc_txbuf, buf rather find them elsewhere so we can remove the wire header from there. (12) Move rxrpc_send_ACK() to output.c so that it can be merged with rxrpc_send_ack_packet(). (13) Use rxrpc_txbuf::kvec[0] to access the wire header for the packet rather than directly accessing the copy in rxrpc_txbuf. This will allow that to be removed to a page frag. (14) Switch from keeping the transmission buffers in rxrpc_txbuf allocated in the slab to allocating them using page fragment allocators. There are separate allocators for DATA packets (which persist for a while) and control packets (which are discarded immediately). We can then turn on MSG_SPLICE_PAGES when transmitting DATA and ACK packets. We can also get rid of the RCU cleanup on rxrpc_txbufs, preferring instead to release the page frags as soon as possible. (15) Parse received packets before handling timeouts as the former may reset the latter. (16) Make sure we don't retransmit DATA packets after all the packets have been ACK'd. (17) Differentiate traces for PING ACK transmission. (18) Switch to keeping timeouts as ktime_t rather than a number of jiffies as the latter is too coarse a granularity. Only set the call timer at the end of the call event function from the aggregate of all the timeouts, thereby reducing the number of timer calls made. In future, it might be possible to reduce the number of timers from one per call to one per I/O thread and to use a high-precision timer. (19) Record RTT probes after successful transmission rather than recording it before and then cancelling it after if unsuccessful[*]. This allows a number of calls to get the current time to be removed. (20) Clean up the resend algorithm as there's now no need to walk the transmission buffer under lock[*]. DATA packets can be retransmitted as soon as they're found rather than being queued up and transmitted when the locked is dropped. (21) When initially parsing a received ACK packet, extract some of the fields from the ack info to the skbuff private data. This makes it easier to do path MTU discovery in the future when the call to which a PING RESPONSE ACK refers has been deallocated. [*] Possible with the move of almost all code from softirq context to the I/O thread. Link: https://lore.kernel.org/r/20240301163807.385573-1-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20240304084322.705539-1-dhowells@redhat.com/ # v2 * tag 'rxrpc-iothread-20240305' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (21 commits) rxrpc: Extract useful fields from a received ACK to skb priv data rxrpc: Clean up the resend algorithm rxrpc: Record probes after transmission and reduce number of time-gets rxrpc: Use ktimes for call timeout tracking and set the timer lazily rxrpc: Differentiate PING ACK transmission traces. rxrpc: Don't permit resending after all Tx packets acked rxrpc: Parse received packets before dealing with timeouts rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet() rxrpc: Don't pick values out of the wire header when setting up security rxrpc: Split up the DATA packet transmission function rxrpc: Add a kvec[] to the rxrpc_txbuf struct rxrpc: Merge together DF/non-DF branches of data Tx function rxrpc: Do lazy DF flag resetting rxrpc: Remove atomic handling on some fields only used in I/O thread rxrpc: Strip barriers and atomics off of timer tracking rxrpc: Fix the names of the fields in the ACK trailer struct rxrpc: Note cksum in txbuf rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics ... ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08Merge tag 'mm-hotfixes-stable-2024-03-07-16-17' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "6 hotfixes. 4 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-03-07-16-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: scripts/gdb/symbols: fix invalid escape sequence warning mailmap: fix Kishon's email init/Kconfig: lower GCC version check for -Warray-bounds mm, mmap: fix vma_merge() case 7 with vma_ops->close mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
2024-03-08bpf: Plumb get_unmapped_area() callback into bpf_map_opsAlexei Starovoitov1-0/+3
Subsequent patches introduce bpf_arena that imposes special alignment requirements on address selection. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20240307031228.42896-4-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski12-74/+79
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07Merge tag 'net-6.8-rc8' of ↵Linus Torvalds5-34/+37
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, ipsec and netfilter. No solution yet for the stmmac issue mentioned in the last PR, but it proved to be a lockdep false positive, not a blocker. Current release - regressions: - dpll: move all dpll<>netdev helpers to dpll code, fix build regression with old compilers Current release - new code bugs: - page_pool: fix netlink dump stop/resume Previous releases - regressions: - bpf: fix verifier to check bpf_func_state->callback_depth when pruning states as otherwise unsafe programs could get accepted - ipv6: avoid possible UAF in ip6_route_mpath_notify() - ice: reconfig host after changing MSI-X on VF - mlx5: - e-switch, change flow rule destination checking - add a memory barrier to prevent a possible null-ptr-deref - switch to using _bh variant of of spinlock where needed Previous releases - always broken: - netfilter: nf_conntrack_h323: add protection for bmp length out of range - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP program in CPU map which led to random xdp_md fields - xfrm: fix UDP encapsulation in TX packet offload - netrom: fix data-races around sysctls - ice: - fix potential NULL pointer dereference in ice_bridge_setlink() - fix uninitialized dplls mutex usage - igc: avoid returning frame twice in XDP_REDIRECT - i40e: disable NAPI right after disabling irqs when handling xsk_pool - geneve: make sure to pull inner header in geneve_rx() - sparx5: fix use after free inside sparx5_del_mact_entry - dsa: microchip: fix register write order in ksz8_ind_write8() Misc: - selftests: mptcp: fixes for diag.sh" * tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) net: pds_core: Fix possible double free in error handling path netrom: Fix data-races around sysctl_net_busy_read netrom: Fix a data-race around sysctl_netrom_link_fails_count netrom: Fix a data-race around sysctl_netrom_routing_control netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size netrom: Fix a data-race around sysctl_netrom_transport_busy_delay netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries netrom: Fix a data-race around sysctl_netrom_transport_timeout netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser netrom: Fix a data-race around sysctl_netrom_default_path_quality netfilter: nf_conntrack_h323: Add protection for bmp length out of range netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout netfilter: nft_ct: fix l3num expectations with inet pseudo family netfilter: nf_tables: reject constant set with timeout netfilter: nf_tables: disallow anonymous set with timeout flag net/rds: fix WARNING in rds_conn_connect_if_down net: dsa: microchip: fix register write order in ksz8_ind_write8() ...
2024-03-07Bluetooth: ISO: Align broadcast sync_timeout with connection timeoutLuiz Augusto von Dentz1-0/+2
This aligns broadcast sync_timeout with existing connection timeouts which are 20 seconds long. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-03-07Merge tag 'nvme-6.9-2024-03-07' of git://git.infradead.org/nvme into ↵Jens Axboe2-3/+5
for-6.9/block Pull NVMe updates from Keith: "nvme updates for Linux 6.9 - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith)" * tag 'nvme-6.9-2024-03-07' of git://git.infradead.org/nvme: (31 commits) nvme: clear caller pointer on identify failure nvme: host: fix double-free of struct nvme_id_ns in ns_update_nuse() nvme: fcloop: make fcloop_class constant nvme: fabrics: make nvmf_class constant nvme: core: constify struct class usage nvme-fabrics: typo in nvmf_parse_key() nvme-multipath: use atomic queue limits API for stacking limits nvme-multipath: pass queue_limits to blk_alloc_disk nvme: use the atomic queue limits update API nvme: cleanup nvme_configure_metadata nvme: don't query identify data in configure_metadata nvme: split out a nvme_identify_ns_nvm helper nvme: move common logic into nvme_update_ns_info nvme: move setting the write cache flags out of nvme_set_queue_limits nvme: move a few things out of nvme_update_disk_info nvme: don't use nvme_update_disk_info for the multipath disk nvme: move blk_integrity_unregister into nvme_init_integrity nvme: cleanup the nvme_init_integrity calling conventions nvme: move max_integrity_segments handling out of nvme_init_integrity nvme: remove nvme_revalidate_zones ...
2024-03-07tcp: add tracing of skbaddr in tcp_event_skb classJason Xing1-1/+2
Use the existing parameter and print the address of skbaddr as other trace functions do. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07tcp: add tracing of skb/skaddr in tcp_event_sk_skb classJason Xing1-1/+2
Printing the addresses can help us identify the exact skb/sk for those system in which it's not that easy to run BPF program. As we can see, it already fetches those, then use it directly and it will print like below: ...tcp_retransmit_skb: skbaddr=XXX skaddr=XXX family=AF_INET... Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07net: phylink: clean the pcs_get_state documentationMaxime Chevallier1-3/+0
commit 4d72c3bb60dd ("net: phylink: strip out pre-March 2020 legacy code") dropped the mac_pcs_get_state ops in phylink_mac_ops in favor of dedicated PCS operation pcs_get_state. However, the documentation for the pcs_get_state ops was incorrectly converted and now self-references. Drop the extra comment. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07Merge tag 'for-next-6.9' of ↵Christian Brauner2-8/+69
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc Merge case-insensitive updates from Gabriel Krisman Bertazi: - Patch case-insensitive lookup by trying the case-exact comparison first, before falling back to costly utf8 casefolded comparison. - Fix to forbid using a case-insensitive directory as part of an overlayfs mount. - Patchset to ensure d_op are set at d_alloc time for fscrypt and casefold volumes, ensuring filesystem dentries will all have the correct ops, whether they come from a lookup or not. * tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-07net/mlx5: Enable SD featureTariq Toukan1-0/+1
Have an actual mlx5_sd instance in the core device, and fix the getter accordingly. This allows SD stuff to flow, the feature becomes supported only here. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07net/mlx5: Add MPIR bit in mcam_access_regTariq Toukan1-1/+3
Add a cap bit in mcam_access_reg to check for MPIR support. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>