summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-08-11Bluetooth: Consolidate code around sk_alloc into a helper functionLuiz Augusto von Dentz1-0/+2
This consolidates code around sk_alloc into bt_sock_alloc which does take care of common initialization. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-11Bluetooth: ISO: do not emit new LE Create CIS if previous is pendingPauli Virtanen2-2/+4
LE Create CIS command shall not be sent before all CIS Established events from its previous invocation have been processed. Currently it is sent via hci_sync but that only waits for the first event, but there can be multiple. Make it wait for all events, and simplify the CIS creation as follows: Add new flag HCI_CONN_CREATE_CIS, which is set if Create CIS has been sent for the connection but it is not yet completed. Make BT_CONNECT state to mean the connection wants Create CIS. On events after which new Create CIS may need to be sent, send it if possible and some connections need it. These events are: hci_connect_cis, iso_connect_cfm, hci_cs_le_create_cis, hci_le_cis_estabilished_evt. The Create CIS status/completion events shall queue new Create CIS only if at least one of the connections transitions away from BT_CONNECT, so that we don't loop if controller is sending bogus events. This fixes sending multiple CIS Create for the same CIS in the "ISO AC 6(i) - Success" BlueZ test case: < HCI Command: LE Create Co.. (0x08|0x0064) plen 9 #129 [hci0] Number of CIS: 2 CIS Handle: 257 ACL Handle: 42 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #130 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 #131 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 257 ... < HCI Command: LE Setup Is.. (0x08|0x006e) plen 13 #132 [hci0] ... > HCI Event: Command Complete (0x0e) plen 6 #133 [hci0] LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 ... < HCI Command: LE Create Co.. (0x08|0x0064) plen 5 #134 [hci0] Number of CIS: 1 CIS Handle: 258 ACL Handle: 42 > HCI Event: Command Status (0x0f) plen 4 #135 [hci0] LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1 Status: ACL Connection Already Exists (0x0b) > HCI Event: LE Meta Event (0x3e) plen 29 #136 [hci0] LE Connected Isochronous Stream Established (0x19) Status: Success (0x00) Connection Handle: 258 ... Fixes: c09b80be6ffc ("Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHED") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-11Bluetooth: ISO: Add support for connecting multiple BISesIulia Tanasescu1-0/+30
It is required for some configurations to have multiple BISes as part of the same BIG. Similar to the flow implemented for unicast, DEFER_SETUP will also be used to bind multiple BISes for the same BIG, before starting Periodic Advertising and creating the BIG. The user will have to open a new socket for each BIS. By setting the BT_DEFER_SETUP socket option and calling connect, a new connection will be added for the BIG and advertising handle set by the socket QoS parameters. Since all BISes will be bound for the same BIG and advertising handle, the socket QoS options and base parameters should match for all connections. By calling connect on a socket that does not have the BT_DEFER_SETUP option set, periodic advertising will be started and the BIG will be created, with a BIS for each previously bound connection. Since a BIG cannot be reconfigured with additional BISes after creation, no more connections can be bound for the BIG after the start periodic advertising and create BIG commands have been queued. The bis_cleanup function has also been updated, so that the advertising set and the BIG will not be terminated unless there are no more bound or connected BISes. The HCI_CONN_BIG_CREATED connection flag has been added to indicate that the BIG has been successfully created. This flag is checked at bis_cleanup, so that the BIG is only terminated if the HCI_LE_Create_BIG_Complete has been received. This implementation has been tested on hardware, using the "isotest" tool with an additional command line option, to specify the number of BISes to create as part of the desired BIG: tools/isotest -i hci0 -s 00:00:00:00:00:00 -N 2 -G 1 -T 1 The btmon log shows that a BIG containing 2 BISes has been created: < HCI Command: LE Create Broadcast Isochronous Group (0x08|0x0068) plen 31 Handle: 0x01 Advertising Handle: 0x01 Number of BIS: 2 SDU Interval: 10000 us (0x002710) Maximum SDU size: 40 Maximum Latency: 10 ms (0x000a) RTN: 0x02 PHY: LE 2M (0x02) Packing: Sequential (0x00) Framing: Unframed (0x00) Encryption: 0x00 Broadcast Code: 00000000000000000000000000000000 > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 23 LE Broadcast Isochronous Group Complete (0x1b) Status: Success (0x00) Handle: 0x01 BIG Synchronization Delay: 1974 us (0x0007b6) Transport Latency: 1974 us (0x0007b6) PHY: LE 2M (0x02) NSE: 3 BN: 1 PTO: 1 IRC: 3 Maximum PDU: 40 ISO Interval: 10.00 msec (0x0008) Connection Handle #0: 10 Connection Handle #1: 11 < HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13 Handle: 10 Data Path Direction: Input (Host to Controller) (0x00) Data Path: HCI (0x00) Coding Format: Transparent (0x03) Company Codec ID: Ericsson Technology Licensing (0) Vendor Codec ID: 0 Controller Delay: 0 us (0x000000) Codec Configuration Length: 0 Codec Configuration: > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 Status: Success (0x00) Handle: 10 < HCI Command: LE Setup Isochronous Data Path (0x08|0x006e) plen 13 Handle: 11 Data Path Direction: Input (Host to Controller) (0x00) Data Path: HCI (0x00) Coding Format: Transparent (0x03) Company Codec ID: Ericsson Technology Licensing (0) Vendor Codec ID: 0 Controller Delay: 0 us (0x000000) Codec Configuration Length: 0 Codec Configuration: > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1 Status: Success (0x00) Handle: 11 < ISO Data TX: Handle 10 flags 0x02 dlen 44 < ISO Data TX: Handle 11 flags 0x02 dlen 44 > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 10 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Num handles: 1 Handle: 11 Count: 1 Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-11Bluetooth: Check for ISO support in controllerClaudia Draghicescu3-0/+4
This patch checks for ISO_BROADCASTER and ISO_SYNC_RECEIVER in controller. Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-08-11net: mana: Add gdma stats to ethtool output for manaShradha Gupta1-0/+87
Extended performance counter stats in 'ethtool -S <interface>' for MANA VF to include GDMA tx LSO packets and bytes count. Tested-on: Ubuntu22 Testcases: 1. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic 2. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV 3. Validated the GDMA stat packets and byte counters Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-11sctp: Remove unused declaration sctp_backlog_migrate()Yue Haibing1-2/+0
Commit 61c9fed41638 ("[SCTP]: A better solution to fix the race between sctp_peeloff() and sctp_rcv().") removed the implementation but left declaration in place. Remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230809142323.9428-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
Merge net again, after pulling in x86/bugs fixes to clang linking errors. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11net: caif: Remove unused declaration cfsrvl_ctrlcmd()Yue Haibing1-3/+0
Commit 43e369210108 ("caif: Move refcount from service layer to sock and dev.") declared but never implemented this. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809134943.37844-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11Merge branch 'x86/bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipJakub Kicinski1-0/+2
Cross merge x86 fixes to fix clang linking errors: ld.lld: error: ./arch/x86/kernel/vmlinux.lds:221: at least one side of the expression must be absolute These will hopefully be downstream by the time we ship the next batch of fixes. * 'x86/bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move gds_ucode_mitigated() declaration to header x86/speculation: Add cpu_show_gds() prototype driver core: cpu: Make cpu_show_not_affected() static x86/srso: Fix build breakage with the LLVM linker Documentation/srso: Document IBPB aspect and fix formatting driver core: cpu: Unify redundant silly stubs Documentation/hw-vuln: Unify filename specification in index Link: https://lore.kernel.org/all/CAHk-=wj_b+FGTnevQSBAtCWuhCk=0oQ_THvthBW2hzqpOTLFmg@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11net: phy: phy_device: Call into the PHY driver to set LED offloadAndrew Lunn1-0/+33
Linux LEDs can be requested to perform hardware accelerated blinking to indicate link, RX, TX etc. Pass the rules for blinking to the PHY driver, if it implements the ops needed to determine if a given pattern can be offloaded, to offload it, and what the current offload is. Additionally implement the op needed to get what device the LED is for. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/20230808210436.838995-3-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11net: stmmac: add new mode parameter for fix_mac_speedShenwei Wang1-1/+1
A mode parameter has been added to the callback function of fix_mac_speed to indicate the physical layer type. The mode can be one the following: MLO_AN_PHY - Conventional PHY MLO_AN_FIXED - Fixed-link mode MLO_AN_INBAND - In-band protocol Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Link: https://lore.kernel.org/r/20230807160716.259072-2-shenwei.wang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11Merge tag 'for-netdev' of ↵Jakub Kicinski4-8/+15
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2023-08-09 We've added 19 non-merge commits during the last 6 day(s) which contain a total of 25 files changed, 369 insertions(+), 141 deletions(-). The main changes are: 1) Fix array-index-out-of-bounds access when detaching from an already empty mprog entry from Daniel Borkmann. 2) Adjust bpf selftest because of a recent llvm change related to the cpu-v4 ISA from Eduard Zingerman. 3) Add uprobe support for the bpf_get_func_ip helper from Jiri Olsa. 4) Fix a KASAN splat due to the kernel incorrectly accepted an invalid program using the recent cpu-v4 instruction from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: bpf: btf: Remove two unused function declarations bpf: lru: Remove unused declaration bpf_lru_promote() selftests/bpf: relax expected log messages to allow emitting BPF_ST selftests/bpf: remove duplicated functions bpf, docs: Fix small typo and define semantics of sign extension selftests/bpf: Add bpf_get_func_ip test for uprobe inside function selftests/bpf: Add bpf_get_func_ip tests for uprobe on function entry bpf: Add support for bpf_get_func_ip helper for uprobe program selftests/bpf: Add a movsx selftest for sign-extension of R10 bpf: Fix an incorrect verification success with movsx insn bpf, docs: Formalize type notation and function semantics in ISA standard bpf: change bpf_alu_sign_string and bpf_movsx_string to static libbpf: Use local includes inside the library bpf: fix bpf_dynptr_slice() to stop return an ERR_PTR. bpf: fix inconsistent return types of bpf_xdp_copy_buf(). selftests/bpf: fix the incorrect verification of port numbers. selftests/bpf: Add test for detachment on empty mprog entry bpf: Fix mprog detachment for empty mprog entry bpf: bpf_struct_ops: Remove unnecessary initial values of variables ==================== Link: https://lore.kernel.org/r/20230810055123.109578-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski12-85/+79
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10Merge tag 'net-6.5-rc6' of ↵Linus Torvalds4-76/+53
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and bpf. Still trending up in size but the good news is that the "current" regressions are resolved, AFAIK. We're getting weirdly many fixes for Wake-on-LAN and suspend/resume handling on embedded this week (most not merged yet), not sure why. But those are all for older bugs. Current release - regressions: - tls: set MSG_SPLICE_PAGES consistently when handing encrypted data over to TCP Current release - new code bugs: - eth: mlx5: correct IDs on VFs internal to the device (IPU) Previous releases - regressions: - phy: at803x: fix WoL support / reporting on AR8032 - bonding: fix incorrect deletion of ETH_P_8021AD protocol VID from slaves, leading to BUG_ON() - tun: prevent tun_build_skb() from exceeding the packet size limit - wifi: rtw89: fix 8852AE disconnection caused by RX full flags - eth/PCI: enetc: fix probing after 6fffbc7ae137 ("PCI: Honor firmware's device disabled status"), keep PCI devices around even if they are disabled / not going to be probed to be able to apply quirks on them - eth: prestera: fix handling IPv4 routes with nexthop IDs Previous releases - always broken: - netfilter: re-work garbage collection to avoid races between user-facing API and timeouts - tunnels: fix generating ipv4 PMTU error on non-linear skbs - nexthop: fix infinite nexthop bucket dump when using maximum nexthop ID - wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() Misc: - unix: use consistent error code in SO_PEERPIDFD - ipv6: adjust ndisc_is_useropt() to include PREFIX_INFO, in prep for upcoming IETF RFC" * tag 'net-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) net: hns3: fix strscpy causing content truncation issue net: tls: set MSG_SPLICE_PAGES consistently ibmvnic: Ensure login failure recovery is safe from other resets ibmvnic: Do partial reset on login failure ibmvnic: Handle DMA unmapping of login buffs in release functions ibmvnic: Unmap DMA login rsp buffer on send login fail ibmvnic: Enforce stronger sanity checks on login response net: mana: Fix MANA VF unload when hardware is unresponsive netfilter: nf_tables: remove busy mark and gc batch API netfilter: nft_set_hash: mark set element as dead when deleting from packet path netfilter: nf_tables: adapt set backend to use GC transaction API netfilter: nf_tables: GC transaction API to avoid race with control plane selftests/bpf: Add sockmap test for redirecting partial skb data selftests/bpf: fix a CI failure caused by vsock sockmap test bpf, sockmap: Fix bug that strp_done cannot be called bpf, sockmap: Fix map type error in sock_map_del_link xsk: fix refcount underflow in error path ipv6: adjust ndisc_is_useropt() to also return true for PIO selftests: forwarding: bridge_mdb: Make test more robust selftests: forwarding: bridge_mdb_max: Fix failing test with old libnet ...
2023-08-10Merge tag 'nf-23-08-10' of ↵Jakub Kicinski1-75/+45
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The existing attempt to resolve races between control plane and GC work is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places forgot to call nft_set_elem_mark_busy(), leading to double-deactivation of elements. This series contains the following patches: 1) Do not skip expired elements during walk otherwise elements might never decrement the reference counter on data, leading to memleak. 2) Add a GC transaction API to replace the former attempt to deal with races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT on elements and it creates a GC transaction to remove the expired elements, GC transaction could abort in case of interference with control plane and retried later (GC async). Set backends such as rbtree and pipapo also perform GC from control plane (GC sync), in such case, element deactivation and removal is safe because mutex is held then collected elements are released via call_rcu(). 3) Adapt existing set backends to use the GC transaction API. 4) Update rhash set backend to set on _DEAD bit to report deleted elements from datapath for GC. 5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT. * tag 'nf-23-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: remove busy mark and gc batch API netfilter: nft_set_hash: mark set element as dead when deleting from packet path netfilter: nf_tables: adapt set backend to use GC transaction API netfilter: nf_tables: GC transaction API to avoid race with control plane netfilter: nf_tables: don't skip expired elements during walk ==================== Link: https://lore.kernel.org/r/20230810070830.24064-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10Merge tag 'for-netdev' of ↵Jakub Kicinski1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Martin KaFai Lau says: ==================== pull-request: bpf 2023-08-09 We've added 5 non-merge commits during the last 7 day(s) which contain a total of 6 files changed, 102 insertions(+), 8 deletions(-). The main changes are: 1) A bpf sockmap memleak fix and a fix in accessing the programs of a sockmap under the incorrect map type from Xu Kuohai. 2) A refcount underflow fix in xsk from Magnus Karlsson. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add sockmap test for redirecting partial skb data selftests/bpf: fix a CI failure caused by vsock sockmap test bpf, sockmap: Fix bug that strp_done cannot be called bpf, sockmap: Fix map type error in sock_map_del_link xsk: fix refcount underflow in error path ==================== Link: https://lore.kernel.org/r/20230810055303.120917-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10x86/speculation: Add cpu_show_gds() prototypeArnd Bergmann1-0/+2
The newly added function has two definitions but no prototypes: drivers/base/cpu.c:605:16: error: no previous prototype for 'cpu_show_gds' [-Werror=missing-prototypes] Add a declaration next to the other ones for this file to avoid the warning. Fixes: 8974eb588283b ("x86/speculation: Add Gather Data Sampling mitigation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: stable@kernel.org Link: https://lore.kernel.org/all/20230809130530.1913368-1-arnd%40kernel.org
2023-08-10netfilter: nf_tables: remove busy mark and gc batch APIPablo Neira Ayuso1-95/+3
Ditch it, it has been replace it by the GC transaction API and it has no clients anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10netfilter: nf_tables: GC transaction API to avoid race with control planePablo Neira Ayuso1-1/+63
The set types rhashtable and rbtree use a GC worker to reclaim memory. From system work queue, in periodic intervals, a scan of the table is done. The major caveat here is that the nft transaction mutex is not held. This causes a race between control plane and GC when they attempt to delete the same element. We cannot grab the netlink mutex from the work queue, because the control plane has to wait for the GC work queue in case the set is to be removed, so we get following deadlock: cpu 1 cpu2 GC work transaction comes in , lock nft mutex `acquire nft mutex // BLOCKS transaction asks to remove the set set destruction calls cancel_work_sync() cancel_work_sync will now block forever, because it is waiting for the mutex the caller already owns. This patch adds a new API that deals with garbage collection in two steps: 1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT so they are not visible via lookup. Annotate current GC sequence in the GC transaction. Enqueue GC transaction work as soon as it is full. If ruleset is updated, then GC transaction is aborted and retried later. 2) GC work grabs the mutex. If GC sequence has changed then this GC transaction lost race with control plane, abort it as it contains stale references to objects and let GC try again later. If the ruleset is intact, then this GC transaction deactivates and removes the elements and it uses call_rcu() to destroy elements. Note that no elements are removed from GC lockless path, the _DEAD bit is set and pointers are collected. GC catchall does not remove the elements anymore too. There is a new set->dead flag that is set on to abort the GC transaction to deal with set->ops->destroy() path which removes the remaining elements in the set from commit_release, where no mutex is held. To deal with GC when mutex is held, which allows safe deactivate and removal, add sync GC API which releases the set element object via call_rcu(). This is used by rbtree and pipapo backends which also perform garbage collection from control plane path. Since element removal from sets can happen from control plane and element garbage collection/timeout, it is necessary to keep the set structure alive until all elements have been deactivated and destroyed. We cannot do a cancel_work_sync or flush_work in nft_set_destroy because its called with the transaction mutex held, but the aforementioned async work queue might be blocked on the very mutex that nft_set_destroy() callchain is sitting on. This gives us the choice of ABBA deadlock or UaF. To avoid both, add set->refs refcount_t member. The GC API can then increment the set refcount and release it once the elements have been free'd. Set backends are adapted to use the GC transaction API in a follow up patch entitled: ("netfilter: nf_tables: use gc transaction API in set backends") This is joint work with Florian Westphal. Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10bpf, sockmap: Fix bug that strp_done cannot be calledXu Kuohai1-0/+1
strp_done is only called when psock->progs.stream_parser is not NULL, but stream_parser was set to NULL by sk_psock_stop_strp(), called by sk_psock_drop() earlier. So, strp_done can never be called. Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock. Change the condition for calling strp_done from judging whether stream_parser is set to judging whether this flag is set. This flag is only set once when strp_init() succeeds, and will never be cleared later. Fixes: c0d95d3380ee ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-10net: ptp: create a mock-up PTP Hardware Clock driverVladimir Oltean1-0/+38
There are several cases where virtual net devices may benefit from having a PTP clock, and these have to do with testing. I can see at least netdevsim and veth as potential users of a common mock-up PTP hardware clock driver. The proposed idea is to create an object which emulates PTP clock operations on top of the unadjustable CLOCK_MONOTONIC_RAW plus a software-controlled time domain via a timecounter/cyclecounter and then link that PHC to the netdevsim device. The driver is fully functional for its intended purpose, and it successfully passes the PTP selftests. $ cd tools/testing/selftests/ptp/ $ ./phc.sh /dev/ptp2 TEST: settime [ OK ] TEST: adjtime [ OK ] TEST: adjfreq [ OK ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230807193324.4128292-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10net/mlx5: Expose NIC temperature via hardware monitoring kernel APIAdham Faris2-2/+15
Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare <jdelvare@suse.com> Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10net: annotate data-races around sock->opsEric Dumazet1-1/+1
IPV6_ADDRFORM socket option is evil, because it can change sock->ops while other threads might read it. Same issue for sk->sk_family being set to AF_INET. Adding READ_ONCE() over sock->ops reads is needed for sockets that might be impacted by IPV6_ADDRFORM. Note that mptcp_is_tcpsk() can also overwrite sock->ops. Adding annotations for all sk->sk_family reads will require more patches :/ BUG: KCSAN: data-race in ____sys_sendmsg / do_ipv6_setsockopt write to 0xffff888109f24ca0 of 8 bytes by task 4470 on cpu 0: do_ipv6_setsockopt+0x2c5e/0x2ce0 net/ipv6/ipv6_sockglue.c:491 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 udpv6_setsockopt+0x95/0xa0 net/ipv6/udp.c:1690 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3663 __sys_setsockopt+0x1c3/0x230 net/socket.c:2273 __do_sys_setsockopt net/socket.c:2284 [inline] __se_sys_setsockopt net/socket.c:2281 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2281 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888109f24ca0 of 8 bytes by task 4469 on cpu 1: sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x349/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmmsg+0x263/0x500 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff850e32b8 -> 0xffffffff850da890 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4469 Comm: syz-executor.1 Not tainted 6.4.0-rc5-syzkaller-00313-g4c605260bc60 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230808135809.2300241-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10Merge tag 'wireless-2023-08-09' of ↵Jakub Kicinski1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few small updates: * fix an integer overflow in nl80211 * fix rtw89 8852AE disconnections * fix a buffer overflow in ath12k * fix AP_VLAN configuration lookups * fix allocation failure handling in brcm80211 * update MAINTAINERS for some drivers * tag 'wireless-2023-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: ath12k: Fix buffer overflow when scanning with extraie wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() wifi: cfg80211: fix sband iftype data lookup for AP_VLAN wifi: rtw89: fix 8852AE disconnection caused by RX full flags MAINTAINERS: Remove tree entry for rtl8180 MAINTAINERS: Update entry for rtl8187 wifi: brcm80211: handle params_v1 allocation failure ==================== Link: https://lore.kernel.org/r/20230809124818.167432-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09Merge tag 'nf-next-2023-08-08' of ↵Jakub Kicinski6-15/+0
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First 4 Patches, from Yue Haibing, remove unused prototypes in various netfilter headers. Last patch makes nfnetlink_log to always include a packet timestamp, up to now it was only included if the skb had assigned previously. From Maciej Żenczykowski. * tag 'nf-next-2023-08-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nfnetlink_log: always add a timestamp netfilter: h323: Remove unused function declarations netfilter: conntrack: Remove unused function declarations netfilter: helper: Remove unused function declarations netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush() ==================== Link: https://lore.kernel.org/r/20230808124159.19046-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09tcp: add missing family to tcp_set_ca_state() tracepointEric Dumazet1-1/+4
Before this code is copied, add the missing family, as we did in commit 3dd344ea84e1 ("net: tracepoint: exposing sk_family in all tcp:tracepoints") Fixes: 15fcdf6ae116 ("tcp: Add tracepoint for tcp_set_ca_state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ping Gan <jacky_gam_2001@163.com> Cc: Manjusaka <me@manjusaka.me> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230808084923.2239142-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set()Yue Haibing1-4/+0
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: phy: Remove two unused function declarationsYue Haibing1-4/+0
Commit 1e2dc14509fd ("net: ethtool: Add helpers for reporting test results") declared but never implemented these function. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230808144610.19096-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09bpf: btf: Remove two unused function declarationsYue Haibing1-2/+0
Commit db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record") removed the implementations but leave declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808145741.33292-1-yuehaibing@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-09Merge tag 'mlx5-updates-2023-08-07' of ↵Jakub Kicinski1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-08-07 1) Few cleanups 2) Dynamic completion EQs The driver creates completion EQs for all vectors directly on driver load, even if those EQs will not be utilized later on. To allow more flexibility in managing completion EQs and to reduce the memory overhead of driver load, this series will adjust completion EQs creation to be dynamic. Meaning, completion EQs will be created only when needed. Patch #1 introduces a counter for tracking the current number of completion EQs. Patches #2-6 refactor the existing infrastructure of managing completion EQs and completion IRQs to be compatible with per-vector allocation/release requests. Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient in case the affinity is requested but completion IRQ is not allocated yet. Patch #9 function rename. Patch #10 handles the corner case of SF performing an IRQ request when no SF IRQ pool is found, and no PF IRQ exists for the same vector. Patch #11 modify driver to use dynamically allocate completion EQs. * tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Bridge, Only handle registered netdev bridge events net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl net/mlx5: Fix typo reminder -> remainder net/mlx5: remove many unnecessary NULL values net/mlx5: Allocate completion EQs dynamically net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max() net/mlx5: Add IRQ vector to CPU lookup function net/mlx5: Introduce mlx5_cpumask_default_spread net/mlx5: Implement single completion EQ create/destroy methods net/mlx5: Use xarray to store and manage completion EQs net/mlx5: Refactor completion IRQ request/release handlers in EQ layer net/mlx5: Use xarray to store and manage completion IRQs net/mlx5: Refactor completion IRQ request/release API net/mlx5: Track the current number of completion EQs ==================== Link: https://lore.kernel.org/r/20230807175642.20834-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09docs: net: page_pool: de-duplicate the intro commentJakub Kicinski1-12/+12
In commit 82e896d992fa ("docs: net: page_pool: use kdoc to avoid duplicating the information") I shied away from using the DOC: comments when moving to kdoc for documenting page_pool API, because I wasn't sure how familiar people are with it. Turns out there is already a DOC: comment for the intro, which is the same in both places, modulo what looks like minor rewording. Use the version from Documentation/ but keep the contents with the code. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09devlink: Remove unused devlink_dpipe_table_resource_set() declarationYue Haibing1-3/+0
Commit f655dacb59ac ("net: devlink: remove unused locked functions") removed this but leave the declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807143214.46648-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: fq: Remove unused typedef fq_flow_get_default_tYue Haibing1-5/+0
Commitbf9009bf21b5 ("net/fq_impl: drop get_default_func, move default flow to fq_tin") remove its last user, so can remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807142111.33524-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09team: change the getter function in the team_option structure to voidZhengchao Shao1-1/+1
Because the getter function in the team_option structure always returns 0, so change the getter function to void and remove redundant code. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807012556.3146071-5-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09team: change the init function in the team_option structure to voidZhengchao Shao1-1/+1
Because the init function in the team_option structure always returns 0, so change the init function to void and remove redundant code. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230807012556.3146071-4-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: fs_enet: Remove linux/fs_enet_pd.hChristophe Leroy1-118/+0
linux/fs_enet_pd.h is not used anymore. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/5be102791c987792ad127b15543ee6715394cf67.1691155347.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: fs_enet: Move struct fs_platform_info into fs_enet.hChristophe Leroy1-16/+0
struct fs_platform_info is only used in fs_enet ethernet driver since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding."). Stale prototypes using fs_platform_info were left over in arch/powerpc/sysdev/fsl_soc.c but they are now removed by previous patch. Move struct fs_platform_info into fs_enet.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f882d6b0b7075d0d8435310634ceaa2cc8e9938f.1691155347.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: fs_enet: Remove has_phy field in fs_platform_info structChristophe Leroy1-1/+0
Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.") has_phy field is never set. Remove dead code and remove the field. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/bb5264e09e18f0ce8a0dbee399926a59f33cb248.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: fs_enet: Remove unused fields in fs_platform_info structChristophe Leroy1-19/+0
Since commit 3dd82a1ea724 ("[POWERPC] CPM: Always use new binding.") many fields of fs_platform_info structure are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/2e584fcd75e21a0f7e7d5f942eebdc067b2f82f9.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09net: fs_enet: Remove fs_get_id()Christophe Leroy1-11/+0
fs_get_id() hasn't been used since commit b219108cbace ("fs_enet: Remove !CONFIG_PPC_CPM_NEW_BINDING code") Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7a53b88cc40302fcbea59554f5e7067e3594ad53.1691155346.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08netfilter: h323: Remove unused function declarationsYue Haibing1-4/+0
Commit f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") declared but never implemented these. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-08netfilter: conntrack: Remove unused function declarationsYue Haibing3-7/+0
Commit 1015c3de23ee ("netfilter: conntrack: remove extension register api") leave nf_conntrack_acct_fini() and nf_conntrack_labels_init() unused, remove it. And commit a0ae2562c6c4 ("netfilter: conntrack: remove l3proto abstraction") leave behind nf_ct_l3proto_try_module_get() and nf_ct_l3proto_module_put(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-08netfilter: helper: Remove unused function declarationsYue Haibing1-3/+0
Commit b118509076b3 ("netfilter: remove nf_conntrack_helper sysctl and modparam toggles") leave these unused declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-08netfilter: gre: Remove unused function declaration nf_ct_gre_keymap_flush()Yue Haibing1-1/+0
Commit a23f89a99906 ("netfilter: conntrack: nf_ct_gre_keymap_flush() removal") leave this unused, remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-08wifi: cfg80211: fix sband iftype data lookup for AP_VLANFelix Fietkau1-0/+3
AP_VLAN interfaces are virtual, so doesn't really exist as a type for capabilities. When passed in as a type, AP is the one that's really intended. Fixes: c4cbaf7973a7 ("cfg80211: Add support for HE") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230622165919.46841-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-08-08bpf: Add support for bpf_get_func_ip helper for uprobe programJiri Olsa2-3/+13
Adding support for bpf_get_func_ip helper for uprobe program to return probed address for both uprobe and return uprobe. We discussed this in [1] and agreed that uprobe can have special use of bpf_get_func_ip helper that differs from kprobe. The kprobe bpf_get_func_ip returns: - address of the function if probe is attach on function entry for both kprobe and return kprobe - 0 if the probe is not attach on function entry The uprobe bpf_get_func_ip returns: - address of the probe for both uprobe and return uprobe The reason for this semantic change is that kernel can't really tell if the probe user space address is function entry. The uprobe program is actually kprobe type program attached as uprobe. One of the consequences of this design is that uprobes do not have its own set of helpers, but share them with kprobes. As we need different functionality for bpf_get_func_ip helper for uprobe, I'm adding the bool value to the bpf_trace_run_ctx, so the helper can detect that it's executed in uprobe context and call specific code. The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which is currently used only for executing bpf programs in uprobe. Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe to address that it's only used for uprobes and that it sets the run_ctx.is_uprobe as suggested by Yafang Shao. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Viktor Malik <vmalik@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-08Merge tag 'x86_bugs_srso' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/srso fixes from Borislav Petkov: "Add a mitigation for the speculative RAS (Return Address Stack) overflow vulnerability on AMD processors. In short, this is yet another issue where userspace poisons a microarchitectural structure which can then be used to leak privileged information through a side channel" * tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/srso: Tie SBPB bit setting to microcode patch detection x86/srso: Add a forgotten NOENDBR annotation x86/srso: Fix return thunks in generated code x86/srso: Add IBPB on VMEXIT x86/srso: Add IBPB x86/srso: Add SRSO_NO support x86/srso: Add IBPB_BRTYPE support x86/srso: Add a Speculative RAS Overflow mitigation x86/bugs: Increase the x86 bugs vector size to two u32s
2023-08-07page_pool: add a lockdep check for recycling in hardirqJakub Kicinski1-0/+7
Page pool use in hardirq is prohibited, add debug checks to catch misuses. IIRC we previously discussed using DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns that people will have DEBUG_NET enabled in perf testing. I don't think anyone enables lockdep in perf testing, so use lockdep to avoid pushback and arguing :) Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-6-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07page_pool: place frag_* fields in one cachelineAlexander Lobakin1-5/+5
On x86_64, frag_* fields of struct page_pool are scattered across two cachelines despite the summary size of 24 bytes. All three fields are used in pretty much the same places, but the last field, ::frag_users, is pushed out to the next CL, provoking unwanted false-sharing on hotpath (frags allocation code). There are some holes and cold members to move around. Move frag_* one block up, placing them right after &page_pool_params perfectly at the beginning of CL2. This doesn't do any meaningful to the second block, as those are some destroy-path cold structures, and doesn't do anything to ::alloc_stats, which still starts at 200-byte offset, 8 bytes after CL3 (still fitting into 1 cacheline). On my setup, this yields 1-2% of Mpps when using PP frags actively. When it comes to 32-bit architectures with 32-byte CL: &page_pool_params plus ::pad is 44 bytes, the block taken care of is 16 bytes within one CL, so there should be at least no regressions from the actual change. ::pages_state_hold_cnt is not related directly to that triple, but is paired currently with ::frags_offset and decoupling them would mean either two 4-byte holes or more invasive layout changes. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-4-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h>Alexander Lobakin2-4/+3
Currently, touching <net/page_pool/types.h> triggers a rebuild of more than half of the kernel. That's because it's included in <linux/skbuff.h>. And each new include to page_pool/types.h adds more [useless] data for the toolchain to process per each source file from that pile. In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling"), Matteo included it to be able to call a couple of functions defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as long as page pool owns the page") one of the calls was removed, so only one was left. It's the call to page_pool_return_skb_page() in napi_frag_unref(). The function is external and doesn't have any dependencies. Having very niche page_pool_types.h included only for that looks like an overkill. As %PP_SIGNATURE is not local to page_pool.c (was only in the early submissions), nothing holds this function there. Teleport page_pool_return_skb_page() to skbuff.c, just next to the main consumer, skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't work with skbs at all and the former name tells nothing. The #if guards here are only to not compile and have it in the vmlinux when not needed -- both call sites are already guarded. Now, touching page_pool_types.h only triggers rebuilding of the drivers using it and a couple of core networking files. Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>