summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-11Merge tag 'for-net-2024-04-10' of ↵Paolo Abeni9-99/+79
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - L2CAP: Don't double set the HCI_CONN_MGMT_CONNECTED bit - Fix memory leak in hci_req_sync_complete - hci_sync: Fix using the same interval and window for Coded PHY - Fix not validating setsockopt user input * tag 'for-net-2024-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit Bluetooth: hci_sock: Fix not validating setsockopt user input Bluetooth: ISO: Fix not validating setsockopt user input Bluetooth: L2CAP: Fix not validating setsockopt user input Bluetooth: RFCOMM: Fix not validating setsockopt user input Bluetooth: SCO: Fix not validating setsockopt user input Bluetooth: Fix memory leak in hci_req_sync_complete() Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset ==================== Link: https://lore.kernel.org/r/20240410191610.4156653-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11x86/bugs: Clarify that syscall hardening isn't a BHI mitigationJosh Poimboeuf3-11/+9
While syscall hardening helps prevent some BHI attacks, there's still other low-hanging fruit remaining. Don't classify it as a mitigation and make it clear that the system may still be vulnerable if it doesn't have a HW or SW mitigation enabled. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Fix BHI handling of RRSBAJosh Poimboeuf1-12/+18
The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been disabled by the Spectre v2 mitigation (or can otherwise be disabled by the BHI mitigation itself if needed). In that case retpolines are fine. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'Ingo Molnar3-42/+42
So we are using the 'ia32_cap' value in a number of places, which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register. But there's very little 'IA32' about it - this isn't 32-bit only code, nor does it originate from there, it's just a historic quirk that many Intel MSR names are prefixed with IA32_. This is already clear from the helper method around the MSR: x86_read_arch_cap_msr(), which doesn't have the IA32 prefix. So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with its role and with the naming of the helper function. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIESJosh Poimboeuf1-15/+7
There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and over. It's even read in the BHI sysfs function which is a big no-no. Just read it once and cache it. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
2024-04-11x86/bugs: Fix BHI documentationJosh Poimboeuf2-12/+15
Fix up some inaccuracies in the BHI documentation. Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
2024-04-11af_unix: Fix garbage collector racing against connect()Michal Luczaj1-1/+17
Garbage collector does not take into account the risk of embryo getting enqueued during the garbage collection. If such embryo has a peer that carries SCM_RIGHTS, two consecutive passes of scan_children() may see a different set of children. Leading to an incorrectly elevated inflight count, and then a dangling pointer within the gc_inflight_list. sockets are AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight socket bound to addr, not in fdtable V's fd will be passed via sendmsg(), gets inflight count bumped connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc() ---------------- ------------------------- ----------- NS = unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr) unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0 NS = unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V]) // V became in-flight // V count=2 inflight=1 close(V) // V count=1 inflight=1 // GC candidate condition met for u in gc_inflight_list: if (total_refs == inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not // reachable from L yet, so V's // inflight remains unchanged __skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if (u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1 inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock its state. This makes GC wait until the end of any ongoing connect() to that socket. After flipping the lock, a possibly SCM-laden embryo is already enqueued. And if there is another embryo coming, it can not possibly carry SCM_RIGHTS. At this point, unix_inflight() can not happen because unix_gc_lock is already taken. Inflight graph remains unaffected. Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.") Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11net: dsa: mt7530: trap link-local frames regardless of ST Port StateArınç ÜNAL2-34/+200
In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer (DLL) of the Open Systems Interconnection basic reference model (OSI/RM) are described; the medium access control (MAC) and logical link control (LLC) sublayers. The MAC sublayer is the one facing the physical layer. In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A Bridge component comprises a MAC Relay Entity for interconnecting the Ports of the Bridge, at least two Ports, and higher layer entities with at least a Spanning Tree Protocol Entity included. Each Bridge Port also functions as an end station and shall provide the MAC Service to an LLC Entity. Each instance of the MAC Service is provided to a distinct LLC Entity that supports protocol identification, multiplexing, and demultiplexing, for protocol data unit (PDU) transmission and reception by one or more higher layer entities. It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC Entity associated with each Bridge Port is modeled as being directly connected to the attached Local Area Network (LAN). On the switch with CPU port architecture, CPU port functions as Management Port, and the Management Port functionality is provided by software which functions as an end station. Software is connected to an IEEE 802 LAN that is wholly contained within the system that incorporates the Bridge. Software provides access to the LLC Entity associated with each Bridge Port by the value of the source port field on the special tag on the frame received by software. We call frames that carry control information to determine the active topology and current extent of each Virtual Local Area Network (VLAN), i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN Registration Protocol Data Units (MVRPDUs), and frames from other link constrained protocols, such as Extensible Authentication Protocol over LAN (EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They are not forwarded by a Bridge. Permanently configured entries in the filtering database (FDB) ensure that such frames are discarded by the Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in detail: Each of the reserved MAC addresses specified in Table 8-1 (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be permanently configured in the FDB in C-VLAN components and ERs. Each of the reserved MAC addresses specified in Table 8-2 (01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently configured in the FDB in S-VLAN components. Each of the reserved MAC addresses specified in Table 8-3 (01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB in TPMR components. The FDB entries for reserved MAC addresses shall specify filtering for all Bridge Ports and all VIDs. Management shall not provide the capability to modify or remove entries for reserved MAC addresses. The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of propagation of PDUs within a Bridged Network, as follows: The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN) component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward. PDUs transmitted using this destination address, or any other addresses that appear in Table 8-1, Table 8-2, and Table 8-3 (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can therefore travel no further than those stations that can be reached via a single individual LAN from the originating station. The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an address that no conformant S-VLAN component, C-VLAN component, or MAC Bridge can forward; however, this address is relayed by a TPMR component. PDUs using this destination address, or any of the other addresses that appear in both Table 8-1 and Table 8-2 but not in Table 8-3 (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed by any TPMRs but will propagate no further than the nearest S-VLAN component, C-VLAN component, or MAC Bridge. The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an address that no conformant C-VLAN component, MAC Bridge can forward; however, it is relayed by TPMR components and S-VLAN components. PDUs using this destination address, or any of the other addresses that appear in Table 8-1 but not in either Table 8-2 or Table 8-3 (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and S-VLAN components but will propagate no further than the nearest C-VLAN component or MAC Bridge. Because the LLC Entity associated with each Bridge Port is provided via CPU port, we must not filter these frames but forward them to CPU port. In a Bridge, the transmission Port is majorly decided by ingress and egress rules, FDB, and spanning tree Port State functions of the Forwarding Process. For link-local frames, only CPU port should be designated as destination port in the FDB, and the other functions of the Forwarding Process must not interfere with the decision of the transmission Port. We call this process trapping frames to CPU port. Therefore, on the switch with CPU port architecture, link-local frames must be trapped to CPU port, and certain link-local frames received by a Port of a Bridge comprising a TPMR component or an S-VLAN component must be excluded from it. A Bridge of the switch with CPU port architecture cannot comprise a Two-Port MAC Relay (TPMR) component as a TPMR component supports only a subset of the functionality of a MAC Bridge. A Bridge comprising two Ports (Management Port doesn't count) of this architecture will either function as a standard MAC Bridge or a standard VLAN Bridge. Therefore, a Bridge of this architecture can only comprise S-VLAN components, C-VLAN components, or MAC Bridge components. Since there's no TPMR component, we don't need to relay PDUs using the destination addresses specified on the Nearest non-TPMR section, and the proportion of the Nearest Customer Bridge section where they must be relayed by TPMR components. One option to trap link-local frames to CPU port is to add static FDB entries with CPU port designated as destination port. However, because that Independent VLAN Learning (IVL) is being used on every VID, each entry only applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC Bridge component or a C-VLAN component, there would have to be 16 times 4096 entries. This switch intellectual property can only hold a maximum of 2048 entries. Using this option, there also isn't a mechanism to prevent link-local frames from being discarded when the spanning tree Port State of the reception Port is discarding. The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4 registers. Whilst this applies to every VID, it doesn't contain all of the reserved MAC addresses without affecting the remaining Standard Group MAC Addresses. The REV_UN frame tag utilised using the RGAC4 register covers the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF destination addresses which may be relayed by MAC Bridges or VLAN Bridges. The latter option provides better but not complete conformance. This switch intellectual property also does not provide a mechanism to trap link-local frames with specific destination addresses to CPU port by Bridge, to conform to the filtering rules for the distinct Bridge components. Therefore, regardless of the type of the Bridge component, link-local frames with these destination addresses will be trapped to CPU port: 01-80-C2-00-00-[00,01,02,03,0E] In a Bridge comprising a MAC Bridge component or a C-VLAN component: Link-local frames with these destination addresses won't be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022: 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] In a Bridge comprising an S-VLAN component: Link-local frames with these destination addresses will be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022: 01-80-C2-00-00-00 Link-local frames with these destination addresses won't be trapped to CPU port which won't conform to IEEE Std 802.1Q-2022: 01-80-C2-00-00-[04,05,06,07,08,09,0A] Currently on this switch intellectual property, if the spanning tree Port State of the reception Port is discarding, link-local frames will be discarded. To trap link-local frames regardless of the spanning tree Port State, make the switch regard them as Bridge Protocol Data Units (BPDUs). This switch intellectual property only lets the frames regarded as BPDUs bypass the spanning tree Port State function of the Forwarding Process. With this change, the only remaining interference is the ingress rules. When the reception Port has no PVID assigned on software, VLAN-untagged frames won't be allowed in. There doesn't seem to be a mechanism on the switch intellectual property to have link-local frames bypass this function of the Forwarding Process. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp-discarding-v2-1-07b1150164ac@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11Revert "s390/ism: fix receive message buffer allocation"Gerd Bayer1-29/+9
This reverts commit 58effa3476536215530c9ec4910ffc981613b413. Review was not finished on this patch. So it's not ready for upstreaming. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Link: https://lore.kernel.org/r/20240409113753.2181368-1-gbayer@linux.ibm.com Fixes: 58effa347653 ("s390/ism: fix receive message buffer allocation") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11net: sparx5: fix wrong config being used when reconfiguring PCSDaniel Machon1-2/+2
The wrong port config is being used if the PCS is reconfigured. Fix this by correctly using the new config instead of the old one. Fixes: 946e7fd5053a ("net: sparx5: add port module support") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409-link-mode-reconfiguration-fix-v2-1-db6a507f3627@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11Merge tag 'amd-drm-fixes-6.9-2024-04-10' of ↵Dave Airlie32-127/+652
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.9-2024-04-10: amdgpu: - GPU reset fixes - Fix some confusing logging - UMSCH fix - Aborted suspend fix - DCN 3.5 fixes - S4 fix - MES logging fixes - SMU 14 fixes - SDMA 4.4.2 fix - KASAN fix - SMU 13.0.10 fix - VCN partition fix - GFX11 fixes - DWB fixes - Plane handling fix - FAMS fix - DCN 3.1.6 fix - VSC SDP fixes - OLED panel fix - GFX 11.5 fix amdkfd: - GPU reset fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240411013425.6431-1-alexander.deucher@amd.com
2024-04-11Merge tag 'drm-intel-fixes-2024-04-10' of ↵Dave Airlie9-27/+79
https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes Display fixes: - Couple CDCLK programming fixes (Ville) - HDCP related fix (Suraj) - 4 Bigjoiner related fixes (Ville) Core fix: - Fix for a circular locking around GuC on reset+wedged case (John) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZhcJxlzc6zLMC1c-@intel.com
2024-04-11net/mlx5: fix possible stack overflowsArnd Bergmann1-41/+41
A couple of debug functions use a 512 byte temporary buffer and call another function that has another buffer of the same size, which in turn exceeds the usual warning limit for excessive stack usage: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:1073:1: error: stack frame size (1448) exceeds limit (1024) in 'dr_dump_start' [-Werror,-Wframe-larger-than] dr_dump_start(struct seq_file *file, loff_t *pos) drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:1009:1: error: stack frame size (1120) exceeds limit (1024) in 'dr_dump_domain' [-Werror,-Wframe-larger-than] dr_dump_domain(struct seq_file *file, struct mlx5dr_domain *dmn) drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:705:1: error: stack frame size (1104) exceeds limit (1024) in 'dr_dump_matcher_rx_tx' [-Werror,-Wframe-larger-than] dr_dump_matcher_rx_tx(struct seq_file *file, bool is_rx, Rework these so that each of the various code paths only ever has one of these buffers in it, and exactly the functions that declare one have the 'noinline_for_stack' annotation that prevents them from all being inlined into the same caller. Fixes: 917d1e799ddf ("net/mlx5: DR, Change SWS usage to debug fs seq_file interface") Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/all/20240219100506.648089-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240408074142.3007036-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11Merge branch 'mlx5-misc-fixes'Jakub Kicinski14-53/+131
Tariq Toukan says: ==================== mlx5 misc fixes This patchset provides bug fixes to mlx5 driver. This is V2 of the series previously submitted as PR by Saeed: https://lore.kernel.org/netdev/20240326144646.2078893-1-saeed@kernel.org/T/ Series generated against: commit 237f3cf13b20 ("xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING") ==================== Link: https://lore.kernel.org/r/20240409190820.227554-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdevTariq Toukan1-0/+7
Adaptations need to be made for the auxiliary device management in the core driver level. Block this combination for now. Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5e: RSS, Block XOR hash with over 128 channelsCarolina Jubran3-2/+34
When supporting more than 128 channels, the RQT size is calculated by multiplying the number of channels by 2 and rounding up to the nearest power of 2. The index of the RQT is derived from the RSS hash calculations. If XOR8 is used as the RSS hash function, there are only 256 possible hash results, and therefore, only 256 indexes can be reached in the RQT. Block setting the RSS hash function to XOR when the number of channels exceeds 128. Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmitRahul Rameshbabu2-5/+10
Free Tx port timestamping metadata entries in the NAPI poll context and consume metadata enties in the WQE xmit path. Do not free a Tx port timestamping metadata entry in the WQE xmit path even in the error path to avoid a race between two metadata entry producers. Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5e: HTB, Fix inconsistencies with QoS SQs numberCarolina Jubran1-16/+17
When creating a new HTB class while the interface is down, the variable that follows the number of QoS SQs (htb_max_qos_sqs) may not be consistent with the number of HTB classes. Previously, we compared these two values to ensure that the node_qid is lower than the number of QoS SQs, and we allocated stats for that SQ when they are equal. Change the check to compare the node_qid with the current number of leaf nodes and fix the checking conditions to ensure allocation of stats_list and stats for each node. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5e: Fix mlx5e_priv_init() cleanup flowCarolina Jubran2-2/+2
When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using lockdep_is_held(). Acquire the state_lock in mlx5e_selq_cleanup(). Kernel log: ============================= WARNING: suspicious RCU usage 6.8.0-rc3_net_next_841a9b5 #1 Not tainted ----------------------------- drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by systemd-modules/293: #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core] #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core] stack backtrace: CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x8a/0xa0 lockdep_rcu_suspicious+0x154/0x1a0 mlx5e_selq_apply+0x94/0xa0 [mlx5_core] mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core] mlx5e_priv_init+0x2be/0x2f0 [mlx5_core] mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core] rdma_init_netdev+0x4e/0x80 [ib_core] ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core] ipoib_intf_init+0x64/0x550 [ib_ipoib] ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib] ipoib_add_one+0xb0/0x360 [ib_ipoib] add_client_context+0x112/0x1c0 [ib_core] ib_register_client+0x166/0x1b0 [ib_core] ? 0xffffffffa0573000 ipoib_init_module+0xeb/0x1a0 [ib_ipoib] do_one_initcall+0x61/0x250 do_init_module+0x8a/0x270 init_module_from_file+0x8b/0xd0 idempotent_init_module+0x17d/0x230 __x64_sys_finit_module+0x61/0xb0 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e </TASK> Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5e: RSS, Block changing channels number when RXFH is configuredCarolina Jubran1-0/+17
Changing the channels number after configuring the receive flow hash indirection table may affect the RSS table size. The previous configuration may no longer be compatible with the new receive flow hash indirection table. Block changing the channels number when RXFH is configured and changing the channels number requires resizing the RSS table size. Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5: Correctly compare pkt reformat idsCosmin Ratiu1-2/+12
struct mlx5_pkt_reformat contains a naked union of a u32 id and a dr_action pointer which is used when the action is SW-managed (when pkt_reformat.owner is set to MLX5_FLOW_RESOURCE_OWNER_SW). Using id directly in that case is incorrect, as it maps to the least significant 32 bits of the 64-bit pointer in mlx5_fs_dr_action and not to the pkt reformat id allocated in firmware. For the purpose of comparing whether two rules are identical, interpreting the least significant 32 bits of the mlx5_fs_dr_action pointer as an id mostly works... until it breaks horribly and produces the outcome described in [1]. This patch fixes mlx5_flow_dests_cmp to correctly compare ids using mlx5_fs_dr_action_get_pkt_reformat_id for the SW-managed rules. Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1] Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5: Properly link new fs rules into the treeCosmin Ratiu1-1/+2
Previously, add_rule_fg would only add newly created rules from the handle into the tree when they had a refcount of 1. On the other hand, create_flow_handle tries hard to find and reference already existing identical rules instead of creating new ones. These two behaviors can result in a situation where create_flow_handle 1) creates a new rule and references it, then 2) in a subsequent step during the same handle creation references it again, resulting in a rule with a refcount of 2 that is not linked into the tree, will have a NULL parent and root and will result in a crash when the flow group is deleted because del_sw_hw_rule, invoked on rule deletion, assumes node->parent is != NULL. This happened in the wild, due to another bug related to incorrect handling of duplicate pkt_reformat ids, which lead to the code in create_flow_handle incorrectly referencing a just-added rule in the same flow handle, resulting in the problem described above. Full details are at [1]. This patch changes add_rule_fg to add new rules without parents into the tree, properly initializing them and avoiding the crash. This makes it more consistent with how rules are added to an FTE in create_flow_handle. Fixes: 74491de93712 ("net/mlx5: Add multi dest support") Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u [1] Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5: offset comp irq index in name by oneMichael Liang1-1/+3
The mlx5 comp irq name scheme is changed a little bit between commit 3663ad34bc70 ("net/mlx5: Shift control IRQ to the last index") and commit 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation"). The index in the comp irq name used to start from 0 but now it starts from 1. There is nothing critical here, but it's harmless to change back to the old behavior, a.k.a starting from 0. Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Michael Liang <mliang@purestorage.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5: Register devlink first under devlink lockShay Drory2-18/+20
In case device is having a non fatal FW error during probe, the driver will report the error to user via devlink. This will trigger a WARN_ON, since mlx5 is calling devlink_register() last. In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register() first under devlink lock. [1] WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0 CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core] RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0 Call Trace: <TASK> ? __warn+0x79/0x120 ? devlink_recover_notify.constprop.0+0xb8/0xc0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? devlink_recover_notify.constprop.0+0xb8/0xc0 devlink_health_report+0x4a/0x1c0 mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core] process_one_work+0x1bb/0x3c0 ? process_one_work+0x3c0/0x3c0 worker_thread+0x4d/0x3c0 ? process_one_work+0x3c0/0x3c0 kthread+0xc6/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11net/mlx5: E-switch, store eswitch pointer before registering devlink_paramShay Drory2-6/+7
Next patch will move devlink register to be first. Therefore, whenever mlx5 will register a param, the user will be notified. In order to notify the user, devlink is using the get() callback of the param. Hence, resources that are being used by the get() callback must be set before the devlink param is registered. Therefore, store eswitch pointer inside mdev before registering the param. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11Merge tag 'probes-fixes-v6.9-rc3' of ↵Linus Torvalds1-6/+12
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: "Fix possible use-after-free issue on kprobe registration. check_kprobe_address_safe() uses `is_module_text_address()` and `__module_text_address()` separately. As a result, if the probed address is in a module that is being unloaded, the first `is_module_text_address()` might return true but then the `__module_text_address()` call might return NULL if the module has been unloaded between the two. The result is that kprobe believes the probe is on the kernel text, and skips getting a module reference. In this case, when it arms a breakpoint on the probe address, it may cause a use-after-free. To fix this issue, only use `__module_text_address()` once and get a reference to the module then. If it fails, reject the probe" * tag 'probes-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Fix possible use-after-free issue on kprobe registration
2024-04-11netfilter: complete validation of user inputEric Dumazet3-0/+12
In my recent commit, I missed that do_replace() handlers use copy_from_sockptr() (which I fixed), followed by unsafe copy_from_sockptr_offset() calls. In all functions, we can perform the @optlen validation before even calling xt_alloc_table_info() with the following check: if ((u64)optlen < (u64)tmp.size + sizeof(tmp)) return -EINVAL; Fixes: 0c83842df40f ("netfilter: validate user input for expected length") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11Merge tag 'bootconfig-fixes-v6.9-rc3' of ↵Linus Torvalds3-6/+12
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fixes from Masami Hiramatsu: - show the original cmdline only once, and only if it was modeified by bootconfig * tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: fs/proc: Skip bootloader comment if no embedded kernel parameters fs/proc: remove redundant comments from /proc/bootconfig
2024-04-11bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()Kent Overstreet1-5/+7
We weren't respecting trans->journal_replay_not_finished - we shouldn't be searching the journal keys unless we have a ref on them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()Kent Overstreet1-27/+1
dropping read locks in bch2_btree_node_lock_write_nofail() dates from before we had the cycle detector; we can now tell the cycle detector directly when taking a lock may not fail because we can't handle transaction restarts. This is needed for adding should_be_locked asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Fix a race in btree_update_nodes_written()Kent Overstreet1-3/+7
One btree update might have terminated in a node update, and then while it is in flight another btree update might free that original node. This race has to be handled in btree_update_nodes_written() - we were missing a READ_ONCE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11r8169: add missing conditional compiling for call to r8169_remove_ledsHeiner Kallweit1-1/+2
Add missing dependency on CONFIG_R8169_LEDS. As-is a link error occurs if config option CONFIG_R8169_LEDS isn't enabled. Fixes: 19fa4f2a85d7 ("r8169: fix LED-related deadlock on module removal") Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/d080038c-eb6b-45ac-9237-b8c1cdd7870f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11platform/chrome: cros_ec_uart: properly fix race conditionNoah Loomans1-14/+14
The cros_ec_uart_probe() function calls devm_serdev_device_open() before it calls serdev_device_set_client_ops(). This can trigger a NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... Call Trace: <TASK> ... ? ttyport_receive_buf A simplified version of crashing code is as follows: static inline size_t serdev_controller_receive_buf(struct serdev_controller *ctrl, const u8 *data, size_t count) { struct serdev_device *serdev = ctrl->serdev; if (!serdev || !serdev->ops->receive_buf) // CRASH! return 0; return serdev->ops->receive_buf(serdev, data, count); } It assumes that if SERPORT_ACTIVE is set and serdev exists, serdev->ops will also exist. This conflicts with the existing cros_ec_uart_probe() logic, as it first calls devm_serdev_device_open() (which sets SERPORT_ACTIVE), and only later sets serdev->ops via serdev_device_set_client_ops(). Commit 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition") attempted to fix a similar race condition, but while doing so, made the window of error for this race condition to happen much wider. Attempt to fix the race condition again, making sure we fully setup before calling devm_serdev_device_open(). Fixes: 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition") Cc: stable@vger.kernel.org Signed-off-by: Noah Loomans <noah@noahloomans.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Link: https://lore.kernel.org/r/20240410182618.169042-2-noah@noahloomans.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2024-04-11net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boardsArınç ÜNAL2-5/+13
The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") brought EEE support but did not enable EEE on MT7531 switch MACs. EEE is enabled on MT7531 switch MACs by pulling the LAN2LED0 pin low on the board (bootstrapping), unsetting the EEE_DIS bit on the trap register, or setting the internal EEE switch bit on the CORE_PLL_GROUP4 register. Thanks to SkyLake Huang (黃啟澤) from MediaTek for providing information on the internal EEE switch bit. There are existing boards that were not designed to pull the pin low. Because of that, the EEE status currently depends on the board design. The EEE_DIS bit on the trap pertains to the LAN2LED0 pin which is usually used to control an LED. Once the bit is unset, the pin will be low. That will make the active low LED turn on. The pin is controlled by the switch PHY. It seems that the PHY controls the pin in the way that it inverts the pin state. That means depending on the wiring of the LED connected to LAN2LED0 on the board, the LED may be on without an active link. To not cause this unwanted behaviour whilst enabling EEE on all boards, set the internal EEE switch bit on the CORE_PLL_GROUP4 register. My testing on MT7531 shows a certain amount of traffic loss when EEE is enabled. That said, I haven't come across a board that enables EEE. So enable EEE on the switch MACs but disable EEE advertisement on the switch PHYs. This way, we don't change the behaviour of the majority of the boards that have this switch. The mediatek-ge PHY driver already disables EEE advertisement on the switch PHYs but my testing shows that it is somehow enabled afterwards. Disabling EEE advertisement before the PHY driver initialises keeps it off. With this change, EEE can now be enabled using ethtool. Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/20240408-for-net-mt7530-fix-eee-for-mt7531-mt7988-v3-1-84fdef1f008b@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()Paulo Alcantara1-1/+2
cifs_get_fattr() may be called with a NULL inode, so check for a non-NULL inode before calling cifs_mark_open_handles_for_deleted_file(). This fixes the following oops: mount.cifs //srv/share /mnt -o ...,vers=3.1.1 cd /mnt touch foo; tail -f foo & rm foo cat foo BUG: kernel NULL pointer dereference, address: 00000000000005c0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 696 Comm: cat Not tainted 6.9.0-rc2 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 RIP: 0010:__lock_acquire+0x5d/0x1c70 Code: 00 00 44 8b a4 24 a0 00 00 00 45 85 f6 0f 84 bb 06 00 00 8b 2d 48 e2 95 01 45 89 c3 41 89 d2 45 89 c8 85 ed 0 0 <48> 81 3f 40 7a 76 83 44 0f 44 d8 83 fe 01 0f 86 1b 03 00 00 31 d2 RSP: 0018:ffffc90000b37490 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff888110021ec0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000005c0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200 FS: 00007f2a1fa08740(0000) GS:ffff888157a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000005c0 CR3: 000000011ac7c000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x180/0x490 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x70/0x230 ? asm_exc_page_fault+0x26/0x30 ? __lock_acquire+0x5d/0x1c70 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc0/0x2d0 ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? kmem_cache_alloc+0x2d9/0x370 _raw_spin_lock+0x34/0x80 ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs] cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs] cifs_get_fattr+0x24c/0x940 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_get_inode_info+0x96/0x120 [cifs] cifs_lookup+0x16e/0x800 [cifs] cifs_atomic_open+0xc7/0x5d0 [cifs] ? lookup_open.isra.0+0x3ce/0x5f0 ? __pfx_cifs_atomic_open+0x10/0x10 [cifs] lookup_open.isra.0+0x3ce/0x5f0 path_openat+0x42b/0xc30 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 do_filp_open+0xc4/0x170 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x57/0xa0 do_syscall_64+0xc1/0x1e0 entry_SYSCALL_64_after_hwframe+0x72/0x7a Fixes: ffceb7640cbf ("smb: client: do not defer close open handles to deleted files") Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-11Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encryptedMichael Kelley1-1/+3
In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. The VMBus ring buffer code could free decrypted/shared pages if set_memory_decrypted() fails. Check the decrypted field in the struct vmbus_gpadl for the ring buffers to decide whether to free the memory. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-6-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-6-mhklinux@outlook.com>
2024-04-11uio_hv_generic: Don't free decrypted memoryRick Edgecombe1-4/+8
In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. The VMBus device UIO driver could free decrypted/shared pages if set_memory_decrypted() fails. Check the decrypted field in the gpadl to decide whether to free the memory. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-5-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-5-mhklinux@outlook.com>
2024-04-11hv_netvsc: Don't free decrypted memoryRick Edgecombe1-2/+5
In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. The netvsc driver could free decrypted/shared pages if set_memory_decrypted() fails. Check the decrypted field in the gpadl to decide whether to free the memory. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-4-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-4-mhklinux@outlook.com>
2024-04-11Drivers: hv: vmbus: Track decrypted status in vmbus_gpadlRick Edgecombe2-4/+22
In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. In order to make sure callers of vmbus_establish_gpadl() and vmbus_teardown_gpadl() don't return decrypted/shared pages to allocators, add a field in struct vmbus_gpadl to keep track of the decryption status of the buffers. This will allow the callers to know if they should free or leak the pages. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-3-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-3-mhklinux@outlook.com>
2024-04-11Drivers: hv: vmbus: Leak pages if set_memory_encrypted() failsRick Edgecombe1-7/+22
In CoCo VMs it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. VMBus code could free decrypted pages if set_memory_encrypted()/decrypted() fails. Leak the pages if this happens. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20240311161558.1310-2-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240311161558.1310-2-mhklinux@outlook.com>
2024-04-11hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile formatShradha Gupta1-41/+172
If the network configuration strings are passed as a combination of IPv4 and IPv6 addresses, the current KVP daemon does not handle processing for the keyfile configuration format. With these changes, the keyfile config generation logic scans through the list twice to generate IPv4 and IPv6 sections for the configuration files to handle this support. Testcases ran:Rhel 9, Hyper-V VMs (IPv4 only, IPv6 only, IPv4 and IPv6 combination) Co-developed-by: Ani Sinha <anisinha@redhat.com> Signed-off-by: Ani Sinha <anisinha@redhat.com> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Tested-by: Ani Sinha <anisinha@redhat.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Link: https://lore.kernel.org/r/1711115162-11629-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1711115162-11629-1-git-send-email-shradhagupta@linux.microsoft.com>
2024-04-11hv: vmbus: Convert sprintf() family to sysfs_emit() familyLi Zhijian1-52/+42
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). sprintf() and scnprintf() will be converted as well if these files have such abused cases. This patch is generated by make coccicheck M=<path/to/file> MODE=patch \ COCCI=scripts/coccinelle/api/device_attr_show.cocci No functional change intended. CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: linux-hyperv@vger.kernel.org Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240319034350.1574454-1-lizhijian@fujitsu.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240319034350.1574454-1-lizhijian@fujitsu.com>
2024-04-10Merge tag 'media/v6.9-2' of ↵Linus Torvalds11-13/+32
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - some fixes for mediatec vcodec encoder/decoder oopses * tag 'media/v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: mediatek: vcodec: support 36 bits physical address media: mediatek: vcodec: adding lock to protect encoder context list media: mediatek: vcodec: adding lock to protect decoder context list media: mediatek: vcodec: Fix oops when HEVC init fails media: mediatek: vcodec: Handle VP9 superframe bitstream with 8 sub-frames
2024-04-10Merge tag 'hardening-v6.9-rc4' of ↵Linus Torvalds3-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - gcc-plugins/stackleak: Avoid .head.text section (Ard Biesheuvel) - ubsan: fix unused variable warning in test module (Arnd Bergmann) - Improve entropy diffusion in randomize_kstack * tag 'hardening-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randomize_kstack: Improve entropy diffusion ubsan: fix unused variable warning in test module gcc-plugins/stackleak: Avoid .head.text section
2024-04-10Merge tag 'turbostat-2024.04.10' of ↵Linus Torvalds4-369/+1727
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Use of the CPU MSR driver is now optional - Perf is now preferred for many counters - Non-root users can now execute turbostat, though with limited functionality - Add counters for some new GFX hardware - Minor fixes * tag 'turbostat-2024.04.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (26 commits) tools/power turbostat: v2024.04.10 tools/power/turbostat: Add support for Xe sysfs knobs tools/power/turbostat: Add support for new i915 sysfs knobs tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz tools/power/turbostat: Fix uncore frequency file string tools/power/turbostat: Unify graphics sysfs snapshots tools/power/turbostat: Cache graphics sysfs path tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX tools/power turbostat: Add selftests tools/power turbostat: read RAPL counters via perf tools/power turbostat: Add proper re-initialization for perf file descriptors tools/power turbostat: Clear added counters when in no-msr mode tools/power turbostat: add early exits for permission checks tools/power turbostat: detect and disable unavailable BICs at runtime tools/power turbostat: Add reading aperf and mperf via perf API tools/power turbostat: Add --no-perf option tools/power turbostat: Add --no-msr option tools/power turbostat: enhance -D (debug counter dump) output tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read tools/power turbostat: Read base_hz and bclk from CPUID.16H if available ...
2024-04-10Merge tag 'platform-drivers-x86-v6.9-2' of ↵Linus Torvalds5-10/+25
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "Fixes: - intel/hid: Solve spurious hibernation aborts (power button release) - toshiba_acpi: Ignore 2 keys to avoid log noise during suspend/resume - intel-vbtn: Fix probe by restoring VBDL and VGBS evalutation order - lg-laptop: Fix W=1 %s null argument warning New HW Support: - acer-wmi: PH18-71 mode button and fan speed sensor - intel/hid: Lunar Lake and Arrow Lake HID IDs" * tag 'platform-drivers-x86-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: lg-laptop: fix %s null argument warning platform/x86: intel-vbtn: Update tablet mode switch at end of probe platform/x86: intel-vbtn: Use acpi_has_method to check for switch platform/x86: toshiba_acpi: Silence logging for some events platform/x86/intel/hid: Add Lunar Lake and Arrow Lake support platform/x86/intel/hid: Don't wake on 5-button releases platform/x86: acer-wmi: Add support for Acer PH18-71
2024-04-10selftests: timers: Fix valid-adjtimex signed left-shift undefined behaviorJohn Stultz1-37/+36
The struct adjtimex freq field takes a signed value who's units are in shifted (<<16) parts-per-million. Unfortunately for negative adjustments, the straightforward use of: freq = ppm << 16 trips undefined behavior warnings with clang: valid-adjtimex.c:66:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value] -499<<16, ~~~~^ valid-adjtimex.c:67:6: warning: shifting a negative signed value is undefined [-Wshift-negative-value] -450<<16, ~~~~^ .. Fix it by using a multiply by (1 << 16) instead of shifting negative values in the valid-adjtimex test case. Align the values for better readability. Reported-by: Lee Jones <joneslee@google.com> Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240409202222.2830476-1-jstultz@google.com Link: https://lore.kernel.org/lkml/0c6d4f0d-2064-4444-986b-1d1ed782135f@collabora.com/
2024-04-10bug: Fix no-return-statement warning with !CONFIG_BUGAdrian Hunter1-1/+4
BUG() does not return, and arch implementations of BUG() use unreachable() or other non-returning code. However with !CONFIG_BUG, the default implementation is often used instead, and that does not do that. x86 always uses its own implementation, but powerpc with !CONFIG_BUG gives a build error: kernel/time/timekeeping.c: In function ‘timekeeping_debug_get_ns’: kernel/time/timekeeping.c:286:1: error: no return statement in function returning non-void [-Werror=return-type] Add unreachable() to default !CONFIG_BUG BUG() implementation. Fixes: e8e9d21a5df6 ("timekeeping: Refactor timekeeping helpers") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/r/20240410153212.127477-1-adrian.hunter@intel.com Closes: https://lore.kernel.org/all/CA+G9fYvjdZCW=7ZGxS6A_3bysjQ56YF7S-+PNLQ_8a4DKh1Bhg@mail.gmail.com/
2024-04-10Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bitArchie Pusaka1-2/+1
The bit is set and tested inside mgmt_device_connected(), therefore we must not set it just outside the function. Fixes: eeda1bf97bb5 ("Bluetooth: hci_event: Fix not indicating new connection for BIG Sync") Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-10Bluetooth: hci_sock: Fix not validating setsockopt user inputLuiz Augusto von Dentz1-13/+8
Check user input length before copying data. Fixes: 09572fca7223 ("Bluetooth: hci_sock: Add support for BT_{SND,RCV}BUF") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>