summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-02rust: make mutually exclusive with CFI_CLANGConor Dooley1-0/+1
commit 8933cf4651e02853ca679be7b2d978dfcdcc5e0c upstream. On RISC-V and arm64, and presumably x86, if CFI_CLANG is enabled, loading a rust module will trigger a kernel panic. Support for sanitisers, including kcfi (CFI_CLANG), is in the works, but for now they're nightly-only options in rustc. Make RUST depend on !CFI_CLANG to prevent configuring a kernel without symmetrical support for kfi. [ Matthew Maurer writes [1]: This patch is fine by me - the last patch needed for KCFI to be functional in Rust just landed upstream last night, so we should revisit this (in the form of enabling it) once we move to `rustc-1.79.0` or later. Ramon de C Valle also gave feedback [2] on the status of KCFI for Rust and created a tracking issue [3] in upstream Rust. - Miguel ] Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Cc: stable@vger.kernel.org Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/rust-for-linux/CAGSQo024u1gHJgzsO38Xg3c4or+JupoPABQx_+0BLEpPg0cOEA@mail.gmail.com/ [1] Link: https://lore.kernel.org/rust-for-linux/CAOcBZOS2kPyH0Dm7Fuh4GC3=v7nZhyzBj_-dKu3PfAnrHZvaxg@mail.gmail.com/ [2] Link: https://github.com/rust-lang/rust/issues/123479 [3] Link: https://lore.kernel.org/r/20240404-providing-emporium-e652e359c711@spud [ Added feedback from the list, links, and used Cc for the tag. ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02rust: init: remove impl Zeroable for InfallibleLaine Taffin Altman1-2/+9
commit 49ceae68a0df9a92617a61e9ce8a0efcf6419585 upstream. In Rust, producing an invalid value of any type is immediate undefined behavior (UB); this includes via zeroing memory. Therefore, since an uninhabited type has no valid values, producing any values at all for it is UB. The Rust standard library type `core::convert::Infallible` is uninhabited, by virtue of having been declared as an enum with no cases, which always produces uninhabited types in Rust. The current kernel code allows this UB to be triggered, for example by code like `Box::<core::convert::Infallible>::init(kernel::init::zeroed())`. Thus, remove the implementation of `Zeroable` for `Infallible`, thereby avoiding the unsoundness (potential for future UB). Cc: stable@vger.kernel.org Fixes: 38cde0bd7b67 ("rust: init: add `Zeroable` trait and `init::zeroed` function") Closes: https://github.com/Rust-for-Linux/pinned-init/pull/13 Signed-off-by: Laine Taffin Altman <alexanderaltman@me.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Link: https://lore.kernel.org/r/CA160A4E-561E-4918-837E-3DCEBA74F808@me.com [ Reformatted the comment slightly. ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02rust: don't select CONSTRUCTORSAlice Ryhl1-1/+0
commit 7d49f53af4b988b188d3932deac2c9c80fd7d9ce upstream. This was originally part of commit 4b9a68f2e59a0 ("rust: add support for static synchronisation primitives") from the old Rust branch, which used module constructors to initialize globals containing various synchronisation primitives with pin-init. That commit has never been upstreamed, but the `select CONSTRUCTORS` statement ended up being included in the patch that initially added Rust support to the Linux Kernel. We are not using module constructors, so let's remove the select. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Cc: stable@vger.kernel.org Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Link: https://lore.kernel.org/r/20240308-constructors-v1-1-4c811342391c@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02rust: kernel: require `Send` for `Module` implementationsWedson Almeida Filho1-1/+1
commit 323617f649c0966ad5e741e47e27e06d3a680d8f upstream. The thread that calls the module initialisation code when a module is loaded is not guaranteed [in fact, it is unlikely] to be the same one that calls the module cleanup code on module unload, therefore, `Module` implementations must be `Send` to account for them moving from one thread to another implicitly. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Cc: stable@vger.kernel.org # 6.8.x: df70d04d5697: rust: phy: implement `Send` for `Registration` Cc: stable@vger.kernel.org Fixes: 247b365dc8dc ("rust: add `kernel` crate") Link: https://lore.kernel.org/r/20240328195457.225001-3-wedsonaf@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02rust: phy: implement `Send` for `Registration`Wedson Almeida Filho1-0/+4
commit df70d04d56975f527b9c965322cf56e245909071 upstream. In preparation for requiring `Send` for `Module` implementations in the next patch. Cc: FUJITA Tomonori <fujita.tomonori@gmail.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: netdev@vger.kernel.org Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240328195457.225001-2-wedsonaf@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02x86/cpu: Fix check for RDPKRU in __show_regs()David Kaplan1-1/+1
commit b53c6bd5d271d023857174b8fd3e32f98ae51372 upstream. cpu_feature_enabled(X86_FEATURE_OSPKE) does not necessarily reflect whether CR4.PKE is set on the CPU. In particular, they may differ on non-BSP CPUs before setup_pku() is executed. In this scenario, RDPKRU will #UD causing the system to hang. Fix by checking CR4 for PKE enablement which is always correct for the current CPU. The scenario happens by inserting a WARN* before setup_pku() in identiy_cpu() or some other diagnostic which would lead to calling __show_regs(). [ bp: Massage commit message. ] Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240421191728.32239-1-bp@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 rangeWenkuan Wang1-2/+1
commit 2718a7fdf292b2dcb49c856fa8a6a955ebbbc45f upstream. Add some more Zen5 models. Fixes: 3e4147f33f8b ("x86/CPU/AMD: Add X86_FEATURE_ZEN5") Signed-off-by: Wenkuan Wang <Wenkuan.Wang@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240423144111.1362-1-bp@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)Terry Tritton1-1/+8
commit ecaaa55c9fa5e8058445a8b891070b12208cdb6d upstream. unshare(CLONE_NEWPID) can return EINVAL if the kernel does not have the CONFIG_PID_NS option enabled. Add a check on these calls to skip the test if we receive EINVAL. Signed-off-by: Terry Tritton <terry.tritton@linaro.org> Link: https://lore.kernel.org/r/20240124141357.1243457-2-terry.tritton@linaro.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02selftests/seccomp: Change the syscall used in KILL_THREAD testTerry Tritton1-4/+4
commit 471dbc547612adeaa769e48498ef591c6c95a57a upstream. The Bionic version of pthread_create used on Android calls the prctl function to give the stack and thread local storage a useful name. This will cause the KILL_THREAD test to fail as it will kill the thread as soon as it is created. change the test to use getpid instead of prctl. Signed-off-by: Terry Tritton <terry.tritton@linaro.org> Link: https://lore.kernel.org/r/20240124141357.1243457-3-terry.tritton@linaro.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02selftests/seccomp: user_notification_addfd check nextfd is availableTerry Tritton1-5/+19
commit 8e3c9f9f3a0742cd12b682a1766674253b33fcf0 upstream. Currently the user_notification_addfd test checks what the next expected file descriptor will be by incrementing a variable nextfd. This does not account for file descriptors that may already be open before the test is started and will cause the test to fail if any exist. Replace nextfd++ with a function get_next_fd which will check and return the next available file descriptor. Signed-off-by: Terry Tritton <terry.tritton@linaro.org> Link: https://lore.kernel.org/r/20240124141357.1243457-4-terry.tritton@linaro.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02drm/amdgpu: fix visible VRAM handling during faultsChristian König5-57/+53
[ Upstream commit a6ff969fe9cbf369e3cd0ac54261fec1122682ec ] When we removed the hacky start code check we actually didn't took into account that *all* VRAM pages needs to be CPU accessible. Clean up the code and unify the handling into a single helper which checks if the whole resource is CPU accessible. The only place where a partial check would make sense is during eviction, but that is neglitible. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: aed01a68047b ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02drm/amdgpu: add shared fdinfo statsAlex Deucher3-0/+21
[ Upstream commit ba1a58d5b907bdf1814f8f57434aebc86233430f ] Add shared stats. Useful for seeing shared memory. v2: take dma-buf into account as well v3: use the new gem helper Link: https://lore.kernel.org/all/20231207180225.439482-1-alexander.deucher@amd.com/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Rob Clark <robdclark@gmail.com> Reviewed-by: Christian König <christian.keonig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Stable-dep-of: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02drm: add drm_gem_object_is_shared_for_memory_stats() helperAlex Deucher1-0/+13
[ Upstream commit b31f5eba32ae8cc28e7cfa5a55ec8670d8c718e2 ] Add a helper so that drm drivers can consistently report shared status via the fdinfo shared memory stats interface. In addition to handle count, show buffers as shared if they are shared via dma-buf as well (e.g., shared with v4l or some other subsystem). v2: switch to inline function Link: https://lore.kernel.org/all/20231207180225.439482-1-alexander.deucher@amd.com/ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.keonig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Stable-dep-of: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"Sean Christopherson1-2/+14
[ Upstream commit de120e1d692d73c7eefa3278837b1eb68f90728a ] Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL when refreshing the PMU to emulate the MSR's architecturally defined post-RESET behavior. Per Intel's SDM: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. and Where "n" is the number of general-purpose counters available in the processor. AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many PPRs. Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose counters, although a literal reading of the SDM would require the CPU to set either bits 63:0 or 31:0. The intent of the behavior is to globally enable all GP counters; honor the intent, if not the letter of the law. Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL. This bug was recently exposed when KVM added supported for AMD's PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to guest software that only knew how to program v1 PMUs (that don't support PERF_GLOBAL_CTRL). Failure to emulate the post-RESET behavior results in such guests unknowingly leaving all general purpose counters globally disabled (the entire reason the post-RESET value sets the GP counter enable bits is to maintain backwards compatibility). The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel CPUs, although the old behavior was likely dumb luck. Because (a) that old code was also broken in its own way (the history of this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented as having a value of '0' post-RESET in all SDMs before March 2023. Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") *almost* got it right (again likely by dumb luck), but for some reason only set the bits if the guest PMU was advertised as v1: if (pmu->version == 1) { pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1; return; } Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset") then tried to remedy that goof, presumably because guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling counters. pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) | (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED); pmu->global_ctrl_mask = ~pmu->global_ctrl; That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu: Don't overwrite the pmu->global_ctrl when refreshing") removed *everything*. However, it did so based on the behavior defined by the SDM , which at the time stated that "Global Perf Counter Controls" is '0' at Power-Up and RESET. But then the March 2023 SDM (325462-079US), stealthily changed its "IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT" table to say: IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits. Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a "pure" RESET flow. But it can only be called prior to the first KVM_RUN, i.e. the guest will only ever observe the final value. Note #2, KVM has always cleared global_ctrl during refresh (see commit f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")), i.e. there is no danger of breaking existing setups by clobbering a value set by userspace. Reported-by: Babu Moger <babu.moger@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabledSean Christopherson2-16/+20
[ Upstream commit f933b88e20150f15787390e2a1754a7e412754ed ] Move the purging of common PMU metadata from intel_pmu_refresh() to kvm_pmu_refresh(), and invoke the vendor refresh() hook if and only if the VM is supposed to have a vPMU. KVM already denies access to the PMU based on kvm->arch.enable_pmu, as get_gp_pmc_amd() returns NULL for all PMCs in that case, i.e. KVM already violates AMD's architecture by not virtualizing a PMU (kernels have long since learned to not panic when the PMU is unavailable). But configuring the PMU as if it were enabled causes unwanted side effects, e.g. calls to kvm_pmu_trigger_event() waste an absurd number of cycles due to the all_valid_pmc_idx bitmap being non-zero. Fixes: b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization") Reported-by: Konstantin Khorenko <khorenko@virtuozzo.com> Closes: https://lore.kernel.org/all/20231109180646.2963718-2-khorenko@virtuozzo.com Link: https://lore.kernel.org/r/20231110022857.1273836-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02usb: xhci: correct return value in case of STS_HCEOliver Neukum1-5/+4
[ Upstream commit 5bfc311dd6c376d350b39028b9000ad766ddc934 ] If we get STS_HCE we give up on the interrupt, but for the purpose of IRQ handling that still counts as ours. We may return IRQ_NONE only if we are positive that it wasn't ours. Hence correct the default. Fixes: 2a25e66d676d ("xhci: print warning when HCE was set") Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240404121106.2842417-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02xhci: move event processing for one interrupter to a separate functionMathias Nyman1-45/+42
[ Upstream commit 84ac5e4fa517f5d1da0054547a82ce905678dc08 ] Split the main XHCI interrupt handler into a different API, so that other potential interrupters can utilize similar event ring handling. A scenario would be if a secondary interrupter required to skip pending events in the event ring, which would warrant a similar set of operations. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com> Link: https://lore.kernel.org/r/20240217001017.29969-7-quic_wcheng@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 5bfc311dd6c3 ("usb: xhci: correct return value in case of STS_HCE") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02Revert "drm/amd/display: fix USB-C flag update after enc10 feature init"Alex Deucher2-7/+5
[ Upstream commit 91f10a3d21f2313485178d49efef8a3ba02bd8c7 ] This reverts commit b5abd7f983e14054593dc91d6df2aa5f8cc67652. This change breaks DSC on 4k monitors at 144Hz over USB-C. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3254 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Muhammad Ahmed <ahmed.ahmed@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Charlene Liu <charlene.liu@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02drm/amd/display: Check DP Alt mode DPCS state via DMUBGeorge Shen2-19/+71
[ Upstream commit 7d1e9d0369e4d45738d4b905c3ec92f76d7f91e6 ] [Why] Currently, driver state for DCN3.2 is not strictly matching HW state for the USBC port. To reduce inconsistencies while debugging, the driver should match HW configuration. [How] Update link encoder flag to indicate USBC port. Call into DMUB to check when DP Alt mode is entered, and also to check for 2-lane versuse 4-lane mode. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 91f10a3d21f2 ("Revert "drm/amd/display: fix USB-C flag update after enc10 feature init"") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02netfs: Fix the pre-flush when appending to a file in writethrough modeDavid Howells1-7/+6
[ Upstream commit c97f59e276d4e93480f29a70accbd0d7273cf3f5 ] In netfs_perform_write(), when the file is marked NETFS_ICTX_WRITETHROUGH or O_*SYNC or RWF_*SYNC was specified, write-through caching is performed on a buffered file. When setting up for write-through, we flush any conflicting writes in the region and wait for the write to complete, failing if there's a write error to return. The issue arises if we're writing at or above the EOF position because we skip the flush and - more importantly - the wait. This becomes a problem if there's a partial folio at the end of the file that is being written out and we want to make a write to it too. Both the already-running write and the write we start both want to clear the writeback mark, but whoever is second causes a warning looking something like: ------------[ cut here ]------------ R=00000012: folio 11 is not under writeback WARNING: CPU: 34 PID: 654 at fs/netfs/write_collect.c:105 ... CPU: 34 PID: 654 Comm: kworker/u386:27 Tainted: G S ... ... Workqueue: events_unbound netfs_write_collection_worker ... RIP: 0010:netfs_writeback_lookup_folio Fix this by making the flush-and-wait unconditional. It will do nothing if there are no folios in the pagecache and will return quickly if there are no folios in the region specified. Further, move the WBC attachment above the flush call as the flush is going to attach a WBC and detach it again if it is not present - and since we need one anyway we might as well share it. Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404161031.468b84f-oliver.sang@intel.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2150448.1714130115@warthog.procyon.org.uk Reviewed-by: Jeffrey Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().Kuniyuki Iwashima2-1/+4
[ Upstream commit 1971d13ffa84a551d29a81fdf5b5ec5be166ac83 ] syzbot reported a lockdep splat regarding unix_gc_lock and unix_state_lock(). One is called from recvmsg() for a connected socket, and another is called from GC for TCP_LISTEN socket. So, the splat is false-positive. Let's add a dedicated lock class for the latter to suppress the splat. Note that this change is not necessary for net-next.git as the issue is only applied to the old GC impl. [0]: WARNING: possible circular locking dependency detected 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted ----------------------------------------------------- kworker/u8:1/11 is trying to acquire lock: ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 but task is already holding lock: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (unix_gc_lock){+.+.}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] unix_notinflight+0x13d/0x390 net/unix/garbage.c:140 unix_detach_fds net/unix/af_unix.c:1819 [inline] unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188 skb_release_all net/core/skbuff.c:1200 [inline] __kfree_skb net/core/skbuff.c:1216 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252 kfree_skb include/linux/skbuff.h:1262 [inline] manage_oob net/unix/af_unix.c:2672 [inline] unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749 unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981 do_splice_read fs/splice.c:985 [inline] splice_file_to_pipe+0x299/0x500 fs/splice.c:1295 do_splice+0xf2d/0x1880 fs/splice.c:1379 __do_splice fs/splice.c:1436 [inline] __do_sys_splice fs/splice.c:1652 [inline] __se_sys_splice+0x331/0x4a0 fs/splice.c:1634 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&u->lock){+.+.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(unix_gc_lock); lock(&u->lock); lock(unix_gc_lock); lock(&u->lock); *** DEADLOCK *** 3 locks held by kworker/u8:1/11: #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261 stack backtrace: CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __unix_gc+0x40e/0xf70 net/unix/garbage.c:302 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02tls: fix lockless read of strp->msg_ready in ->pollSabrina Dubroca3-5/+6
[ Upstream commit 0844370f8945086eb9335739d10205dcea8d707b ] tls_sk_poll is called without locking the socket, and needs to read strp->msg_ready (via tls_strp_msg_ready). Convert msg_ready to a bool and use READ_ONCE/WRITE_ONCE where needed. The remaining reads are only performed when the socket is locked. Fixes: 121dca784fc0 ("tls: suppress wakeups unless we have a full record") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/0b7ee062319037cf86af6b317b3d72f7bfcd2e97.1713797701.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02dpll: fix dpll_pin_on_pin_register() for multiple parent pinsArkadiusz Kubalewski1-25/+33
[ Upstream commit 38d7b94e81d068b8d8c8392f421cfd2c3bbfd1a6 ] In scenario where pin is registered with multiple parent pins via dpll_pin_on_pin_register(..), all belonging to the same dpll device. A second call to dpll_pin_on_pin_unregister(..) would cause a call trace, as it tries to use already released registration resources (due to fix introduced in b446631f355e). In this scenario pin was registered twice, so resources are not yet expected to be release until each registered pin/pin pair is unregistered. Currently, the following crash/call trace is produced when ice driver is removed on the system with installed E810T NIC which includes dpll device: WARNING: CPU: 51 PID: 9155 at drivers/dpll/dpll_core.c:809 dpll_pin_ops+0x20/0x30 RIP: 0010:dpll_pin_ops+0x20/0x30 Call Trace: ? __warn+0x7f/0x130 ? dpll_pin_ops+0x20/0x30 dpll_msg_add_pin_freq+0x37/0x1d0 dpll_cmd_pin_get_one+0x1c0/0x400 ? __nlmsg_put+0x63/0x80 dpll_pin_event_send+0x93/0x140 dpll_pin_on_pin_unregister+0x3f/0x100 ice_dpll_deinit_pins+0xa1/0x230 [ice] ice_remove+0xf1/0x210 [ice] Fix by adding a parent pointer as a cookie when creating a registration, also when searching for it. For the regular pins pass NULL, this allows to create separated registration for each parent the pin is registered with. Fixes: b446631f355e ("dpll: fix dpll_xa_ref_*_del() for multiple registrations") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240424101636.1491424-1-arkadiusz.kubalewski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02dpll: check that pin is registered in __dpll_pin_unregister()Jiri Pirko1-0/+3
[ Upstream commit 9736c648370d237f61065a7e45e668e2db4374e9 ] Similar to what is done in dpll_device_unregister(), add assertion to __dpll_pin_unregister() to make sure driver does not try to unregister non-registered pin. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://lore.kernel.org/r/20240206074853.345744-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 38d7b94e81d0 ("dpll: fix dpll_pin_on_pin_register() for multiple parent pins") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02octeontx2-af: fix the double free in rvu_npc_freemem()Su Hui1-1/+0
[ Upstream commit 6e965eba43e9724f3e603d7b7cc83e53b23d155e ] Clang static checker(scan-build) warning: drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c:line 2184, column 2 Attempt to free released memory. npc_mcam_rsrcs_deinit() has released 'mcam->counters.bmap'. Deleted this redundant kfree() to fix this double free problem. Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20240424022724.144587-1-suhui@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packetsJason Reeder1-0/+5
[ Upstream commit 1b9e743e923b256e353a9a644195372285e5a6c0 ] The CPTS, by design, captures the messageType (Sync, Delay_Req, etc.) field from the second nibble of the PTP header which is defined in the PTPv2 (1588-2008) specification. In the PTPv1 (1588-2002) specification the first two bytes of the PTP header are defined as the versionType which is always 0x0001. This means that any PTPv1 packets that are tagged for TX timestamping by the CPTS will have their messageType set to 0x0 which corresponds to a Sync message type. This causes issues when a PTPv1 stack is expecting a Delay_Req (messageType: 0x1) timestamp that never appears. Fix this by checking if the ptp_class of the timestamped TX packet is PTP_CLASS_V1 and then matching the PTP sequence ID to the stored sequence ID in the skb->cb data structure. If the sequence IDs match and the packet is of type PTPv1 then there is a chance that the messageType has been incorrectly stored by the CPTS so overwrite the messageType stored by the CPTS with the messageType from the skb->cb data structure. This allows the PTPv1 stack to receive TX timestamps for Delay_Req packets which are necessary to lock onto a PTP Leader. Signed-off-by: Jason Reeder <jreeder@ti.com> Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Tested-by: Ed Trexel <ed.trexel@hp.com> Fixes: f6bd59526ca5 ("net: ethernet: ti: introduce am654 common platform time sync driver") Link: https://lore.kernel.org/r/20240424071626.32558-1-r-gunasekaran@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02ice: fix LAG and VF lock dependency in ice_reset_vf()Jacob Keller1-8/+8
[ Upstream commit 96fdd1f6b4ed72a741fb0eb705c0e13049b8721f ] 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate"), the ice driver has acquired the LAG mutex in ice_reset_vf(). The commit placed this lock acquisition just prior to the acquisition of the VF configuration lock. If ice_reset_vf() acquires the configuration lock via the ICE_VF_RESET_LOCK flag, this could deadlock with ice_vc_cfg_qs_msg() because it always acquires the locks in the order of the VF configuration lock and then the LAG mutex. Lockdep reports this violation almost immediately on creating and then removing 2 VF: ====================================================== WARNING: possible circular locking dependency detected 6.8.0-rc6 #54 Tainted: G W O ------------------------------------------------------ kworker/60:3/6771 is trying to acquire lock: ff40d43e099380a0 (&vf->cfg_lock){+.+.}-{3:3}, at: ice_reset_vf+0x22f/0x4d0 [ice] but task is already holding lock: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&pf->lag_mutex){+.+.}-{3:3}: __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_vc_cfg_qs_msg+0x45/0x690 [ice] ice_vc_process_vf_msg+0x4f5/0x870 [ice] __ice_clean_ctrlq+0x2b5/0x600 [ice] ice_service_task+0x2c9/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 -> #0 (&vf->cfg_lock){+.+.}-{3:3}: check_prev_add+0xe2/0xc50 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 lock_acquire+0xd4/0x2d0 __mutex_lock+0x9b/0xbf0 ice_reset_vf+0x22f/0x4d0 [ice] ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 kthread+0x104/0x140 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pf->lag_mutex); lock(&vf->cfg_lock); lock(&pf->lag_mutex); lock(&vf->cfg_lock); *** DEADLOCK *** 4 locks held by kworker/60:3/6771: #0: ff40d43e05428b38 ((wq_completion)ice){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #1: ff50d06e05197e58 ((work_completion)(&pf->serv_task)){+.+.}-{0:0}, at: process_one_work+0x176/0x4d0 #2: ff40d43ea1960e50 (&pf->vfs.table_lock){+.+.}-{3:3}, at: ice_process_vflr_event+0x48/0xd0 [ice] #3: ff40d43ea1961210 (&pf->lag_mutex){+.+.}-{3:3}, at: ice_reset_vf+0xb7/0x4d0 [ice] stack backtrace: CPU: 60 PID: 6771 Comm: kworker/60:3 Tainted: G W O 6.8.0-rc6 #54 Hardware name: Workqueue: ice ice_service_task [ice] Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x12d/0x150 check_prev_add+0xe2/0xc50 ? save_trace+0x59/0x230 ? add_chain_cache+0x109/0x450 validate_chain+0x558/0x800 __lock_acquire+0x4f8/0xb40 ? lockdep_hardirqs_on+0x7d/0x100 lock_acquire+0xd4/0x2d0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? lock_is_held_type+0xc7/0x120 __mutex_lock+0x9b/0xbf0 ? ice_reset_vf+0x22f/0x4d0 [ice] ? ice_reset_vf+0x22f/0x4d0 [ice] ? rcu_is_watching+0x11/0x50 ? ice_reset_vf+0x22f/0x4d0 [ice] ice_reset_vf+0x22f/0x4d0 [ice] ? process_one_work+0x176/0x4d0 ice_process_vflr_event+0x98/0xd0 [ice] ice_service_task+0x1cc/0x480 [ice] process_one_work+0x1e9/0x4d0 worker_thread+0x1e1/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x104/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> To avoid deadlock, we must acquire the LAG mutex only after acquiring the VF configuration lock. Fix the ice_reset_vf() to acquire the LAG mutex only after we either acquire or check that the VF configuration lock is held. Fixes: 9f74a3dfcf83 ("ice: Fix VF Reset paths when interface in a failed over aggregate") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-5-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02iavf: Fix TC config comparison with existing adapter TC configSudheer Mogilappagari1-1/+29
[ Upstream commit 54976cf58d6168b8d15cebb395069f23b2f34b31 ] Same number of TCs doesn't imply that underlying TC configs are same. The config could be different due to difference in number of queues in each TC. Add utility function to determine if TC configs are same. Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf") Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Mineri Bhange <minerix.bhange@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02i40e: Report MFS in decimal base instead of hexErwan Velu1-2/+2
[ Upstream commit ef3c313119ea448c22da10366faa26b5b4b1a18e ] If the MFS is set below the default (0x2600), a warning message is reported like the following : MFS for port 1 has been set below the default: 600 This message is a bit confusing as the number shown here (600) is in fact an hexa number: 0x600 = 1536 Without any explicit "0x" prefix, this message is read like the MFS is set to 600 bytes. MFS, as per MTUs, are usually expressed in decimal base. This commit reports both current and default MFS values in decimal so it's less confusing for end-users. A typical warning message looks like the following : MFS for port 1 (1536) has been set below the default (9728) Signed-off-by: Erwan Velu <e.velu@criteo.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02i40e: Do not use WQ_MEM_RECLAIM flag for workqueueSindhu Devale1-1/+1
[ Upstream commit 2cc7d150550cc981aceedf008f5459193282425c ] Issue reported by customer during SRIOV testing, call trace: When both i40e and the i40iw driver are loaded, a warning in check_flush_dependency is being triggered. This seems to be because of the i40e driver workqueue is allocated with the WQ_MEM_RECLAIM flag, and the i40iw one is not. Similar error was encountered on ice too and it was fixed by removing the flag. Do the same for i40e too. [Feb 9 09:08] ------------[ cut here ]------------ [ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966 check_flush_dependency+0x10b/0x120 [ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1 [ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 [ +0.000001] Workqueue: i40e i40e_service_task [i40e] [ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120 [ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90 [ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282 [ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX: 0000000000000027 [ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI: ffff94d47f620bc0 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff [ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12: ffff94c5451ea180 [ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15: ffff94c5f1330ab0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4: 00000000007706f0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] ? __warn+0x80/0x130 [ +0.000003] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? report_bug+0x195/0x1a0 [ +0.000005] ? handle_bug+0x3c/0x70 [ +0.000003] ? exc_invalid_op+0x14/0x70 [ +0.000002] ? asm_exc_invalid_op+0x16/0x20 [ +0.000006] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? check_flush_dependency+0x10b/0x120 [ +0.000002] __flush_workqueue+0x126/0x3f0 [ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core] [ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core] [ +0.000020] i40iw_close+0x4b/0x90 [irdma] [ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e] [ +0.000035] i40e_service_task+0x126/0x190 [i40e] [ +0.000024] process_one_work+0x174/0x340 [ +0.000003] worker_thread+0x27e/0x390 [ +0.000001] ? __pfx_worker_thread+0x10/0x10 [ +0.000002] kthread+0xdf/0x110 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork+0x2d/0x50 [ +0.000003] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork_asm+0x1b/0x30 [ +0.000004] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]--- Fixes: 4d5957cbdecd ("i40e: remove WQ_UNBOUND and the task limit of our workqueue") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Robert Ganzynkowicz <robert.ganzynkowicz@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()Dan Carpenter1-3/+5
[ Upstream commit 4dcd0e83ea1d1df9b2e0174a6d3e795b3477d64e ] The rx_chn->irq[] array is unsigned int but it should be signed for the error handling to work. Also if k3_udma_glue_rx_get_irq() returns zero then we should return -ENXIO instead of success. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/05282415-e7f4-42f3-99f8-32fde8f30936@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02net: phy: dp83869: Fix MII mode failureMD Danish Anwar1-1/+2
[ Upstream commit 6c9cd59dbcb09a2122b5ce0dfc07c74e6fc00dc0 ] The DP83869 driver sets the MII bit (needed for PHY to work in MII mode) only if the op-mode is either DP83869_100M_MEDIA_CONVERT or DP83869_RGMII_100_BASE. Some drivers i.e. ICSSG support MII mode with op-mode as DP83869_RGMII_COPPER_ETHERNET for which the MII bit is not set in dp83869 driver. As a result MII mode on ICSSG doesn't work and below log is seen. TI DP83869 300b2400.mdio:0f: selected op-mode is not valid with MII mode icssg-prueth icssg1-eth: couldn't connect to phy ethernet-phy@0 icssg-prueth icssg1-eth: can't phy connect port MII0 Fix this by setting MII bit for DP83869_RGMII_COPPER_ETHERNET op-mode as well. Fixes: 94e86ef1b801 ("net: phy: dp83869: support mii mode when rgmii strap cfg is used") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02netfilter: nf_tables: honor table dormant flag from netdev release event pathPablo Neira Ayuso1-1/+3
[ Upstream commit 8e30abc9ace4f0add4cd761dfdbfaebae5632dd2 ] Check for table dormant flag otherwise netdev release event path tries to unregister an already unregistered hook. [524854.857999] ------------[ cut here ]------------ [524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260 [...] [524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365 [524854.858869] Workqueue: netns cleanup_net [524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260 [524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41 [524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246 [524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a [524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438 [524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34 [524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005 [524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00 [524854.858971] FS: 0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000 [524854.858982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0 [524854.859000] Call Trace: [524854.859006] <TASK> [524854.859013] ? __warn+0x9f/0x1a0 [524854.859027] ? __nf_unregister_net_hook+0x21a/0x260 [524854.859044] ? report_bug+0x1b1/0x1e0 [524854.859060] ? handle_bug+0x3c/0x70 [524854.859071] ? exc_invalid_op+0x17/0x40 [524854.859083] ? asm_exc_invalid_op+0x1a/0x20 [524854.859100] ? __nf_unregister_net_hook+0x6a/0x260 [524854.859116] ? __nf_unregister_net_hook+0x21a/0x260 [524854.859135] nf_tables_netdev_event+0x337/0x390 [nf_tables] [524854.859304] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables] [524854.859461] ? packet_notifier+0xb3/0x360 [524854.859476] ? _raw_spin_unlock_irqrestore+0x11/0x40 [524854.859489] ? dcbnl_netdevice_event+0x35/0x140 [524854.859507] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables] [524854.859661] notifier_call_chain+0x7d/0x140 [524854.859677] unregister_netdevice_many_notify+0x5e1/0xae0 Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02ARM: dts: imx6ull-tarragon: fix USB over-current polarityMichael Heimpold1-0/+1
[ Upstream commit d7f3040a565214a30e2f07dc9b91566d316e2d36 ] Our Tarragon platform uses a active-low signal to inform the i.MX6ULL about the over-current detection. Fixes: 5e4f393ccbf0 ("ARM: dts: imx6ull: Add chargebyte Tarragon support") Signed-off-by: Michael Heimpold <michael.heimpold@chargebyte.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02eth: bnxt: fix counting packets discarded due to OOM and netpollJakub Kicinski1-26/+18
[ Upstream commit 730117730709992c9f6535dd7b47638ee561ec45 ] I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually *a* completion ring but not *the* completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash workIdo Schimmel1-1/+5
[ Upstream commit fb4e2b70a7194b209fc7320bbf33b375f7114bd5 ] The rehash delayed work is rescheduled with a delay if the number of credits at end of the work is not negative as supposedly it means that the migration ended. Otherwise, it is rescheduled immediately. After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash" the above is no longer accurate as a non-negative number of credits is no longer indicative of the migration being done. It can also happen if the work encountered an error in which case the migration will resume the next time the work is scheduled. The significance of the above is that it is possible for the work to be pending and associated with hints that were allocated when the migration started. This leads to the hints being leaked [1] when the work is canceled while pending as part of ACL region dismantle. Fix by freeing the hints if hints are associated with a work that was canceled while pending. Blame the original commit since the reliance on not having a pending work associated with hints is fragile. [1] unreferenced object 0xffff88810e7c3000 (size 256): comm "kworker/0:16", pid 176, jiffies 4295460353 hex dump (first 32 bytes): 00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80 .0......a....... 00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00 ..a.@........... backtrace (crc 2544ddb9): [<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0 [<000000004d9a1ad9>] objagg_hints_get+0x42/0x390 [<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400 [<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160 [<00000000e81fd734>] process_one_work+0x59c/0xf20 [<00000000ceee9e81>] worker_thread+0x799/0x12c0 [<00000000bda6fe39>] kthread+0x246/0x300 [<0000000070056d23>] ret_from_fork+0x34/0x70 [<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix incorrect list API usageIdo Schimmel1-0/+7
[ Upstream commit b377add0f0117409c418ddd6504bd682ebe0bf79 ] Both the function that migrates all the chunks within a region and the function that migrates all the entries within a chunk call list_first_entry() on the respective lists without checking that the lists are not empty. This is incorrect usage of the API, which leads to the following warning [1]. Fix by returning if the lists are empty as there is nothing to migrate in this case. [1] WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0> Modules linked in: CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted 6.9.0-rc3-custom-00883-g94a65f079ef6 #39 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0 [...] Call Trace: <TASK> mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix warning during rehashIdo Schimmel1-3/+17
[ Upstream commit 743edc8547a92b6192aa1f1b6bb78233fa21dc9b ] As previously explained, the rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. When the work runs out of credits it stores the current chunk and entry as markers in the per-work context so that it would know where to resume the migration from the next time the work is scheduled. Upon error, the chunk marker is reset to NULL, but without resetting the entry markers despite being relative to it. This can result in migration being resumed from an entry that does not belong to the chunk being migrated. In turn, this will eventually lead to a chunk being iterated over as if it is an entry. Because of how the two structures happen to be defined, this does not lead to KASAN splats, but to warnings such as [1]. Fix by creating a helper that resets all the markers and call it from all the places the currently only reset the chunk marker. For good measures also call it when starting a completely new rehash. Add a warning to avoid future cases. [1] WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0 Modules linked in: CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b #29 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_afk_encode+0x242/0x2f0 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 </TASK> Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix memory leak during rehashIdo Schimmel1-0/+4
[ Upstream commit 8ca3f7a7b61393804c46f170743c3b839df13977 ] The rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. If the migration fails, the code tries to migrate the filters back to the old region. However, the rollback itself can also fail in which case another migration will be erroneously performed. Besides the fact that this ping pong is not a very good idea, it also creates a problem. Each virtual chunk references two chunks: The currently used one ('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the first holds the chunk we want to migrate filters to and the second holds the chunk we are migrating filters from. The code currently assumes - but does not verify - that the backup chunk does not exist (NULL) if the currently used chunk does not reference the target region. This assumption breaks when we are trying to rollback a rollback, resulting in the backup chunk being overwritten and leaked [1]. Fix by not rolling back a failed rollback and add a warning to avoid future cases. [1] WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20 Modules linked in: CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G W 6.9.0-rc2-custom-00784-gc6a05c468a0b #14 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:parman_destroy+0x17/0x20 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_region_fini+0x19/0x60 mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0 mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 843500518509 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Rate limit error messageIdo Schimmel1-1/+1
[ Upstream commit 5bcf925587e9b5d36420d572a0b4d131c90fb306 ] In the rare cases when the device resources are exhausted it is likely that the rehash delayed work will fail. An error message will be printed whenever this happens which can be overwhelming considering the fact that the work is per-region and that there can be hundreds of regions. Fix by rate limiting the error message. Fixes: e5e7962ee5c2 ("mlxsw: spectrum_acl: Implement region migration according to hints") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/c510763b2ebd25e7990d80183feff91cde593145.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehashIdo Schimmel1-0/+1
[ Upstream commit 54225988889931467a9b55fdbef534079b665519 ] The rehash delayed work migrates filters from one region to another according to the number of available credits. The migrated from region is destroyed at the end of the work if the number of credits is non-negative as the assumption is that this is indicative of migration being complete. This assumption is incorrect as a non-negative number of credits can also be the result of a failed migration. The destruction of a region that still has filters referencing it can result in a use-after-free [1]. Fix by not destroying the region if migration failed. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230 Read of size 8 at addr ffff8881735319e8 by task kworker/0:31/3858 CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G W 6.9.0-rc2-custom-00782-gf2275c2157d8 #5 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230 mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70 mlxsw_sp_acl_atcam_entry_del+0x81/0x210 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 174: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0 mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 7: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_region_destroy+0x272/0x310 mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/3e412b5659ec2310c5c615760dfe5eac18dd7ebd.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity updateIdo Schimmel1-2/+8
[ Upstream commit 79b5b4b18bc85b19d3a518483f9abbbe6d7b3ba4 ] The rule activity update delayed work periodically traverses the list of configured rules and queries their activity from the device. As part of this task it accesses the entry pointed by 'ventry->entry', but this entry can be changed concurrently by the rehash delayed work, leading to a use-after-free [1]. Fix by closing the race and perform the activity query under the 'vregion->lock' mutex. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181 CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work Call Trace: <TASK> dump_stack_lvl+0xc6/0x120 print_report+0xce/0x670 kasan_report+0xd7/0x110 mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140 mlxsw_sp_acl_rule_activity_update_work+0x219/0x400 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc+0x19c/0x360 mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Freed by task 1039: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 poison_slab_object+0x102/0x170 __kasan_slab_free+0x14/0x30 kfree+0xc1/0x290 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300 process_one_work+0x8eb/0x19b0 worker_thread+0x6c9/0xf70 kthread+0x2c9/0x3b0 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix race during rehash delayed workIdo Schimmel1-2/+2
[ Upstream commit d90cfe20562407d9f080d24123078d666d730707 ] The purpose of the rehash delayed work is to reduce the number of masks (eRPs) used by an ACL region as the eRP bank is a global and limited resource. This is done in three steps: 1. Creating a new set of masks and a new ACL region which will use the new masks and to which the existing filters will be migrated to. The new region is assigned to 'vregion->region' and the region from which the filters are migrated from is assigned to 'vregion->region2'. 2. Migrating all the filters from the old region to the new region. 3. Destroying the old region and setting 'vregion->region2' to NULL. Only the second steps is performed under the 'vregion->lock' mutex although its comments says that among other things it "Protects consistency of region, region2 pointers". This is problematic as the first step can race with filter insertion from user space that uses 'vregion->region', but under the mutex. Fix by holding the mutex across the entirety of the delayed work and not only during the second step. Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1ec1d54edf2bad0a369e6b4fa030aba64e1f124b.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: spectrum_acl_tcam: Fix race in region ID allocationIdo Schimmel2-36/+30
[ Upstream commit 627f9c1bb882765a84aa78015abbacd783d429be ] Region identifiers can be allocated both when user space tries to insert a new tc filter and when filters are migrated from one region to another as part of the rehash delayed work. There is no lock protecting the bitmap from which these identifiers are allocated from, which is racy and leads to bad parameter errors from the device's firmware. Fix by converting the bitmap to IDA which handles its own locking. For consistency, do the same for the group identifiers that are part of the same structure. Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()") Reported-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/ce494b7940cadfe84f3e18da7785b51ef5f776e3.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02mlxsw: Use refcount_t for reference countingAmit Cohen6-36/+40
[ Upstream commit 1267f7223bec186dc26ef4e6075496c6217355de ] mlxsw driver uses 'unsigned int' for reference counters in several structures. Instead, use refcount_t type which allows us to catch overflow and underflow issues. Change the type of the counters and use the appropriate API. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 627f9c1bb882 ("mlxsw: spectrum_acl_tcam: Fix race in region ID allocation") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02net: openvswitch: Fix Use-After-Free in ovs_ct_exitHyunwoo Kim1-2/+2
[ Upstream commit 5ea7b72d4fac2fdbc0425cd8f2ea33abe95235b2 ] Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal of ovs_ct_limit_exit, is not part of the RCU read critical section, it is possible that the RCU grace period will pass during the traversal and the key will be free. To prevent this, it should be changed to hlist_for_each_entry_safe. Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/ZiYvzQN/Ry5oeFQW@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02ipvs: Fix checksumming on GSO of SCTP packetsIsmael Luceno1-2/+4
[ Upstream commit e10d3ba4d434ed172914617ed8d74bd411421193 ] It was observed in the wild that pairs of consecutive packets would leave the IPVS with the same wrong checksum, and the issue only went away when disabling GSO. IPVS needs to avoid computing the SCTP checksum when using GSO. Fixes: 90017accff61 ("sctp: Add GSO support") Co-developed-by: Firo Yang <firo.yang@suse.com> Signed-off-by: Ismael Luceno <iluceno@suse.de> Tested-by: Andreas Taschner <andreas.taschner@suse.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02Bluetooth: qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()Bartosz Golaszewski1-5/+13
[ Upstream commit 3d05fc82237aa97162d0d7dc300b55bb34e91d02 ] Any return value from gpiod_get_optional() other than a pointer to a GPIO descriptor or a NULL-pointer is an error and the driver should abort probing. That being said: commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL() with gpiod_get_optional()") no longer sets power_ctrl_enabled on NULL-pointer returned by devm_gpiod_get_optional(). Restore this behavior but bail-out on errors. While at it: also bail-out on error returned when trying to get the "swctrl" GPIO. Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Zijun Hu <quic_zijuhu@quicinc.com> Closes: https://lore.kernel.org/linux-bluetooth/1713449192-25926-2-git-send-email-quic_zijuhu@quicinc.com/ Fixes: 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL() with gpiod_get_optional()") Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Tested-by: Wren Turkal <wt@penguintechs.org> Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Krzysztof Kozlowski<krzysztof.kozlowski@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv MonitorChun-Yi Lee1-2/+2
[ Upstream commit 88cd6e6b2d327faa13e4505b07f1e380e51b21ff ] Since the d883a4669a1de be introduced in v6.4, bluetooth daemon got the following failed message of MGMT_OP_REMOVE_ADV_MONITOR command when controller is power-off: bluetoothd[20976]: src/adapter.c:reset_adv_monitors_complete() Failed to reset Adv Monitors: Failed> Normally this situation is happened when the bluetoothd deamon be started manually after system booting. Which means that bluetoothd received MGMT_EV_INDEX_ADDED event after kernel runs hci_power_off(). Base on doc/mgmt-api.txt, the MGMT_OP_REMOVE_ADV_MONITOR command can be used when the controller is not powered. This patch changes the code in remove_adv_monitor() to use hci_cmd_sync_submit() instead of hci_cmd_sync_queue(). Fixes: d883a4669a1de ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running") Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Manish Mandlik <mmandlik@google.com> Cc: Archie Pusaka <apusaka@chromium.org> Cc: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Chun-Yi Lee <jlee@suse.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02Bluetooth: btusb: mediatek: Fix double free of skb in coredumpSean Wang1-4/+3
[ Upstream commit 18bdb386a1a30e7a3d7732a98e45e69cf6b5710d ] hci_devcd_append() would free the skb on error so the caller don't have to free it again otherwise it would cause the double free of skb. Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Reported-by : Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>