summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-03-02compiler.h: Explain how __is_constexpr() worksKees Cook1-0/+39
The __is_constexpr() macro is dark magic. Shed some light on it with a comment to explain how and why it works. Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240301044428.work.411-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-02overflow: Allow non-type arg to type_max() and type_min()Kees Cook1-5/+7
A common use of type_max() is to find the max for the type of a variable. Using the pattern type_max(typeof(var)) is needlessly verbose. Instead, since typeof(type) == type we can just explicitly call typeof() on the argument to type_max() and type_min(). Add wrappers for readability. We can do some replacements right away: $ git grep '\btype_\(min\|max\)(typeof' | wc -l 11 Link: https://lore.kernel.org/r/20240301062221.work.840-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01lib/string_helpers: Add flags param to string_get_size()Andy Shevchenko1-3/+7
The new flags parameter allows controlling - Whether or not the units suffix is separated by a space, for compatibility with sort -h - Whether or not to append a B suffix - we're not always printing bytes. Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240229205345.93902-1-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01overflow: Use POD in check_shl_overflow()Andy Shevchenko1-1/+1
The check_shl_overflow() uses u64 type that is defined in types.h. Instead of including that header, just switch to use POD type directly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240228204919.3680786-2-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01kernel.h: Move lib/cmdline.c prototypes to string.hAndy Shevchenko2-6/+8
The lib/cmdline.c is basically a set of some small string parsers which are wide used in the kernel. Their prototypes belong to the string.h rather then kernel.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231003130142.2936503-1-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01fortify: Improve buffer overflow reportingKees Cook1-26/+30
Improve the reporting of buffer overflows under CONFIG_FORTIFY_SOURCE to help accelerate debugging efforts. The calculations are all just sitting in registers anyway, so pass them along to the function to be reported. For example, before: detected buffer overflow in memcpy and after: memcpy: detected buffer overflow: 4096 byte read of buffer size 1 Link: https://lore.kernel.org/r/20230407192717.636137-10-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01fortify: Provide KUnit counters for failure testingKees Cook1-20/+23
The standard C string APIs were not designed to have a failure mode; they were expected to always succeed without memory safety issues. Normally, CONFIG_FORTIFY_SOURCE will use fortify_panic() to stop processing, as truncating a read or write may provide an even worse system state. However, this creates a problem for testing under things like KUnit, which needs a way to survive failures. When building with CONFIG_KUNIT, provide a failure path for all users of fortify_panic, and track whether the failure was a read overflow or a write overflow, for KUnit tests to examine. Inspired by similar logic in the slab tests. Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01fortify: Split reporting and avoid passing string pointerKees Cook1-21/+60
In preparation for KUnit testing and further improvements in fortify failure reporting, split out the report and encode the function and access failure (read or write overflow) into a single u8 argument. This mainly ends up saving a tiny bit of space in the data segment. For a defconfig with FORTIFY_SOURCE enabled: $ size gcc/vmlinux.before gcc/vmlinux.after text data bss dec hex filename 26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before 26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01refcount: Annotated intentional signed integer wrap-aroundKees Cook1-3/+6
Mark the various refcount_t functions with __signed_wrap, as we depend on the wrapping behavior to detect the overflow and perform saturation. Silences warnings seen with the LKDTM REFCOUNT_* tests: UBSAN: signed-integer-overflow in ../include/linux/refcount.h:189:11 2147483647 + 1 cannot be represented in type 'int' Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20240221051634.work.287-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01lib/string_choices: Add str_plural() helperMichal Wajdeczko1-0/+11
Add str_plural() helper to replace existing open implementations used by many drivers and help improve future user facing messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20240214165015.1656-1-michal.wajdeczko@intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01overflow: Introduce wrapping_assign_add() and wrapping_assign_sub()Kees Cook1-0/+32
This allows replacements of the idioms "var += offset" and "var -= offset" with the wrapping_assign_add() and wrapping_assign_sub() helpers respectively. They will avoid wrap-around sanitizer instrumentation. Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01overflow: Introduce wrapping_add(), wrapping_sub(), and wrapping_mul()Kees Cook1-0/+48
Provide helpers that will perform wrapping addition, subtraction, or multiplication without tripping the arithmetic wrap-around sanitizers. The first argument is the type under which the wrap-around should happen with. In other words, these two calls will get very different results: wrapping_mul(int, 50, 50) == 2500 wrapping_mul(u8, 50, 50) == 196 Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01overflow: Adjust check_*_overflow() kern-doc to reflect resultsKees Cook1-12/+9
The check_*_overflow() helpers will return results with potentially wrapped-around values. These values have always been checked by the selftests, so avoid the confusing language in the kern-doc. The idea of "safe for use" was relative to the expectation of whether or not the caller wants a wrapped value -- the calculation itself will always follow arithmetic wrapping rules. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01kernel.h: Move upper_*_bits() and lower_*_bits() to wordpart.hAndy Shevchenko2-28/+31
The wordpart.h header is collecting APIs related to the handling parts of the word (usually in byte granularity). The upper_*_bits() and lower_*_bits() are good candidates to be moved to there. This helps to clean up header dependency hell with regard to kernel.h as the latter gathers completely unrelated stuff together and slows down compilation (especially when it's included into other header). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240214172752.3605073-1-andriy.shevchenko@linux.intel.com Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-21string: Allow 2-argument strscpy_pad()Kees Cook1-13/+20
Similar to strscpy(), update strscpy_pad()'s 3rd argument to be optional when the destination is a compile-time known size array. Cc: Andy Shevchenko <andy@kernel.org> Cc: <linux-hardening@vger.kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-21string: Allow 2-argument strscpy()Kees Cook2-23/+37
Using sizeof(dst) for the "size" argument in strscpy() is the overwhelmingly common case. Instead of requiring this everywhere, allow a 2-argument version to be used that will use the sizeof() internally. There are other functions in the kernel with optional arguments[1], so this isn't unprecedented, and improves readability. Update and relocate the kern-doc for strscpy() too, and drop __HAVE_ARCH_STRSCPY as it is unused. Adjust ARCH=um build to notice the changed export name, as it doesn't do full header includes for the string helpers. This could additionally let us save a few hundred lines of code: 1177 files changed, 2455 insertions(+), 3026 deletions(-) with a treewide cleanup using Coccinelle: @needless_arg@ expression DST, SRC; @@ strscpy(DST, SRC -, sizeof(DST) ) Link: https://elixir.bootlin.com/linux/v6.7/source/include/linux/pci.h#L1517 [1] Reviewed-by: Justin Stitt <justinstitt@google.com> Cc: Andy Shevchenko <andy@kernel.org> Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-21string: Redefine strscpy_pad() as a macroKees Cook1-2/+31
In preparation for making strscpy_pad()'s 3rd argument optional, redefine it as a macro. This also has the benefit of allowing greater FORITFY introspection, as it couldn't see into the strscpy() nor the memset() within strscpy_pad(). Cc: Andy Shevchenko <andy@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-hardening@vger.kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-21ubsan: Reintroduce signed overflow sanitizerKees Cook1-1/+8
In order to mitigate unexpected signed wrap-around[1], bring back the signed integer overflow sanitizer. It was removed in commit 6aaa31aeb9cf ("ubsan: remove overflow checks") because it was effectively a no-op when combined with -fno-strict-overflow (which correctly changes signed overflow from being "undefined" to being explicitly "wrap around"). Compilers are adjusting their sanitizers to trap wrap-around and to detecting common code patterns that should not be instrumented (e.g. "var + offset < var"). Prepare for this and explicitly rename the option from "OVERFLOW" to "WRAP" to more accurately describe the behavior. To annotate intentional wrap-around arithmetic, the helpers wrapping_add/sub/mul_wrap() can be used for individual statements. At the function level, the __signed_wrap attribute can be used to mark an entire function as expecting its signed arithmetic to wrap around. For a single object file the Makefile can use "UBSAN_SIGNED_WRAP_target.o := n" to mark it as wrapping, and for an entire directory, "UBSAN_SIGNED_WRAP := n" can be used. Additionally keep these disabled under CONFIG_COMPILE_TEST for now. Link: https://github.com/KSPP/linux/issues/26 [1] Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hao Luo <haoluo@google.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-01kernel.h: removed REPEAT_BYTE from kernel.hTanzir Hasan3-9/+15
This patch creates wordpart.h and includes it in asm/word-at-a-time.h for all architectures. WORD_AT_A_TIME_CONSTANTS depends on kernel.h because of REPEAT_BYTE. Moving this to another header and including it where necessary allows us to not include the bloated kernel.h. Making this implicit dependency on REPEAT_BYTE explicit allows for later improvements in the lib/string.c inclusion list. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Tanzir Hasan <tanzirh@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20231226-libstringheader-v6-1-80aa08c7652c@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-01-28Merge tag 'x86_urgent_for_v6.8_rc2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure 32-bit syscall registers are properly sign-extended - Add detection for AMD's Zen5 generation CPUs and Intel's Clearwater Forest CPU model number - Make a stub function export non-GPL because it is part of the paravirt alternatives and that can be used by non-GPL code * tag 'x86_urgent_for_v6.8_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Add more models to X86_FEATURE_ZEN5 x86/entry/ia32: Ensure s32 is sign extended to s64 x86/cpu: Add model number for Intel Clearwater Forest processor x86/CPU/AMD: Add X86_FEATURE_ZEN5 x86/paravirt: Make BUG_func() usable by non-GPL modules
2024-01-27Merge tag 'ata-6.8-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata updates from Niklas Cassel: - Fix an incorrect link_power_management_policy sysfs attribute value. We were previously using the same attribute value for two different LPM policies (me) - Add a ASMedia ASM1166 quirk. The SATA host controller always reports that it has 32 ports, even though it only has six ports. Add a quirk that overrides the value reported by the controller (Conrad) - Add a ASMedia ASM1061 quirk. The SATA host controller completely ignores the upper 21 bits of the DMA address. This causes IOMMU error events when a (valid) DMA address actually has any of the upper 21 bits set. Add a quirk that limits the dma_mask to 43-bits (Lennert) * tag 'ata-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers ahci: asm1166: correct count of reported ports ata: libata-sata: improve sysfs description for ATA_LPM_UNKNOWN
2024-01-27Merge tag 'drm-fixes-2024-01-27' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-5/+20
Pull drm fixes from Dave Airlie: "Lots going on for rc2, ivpu has a bunch of stabilisation and debugging work, then amdgpu and xe are the main fixes. i915, exynos have a few, then some misc panel and bridge fixes. Worth mentioning are three regressions. One of the nouveau fixes in 6.7 for a serious deadlock had side effects, so I guess we will bring back the deadlock until I can figure out what should be done properly. There was a scheduler regression vs amdgpu which was reported in a few places and is now fixed. There was an i915 vs simpledrm problem resulting in black screens, that is reverted also. I'll be working on a proper nouveau fix, it kinda looks like one of those cases where someone tried to use an atomic where they should have probably used a lock, but I'll see. fb: - fix simpledrm/i915 regression by reverting change scheduler: - fix regression affecting amdgpu users due to sched draining nouveau: - revert 6.7 deadlock fix as it has side effects dp: - fix documentation warning ttm: - fix dummy page read on some platforms bridge: - anx7625 suspend fix - sii902x: fix probing and audio registration - parade-ps8640: fix suspend of bridge, aux fixes - samsung-dsim: avoid using FORCE_STOP_STATE panel: - simple add missing bus flags - fix samsung-s6d7aa0 flags amdgpu: - AC/DC power supply tracking fix - Don't show invalid vram vendor data - SMU 13.0.x fixes - GART fix for umr on systems without VRAM - GFX 10/11 UNORD_DISPATCH fixes - IPS display fixes (required for S0ix on some platforms) - Misc fixes i915: - DSI sequence revert to fix GitLab #10071 and DP test-pattern fix - Drop -Wstringop-overflow (broken on GCC11) ivpu: - fix recovery/reset support - improve submit ioctl stability - fix dev open/close races on unbind - PLL disable reset fix - deprecate context priority param - improve debug buffer logging - disable buffer sharing across VPU contexts - free buffer sgt on unbind - fix missing lock around shmem vmap - add better boot diagnostics - add more debug prints around mapping - dump MMU events in case of timeout v3d: - NULL ptr dereference fix exynos: - fix stack usage - fix incorrect type - fix dt typo - fix gsc runtime resume xe: - Make an ops struct static - Fix an implicit 0 to NULL conversion - A couple of 32-bit fixes - A migration coherency fix for Lunar Lake. - An error path vm id leak fix - Remove PVC references in kunit tests" * tag 'drm-fixes-2024-01-27' of git://anongit.freedesktop.org/drm/drm: (66 commits) Revert "nouveau: push event block/allowing out of the fence context" drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE drm/sched: Drain all entities in DRM sched run job worker drm/amd/display: "Enable IPS by default" drm/amd: Add a DC debug mask for IPS drm/amd/display: Disable ips before dc interrupt setting drm/amd/display: Replay + IPS + ABM in Full Screen VPB drm/amd/display: Add IPS checks before dcn register access drm/amd/display: Add Replay IPS register for DMUB command table drm/amd/display: Allow IPS2 during Replay drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs drm/amd/amdgpu: Assign GART pages to AMD device mapping drm/amd/pm: Fetch current power limit from FW drm/amdgpu: Fix null pointer dereference drm/amdgpu: Show vram vendor only if available drm/amd/pm: update the power cap setting drm/amdgpu: Avoid fetching vram vendor information drm/amdgpu/pm: Fix the power source flag error drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions ...
2024-01-26Merge tag 'spi-fix-v6.8-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "As well as a few device IDs and the usual scattering of driver specific fixes this contains a couple of core things. One is a missed case in error handling, the other patch is a change from me raising the number of chip selects allowed by the newly added multi chip select support patches to resolve problems seen on several systems that exceeded the limit. This is not a real solution to the issue but rather just a change to avoid disruption to users, one of the options I am considering is just sending a revert of those changes if we can't come up with something sensible" * tag 'spi-fix-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: fix finalize message on error return spi: cs42l43: Handle error from devm_pm_runtime_enable spi: Raise limit on number of chip selects spi: hisi-sfc-v3xx: Return IRQ_NONE if no interrupts were detected spi: spi-cadence: Reverse the order of interleaved write and read operations spi: spi-imx: Use dev_err_probe for failed DMA channel requests spi: bcm-qspi: fix SFDP BFPT read by usig mspi read spi: intel-pci: Add support for Arrow Lake SPI serial flash spi: intel-pci: Remove Meteor Lake-S SoC PCI ID from the list
2024-01-25Merge tag 'net-6.8-rc2' of ↵Linus Torvalds12-30/+108
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, netfilter and WiFi. Jakub is doing a lot of work to include the self-tests in our CI, as a result a significant amount of self-tests related fixes is flowing in (and will likely continue in the next few weeks). Current release - regressions: - bpf: fix a kernel crash for the riscv 64 JIT - bnxt_en: fix memory leak in bnxt_hwrm_get_rings() - revert "net: macsec: use skb_ensure_writable_head_tail to expand the skb" Previous releases - regressions: - core: fix removing a namespace with conflicting altnames - tc/flower: fix chain template offload memory leak - tcp: - make sure init the accept_queue's spinlocks once - fix autocork on CPUs with weak memory model - udp: fix busy polling - mlx5e: - fix out-of-bound read in port timestamping - fix peer flow lists corruption - iwlwifi: fix a memory corruption Previous releases - always broken: - netfilter: - nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain - nft_limit: reject configurations that cause integer overflow - bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a NULL pointer dereference upon shrinking - llc: make llc_ui_sendmsg() more robust against bonding changes - smc: fix illegal rmb_desc access in SMC-D connection dump - dpll: fix pin dump crash for rebound module - bnxt_en: fix possible crash after creating sw mqprio TCs - hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB Misc: - several self-tests fixes for better integration with the netdev CI - added several missing modules descriptions" * tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits) tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring tsnep: Remove FCS for XDP data path net: fec: fix the unhandled context fault from smmu selftests: bonding: do not test arp/ns target with mode balance-alb/tlb fjes: fix memleaks in fjes_hw_setup i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue i40e: set xdp_rxq_info::frag_size xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers ice: remove redundant xdp_rxq_info registration i40e: handle multi-buffer packets that are shrunk by xdp prog ice: work on pre-XDP prog frag count xsk: fix usage of multi-buffer BPF helpers for ZC XDP xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags xsk: recycle buffer in case Rx queue was full net: fill in MODULE_DESCRIPTION()s for rvu_mbox net: fill in MODULE_DESCRIPTION()s for litex net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio net: fill in MODULE_DESCRIPTION()s for fec ...
2024-01-25Merge tag 'vfs-6.8-rc2.netfs' of ↵Linus Torvalds1-0/+25
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs fixes from Christian Brauner: "This contains various fixes for the netfs work merged earlier this cycle: afs: - Fix locking imbalance in afs_proc_addr_prefs_show() - Remove afs_dynroot_d_revalidate() which is redundant - Fix error handling during lookup - Hide sillyrenames from userspace. This fixes a race between silly-rename files being created/removed and userspace iterating over directory entries - Don't use unnecessary folio_*() functions cifs: - Don't use unnecessary folio_*() functions cachefiles: - erofs: Fix Null dereference when cachefiles are not doing ondemand-mode - Update mailing list netfs library: - Add Jeff Layton as reviewer - Update mailing list - Fix a error checking in netfs_perform_write() - fscache: Check error before dereferencing - Don't use unnecessary folio_*() functions" * tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix missing/incorrect unlocking of RCU read lock afs: Remove afs_dynroot_d_revalidate() as it is redundant afs: Fix error handling with lookup via FS.InlineBulkStatus afs: Hide silly-rename files from userspace cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write() netfs, fscache: Prevent Oops in fscache_put_cache() cifs: Don't use certain unnecessary folio_*() functions afs: Don't use certain unnecessary folio_*() functions netfs: Don't use certain unnecessary folio_*() functions netfs: Add Jeff Layton as reviewer netfs, cachefiles: Change mailing list
2024-01-25Merge tag 'mlx5-fixes-2024-01-24' of ↵Paolo Abeni4-4/+11
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-01-24 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. * tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: fix a potential double-free in fs_any_create_groups net/mlx5e: fix a double-free in arfs_create_groups net/mlx5e: Ignore IPsec replay window values on sender side net/mlx5e: Allow software parsing when IPsec crypto is enabled net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO net/mlx5: DR, Can't go to uplink vport on RX rule net/mlx5: DR, Use the right GVMI number for drop action net/mlx5: Bridge, fix multicast packets sent to uplink net/mlx5: Fix a WARN upon a callback command failure net/mlx5e: Fix peer flow lists handling net/mlx5e: Fix inconsistent hairpin RQT sizes net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context net/mlx5: Fix query of sd_group field net/mlx5e: Use the correct lag ports number when creating TISes ==================== Link: https://lore.kernel.org/r/20240124081855.115410-1-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25Merge tag 'for-netdev' of ↵Paolo Abeni1-0/+27
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-01-25 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 2 day(s) which contain a total of 13 files changed, 190 insertions(+), 91 deletions(-). The main changes are: 1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which support XDP multi-buffer. The former triggered a NULL pointer dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar. 2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and epilogue for struct_ops programs, from Pu Lehui. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue i40e: set xdp_rxq_info::frag_size xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers ice: remove redundant xdp_rxq_info registration i40e: handle multi-buffer packets that are shrunk by xdp prog ice: work on pre-XDP prog frag count xsk: fix usage of multi-buffer BPF helpers for ZC XDP xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags xsk: recycle buffer in case Rx queue was full riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops ==================== Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25xsk: fix usage of multi-buffer BPF helpers for ZC XDPMaciej Fijalkowski1-0/+26
Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens: [1136314.192256] BUG: kernel NULL pointer dereference, address: 0000000000000034 [1136314.203943] #PF: supervisor read access in kernel mode [1136314.213768] #PF: error_code(0x0000) - not-present page [1136314.223550] PGD 0 P4D 0 [1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI [1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257 [1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210 [1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86 [1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246 [1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX: 0000000000000000 [1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffc9003168c000 [1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09: 0000000000010000 [1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12: 0000000000000001 [1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15: 0000000000000001 [1136314.373298] FS: 00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000) knlGS:0000000000000000 [1136314.386105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4: 00000000007706f0 [1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1136314.431890] PKRU: 55555554 [1136314.439143] Call Trace: [1136314.446058] <IRQ> [1136314.452465] ? __die+0x20/0x70 [1136314.459881] ? page_fault_oops+0x15b/0x440 [1136314.468305] ? exc_page_fault+0x6a/0x150 [1136314.476491] ? asm_exc_page_fault+0x22/0x30 [1136314.484927] ? __xdp_return+0x6c/0x210 [1136314.492863] bpf_xdp_adjust_tail+0x155/0x1d0 [1136314.501269] bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60 [1136314.511263] ice_clean_rx_irq_zc+0x206/0xc60 [ice] [1136314.520222] ? ice_xmit_zc+0x6e/0x150 [ice] [1136314.528506] ice_napi_poll+0x467/0x670 [ice] [1136314.536858] ? ttwu_do_activate.constprop.0+0x8f/0x1a0 [1136314.546010] __napi_poll+0x29/0x1b0 [1136314.553462] net_rx_action+0x133/0x270 [1136314.561619] __do_softirq+0xbe/0x28e [1136314.569303] do_softirq+0x3f/0x60 This comes from __xdp_return() call with xdp_buff argument passed as NULL which is supposed to be consumed by xsk_buff_free() call. To address this properly, in ZC case, a node that represents the frag being removed has to be pulled out of xskb_list. Introduce appropriate xsk helpers to do such node operation and use them accordingly within bpf_xdp_adjust_tail(). Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> # For the xsk header part Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-4-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-25xsk: make xsk_buff_pool responsible for clearing xdp_buff::flagsMaciej Fijalkowski1-0/+1
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is used by drivers to notify data path whether xdp_buff contains fragments or not. Data path looks up mentioned flag on first buffer that occupies the linear part of xdp_buff, so drivers only modify it there. This is sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on stack or it resides within struct representing driver's queue and fragments are carried via skb_frag_t structs. IOW, we are dealing with only one xdp_buff. ZC mode though relies on list of xdp_buff structs that is carried via xsk_buff_pool::xskb_list, so ZC data path has to make sure that fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise, xsk_buff_free() could misbehave if it would be executed against xdp_buff that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can take place when within supplied XDP program bpf_xdp_adjust_tail() is used with negative offset that would in turn release the tail fragment from multi-buffer frame. Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would result in releasing all the nodes from xskb_list that were produced by driver before XDP program execution, which is not what is intended - only tail fragment should be deleted from xskb_list and then it should be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never make it up to user space, so from AF_XDP application POV there would be no traffic running, however due to free_list getting constantly new nodes, driver will be able to feed HW Rx queue with recycled buffers. Bottom line is that instead of traffic being redirected to user space, it would be continuously dropped. To fix this, let us clear the mentioned flag on xsk_buff_pool side during xdp_buff initialization, which is what should have been done right from the start of XSK multi-buffer support. Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-25Merge tag 'execve-v6.8-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fixes from Kees Cook: - Fix error handling in begin_new_exec() (Bernd Edlinger) - MAINTAINERS: specifically mention ELF (Alexey Dobriyan) - Various cleanups related to earlier open() (Askar Safin, Kees Cook) * tag 'execve-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Distinguish in_execve from in_exec exec: Fix error handling in begin_new_exec() exec: Add do_close_execat() helper exec: remove useless comment ELF, MAINTAINERS: specifically mention ELF
2024-01-24exec: Distinguish in_execve from in_execKees Cook1-1/+1
Just to help distinguish the fs->in_exec flag from the current->in_execve flag, add comments in check_unsafe_exec() and copy_fs() for more context. Also note that in_execve is only used by TOMOYO now. Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-01-24netfilter: nf_tables: cleanup documentationGeorge Guo1-10/+39
- Correct comments for nlpid, family, udlen and udata in struct nft_table, and afinfo is no longer a member of enum nft_set_class. - Add comment for data in struct nft_set_elem. - Add comment for flags in struct nft_ctx. - Add comments for timeout in struct nft_set_iter, and flags is not a member of struct nft_set_iter, remove the comment for it. - Add comments for commit, abort, estimate and gc_init in struct nft_set_ops. - Add comments for pending_update, num_exprs, exprs and catchall_list in struct nft_set. - Add comment for ext_len in struct nft_set_ext_tmpl. - Add comment for inner_ops in struct nft_expr_type. - Add comments for clone, destroy_clone, reduce, gc, offload, offload_action, offload_stats in struct nft_expr_ops. - Add comments for blob_gen_0, blob_gen_1, bound, genmask, udlen, udata, blob_next in struct nft_chain. - Add comment for flags in struct nft_base_chain. - Add comments for udlen, udata in struct nft_object. - Add comment for type in struct nft_object_ops. - Add comment for hook_list in struct nft_flowtable, and remove comments for dev_name and ops which are not members of struct nft_flowtable. Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-24spi: Raise limit on number of chip selectsMark Brown1-1/+1
As reported by Guenter the limit we've got on the number of chip selects is set too low for some systems, raise the limit. We should really remove the hard coded limit but this is needed as a fix so let's do the simple thing and raise the limit for now. Fixes: 4d8ff6b0991d ("spi: Add multi-cs memories support in SPI core") Reported-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/r/20240124-spi-multi-cs-max-v2-1-df6fc5ab1abc@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-01-24x86/entry/ia32: Ensure s32 is sign extended to s64Richard Palethorpe1-0/+1
Presently ia32 registers stored in ptregs are unconditionally cast to unsigned int by the ia32 stub. They are then cast to long when passed to __se_sys*, but will not be sign extended. This takes the sign of the syscall argument into account in the ia32 stub. It still casts to unsigned int to avoid implementation specific behavior. However then casts to int or unsigned int as necessary. So that the following cast to long sign extends the value. This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the maximum number of events. It doesn't appear other systemcalls with signed arguments are effected because they all have compat variants defined and wired up. Fixes: ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/
2024-01-24net/mlx5: Bridge, fix multicast packets sent to uplinkMoshe Shemesh2-1/+2
To enable multicast packets which are offloaded in bridge multicast offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should be set. Add this bit to FTE for the bridge multicast offload rules. Fixes: 18c2916cee12 ("net/mlx5: Bridge, snoop igmp/mld packets") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5: Fix query of sd_group fieldTariq Toukan2-3/+8
The sd_group field moved in the HW spec from the MPIR register to the vport context. Align the query accordingly. Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5e: Use the correct lag ports number when creating TISesSaeed Mahameed1-0/+1
The cited commit moved the code of mlx5e_create_tises() and changed the loop to create TISes over MLX5_MAX_PORTS constant value, instead of getting the correct lag ports supported by the device, which can cause FW errors on devices with less than MLX5_MAX_PORTS ports. Change that back to mlx5e_get_num_lag_ports(mdev). Also IPoIB interfaces create there own TISes, they don't use the eth TISes, pass a flag to indicate that. This fixes the following errors that might appear in kernel log: mlx5_cmd_out_err:808:(pid 650): CREATE_TIS(0x912) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x595b5d), err(-22) mlx5e_create_mdev_resources:174:(pid 650): alloc tises failed, -22 Fixes: b25bd37c859f ("net/mlx5: Move TISes from priv to mdev HW resources") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/sched: flower: Fix chain template offloadIdo Schimmel1-0/+4
When a qdisc is deleted from a net device the stack instructs the underlying driver to remove its flow offload callback from the associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack then continues to replay the removal of the filters in the block for this driver by iterating over the chains in the block and invoking the 'reoffload' operation of the classifier being used. In turn, the classifier in its 'reoffload' operation prepares and emits a 'FLOW_CLS_DESTROY' command for each filter. However, the stack does not do the same for chain templates and the underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when a qdisc is deleted. This results in a memory leak [1] which can be reproduced using [2]. Fix by introducing a 'tmplt_reoffload' operation and have the stack invoke it with the appropriate arguments as part of the replay. Implement the operation in the sole classifier that supports chain templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}' command based on whether a flow offload callback is being bound to a filter block or being unbound from one. As far as I can tell, the issue happens since cited commit which reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains() in __tcf_block_put(). The order cannot be reversed as the filter block is expected to be freed after flushing all the chains. [1] unreferenced object 0xffff888107e28800 (size 2048): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[...... 01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................ backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab374e>] __kmalloc+0x4e/0x90 [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0 unreferenced object 0xffff88816d2c0400 (size 1024): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): 40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8..... 10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m.... backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90 [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0 [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460 [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0 [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [2] # tc qdisc add dev swp1 clsact # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32 # tc qdisc del dev swp1 clsact # devlink dev reload pci/0000:06:00.0 Fixes: bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-23afs: Fix error handling with lookup via FS.InlineBulkStatusDavid Howells1-0/+25
When afs does a lookup, it tries to use FS.InlineBulkStatus to preemptively look up a bunch of files in the parent directory and cache this locally, on the basis that we might want to look at them too (for example if someone does an ls on a directory, they may want want to then stat every file listed). FS.InlineBulkStatus can be considered a compound op with the normal abort code applying to the compound as a whole. Each status fetch within the compound is then given its own individual abort code - but assuming no error that prevents the bulk fetch from returning the compound result will be 0, even if all the constituent status fetches failed. At the conclusion of afs_do_lookup(), we should use the abort code from the appropriate status to determine the error to return, if any - but instead it is assumed that we were successful if the op as a whole succeeded and we return an incompletely initialised inode, resulting in ENOENT, no matter the actual reason. In the particular instance reported, a vnode with no permission granted to be accessed is being given a UAEACCES abort code which should be reported as EACCES, but is instead being reported as ENOENT. Fix this by abandoning the inode (which will be cleaned up with the op) if file[1] has an abort code indicated and turn that abort code into an error instead. Whilst we're at it, add a tracepoint so that the abort codes of the individual subrequests of FS.InlineBulkStatus can be logged. At the moment only the container abort code can be 0. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-23Merge tag 'for-6.8-rc1-tag' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - zoned mode fixes: - fix slowdown when writing large file sequentially by looking up block groups with enough space faster - locking fixes when activating a zone - new mount API fixes: - preserve mount options for a ro/rw mount of the same subvolume - scrub fixes: - fix use-after-free in case the chunk length is not aligned to 64K, this does not happen normally but has been reported on images converted from ext4 - similar alignment check was missing with raid-stripe-tree - subvolume deletion fixes: - prevent calling ioctl on already deleted subvolume - properly track flag tracking a deleted subvolume - in subpage mode, fix decompression of an inline extent (zlib, lzo, zstd) - fix crash when starting writeback on a folio, after integration with recent MM changes this needs to be started conditionally - reject unknown flags in defrag ioctl - error handling, API fixes, minor warning fixes * tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: limit RST scrub to chunk boundary btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned btrfs: don't unconditionally call folio_start_writeback in subpage btrfs: use the original mount's mount options for the legacy reconfigure btrfs: don't warn if discard range is not aligned to sector btrfs: tree-checker: fix inline ref size in error messages btrfs: zstd: fix and simplify the inline extent decompression btrfs: lzo: fix and simplify the inline extent decompression btrfs: zlib: fix and simplify the inline extent decompression btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted btrfs: don't abort filesystem when attempting to snapshot deleted subvolume btrfs: zoned: fix lock ordering in btrfs_zone_activate() btrfs: fix unbalanced unlock of mapping_tree_lock btrfs: ref-verify: free ref cache before clearing mount opt btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send() btrfs: zoned: optimize hint byte for zoned allocator btrfs: zoned: factor out prepare_allocation_zoned()
2024-01-23ata: libata-sata: improve sysfs description for ATA_LPM_UNKNOWNNiklas Cassel1-1/+1
Currently, both ATA_LPM_UNKNOWN (0) and ATA_LPM_MAX_POWER (1) displays as "max_performance" in sysfs. This is quite misleading as they are not the same. For ATA_LPM_UNKNOWN, ata_eh_set_lpm() will not be called at all, leaving the configuration in unknown state. For ATA_LPM_MAX_POWER, ata_eh_set_lpm() is called, and setting the policy to ATA_LPM_MAX_POWER. This also matches the description of the SATA_MOBILE_LPM_POLICY Kconfig: 0 => Keep firmware settings 1 => Maximum performance Thus, update the sysfs description for ATA_LPM_UNKNOWN to match reality. While at it, update libata.h to mention that the ascii descriptions are in libata-sata.c and not in libata-scsi.c. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-01-22accel/ivpu: Deprecate DRM_IVPU_PARAM_CONTEXT_PRIORITY paramWachowski, Karol1-5/+20
DRM_IVPU_PARAM_CONTEXT_PRIORITY has been deprecated because it has been replaced with DRM_IVPU_JOB_PRIORITY levels set with submit IOCTL and was unused anyway. Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-10-jacek.lawrynowicz@linux.intel.com
2024-01-21Merge tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefsLinus Torvalds1-6/+6
Pull header fix from Kent Overstreet: "Just one small fixup for the RT build" * tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs: spinlock: Fix failing build for PREEMPT_RT
2024-01-21udp: fix busy pollingEric Dumazet3-12/+17
Generic sk_busy_loop_end() only looks at sk->sk_receive_queue for presence of packets. Problem is that for UDP sockets after blamed commit, some packets could be present in another queue: udp_sk(sk)->reader_queue In some cases, a busy poller could spin until timeout expiration, even if some packets are available in udp_sk(sk)->reader_queue. v3: - make sk_busy_loop_end() nicer (Willem) v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats. - add a sk_is_inet() check in sk_is_udp() (Willem feedback) - add a sk_is_inet() check in sk_is_tcp(). Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-21Merge tag 'dmaengine-fix-6.8-rc1' of ↵Linus Torvalds1-0/+21
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New support: - Loongson LS2X APB DMA controller - sf-pdma: mpfs-pdma support - Qualcomm X1E80100 GPI dma controller support Updates: - Xilinx XDMA updates to support interleaved DMA transfers - TI PSIL threads for AM62P and J722S and cfg register regions description - axi-dmac Improving the cyclic DMA transfers - Tegra Support dma-channel-mask property - Remaining platform remove callback returning void conversions Driver fixes for: - Xilinx xdma driver operator precedence and initialization fix - Excess kernel-doc warning fix in imx-sdma xilinx xdma drivers - format-overflow warning fix for rz-dmac, sh usb dmac drivers - 'output may be truncated' fix for shdma, fsl-qdma and dw-edma drivers" * tag 'dmaengine-fix-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (58 commits) dmaengine: dw-edma: increase size of 'name' in debugfs code dmaengine: fsl-qdma: increase size of 'irq_name' dmaengine: shdma: increase size of 'dev_id' dmaengine: xilinx: xdma: Fix kernel-doc warnings dmaengine: usb-dmac: Avoid format-overflow warning dmaengine: sh: rz-dmac: Avoid format-overflow warning dmaengine: imx-sdma: fix Excess kernel-doc warnings dmaengine: xilinx: xdma: Fix initialization location of desc in xdma_channel_isr() dmaengine: xilinx: xdma: Fix operator precedence in xdma_prep_interleaved_dma() dmaengine: xilinx: xdma: statify xdma_prep_interleaved_dma dmaengine: xilinx: xdma: Workaround truncation compilation error dmaengine: pl330: issue_pending waits until WFP state dmaengine: xilinx: xdma: Implement interleaved DMA transfers dmaengine: xilinx: xdma: Prepare the introduction of interleaved DMA transfers dmaengine: xilinx: xdma: Add transfer error reporting dmaengine: xilinx: xdma: Add error checking in xdma_channel_isr() dmaengine: xilinx: xdma: Rework xdma_terminate_all() dmaengine: xilinx: xdma: Ease dma_pool alignment requirements dmaengine: xilinx: xdma: Add necessary macro definitions dmaengine: xilinx: xdma: Get rid of unused code ...
2024-01-20Merge tag 'riscv-for-linus-6.8-mw4' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for tuning for systems with fast misaligned accesses. - Support for SBI-based suspend. - Support for the new SBI debug console extension. - The T-Head CMOs now use PA-based flushes. - Support for enabling the V extension in kernel code. - Optimized IP checksum routines. - Various ftrace improvements. - Support for archrandom, which depends on the Zkr extension. - The build is no longer broken under NET=n, KUNIT=y for ports that don't define their own ipv6 checksum. * tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits) lib: checksum: Fix build with CONFIG_NET=n riscv: lib: Check if output in asm goto supported riscv: Fix build error on rv32 + XIP riscv: optimize ELF relocation function in riscv RISC-V: Implement archrandom when Zkr is available riscv: Optimize hweight API with Zbb extension riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support riscv: ftrace: Make function graph use ftrace directly riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name riscv: Restrict DWARF5 when building with LLVM to known working versions riscv: Hoist linker relaxation disabling logic into Kconfig kunit: Add tests for csum_ipv6_magic and ip_fast_csum riscv: Add checksum library riscv: Add checksum header riscv: Add static key for misaligned accesses asm-generic: Improve csum_fold RISC-V: selftests: cbo: Ensure asm operands match constraints ...
2024-01-20llc: Drop support for ETH_P_TR_802_2.Kuniyuki Iwashima1-4/+2
syzbot reported an uninit-value bug below. [0] llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2 (0x0011), and syzbot abused the latter to trigger the bug. write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16) llc_conn_handler() initialises local variables {saddr,daddr}.mac based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes them to __llc_lookup(). However, the initialisation is done only when skb->protocol is htons(ETH_P_802_2), otherwise, __llc_lookup_established() and __llc_lookup_listener() will read garbage. The missing initialisation existed prior to commit 211ed865108e ("net: delete all instances of special processing for token ring"). It removed the part to kick out the token ring stuff but forgot to close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv(). Let's remove llc_tr_packet_type and complete the deprecation. [0]: BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90 __llc_lookup_established+0xe9d/0xf90 __llc_lookup net/llc/llc_conn.c:611 [inline] llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 __netif_receive_skb_one_core net/core/dev.c:5527 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641 netif_receive_skb_internal net/core/dev.c:5727 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5786 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555 tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x1490 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Local variable daddr created at: llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 Fixes: 211ed865108e ("net: delete all instances of special processing for token ring") Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-20tcp: make sure init the accept_queue's spinlocks onceZhengchao Shao1-0/+8
When I run syz's reproduction C program locally, it causes the following issue: pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0! WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7 30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900 RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000 R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000 FS: 00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0 Call Trace: <IRQ> _raw_spin_unlock (kernel/locking/spinlock.c:186) inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321) inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358) tcp_check_req (net/ipv4/tcp_minisocks.c:868) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205) ip_local_deliver_finish (net/ipv4/ip_input.c:234) __netif_receive_skb_one_core (net/core/dev.c:5529) process_backlog (./include/linux/rcupdate.h:779) __napi_poll (net/core/dev.c:6533) net_rx_action (net/core/dev.c:6604) __do_softirq (./arch/x86/include/asm/jump_label.h:27) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4374) ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469) tcp_rcv_state_process (net/ipv4/tcp_input.c:6657) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929) __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968) release_sock (net/core/sock.c:3536) inet_wait_for_connect (net/ipv4/af_inet.c:609) __inet_stream_connect (net/ipv4/af_inet.c:702) inet_stream_connect (net/ipv4/af_inet.c:748) __sys_connect (./include/linux/file.h:45 net/socket.c:2064) __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7fa10ff05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640 R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20 </TASK> The issue triggering process is analyzed as follows: Thread A Thread B tcp_v4_rcv //receive ack TCP packet inet_shutdown tcp_check_req tcp_disconnect //disconnect sock ... tcp_set_state(sk, TCP_CLOSE) inet_csk_complete_hashdance ... inet_csk_reqsk_queue_add inet_listen //start listen spin_lock(&queue->rskq_lock) inet_csk_listen_start ... reqsk_queue_alloc ... spin_lock_init spin_unlock(&queue->rskq_lock) //warning When the socket receives the ACK packet during the three-way handshake, it will hold spinlock. And then the user actively shutdowns the socket and listens to the socket immediately, the spinlock will be initialized. When the socket is going to release the spinlock, a warning is generated. Also the same issue to fastopenq.lock. Move init spinlock to inet_create and inet_accept to make sure init the accept_queue's spinlocks once. Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue") Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Reported-by: Ming Shu <sming56@aliyun.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240118012019.1751966-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-20Merge tag 'strlcpy-removal-v6.8-rc1' of ↵Linus Torvalds2-54/+0
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull strlcpy removal from Kees Cook: "As promised, this is 'part 2' of the hardening tree, late in -rc1 now that all the other trees with strlcpy() removals have landed. One new user appeared (in bcachefs) but was a trivial refactor. The kernel is now free of the strlcpy() API! - Remove of the final (very recent) user of strlcpy() (in bcachefs) - Remove the strlcpy() API. Long live strscpy()" * tag 'strlcpy-removal-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: string: Remove strlcpy() bcachefs: Replace strlcpy() with strscpy()
2024-01-20Merge tag 'devicetree-for-6.8-2' of ↵Linus Torvalds2-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree header detangling from Rob Herring: "Remove the circular including of of_device.h and of_platform.h along with all of their implicit includes. This is the culmination of several kernel cycles worth of fixing implicit DT includes throughout the tree" * tag 'devicetree-for-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: Stop circularly including of_device.h and of_platform.h clk: qcom: gcc-x1e80100: Replace of_device.h with explicit includes thermal: loongson2: Replace of_device.h with explicit includes net: can: Use device_get_match_data() sparc: Use device_get_match_data()