summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2024-03-13Merge tag 'hardening-v6.9-rc1' of ↵Linus Torvalds4-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ...
2024-03-13Merge tag 'execve-v6.9-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Drop needless error path code in remove_arg_zero() (Li kunyu, Kees Cook) - binfmt_elf_efpic: Don't use missing interpreter's properties (Max Filippov) - Use /bin/bash for execveat selftests * tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Simplify remove_arg_zero() error path selftests/exec: Perform script checks with /bin/bash exec: Delete unnecessary statements in remove_arg_zero() fs: binfmt_elf_efpic: don't use missing interpreter's properties
2024-03-12Merge tag 's390-6.9-1' of ↵Linus Torvalds1-0/+31
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Various virtual vs physical address usage fixes - Fix error handling in Processor Activity Instrumentation device driver, and export number of counters with a sysfs file - Allow for multiple events when Processor Activity Instrumentation counters are monitored in system wide sampling - Change multiplier and shift values of the Time-of-Day clock source to improve steering precision - Remove a couple of unneeded GFP_DMA flags from allocations - Disable mmap alignment if randomize_va_space is also disabled, to avoid a too small heap - Various changes to allow s390 to be compiled with LLVM=1, since ld.lld and llvm-objcopy will have proper s390 support witch clang 19 - Add __uninitialized macro to Compiler Attributes. This is helpful with s390's FPU code where some users have up to 520 byte stack frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the intention (performance improvement) of such code sections. - Convert switch_to() to an out-of-line function, and use the generic switch_to header file - Replace the usage of s390's debug feature with pr_debug() calls within the zcrypt device driver - Improve hotplug support of the Adjunct Processor device driver - Improve retry handling in the zcrypt device driver - Various changes to the in-kernel FPU code: - Make in-kernel FPU sections preemptible - Convert various larger inline assemblies and assembler files to C, mainly by using singe instruction inline assemblies. This increases readability, but also allows makes it easier to add proper instrumentation hooks - Cleanup of the header files - Provide fast variants of csum_partial() and csum_partial_copy_nocheck() based on vector instructions - Introduce and use a lock to synchronize accesses to zpci device data structures to avoid inconsistent states caused by concurrent accesses - Compile the kernel without -fPIE. This addresses the following problems if the kernel is compiled with -fPIE: - It uses dynamic symbols (.dynsym), for which the linker refuses to allow more than 64k sections. This can break features which use '-ffunction-sections' and '-fdata-sections', including kpatch-build and function granular KASLR - It unnecessarily uses GOT relocations, adding an extra layer of indirection for many memory accesses - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were reported as globally shared * tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits) s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64 s390/cache: prevent rebuild of shared_cpu_list s390/crypto: remove retry loop with sleep from PAES pkey invocation s390/pkey: improve pkey retry behavior s390/zcrypt: improve zcrypt retry behavior s390/zcrypt: introduce retries on in-kernel send CPRB functions s390/ap: introduce mutex to lock the AP bus scan s390/ap: rework ap_scan_bus() to return true on config change s390/ap: clarify AP scan bus related functions and variables s390/ap: rearm APQNs bindings complete completion s390/configs: increase number of LOCKDEP_BITS s390/vfio-ap: handle hardware checkstop state on queue reset operation s390/pai: change sampling event assignment for PMU device driver s390/boot: fix minor comment style damages s390/boot: do not check for zero-termination relocation entry s390/boot: make type of __vmlinux_relocs_64_start|end consistent s390/boot: sanitize kaslr_adjust_relocs() function prototype s390/boot: simplify GOT handling s390: vmlinux.lds.S: fix .got.plt assertion s390/boot: workaround current 'llvm-objdump -t -j ...' behavior ...
2024-03-12Merge tag 'x86-core-2024-03-11' of ↵Linus Torvalds3-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: - The biggest change is the rework of the percpu code, to support the 'Named Address Spaces' GCC feature, by Uros Bizjak: - This allows C code to access GS and FS segment relative memory via variables declared with such attributes, which allows the compiler to better optimize those accesses than the previous inline assembly code. - The series also includes a number of micro-optimizations for various percpu access methods, plus a number of cleanups of %gs accesses in assembly code. - These changes have been exposed to linux-next testing for the last ~5 months, with no known regressions in this area. - Fix/clean up __switch_to()'s broken but accidentally working handling of FPU switching - which also generates better code - Propagate more RIP-relative addressing in assembly code, to generate slightly better code - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to make it easier for distros to follow & maintain these options - Rework the x86 idle code to cure RCU violations and to clean up the logic - Clean up the vDSO Makefile logic - Misc cleanups and fixes * tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/idle: Select idle routine only once x86/idle: Let prefer_mwait_c1_over_halt() return bool x86/idle: Cleanup idle_setup() x86/idle: Clean up idle selection x86/idle: Sanitize X86_BUG_AMD_E400 handling sched/idle: Conditionally handle tick broadcast in default_idle_call() x86: Increase brk randomness entropy for 64-bit systems x86/vdso: Move vDSO to mmap region x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o x86/retpoline: Ensure default return thunk isn't used at runtime x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32 x86/vdso: Use $(addprefix ) instead of $(foreach ) x86/vdso: Simplify obj-y addition x86/vdso: Consolidate targets and clean-files x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS ...
2024-03-12Merge tag 'x86-asm-2024-03-11' of ↵Linus Torvalds1-36/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "Two changes to simplify the x86 decoder logic a bit" * tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Directly assign x86_64 state in insn_init() x86/insn: Remove superfluous checks from instruction decoding routines
2024-03-12Merge tag 'x86_sev_for_v6.9_rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add the x86 part of the SEV-SNP host support. This will allow the kernel to be used as a KVM hypervisor capable of running SNP (Secure Nested Paging) guests. Roughly speaking, SEV-SNP is the ultimate goal of the AMD confidential computing side, providing the most comprehensive confidential computing environment up to date. This is the x86 part and there is a KVM part which did not get ready in time for the merge window so latter will be forthcoming in the next cycle. - Rework the early code's position-dependent SEV variable references in order to allow building the kernel with clang and -fPIE/-fPIC and -mcmodel=kernel - The usual set of fixes, cleanups and improvements all over the place * tag 'x86_sev_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/sev: Disable KMSAN for memory encryption TUs x86/sev: Dump SEV_STATUS crypto: ccp - Have it depend on AMD_IOMMU iommu/amd: Fix failure return from snp_lookup_rmpentry() x86/sev: Fix position dependent variable references in startup code crypto: ccp: Make snp_range_list static x86/Kconfig: Remove CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT Documentation: virt: Fix up pre-formatted text block for SEV ioctls crypto: ccp: Add the SNP_SET_CONFIG command crypto: ccp: Add the SNP_COMMIT command crypto: ccp: Add the SNP_PLATFORM_STATUS command x86/cpufeatures: Enable/unmask SEV-SNP CPU feature KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown crypto: ccp: Handle legacy SEV commands when SNP is enabled crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled crypto: ccp: Handle the legacy TMR allocation when SNP is enabled x86/sev: Introduce an SNP leaked pages list crypto: ccp: Provide an API to issue SEV and SNP commands ...
2024-03-12Merge tag 'x86-fred-2024-03-10' of ↵Linus Torvalds5-9/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FRED support from Thomas Gleixner: "Support for x86 Fast Return and Event Delivery (FRED). FRED is a replacement for IDT event delivery on x86 and addresses most of the technical nightmares which IDT exposes: 1) Exception cause registers like CR2 need to be manually preserved in nested exception scenarios. 2) Hardware interrupt stack switching is suboptimal for nested exceptions as the interrupt stack mechanism rewinds the stack on each entry which requires a massive effort in the low level entry of #NMI code to handle this. 3) No hardware distinction between entry from kernel or from user which makes establishing kernel context more complex than it needs to be especially for unconditionally nestable exceptions like NMI. 4) NMI nesting caused by IRET unconditionally reenabling NMIs, which is a problem when the perf NMI takes a fault when collecting a stack trace. 5) Partial restore of ESP when returning to a 16-bit segment 6) Limitation of the vector space which can cause vector exhaustion on large systems. 7) Inability to differentiate NMI sources FRED addresses these shortcomings by: 1) An extended exception stack frame which the CPU uses to save exception cause registers. This ensures that the meta information for each exception is preserved on stack and avoids the extra complexity of preserving it in software. 2) Hardware interrupt stack switching is non-rewinding if a nested exception uses the currently interrupt stack. 3) The entry points for kernel and user context are separate and GS BASE handling which is required to establish kernel context for per CPU variable access is done in hardware. 4) NMIs are now nesting protected. They are only reenabled on the return from NMI. 5) FRED guarantees full restore of ESP 6) FRED does not put a limitation on the vector space by design because it uses a central entry points for kernel and user space and the CPUstores the entry type (exception, trap, interrupt, syscall) on the entry stack along with the vector number. The entry code has to demultiplex this information, but this removes the vector space restriction. The first hardware implementations will still have the current restricted vector space because lifting this limitation requires further changes to the local APIC. 7) FRED stores the vector number and meta information on stack which allows having more than one NMI vector in future hardware when the required local APIC changes are in place. The series implements the initial FRED support by: - Reworking the existing entry and IDT handling infrastructure to accomodate for the alternative entry mechanism. - Expanding the stack frame to accomodate for the extra 16 bytes FRED requires to store context and meta information - Providing FRED specific C entry points for events which have information pushed to the extended stack frame, e.g. #PF and #DB. - Providing FRED specific C entry points for #NMI and #MCE - Implementing the FRED specific ASM entry points and the C code to demultiplex the events - Providing detection and initialization mechanisms and the necessary tweaks in context switching, GS BASE handling etc. The FRED integration aims for maximum code reuse vs the existing IDT implementation to the extent possible and the deviation in hot paths like context switching are handled with alternatives to minimalize the impact. The low level entry and exit paths are seperate due to the extended stack frame and the hardware based GS BASE swichting and therefore have no impact on IDT based systems. It has been extensively tested on existing systems and on the FRED simulation and as of now there are no outstanding problems" * tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/fred: Fix init_task thread stack pointer initialization MAINTAINERS: Add a maintainer entry for FRED x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly x86/fred: Invoke FRED initialization code to enable FRED x86/fred: Add FRED initialization functions x86/syscall: Split IDT syscall setup code into idt_syscall_init() KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled x86/traps: Add sysvec_install() to install a system interrupt handler x86/fred: FRED entry/exit and dispatch code x86/fred: Add a machine check entry stub for FRED x86/fred: Add a NMI entry stub for FRED x86/fred: Add a debug fault entry stub for FRED x86/idtentry: Incorporate definitions/declarations of the FRED entries x86/fred: Make exc_page_fault() work for FRED x86/fred: Allow single-step trap and NMI when starting a new task x86/fred: No ESPFIX needed when FRED is enabled ...
2024-03-12Merge tag 'timers-core-2024-03-10' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A large set of updates and features for timers and timekeeping: - The hierarchical timer pull model When timer wheel timers are armed they are placed into the timer wheel of a CPU which is likely to be busy at the time of expiry. This is done to avoid wakeups on potentially idle CPUs. This is wrong in several aspects: 1) The heuristics to select the target CPU are wrong by definition as the chance to get the prediction right is close to zero. 2) Due to #1 it is possible that timers are accumulated on a single target CPU 3) The required computation in the enqueue path is just overhead for dubious value especially under the consideration that the vast majority of timer wheel timers are either canceled or rearmed before they expire. The timer pull model avoids the above by removing the target computation on enqueue and queueing timers always on the CPU on which they get armed. This is achieved by having separate wheels for CPU pinned timers and global timers which do not care about where they expire. As long as a CPU is busy it handles both the pinned and the global timers which are queued on the CPU local timer wheels. When a CPU goes idle it evaluates its own timer wheels: - If the first expiring timer is a pinned timer, then the global timers can be ignored as the CPU will wake up before they expire. - If the first expiring timer is a global timer, then the expiry time is propagated into the timer pull hierarchy and the CPU makes sure to wake up for the first pinned timer. The timer pull hierarchy organizes CPUs in groups of eight at the lowest level and at the next levels groups of eight groups up to the point where no further aggregation of groups is required, i.e. the number of levels is log8(NR_CPUS). The magic number of eight has been established by experimention, but can be adjusted if needed. In each group one busy CPU acts as the migrator. It's only one CPU to avoid lock contention on remote timer wheels. The migrator CPU checks in its own timer wheel handling whether there are other CPUs in the group which have gone idle and have global timers to expire. If there are global timers to expire, the migrator locks the remote CPU timer wheel and handles the expiry. Depending on the group level in the hierarchy this handling can require to walk the hierarchy downwards to the CPU level. Special care is taken when the last CPU goes idle. At this point the CPU is the systemwide migrator at the top of the hierarchy and it therefore cannot delegate to the hierarchy. It needs to arm its own timer device to expire either at the first expiring timer in the hierarchy or at the first CPU local timer, which ever expires first. This completely removes the overhead from the enqueue path, which is e.g. for networking a true hotpath and trades it for a slightly more complex idle path. This has been in development for a couple of years and the final series has been extensively tested by various teams from silicon vendors and ran through extensive CI. There have been slight performance improvements observed on network centric workloads and an Intel team confirmed that this allows them to power down a die completely on a mult-die socket for the first time in a mostly idle scenario. There is only one outstanding ~1.5% regression on a specific overloaded netperf test which is currently investigated, but the rest is either positive or neutral performance wise and positive on the power management side. - Fixes for the timekeeping interpolation code for cross-timestamps: cross-timestamps are used for PTP to get snapshots from hardware timers and interpolated them back to clock MONOTONIC. The changes address a few corner cases in the interpolation code which got the math and logic wrong. - Simplifcation of the clocksource watchdog retry logic to automatically adjust to handle larger systems correctly instead of having more incomprehensible command line parameters. - Treewide consolidation of the VDSO data structures. - The usual small improvements and cleanups all over the place" * tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) timer/migration: Fix quick check reporting late expiry tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n vdso/datapage: Quick fix - use asm/page-def.h for ARM64 timers: Assert no next dyntick timer look-up while CPU is offline tick: Assume timekeeping is correctly handed over upon last offline idle call tick: Shut down low-res tick from dying CPU tick: Split nohz and highres features from nohz_mode tick: Move individual bit features to debuggable mask accesses tick: Move got_idle_tick away from common flags tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING tick: Move tick cancellation up to CPUHP_AP_TICK_DYING tick: Start centralizing tick related CPU hotplug operations tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick() tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick() tick: Use IS_ENABLED() whenever possible tick/sched: Remove useless oneshot ifdeffery tick/nohz: Remove duplicate between lowres and highres handlers tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer() hrtimer: Select housekeeping CPU during migration ...
2024-03-11Merge tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-14/+90
Pull workqueue updates from Tejun Heo: "This cycle, a lot of workqueue changes including some that are significant and invasive. - During v6.6 cycle, unbound workqueues were updated so that they are more topology aware and flexible, which among other things improved workqueue behavior on modern multi-L3 CPUs. In the process, commit 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") switched unbound workqueues to use per-CPU frontend pool_workqueues as a part of increasing front-back mapping flexibility. An unwelcome side effect of this change was that this made max concurrency enforcement per-CPU blowing up the maximum number of allowed concurrent executions. I incorrectly assumed that this wouldn't cause practical problems as most unbound workqueue users are self-regulate max concurrency; however, there definitely are which don't (e.g. on IO paths) and the drastic increase in the allowed max concurrency led to noticeable perf regressions in some use cases. This is now addressed by separating out max concurrency enforcement to a separate struct - wq_node_nr_active - which makes @max_active consistently mean system-wide max concurrency regardless of the number of CPUs or (finally) NUMA nodes. This is a rather invasive and, in places, a bit clunky; however, the clunkiness rises from the the inherent requirement to handle the disagreement between the execution locality domain and max concurrency enforcement domain on some modern machines. See commit 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") for more details. - BH workqueue support is added. They are similar to per-CPU workqueues but execute work items in the softirq context. This is expected to replace tasklet. However, currently, it's missing the ability to disable and enable work items which is needed to convert many tasklet users. To avoid crowding this merge window too much, this will be included in the next merge window. A separate pull request will be sent for the couple conversion patches that are currently pending. - Waiman plugged a long-standing hole in workqueue CPU isolation where ordered workqueues didn't follow wq_unbound_cpumask updates. Ordered workqueues now follow the same rules as other unbound workqueues. - More CPU isolation improvements: Juri fixed another deficit in workqueue isolation where unbound rescuers don't respect wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on isolated CPUs. - Other misc changes" * tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits) workqueue: Drain BH work items on hot-unplugged CPUs workqueue: Introduce from_work() helper for cleaner callback declarations workqueue: Control intensive warning threshold through cmdline workqueue: Make @flags handling consistent across set_work_data() and friends workqueue: Remove clear_work_data() workqueue: Factor out work_grab_pending() from __cancel_work_sync() workqueue: Clean up enum work_bits and related constants workqueue: Introduce work_cancel_flags workqueue: Use variable name irq_flags for saving local irq flags workqueue: Reorganize flush and cancel[_sync] functions workqueue: Rename __cancel_work_timer() to __cancel_timer_sync() workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held() workqueue: Cosmetic changes workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK workqueue: Fix queue_work_on() with BH workqueues async: Use a dedicated unbound workqueue with raised min_active workqueue: Implement workqueue_set_min_active() workqueue: Fix kernel-doc comment of unplug_oldest_pwq() workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask kernel/workqueue: Let rescuers follow unbound wq cpumask changes ...
2024-03-11Merge tag 'vfs-6.9.pidfd' of ↵Linus Torvalds1-1/+30
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pdfd updates from Christian Brauner: - Until now pidfds could only be created for thread-group leaders but not for threads. There was no technical reason for this. We simply had no users that needed support for this. Now we do have users that need support for this. This introduces a new PIDFD_THREAD flag for pidfd_open(). If that flag is set pidfd_open() creates a pidfd that refers to a specific thread. In addition, we now allow clone() and clone3() to be called with CLONE_PIDFD | CLONE_THREAD which wasn't possible before. A pidfd that refers to an individual thread differs from a pidfd that refers to a thread-group leader: (1) Pidfds are pollable. A task may poll a pidfd and get notified when the task has exited. For thread-group leader pidfds the polling task is woken if the thread-group is empty. In other words, if the thread-group leader task exits when there are still threads alive in its thread-group the polling task will not be woken when the thread-group leader exits but rather when the last thread in the thread-group exits. For thread-specific pidfds the polling task is woken if the thread exits. (2) Passing a thread-group leader pidfd to pidfd_send_signal() will generate thread-group directed signals like kill(2) does. Passing a thread-specific pidfd to pidfd_send_signal() will generate thread-specific signals like tgkill(2) does. The default scope of the signal is thus determined by the type of the pidfd. Since use-cases exist where the default scope of the provided pidfd needs to be overriden the following flags are added to pidfd_send_signal(): - PIDFD_SIGNAL_THREAD Send a thread-specific signal. - PIDFD_SIGNAL_THREAD_GROUP Send a thread-group directed signal. - PIDFD_SIGNAL_PROCESS_GROUP Send a process-group directed signal. The scope change will only work if the struct pid is actually used for this scope. For example, in order to send a thread-group directed signal the provided pidfd must be used as a thread-group leader and similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be used as a process group leader. - Move pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds to become useful for the first time. They can now be compared by inode number which are unique for the system lifetime. Instead of stashing struct pid in file->private_data we can now stash it in inode->i_private. This makes it possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. Another side-effect is that file->private_data is now freed up for per-file options for pidfds. Now, each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode and dentry for each struct pid and we reuse that inode and dentry for all pidfds that refer to the same struct pid. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs. The dentry and inode allocation mechanism is moved into generic infrastructure that is now shared between nsfs and pidfs. The path_from_stashed() helper must be provided with a stashing location, an inode number, a mount, and the private data that is supposed to be used and it will provide a path that can be passed to dentry_open(). The helper will try retrieve an existing dentry from the provided stashing location. If a valid dentry is found it is reused. If not a new one is allocated and we try to stash it in the provided location. If this fails we retry until we either find an existing dentry or the newly allocated dentry could be stashed. Subsequent openers of the same namespace or task are then able to reuse it. - Currently it is only possible to get notified when a task has exited, i.e., become a zombie and userspace gets notified with EPOLLIN. We now also support waiting until the task has been reaped, notifying userspace with EPOLLHUP. - Ensure that ESRCH is reported for getfd if a task is exiting instead of the confusing EBADF. - Various smaller cleanups to pidfd functions. * tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits) libfs: improve path_from_stashed() libfs: add stashed_dentry_prune() libfs: improve path_from_stashed() helper pidfs: convert to path_from_stashed() helper nsfs: convert to path_from_stashed() helper libfs: add path_from_stashed() pidfd: add pidfs pidfd: move struct pidfd_fops pidfd: allow to override signal scope in pidfd_send_signal() pidfd: change pidfd_send_signal() to respect PIDFD_THREAD signal: fill in si_code in prepare_kill_siginfo() selftests: add ESRCH tests for pidfd_getfd() pidfd: getfd should always report ESRCH if a task is exiting pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together pidfd: exit: kill the no longer used thread_group_exited() pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN)) pid: kill the obsolete PIDTYPE_PID code in transfer_pid() pidfd: kill the no longer needed do_notify_pidfd() in de_thread() pidfd_poll: report POLLHUP when pid_task() == NULL pidfd: implement PIDFD_THREAD flag for pidfd_open() ...
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds2-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-03-11Merge tag 'linux_kselftest-kunit-6.9-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - fix to make kunit_bus_type const - kunit tool change to Print UML command - DRM device creation helpers are now using the new kunit device creation helpers. This change resulted in DRM helpers switching from using a platform_device, to a dedicated bus and device type used by kunit. kunit devices don't set DMA mask and this caused regression on some drm tests as they can't allocate DMA buffers. Fix this problem by setting DMA masks on the kunit device during initialization. - KUnit has several macros which accept a log message, which can contain printf format specifiers. Some of these (the explicit log macros) already use the __printf() gcc attribute to ensure the format specifiers are valid, but those which could fail the test, and hence used __kunit_do_failed_assertion() behind the scenes, did not. These include: KUNIT_EXPECT_*_MSG(), KUNIT_ASSERT_*_MSG(), and KUNIT_FAIL() A nine-patch series adds the __printf() attribute, and fixes all of the issues uncovered. * tag 'linux_kselftest-kunit-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Annotate _MSG assertion variants with gnu printf specifiers drm: tests: Fix invalid printf format specifiers in KUnit tests drm/xe/tests: Fix printf format specifiers in xe_migrate test net: test: Fix printf format specifier in skb_segment kunit test rtc: test: Fix invalid format specifier. time: test: Fix incorrect format specifier lib: memcpy_kunit: Fix an invalid format specifier in an assertion msg lib/cmdline: Fix an invalid format specifier in an assertion msg kunit: test: Log the correct filter string in executor_test kunit: Setup DMA masks on the kunit device kunit: make kunit_bus_type const kunit: Mark filter* params as rw kunit: tool: Print UML command
2024-03-11Merge tag 'linux_kselftest-next-6.9-rc1' of ↵Linus Torvalds56-842/+3014
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest update from Shuah Khan: - livepatch restructuring to move the module out of lib to be built as a out-of-tree modules during kselftest build. This makes it easier change, debug and rebuild the tests by running make on the selftests/livepatch directory, which is not currently possible since the modules on lib/livepatch are build and installed using the main makefile modules target. - livepatch restructuring fixes for problems found by kernel test robot. The change skips the test if kernel-devel isn't installed (default value of KDIR), or if KDIR variable passed doesn't exists. - resctrl test restructuring and new non-contiguous CBMs CAT test - new ktap_helpers to print diagnostic messages, pass/fail tests based on exit code, abort test, and finish the test. - a new test verify power supply properties. - a new ftrace to exercise function tracer across cpu hotplug. - timeout increase for mqueue test to allow the test to run on i3.metal AWS instances. - minor spelling corrections in several tests. - missing gitignore files and changes to existing gitignore files. * tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits) kselftest: Add basic test for probing the rust sample modules selftests: lib.mk: Do not process TEST_GEN_MODS_DIR selftests: livepatch: Avoid running the tests if kernel-devel is missing selftests: livepatch: Add initial .gitignore selftests/resctrl: Add non-contiguous CBMs CAT test selftests/resctrl: Add resource_info_file_exists() selftests/resctrl: Split validate_resctrl_feature_request() selftests/resctrl: Add a helper for the non-contiguous test selftests/resctrl: Add test groups and name L3 CAT test L3_CAT selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy" selftests/mqueue: Set timeout to 180 seconds selftests/ftrace: Add test to exercize function tracer across cpu hotplug selftest: ftrace: fix minor typo in log selftests: thermal: intel: workload_hint: add missing gitignore selftests: thermal: intel: power_floor: add missing gitignore selftests: uevent: add missing gitignore selftests: Add test to verify power supply properties selftests: ktap_helpers: Add a helper to finish the test selftests: ktap_helpers: Add a helper to abort the test selftests: ktap_helpers: Add helper to pass/fail test based on exit code ...
2024-03-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+11
Pull kvm fixes from Paolo Bonzini: "KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable from userspace, so there would be no way to write to a read-only guest_memfd). - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely for development and testing. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes. x86 fixes: - Fix missing marking of a guest page as dirty when emulating an atomic access. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC in non-preemptible mode). - Disable AMD DebugSwap by default, it breaks VMSA signing and will be re-enabled with a better VM creation API in 6.10. - Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock, to avoid a race with unregistering of the same region and the consequent use-after-free issue" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: SEV: disable SEV-ES DebugSwap by default KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-07Merge tag 'net-6.8-rc8' of ↵Linus Torvalds3-11/+78
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, ipsec and netfilter. No solution yet for the stmmac issue mentioned in the last PR, but it proved to be a lockdep false positive, not a blocker. Current release - regressions: - dpll: move all dpll<>netdev helpers to dpll code, fix build regression with old compilers Current release - new code bugs: - page_pool: fix netlink dump stop/resume Previous releases - regressions: - bpf: fix verifier to check bpf_func_state->callback_depth when pruning states as otherwise unsafe programs could get accepted - ipv6: avoid possible UAF in ip6_route_mpath_notify() - ice: reconfig host after changing MSI-X on VF - mlx5: - e-switch, change flow rule destination checking - add a memory barrier to prevent a possible null-ptr-deref - switch to using _bh variant of of spinlock where needed Previous releases - always broken: - netfilter: nf_conntrack_h323: add protection for bmp length out of range - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP program in CPU map which led to random xdp_md fields - xfrm: fix UDP encapsulation in TX packet offload - netrom: fix data-races around sysctls - ice: - fix potential NULL pointer dereference in ice_bridge_setlink() - fix uninitialized dplls mutex usage - igc: avoid returning frame twice in XDP_REDIRECT - i40e: disable NAPI right after disabling irqs when handling xsk_pool - geneve: make sure to pull inner header in geneve_rx() - sparx5: fix use after free inside sparx5_del_mact_entry - dsa: microchip: fix register write order in ksz8_ind_write8() Misc: - selftests: mptcp: fixes for diag.sh" * tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) net: pds_core: Fix possible double free in error handling path netrom: Fix data-races around sysctl_net_busy_read netrom: Fix a data-race around sysctl_netrom_link_fails_count netrom: Fix a data-race around sysctl_netrom_routing_control netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size netrom: Fix a data-race around sysctl_netrom_transport_busy_delay netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries netrom: Fix a data-race around sysctl_netrom_transport_timeout netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser netrom: Fix a data-race around sysctl_netrom_default_path_quality netfilter: nf_conntrack_h323: Add protection for bmp length out of range netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout netfilter: nft_ct: fix l3num expectations with inet pseudo family netfilter: nf_tables: reject constant set with timeout netfilter: nf_tables: disallow anonymous set with timeout flag net/rds: fix WARNING in rds_conn_connect_if_down net: dsa: microchip: fix register write order in ksz8_ind_write8() ...
2024-03-06selftests/bpf: Fix up xdp bonding test wrt feature flagsDaniel Borkmann1-2/+2
Adjust the XDP feature flags for the bond device when no bond slave devices are attached. After 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0 as flags instead of NETDEV_XDP_ACT_MASK. # ./vmtest.sh -- ./test_progs -t xdp_bond [...] [ 3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface [ 3.995434] bond1 (unregistering): Released all slaves [ 4.022311] bond2: (slave veth2_1): Releasing backup interface #507/1 xdp_bonding/xdp_bonding_attach:OK #507/2 xdp_bonding/xdp_bonding_nested:OK #507/3 xdp_bonding/xdp_bonding_features:OK #507/4 xdp_bonding/xdp_bonding_roundrobin:OK #507/5 xdp_bonding/xdp_bonding_activebackup:OK #507/6 xdp_bonding/xdp_bonding_xor_layer2:OK #507/7 xdp_bonding/xdp_bonding_xor_layer23:OK #507/8 xdp_bonding/xdp_bonding_xor_layer34:OK #507/9 xdp_bonding/xdp_bonding_redirect_multi:OK #507 xdp_bonding:OK Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED [ 4.185255] bond2 (unregistering): Released all slaves [...] Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Message-ID: <20240305090829.17131-2-daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-06selftests/bpf: test case for callback_depth states pruning logicEduard Zingerman1-0/+70
The test case was minimized from mailing list discussion [0]. It is equivalent to the following C program: struct iter_limit_bug_ctx { __u64 a; __u64 b; __u64 c; }; static __naked void iter_limit_bug_cb(void) { switch (bpf_get_prandom_u32()) { case 1: ctx->a = 42; break; case 2: ctx->b = 42; break; default: ctx->c = 42; break; } } int iter_limit_bug(struct __sk_buff *skb) { struct iter_limit_bug_ctx ctx = { 7, 7, 7 }; bpf_loop(2, iter_limit_bug_cb, &ctx, 0); if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7) asm volatile("r1 /= 0;":::"r1"); return 0; } The main idea is that each loop iteration changes one of the state variables in a non-deterministic manner. Hence it is premature to prune the states that have two iterations left comparing them to states with one iteration left. E.g. {{7,7,7}, callback_depth=0} can reach state {42,42,7}, while {{7,7,7}, callback_depth=1} can't. [0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/ Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240222154121.6991-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-06selftests/exec: Perform script checks with /bin/bashKees Cook1-1/+1
It seems some shells linked to /bin/sh don't have consistent behavior with error codes on execution failures. Explicitly use /bin/bash so that "not found" errors are correctly generated. Repeating the comment from the test: /* * Execute as a long pathname relative to "/". If this is a script, * the interpreter will launch but fail to open the script because its * name ("/dev/fd/5/xxx....") is bigger than PATH_MAX. * * The failure code is usually 127 (POSIX: "If a command is not found, * the exit status shall be 127."), but some systems give 126 (POSIX: * "If the command name is found, but it is not an executable utility, * the exit status shall be 126."), so allow either. */ Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Closes: https://lore.kernel.org/lkml/02c8bf8e-1934-44ab-a886-e065b37366a7@collabora.com/ Signed-off-by: Kees Cook <keescook@chromium.org> --- Cc: Eric Biederman <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-mm@kvack.org Cc: linux-kselftest@vger.kernel.org
2024-03-05selftests/powerpc: Fix load_unaligned_zeropad build failureMichael Ellerman2-0/+1
This test is userspace code, but uses some kernel headers via symlinks, and mocks other headers, in order to test load_unaligned_zeropad(). Currently the test fails to build with: In file included from load_unaligned_zeropad.c:26: word-at-a-time.h:7:10: fatal error: linux/bitops.h: No such file or directory 7 | #include <linux/bitops.h> This is due to the recent changes to the kernel headers. Fix it by symlinking the new wordpart.h, and creating an empty stub for bitops.h which is all that's needed. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: 66a5c40f60f5 ("kernel.h: removed REPEAT_BYTE from kernel.h") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20240305125644.3315910-1-mpe@ellerman.id.au Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-04kselftest: Add basic test for probing the rust sample modulesLaura Nao4-0/+51
Add new basic kselftest that checks if the available rust sample modules can be added and removed correctly. Signed-off-by: Laura Nao <laura.nao@collabora.com> Reviewed-by: Sergio Gonzalez Collado <sergio.collado@gmail.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-04selftests: mptcp: diag: avoid extra waitingMatthieu Baerts (NGI0)1-6/+3
When creating a lot of listener sockets, it is enough to wait only for the last one, like we are doing before in diag.sh for other subtests. If we do a check for each listener sockets, each time listing all available sockets, it can take a very long time in very slow environments, at the point we can reach some timeout. When using the debug kconfig, the waiting time switches from more than 8 sec to 0.1 sec on my side. In slow/busy environments, and with a poll timeout set to 30 ms, the waiting time could go up to ~100 sec because the listener socket would timeout and stop, while the script would still be checking one by one if all sockets are ready. The result is that after having waited for everything to be ready, all sockets have been stopped due to a timeout, and it is too late for the script to check how many there were. While at it, also removed ss options we don't need: we only need the filtering options, to count how many listener sockets have been created. We don't need to ask ss to display internal TCP information, and the memory if the output is dropped by the 'wc -l' command anyway. Fixes: b4b51d36bbaa ("selftests: mptcp: explicitly trigger the listener diag code-path") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/r/20240301063754.2ecefecf@kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04selftests: mptcp: diag: return KSFT_FAIL not test_cntGeliang Tang1-3/+3
The test counter 'test_cnt' should not be returned in diag.sh, e.g. what if only the 4th test fail? Will do 'exit 4' which is 'exit ${KSFT_SKIP}', the whole test will be marked as skipped instead of 'failed'! So we should do ret=${KSFT_FAIL} instead. Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests") Cc: stable@vger.kernel.org Fixes: 42fb6cddec3b ("selftests: mptcp: more stable diag tests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-03Merge tag 'powerpc-6.8-5' of ↵Linus Torvalds1-10/+6
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix IOMMU table initialisation when doing kdump over SR-IOV - Fix incorrect RTAS function name for resetting TCE tables - Fix fpu_signal selftest failures since a recent change Thanks to Gaurav Batra and Nathan Lynch. * tag 'powerpc-6.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix fpu_signal failures powerpc/rtas: use correct function name for resetting TCE tables powerpc/pseries/iommu: IOMMU table is not initialized for kdump over SR-IOV
2024-03-01selftests/powerpc: Fix fpu_signal failuresMichael Ellerman1-10/+6
My recent commit e5d00aaac651 ("selftests/powerpc: Check all FPRs in fpu_preempt") inadvertently broke the fpu_signal test. It needs to take into account that fpu_preempt now loads 32 FPRs, so enlarge darray. Also use the newly added randomise_darray() to properly randomise darray. Finally the checking done in signal_fpu_sig() needs to skip checking f30/f31, because they are used as scratch registers in check_all_fprs(), called by preempt_fpu(), and so could hold other values when the signal is taken. Fixes: e5d00aaac651 ("selftests/powerpc: Check all FPRs in fpu_preempt") Reported-by: Spoorthy <spoorthy@linux.ibm.com> Depends-on: 2ba107f6795d ("selftests/powerpc: Generate better bit patterns for FPU tests") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240301101035.1230024-1-mpe@ellerman.id.au
2024-03-01objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocksJosh Poimboeuf1-0/+12
If SAVE and RESTORE unwind hints are in different basic blocks, and objtool sees the RESTORE before the SAVE, it errors out with: vmlinux.o: warning: objtool: vmw_port_hb_in+0x242: objtool isn't smart enough to handle this CFI save/restore combo In such a case, defer following the RESTORE block until the straight-line path gets followed later. Fixes: 8faea26e6111 ("objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402240702.zJFNmahW-lkp@intel.com/ Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20240227073527.avcm5naavbv3cj5s@treble Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01fortify: Split reporting and avoid passing string pointerKees Cook1-1/+1
In preparation for KUnit testing and further improvements in fortify failure reporting, split out the report and encode the function and access failure (read or write overflow) into a single u8 argument. This mainly ends up saving a tiny bit of space in the data segment. For a defconfig with FORTIFY_SOURCE enabled: $ size gcc/vmlinux.before gcc/vmlinux.after text data bss dec hex filename 26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before 26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds7-16/+257
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29Merge tag 'nf-24-02-29' of ↵Paolo Abeni2-1/+190
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin. Patch #2 fixes an issue with bridge netfilter and broadcast/multicast packets. There is a day 0 bug in br_netfilter when used with connection tracking. Conntrack assumes that an nf_conn structure that is not yet added to hash table ("unconfirmed"), is only visible by the current cpu that is processing the sk_buff. For bridge this isn't true, sk_buff can get cloned in between, and clones can be processed in parallel on different cpu. This patch disables NAT and conntrack helpers for multicast packets. Patch #3 adds a selftest to cover for the br_netfilter bug. netfilter pull request 24-02-29 * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() ==================== Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29tools: ynl: fix handling of multiple mcast groupsJakub Kicinski1-0/+1
We never increment the group number iterator, so all groups get recorded into index 0 of the mcast_groups[] array. As a result YNL can only handle using the last group. For example using the "netdev" sample on kernel with page pool commands results in: $ ./samples/netdev YNL: Multicast group 'mgmt' not found Most families have only one multicast group, so this hasn't been noticed. Plus perhaps developers usually test the last group which would have worked. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240226214019.1255242-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29selftests: netfilter: add bridge conntrack + multicast test caseFlorian Westphal2-1/+190
Add test case for multicast packet confirm race. Without preceding patch, this should result in: WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0 Workqueue: events_unbound macvlan_process_broadcast RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0 ? __nf_conntrack_confirm+0x3ed/0x5f0 nf_confirm+0x2ad/0x2d0 nf_hook_slow+0x36/0xd0 ip_local_deliver+0xce/0x110 __netif_receive_skb_one_core+0x4f/0x70 process_backlog+0x8c/0x130 [..] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-28selftests: lib.mk: Do not process TEST_GEN_MODS_DIRMarcos Paulo de Souza1-3/+0
The directory itself doesn't need have path handling, since it's only to mean where is the directory that contains modules to be built. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-28selftests: livepatch: Avoid running the tests if kernel-devel is missingMarcos Paulo de Souza2-0/+19
By checking if KDIR is a valid directory we can safely skip the tests if kernel-devel isn't installed (default value of KDIR), or if KDIR variable passed doesn't exists. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402191417.XULH88Ct-lkp@intel.com/ Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-28selftests: livepatch: Add initial .gitignoreMarcos Paulo de Souza1-0/+1
Ignore the binary used to test livepatching a syscall. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-28kunit: tool: Print UML commandMickaël Salaün1-0/+1
As for the Qemu command, print the command used to run tests with UML. Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-27selftests: mptcp: explicitly trigger the listener diag code-pathPaolo Abeni1-1/+29
The mptcp diag interface already experienced a few locking bugs that lockdep and appropriate coverage have detected in advance. Let's add a test-case triggering the relevant code path, to prevent similar issues in the future. Be careful to cope with very slow environments. Note that we don't need an explicit timeout on the mptcp_connect subprocess to cope with eventual bug/hang-up as the final cleanup terminating the child processes will take care of that. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27selftests: mptcp: join: add ss mptcp support checkGeliang Tang1-0/+5
Commands 'ss -M' are used in script mptcp_join.sh to display only MPTCP sockets. So it must be checked if ss tool supports MPTCP in this script. Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-7-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27selftests: mptcp: rm subflow with v4/v4mapped addrGeliang Tang2-14/+18
Now both a v4 address and a v4-mapped address are supported when destroying a userspace pm subflow, this patch adds a second subflow to "userspace pm add & remove address" test, and two subflows could be removed two different ways, one with the v4mapped and one with v4. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 48d73f609dcc ("selftests: mptcp: update userspace pm addr tests") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-2-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: net: veth: test syncing GRO and XDP state while device is downJakub Kicinski1-0/+14
Test that we keep GRO flag in sync when XDP is disabled while the device is closed. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-25Merge tag 'cxl-fixes-6.8-rc6' of ↵Linus Torvalds4-9/+70
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "A collection of significant fixes for the CXL subsystem. The largest change in this set, that bordered on "new development", is the fix for the fact that the location of the new qos_class attribute did not match the Documentation. The fix ends up deleting more code than it added, and it has a new unit test to backstop basic errors in this interface going forward. So the "red-diff" and unit test saved the "rip it out and try again" response. In contrast, the new notification path for firmware reported CXL errors (CXL CPER notifications) has a locking context bug that can not be fixed with a red-diff. Given where the release cycle stands, it is not comfortable to squeeze in that fix in these waning days. So, that receives the "back it out and try again later" treatment. There is a regression fix in the code that establishes memory NUMA nodes for platform CXL regions. That has an ack from x86 folks. There are a couple more fixups for Linux to understand (reassemble) CXL regions instantiated by platform firmware. The policy around platforms that do not match host-physical-address with system-physical-address (i.e. systems that have an address translation mechanism between the address range reported in the ACPI CEDT.CFMWS and endpoint decoders) has been softened to abort driver load rather than teardown the memory range (can cause system hangs). Lastly, there is a robustness / regression fix for cases where the driver would previously continue in the face of error, and a fixup for PCI error notification handling. Summary: - Fix NUMA initialization from ACPI CEDT.CFMWS - Fix region assembly failures due to async init order - Fix / simplify export of qos_class information - Fix cxl_acpi initialization vs single-window-init failures - Fix handling of repeated 'pci_channel_io_frozen' notifications - Workaround platforms that violate host-physical-address == system-physical address assumptions - Defer CXL CPER notification handling to v6.9" * tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/acpi: Fix load failures due to single window creation failure acpi/ghes: Remove CXL CPER notifications cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window cxl/test: Add support for qos_class checking cxl: Fix sysfs export of qos_class for memdev cxl: Remove unnecessary type cast in cxl_qos_class_verify() cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf' cxl/region: Allow out of order assembly of autodiscovered regions cxl/region: Handle endpoint decoders in cxl_region_find_decoder() x86/numa: Fix the sort compare func used in numa_fill_memblks() x86/numa: Fix the address overlap check in numa_fill_memblks() cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24selftests/resctrl: Add non-contiguous CBMs CAT testMaciej Wieczor-Retman3-0/+93
Add tests for both L2 and L3 CAT to verify the return values generated by writing non-contiguous CBMs don't contradict the reported non-contiguous support information. Use a logical XOR to confirm return value of write_schemata() and non-contiguous CBMs support information match. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-24selftests/resctrl: Add resource_info_file_exists()Maciej Wieczor-Retman2-0/+26
Feature checking done by resctrl_mon_feature_exists() covers features represented by the feature name presence inside the 'mon_features' file in /sys/fs/resctrl/info/L3_MON directory. There exists a different way to represent feature support and that is by the presence of 0 or 1 in a single file in the info/resource directory. In this case the filename represents what feature support is being indicated. Add a generic function to check file presence in the /sys/fs/resctrl/info/<RESOURCE> directory. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-24selftests/resctrl: Split validate_resctrl_feature_request()Maciej Wieczor-Retman5-17/+31
validate_resctrl_feature_request() is used to test both if a resource is present in the info directory, and if a passed monitoring feature is present in the mon_features file. Refactor validate_resctrl_feature_request() into two smaller functions that each accomplish one check to give feature checking more granularity: - Resource directory presence in the /sys/fs/resctrl/info directory. - Feature name presence in the /sys/fs/resctrl/info/<RESOURCE>/mon_features file. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-24selftests/resctrl: Add a helper for the non-contiguous testMaciej Wieczor-Retman2-0/+37
The CAT non-contiguous selftests have to read the file responsible for reporting support of non-contiguous CBMs in kernel (resctrl). Then the test compares if that information matches what is reported by CPUID output. Add a generic helper function to read an unsigned number from /sys/fs/resctrl/info/<RESOURCE>/<FILE>. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-24selftests/resctrl: Add test groups and name L3 CAT test L3_CATIlpo Järvinen3-6/+16
To select test to run -t parameter can be used. However, -t cat currently maps to L3 CAT test which will be confusing after more CAT related tests will be added. Allow selecting tests as groups and call L3 CAT test "L3_CAT", "CAT" group will enable all CAT related tests. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-23Merge tag 'mm-hotfixes-stable-2024-02-22-15-02' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "A batch of MM (and one non-MM) hotfixes. Ten are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kasan: guard release_free_meta() shadow access with kasan_arch_is_ready() mm/damon/lru_sort: fix quota status loss due to online tunings mm/damon/reclaim: fix quota stauts loss due to online tunings MAINTAINERS: mailmap: update Shakeel's email address mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals mm: memcontrol: clarify swapaccount=0 deprecation warning mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array mm/zswap: invalidate duplicate entry when !zswap_enabled lib/Kconfig.debug: TEST_IOV_ITER depends on MMU mm/swap: fix race when skipping swapcache mm/swap_state: update zswap LRU's protection range with the folio locked selftests/mm: uffd-unit-test check if huge page size is 0 mm/damon/core: check apply interval in damon_do_apply_schemes() mm: zswap: fix missing folio cleanup in writeback race path
2024-02-23KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusiveSean Christopherson1-0/+6
Extend set_memory_region_test's invalid flags subtest to verify that GUEST_MEMFD is incompatible with READONLY. GUEST_MEMFD doesn't currently support writes from userspace and KVM doesn't support emulated MMIO on private accesses, and so KVM is supposed to reject the GUEST_MEMFD+READONLY in order to avoid configuration that KVM can't support. Link: https://lore.kernel.org/r/20240222190612.2942589-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-23KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcasesSean Christopherson1-1/+5
Actually create a GUEST_MEMFD instance and pass it to KVM when doing negative tests for KVM_SET_USER_MEMORY_REGION2 + KVM_MEM_GUEST_MEMFD. Without a valid GUEST_MEMFD file descriptor, KVM_SET_USER_MEMORY_REGION2 will always fail with -EINVAL, resulting in false passes for any and all tests of illegal combinations of KVM_MEM_GUEST_MEMFD and other flags. Fixes: 5d74316466f4 ("KVM: selftests: Add a memory region subtest to validate invalid flags") Link: https://lore.kernel.org/r/20240222190612.2942589-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-22Merge tag 'for-linus-iommufd' of ↵Linus Torvalds3-31/+91
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: - Fix dirty tracking bitmap collection when using reporting bitmaps that are not neatly aligned to u64's or match the IO page table radix tree layout. - Add self tests to cover the cases that were found to be broken. - Add missing enforcement of invalidation type in the uapi. - Fix selftest config generation * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: selftests/iommu: fix the config fragment iommufd: Reject non-zero data_type if no data_len is provided iommufd/iova_bitmap: Consider page offset for the pages to be pinned iommufd/selftest: Add mock IO hugepages tests iommufd/selftest: Hugepage mock domain support iommufd/selftest: Refactor mock_domain_read_and_clear_dirty() iommufd/selftest: Refactor dirty bitmap tests iommufd/iova_bitmap: Handle recording beyond the mapped pages iommufd/selftest: Test u64 unaligned bitmaps iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array iommufd/iova_bitmap: Bounds check mapped::pages access
2024-02-22Merge tag 'net-6.8.0-rc6' of ↵Linus Torvalds16-97/+345
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - af_unix: fix another unix GC hangup Previous releases - regressions: - core: fix a possible AF_UNIX deadlock - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready() - netfilter: nft_flow_offload: release dst in case direct xmit path is used - bridge: switchdev: ensure MDB events are delivered exactly once - l2tp: pass correct message length to ip6_append_data - dccp/tcp: unhash sk from ehash for tb2 alloc failure after check_estalblished() - tls: fixes for record type handling with PEEK - devlink: fix possible use-after-free and memory leaks in devlink_init() Previous releases - always broken: - bpf: fix an oops when attempting to read the vsyscall page through bpf_probe_read_kernel - sched: act_mirred: use the backlog for mirred ingress - netfilter: nft_flow_offload: fix dst refcount underflow - ipv6: sr: fix possible use-after-free and null-ptr-deref - mptcp: fix several data races - phonet: take correct lock to peek at the RX queue Misc: - handful of fixes and reliability improvements for selftests" * tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) l2tp: pass correct message length to ip6_append_data net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY selftests: ioam: refactoring to align with the fix Fix write to cloned skb in ipv6_hop_ioam() phonet/pep: fix racy skb_queue_empty() use phonet: take correct lock to peek at the RX queue net: sparx5: Add spinlock for frame transmission from CPU net/sched: flower: Add lock protection when remove filter handle devlink: fix port dump cmd type net: stmmac: Fix EST offset for dwmac 5.10 tools: ynl: don't leak mcast_groups on init error tools: ynl: make sure we always pass yarg to mnl_cb_run net: mctp: put sock on tag allocation failure netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure selftests: tls: add test for peeking past a record of a different type selftests: tls: add test for merging of same-type control messages ...
2024-02-22KVM: s390: selftests: memop: add a simple AR testEric Farman1-0/+31
There is a selftest that checks for an (expected) error when an invalid AR is specified, but not one that exercises the AR path. Add a simple test that mirrors the vanilla write/read test while providing an AR. An AR that contains zero will direct the CPU to use the primary address space normally used anyway. AR[1] is selected for this test because the host AR[1] is usually non-zero, and KVM needs to correctly swap those values. Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20240220211211.3102609-3-farman@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>