summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/kernel-parameters.txt
AgeCommit message (Collapse)AuthorFilesLines
2023-08-18Documentation: Fix typosBjorn Helgaas1-1/+1
Fix typos in Documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-08-18doc: update params of memhp_default_state=Ma Wupeng1-1/+1
Commit 5f47adf762b7 ("mm/memory_hotplug: allow to specify a default online_type") allows to specify a default online_type which make online memory to kernel or movable zone possible but fail to update to doc. Update doc to fit this change. Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230802074312.2111074-1-mawupeng1@huawei.com
2023-07-22docs: panic: cleanups for panic paramsRandy Dunlap1-15/+15
Move 'panic_print' to its correct place in alphabetical order. Add parameter format for 'pause_on_oops'. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230715034811.9665-1-rdunlap@infradead.org
2023-07-21Docs: kernel-parameters: sort arm64 entriesRandy Dunlap1-6/+6
Put the arm64 kernel-parameters entries into alphabetical order. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230715235105.17966-1-rdunlap@infradead.org
2023-07-07Merge tag 'docs-6.5-2' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull mode documentation updates from Jonathan Corbet: "A half-dozen late arriving docs patches. They are mostly fixes, but we also have a kernel-doc tweak for enums and the long-overdue removal of the outdated and redundant patch-submission comments at the top of the MAINTAINERS file" * tag 'docs-6.5-2' of git://git.lwn.net/linux: scripts: kernel-doc: support private / public marking for enums Documentation: KVM: SEV: add a missing backtick Documentation: ACPI: fix typo in ssdt-overlays.rst Fix documentation of panic_on_warn docs: remove the tips on how to submit patches from MAINTAINERS docs: fix typo in zh_TW and zh_CN translation
2023-07-04Fix documentation of panic_on_warnOlaf Hering1-1/+1
The kernel cmdline option panic_on_warn expects an integer, it is not a plain option as documented. A number of uses in the tree figured this already, and use panic_on_warn=1 for their purpose. Adjust a comment which otherwise may mislead people in the future. Fixes: 9e3961a09798 ("kernel: add panic_on_warn") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-30Merge tag 'riscv-for-linus-6.5-mw1' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for ACPI - Various cleanups to the ISA string parsing, including making them case-insensitive - Support for the vector extension - Support for independent irq/softirq stacks - Our CPU DT binding now has "unevaluatedProperties: false" * tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits) riscv: hibernate: remove WARN_ON in save_processor_state dt-bindings: riscv: cpus: switch to unevaluatedProperties: false dt-bindings: riscv: cpus: add a ref the common cpu schema riscv: stack: Add config of thread stack size riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK RISC-V: always report presence of extensions formerly part of the base ISA dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support RISC-V: remove decrement/increment dance in ISA string parser RISC-V: rework comments in ISA string parser RISC-V: validate riscv,isa at boot, not during ISA string parsing RISC-V: split early & late of_node to hartid mapping RISC-V: simplify register width check in ISA string parsing perf: RISC-V: Limit the number of counters returned from SBI riscv: replace deprecated scall with ecall riscv: uprobes: Restore thread.bad_cause riscv: mm: try VMA lock-based page fault handling first riscv: mm: Pre-allocate PGD entries for vmalloc/modules area RISC-V: hwprobe: Expose Zba, Zbb, and Zbs RISC-V: Track ISA extensions per hart ...
2023-06-30Merge tag 'iommu-updates-v6.5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - iova_magazine_alloc() optimization - Make flush-queue an IOMMU driver capability - Consolidate the error handling around device attachment AMD IOMMU changes: - AVIC Interrupt Remapping Improvements - Some minor fixes and cleanups Intel VT-d changes from Lu Baolu: - Small and misc cleanups ARM-SMMU changes from Will Deacon: - Device-tree binding updates: - Add missing clocks for SC8280XP and SA8775 Adreno SMMUs - Add two new Qualcomm SMMUs in SDX75 and SM6375 - Workarounds for Arm MMU-700 errata: - 1076982: Avoid use of SEV-based cmdq wakeup - 2812531: Terminate command batches with a CMD_SYNC - Enforce single-stage translation to avoid nesting-related errata - Set the correct level hint for range TLB invalidation on teardown .. and some other minor fixes and cleanups (including Freescale PAMU and virtio-iommu changes)" * tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (50 commits) iommu/vt-d: Remove commented-out code iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one() iommu/vt-d: Handle the failure case of dmar_reenable_qi() iommu/vt-d: Remove unnecessary (void*) conversions iommu/amd: Remove extern from function prototypes iommu/amd: Use BIT/BIT_ULL macro to define bit fields iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro iommu/amd: Fix compile error for unused function iommu/amd: Improving Interrupt Remapping Table Invalidation iommu/amd: Do not Invalidate IRT when IRTE caching is disabled iommu/amd: Introduce Disable IRTE Caching Support iommu/amd: Remove the unused struct amd_ir_data.ref iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga() iommu/arm-smmu-v3: Set TTL invalidation hint better iommu/arm-smmu-v3: Document nesting-related errata iommu/arm-smmu-v3: Add explicit feature for nesting iommu/arm-smmu-v3: Document MMU-700 erratum 2812531 iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982 dt-bindings: arm-smmu: Add SDX75 SMMU compatible dt-bindings: arm-smmu: Add SM6375 GPU SMMU ...
2023-06-30Merge tag 'mips_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds1-4/+4
Pull MIPS updates from Thomas Bogendoerfer: - add support for TP-Link HC220 G5 v1 - add support for Wifi/Bluetooth on CI20 - rework Ralink clock and reset handling - cleanups and fixes * tag 'mips_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (58 commits) MIPS: Loongson64: DTS: Add RTC support to Loongson-2K1000 MIPS: Loongson64: DTS: Add RTC support to LS7A PCH MIPS: OCTEON: octeon-usb: cleanup divider calculation MIPS: OCTEON: octeon-usb: introduce dwc3_octeon_{read,write}q MIPS: OCTEON: octeon-usb: move gpio config to separate function MIPS: OCTEON: octeon-usb: use bitfields for shim register MIPS: OCTEON: octeon-usb: use bitfields for host config register MIPS: OCTEON: octeon-usb: use bitfields for control register MIPS: OCTEON: octeon-usb: add all register offsets mips: ralink: match all supported system controller compatible strings MIPS: dec: prom: Address -Warray-bounds warning MIPS: DTS: CI20: Raise VDDCORE voltage to 1.125 volts clk: ralink: mtmips: Fix uninitialized use of ret in mtmips_register_{fixed,factor}_clocks() mips: ralink: introduce commonly used remap node function mips: pci-mt7620: use dev_info() to log PCIe device detection result mips: pci-mt7620: do not print NFTS register value as error log MAINTAINERS: add Mediatek MTMIPS Clock maintainer mips: ralink: get cpu rate from new driver code mips: ralink: remove reset related code mips: ralink: mt7620: remove clock related code ...
2023-06-28Merge tag 'docs-arm64-move' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull arm64 documentation move from Jonathan Corbet: "Move the arm64 architecture documentation under Documentation/arch/. This brings some order to the documentation directory, declutters the top-level directory, and makes the documentation organization more closely match that of the source" * tag 'docs-arm64-move' of git://git.lwn.net/linux: perf arm-spe: Fix a dangling Documentation/arm64 reference mm: Fix a dangling Documentation/arm64 reference arm64: Fix dangling references to Documentation/arm64 dt-bindings: fix dangling Documentation/arm64 reference docs: arm64: Move arm64 documentation under Documentation/arch/
2023-06-28Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-0/+12
Pull workqueue updates from Tejun Heo: - Concurrency-managed per-cpu work items that hog CPUs and delay the execution of other work items are now automatically detected and excluded from concurrency management. Reporting on such work items can also be enabled through a config option. - Added tools/workqueue/wq_monitor.py which improves visibility into workqueue usages and behaviors. - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles, superseded by commit afa4bb778e48 in mainline. * tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0 workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle() workqueue: fix enum type for gcc-13 workqueue: Track and monitor per-workqueue CPU time usage workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE workqueue: Improve locking rule description for worker fields workqueue: Move worker_set/clr_flags() upwards workqueue: Re-order struct worker fields workqueue: Add pwq->stats[] and a monitoring script Further upgrade queue_work_on() comment
2023-06-28Merge tag 'objtool-core-2023-06-27' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molar: "Build footprint & performance improvements: - Reduce memory usage with CONFIG_DEBUG_INFO=y In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel, DWARF creates almost 200 million relocations, ballooning objtool's peak heap usage to 53GB. These patches reduce that to 25GB. On a distro-type kernel with kernel IBT enabled, they reduce objtool's peak heap usage from 4.2GB to 2.8GB. These changes also improve the runtime significantly. Debuggability improvements: - Add the unwind_debug command-line option, for more extend unwinding debugging output - Limit unreachable warnings to once per function - Add verbose option for disassembling affected functions - Include backtrace in verbose mode - Detect missing __noreturn annotations - Ignore exc_double_fault() __noreturn warnings - Remove superfluous global_noreturns entries - Move noreturn function list to separate file - Add __kunit_abort() to noreturns Unwinder improvements: - Allow stack operations in UNWIND_HINT_UNDEFINED regions - drm/vmwgfx: Add unwind hints around RBP clobber Cleanups: - Move the x86 entry thunk restore code into thunk functions - x86/unwind/orc: Use swap() instead of open coding it - Remove unnecessary/unused variables Fixes for modern stack canary handling" * tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/orc: Make the is_callthunk() definition depend on CONFIG_BPF_JIT=y objtool: Skip reading DWARF section data objtool: Free insns when done objtool: Get rid of reloc->rel[a] objtool: Shrink elf hash nodes objtool: Shrink reloc->sym_reloc_entry objtool: Get rid of reloc->jump_table_start objtool: Get rid of reloc->addend objtool: Get rid of reloc->type objtool: Get rid of reloc->offset objtool: Get rid of reloc->idx objtool: Get rid of reloc->list objtool: Allocate relocs in advance for new rela sections objtool: Add for_each_reloc() objtool: Don't free memory in elf_close() objtool: Keep GElf_Rel[a] structs synced objtool: Add elf_create_section_pair() objtool: Add mark_sec_changed() objtool: Fix reloc_hash size objtool: Consolidate rel/rela handling ...
2023-06-27Merge tag 'x86_mtrr_for_v6.5' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mtrr updates from Borislav Petkov: "A serious scrubbing of the MTRR code including adding a new map mechanism in order to look up the memory type of a region easily. Also address memory range lookup issues like returning an invalid memory type. Furthermore, this handles the decoupling of PAT from MTRR more naturally. All work by Juergen Gross" * tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/xen: Set default memory type for PV guests to WB x86/mtrr: Unify debugging printing x86/mtrr: Remove unused code x86/mm: Only check uniform after calling mtrr_type_lookup() x86/mtrr: Don't let mtrr_type_lookup() return MTRR_TYPE_INVALID x86/mtrr: Use new cache_map in mtrr_type_lookup() x86/mtrr: Add mtrr=debug command line option x86/mtrr: Construct a memory map with cache modes x86/mtrr: Add get_effective_type() service function x86/mtrr: Allocate mtrr_value array dynamically x86/mtrr: Move 32-bit code from mtrr.c to legacy.c x86/mtrr: Have only one set_mtrr() variant x86/mtrr: Replace vendor tests in MTRR code x86/xen: Set MTRR state when running as Xen PV initial domain x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest x86/mtrr: Support setting MTRR state for software defined MTRRs x86/mtrr: Replace size_or_mask and size_and_mask with a much easier concept x86/mtrr: Remove physical address size calculation
2023-06-27Merge tag 'docs-6.5' of git://git.lwn.net/linuxLinus Torvalds1-24/+39
Pull documentation updates from Jonathan Corbet: "It's been a relatively calm cycle in docsland. We do have: - Some initial page-table documentation from Linus (the other Linus) - Regression-handling documentation improvements from Thorsten - Addition of kerneldoc documentation for the ERR_PTR() and related macros from James Seo ... and the usual collection of fixes and updates" * tag 'docs-6.5' of git://git.lwn.net/linux: docs: consolidate storage interfaces Documentation: update git configuration for Link: tag Documentation: KVM: make corrections to vcpu-requests.rst Documentation: KVM: make corrections to ppc-pv.rst Documentation: KVM: make corrections to locking.rst Documentation: KVM: make corrections to halt-polling.rst Documentation: virt: correct location of haltpoll module params Documentation/mm: Initial page table documentation docs: crypto: async-tx-api: fix typo in struct name docs/doc-guide: Clarify how to write tables docs: handling-regressions: rework section about fixing procedures docs: process: fix a typoed cross-reference docs: submitting-patches: Discuss interleaved replies MAINTAINERS: direct process doc changes to a dedicated ML Documentation: core-api: Add error pointer functions to kernel-api err.h: Add missing kerneldocs for error pointer functions Documentation: conf.py: Add __force to c_id_attributes docs: clarify KVM related kernel parameters' descriptions docs: consolidate human interface subsystems docs: admin-guide: Add information about intel_pstate active mode
2023-06-27Merge tag 'rcu.2023.06.22a' of ↵Linus Torvalds1-62/+78
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "Documentation updates Miscellaneous fixes, perhaps most notably: - Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. - Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. - Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree_rcu updates: - Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. Callback-offloading updates: - Fix a number of bugs involving the shrinker and lazy callbacks. Tasks RCU updates Torture-test updates" * tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) torture: Remove duplicated argument -enable-kvm for ppc64 doc/rcutorture: Add description of rcutorture.stall_cpu_block rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcutorture: Correct name of use_softirq module parameter locktorture: Add long_hold to adjust lock-hold delays rcu/nocb: Make shrinker iterate only over NOCB CPUs rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp rcu: Employ jiffies-based backstop to callback time limit rcu: Check callback-invocation time limit for rcuc kthreads rcu: Remove RCU_NONIDLE() rcu: Add more RCU files to kernel-api.rst rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker rcu/nocb: Fix shrinker race against callback enqueuer rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading ...
2023-06-27Merge tag 'arm64-upstream' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Notable features are user-space support for the memcpy/memset instructions and the permission indirection extension. - Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays - User-space support for the Armv8.8 memcpy/memset instructions - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups - Removal of superfluous ISBs on context switch (following retrospective architecture tightening) - Decode the ISS2 register during faults for additional information to help with debugging - KPTI clean-up/simplification of the trampoline exit code - Addressing several -Wmissing-prototype warnings - Kselftest improvements for signal handling and ptrace - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code - More sysreg conversions to the automatic register/bitfields generation - CPU capabilities handling cleanup - Arm documentation updates: ACPI, ptdump" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits) kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe Documentation/arm64: Add ptdump documentation arm64: hibernate: remove WARN_ON in save_processor_state kselftest/arm64: Log signal code and address for unexpected signals docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst arm64/fpsimd: Exit streaming mode when flushing tasks docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference ...
2023-06-26Merge tag 'smp-core-2023-06-26' of ↵Linus Torvalds1-14/+6
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...
2023-06-21docs: arm64: Move arm64 documentation under Documentation/arch/Jonathan Corbet1-1/+1
Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Move Documentation/arm64 into arch/ (along with the Chinese equvalent translations) and fix up documentation references. Cc: Will Deacon <will@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Hu Haowen <src.res@email.cn> Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Yantengsi <siyanteng@loongson.cn> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-09iommu/amd: Introduce Disable IRTE Caching SupportSuravee Suthikulpanit1-0/+1
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration for each device. In a normal operation, the AMD IOMMU caches the table to optimize subsequent data accesses. This requires the IOMMU driver to invalidate IRT whenever it updates the table. The invalidation process includes issuing an INVALIDATE_INTERRUPT_TABLE command following by a COMPLETION_WAIT command. However, there are cases in which the IRT is updated at a high rate. For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large amount of vcpus and VFIO PCI pass-through devices, the invalidation process could potentially become a performance bottleneck. Introducing a new kernel boot option: amd_iommu=irtcachedis which disables IRTE caching by setting the IRTCachedis bit in each IOMMU Control register, and bypass the IRT invalidation process. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09MIPS: Select CONFIG_GENERIC_IDLE_POLL_SETUPJiaxun Yang1-2/+2
hlt,nohlt paramaters are useful when debugging cpuidle related issues. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-06-09MIPS: Rework smt cmdline parametersJiaxun Yang1-2/+2
Provide a generic smt parameters interface aligned with s390 to allow users to limit smt usage and threads per core. It replaced previous undocumented "nothreads" parameter for smp-cps which is ambiguous and does not cover smp-mt. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-06-07Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a', ↵Paul E. McKenney1-2/+11
'nocb.2023.05.11a', 'rcu-tasks.2023.05.10a', 'torture.2023.05.15a' and 'rcu-urgent.2023.06.06a' into HEAD doc.2023.05.10a: Documentation updates fixes.2023.05.11a: Miscellaneous fixes kvfree.2023.05.10a: kvfree_rcu updates nocb.2023.05.11a: Callback-offloading updates rcu-tasks.2023.05.10a: Tasks RCU updates torture.2023.05.15a: Torture-test updates rcu-urgent.2023.06.06a: Urgent SRCU fix
2023-06-05block: move the code to do early boot lookup of block devices to block/Christoph Hellwig1-2/+2
Create a new block/early-lookup.c to keep the early block device lookup code instead of having this code sit with the early mount code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05init: improve the name_to_dev_t interfaceChristoph Hellwig1-2/+2
name_to_dev_t has a very misleading name, that doesn't make clear it should only be used by the early init code, and also has a bad calling convention that doesn't allow returning different kinds of errors. Rename it to early_lookup_bdev to make the use case clear, and return an errno, where -EINVAL means the string could not be parsed, and -ENODEV means it the string was valid, but there was no device found for it. Also stub out the whole call for !CONFIG_BLOCK as all the non-block root cases are always covered in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05init: move the nfs/cifs/ram special cases out of name_to_dev_tChristoph Hellwig1-1/+6
The nfs/cifs/ram special case only needs to be parsed once, and only in the boot code. Move them out of name_to_dev_t and into prepare_namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05arm64: mops: allow disabling MOPS from the kernel command lineKristina Martsenko1-0/+3
Make it possible to disable the MOPS extension at runtime using the kernel command line. This can be useful for testing or working around hardware issues. For example it could be used to test new memory copy routines that do not use MOPS instructions (e.g. from Arm Optimized Routines). Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-11-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-01RISC-V: Add ACPI initialization in setup_arch()Sunil V L1-4/+4
Initialize the ACPI core for RISC-V during boot. ACPI tables and interpreter are initialized based on the information passed from the firmware and the value of the kernel parameter 'acpi'. With ACPI support added for RISC-V, the kernel parameter 'acpi' is also supported on RISC-V. Hence, update the documentation. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230515054928.2079268-9-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01x86/mtrr: Add mtrr=debug command line optionJuergen Gross1-0/+4
Add a new command line option "mtrr=debug" for getting debug output after building the new cache mode map. The output will include MTRR register values and the resulting map. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20230502120931.20719-13-jgross@suse.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-05-19docs: clarify KVM related kernel parameters' descriptionsYan-Jie Wang1-24/+29
The descriptions of certain KVM related kernel parameters can be confusing. They state "Disable ...," which may make people think that setting them to 1 will disable the associated feature when in fact the opposite is true. This commit addresses this issue by revising the descriptions of these parameters by using "Control..." rather than "Enable/Disable...". 1==enabled or 0==disabled can be communicated by the description of default value such as "1 (enabled)" or "0 (disabled)". Also update the description of KVM's default value for kvm-intel.nested as it is enabled by default. Signed-off-by: Yan-Jie Wang <yanjiewtw@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230503081530.19956-1-yanjiewtw@gmail.com
2023-05-19docs: admin-guide: Add information about intel_pstate active modeNatesh Sharma1-0/+10
Information about intel_pstate active mode is added in the doc. This operation mode could be used to set on the hardware when it's not activated. Status of the mode could be checked from sysfs file i.e., /sys/devices/system/cpu/intel_pstate/status. The information is already available in cpu-freq/intel-pstate.txt documentation. Signed-off-by: Natesh Sharma <nsharma@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> [jc: reformatted for width ] Link: https://lore.kernel.org/r/20230427083706.49882-1-nsharma@redhat.com
2023-05-18workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanismTejun Heo1-0/+5
Workqueue now automatically marks per-cpu work items that hog CPU for too long as CPU_INTENSIVE, which excludes them from concurrency management and prevents stalling other concurrency-managed work items. If a work function keeps running over the thershold, it likely needs to be switched to use an unbound workqueue. This patch adds a debug mechanism which tracks the work functions which trigger the automatic CPU_INTENSIVE mechanism and report them using pr_warn() with exponential backoff. v3: Documentation update. v2: Drop bouncing to kthread_worker for printing messages. It was to avoid introducing circular locking dependency through printk but not effective as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's just print directly using printk_deferred(). Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org>
2023-05-18workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVETejun Heo1-0/+7
If a per-cpu work item hogs the CPU, it can prevent other work items from starting through concurrency management. A per-cpu workqueue which intends to host such CPU-hogging work items can choose to not participate in concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be error-prone and difficult to debug when missed. This patch adds an automatic CPU usage based detection. If a concurrency-managed work item consumes more CPU time than the threshold (10ms by default) continuously without intervening sleeps, wq_worker_tick() which is called from scheduler_tick() will detect the condition and automatically mark it CPU_INTENSIVE. The mechanism isn't foolproof: * Detection depends on tick hitting the work item. Getting preempted at the right timings may allow a violating work item to evade detection at least temporarily. * nohz_full CPUs may not be running ticks and thus can fail detection. * Even when detection is working, the 10ms detection delays can add up if many CPU-hogging work items are queued at the same time. However, in vast majority of cases, this should be able to detect violations reliably and provide reasonable protection with a small increase in code complexity. If some work items trigger this condition repeatedly, the bigger problem likely is the CPU being saturated with such per-cpu work items and the solution would be making them UNBOUND. The next patch will add a debug mechanism to help spot such cases. v4: Documentation for workqueue.cpu_intensive_thresh_us added to kernel-parameters.txt. v3: Switch to use wq_worker_tick() instead of hooking into preemptions as suggested by Peter. v2: Lai pointed out that wq_worker_stopping() also needs to be called from preemption and rtlock paths and an earlier patch was updated accordingly. This patch adds a comment describing the risk of infinte recursions and how they're avoided. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-16x86/unwind/orc: Add 'unwind_debug' cmdline optionJosh Poimboeuf1-0/+6
Sometimes the one-line ORC unwinder warnings aren't very helpful. Add a new 'unwind_debug' cmdline option which will dump the full stack contents of the current task when an error condition is encountered. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/6afb9e48a05fd2046bfad47e69b061b43dfd0e0e.1681331449.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-15doc/rcutorture: Add description of rcutorture.stall_cpu_blockZqiang1-2/+11
If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y, then run the rcutorture tests specifying stalls as follows: runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \ bootparams="console=ttyS0 rcutorture.stall_cpu=30 \ rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d The tests will produce the following splat: [ 10.841071] rcu-torture: rcu_torture_stall begin CPU stall [ 10.841073] rcu_torture_stall start on CPU 3. [ 10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000 .... [ 10.841108] Call Trace: [ 10.841110] <TASK> [ 10.841112] dump_stack_lvl+0x64/0xb0 [ 10.841118] dump_stack+0x10/0x20 [ 10.841121] __schedule_bug+0x8b/0xb0 [ 10.841126] __schedule+0x2172/0x2940 [ 10.841157] schedule+0x9b/0x150 [ 10.841160] schedule_timeout+0x2e8/0x4f0 [ 10.841192] schedule_timeout_uninterruptible+0x47/0x50 [ 10.841195] rcu_torture_stall+0x2e8/0x300 [ 10.841199] kthread+0x175/0x1a0 [ 10.841206] ret_from_fork+0x2c/0x50 This is because the rcutorture.stall_cpu_block=1 module parameter causes rcu_torture_stall() to invoke schedule_timeout_uninterruptible() within an RCU read-side critical section. This in turn results in a quiescent state (which prevents the stall) and a sleep in an atomic context (which produces the above splat). Although this code is operating as designed, the design has proven to be counterintuitive to many. This commit therefore updates the description in kernel-parameters.txt accordingly. [ paulmck: Apply Joel Fernandes feedback. ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-05-15cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATEThomas Gleixner1-0/+6
There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and then waiting for it to respond before moving on to the next. Allow a platform to enable parallel setup which brings all to be onlined CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the control CPU (BP) is single-threaded the important part is the last state CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up. This allows the CPUs to run up to the first sychronization point cpuhp_ap_sync_alive() where they wait for the control CPU to release them one by one for the full onlining procedure. This parallelism depends on the CPU hotplug core sync mechanism which ensures that the parallel brought up CPUs wait for release before touching any state which would make the CPU visible to anything outside the hotplug control mechanism. To handle the SMT constraints of X86 correctly the bringup happens in two iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up the primary SMT threads of each core first, which can load the microcode without the need to rendevouz with the thread siblings. Once that's completed it brings up the secondary SMT threads. Co-developed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
2023-05-15x86/topology: Remove CPU0 hotplug optionThomas Gleixner1-14/+0
This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support physical hotplug of CPU0: "We'll change this code in the future to wake up hard offlined CPU0 if real platform and request are available." 11 years later this has not happened and physical hotplug is not officially supported. Remove the cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.715707999@linutronix.de
2023-05-10doc: Document the rcutree.rcu_resched_ns module parameterPaul E. McKenney1-0/+7
This commit adds kernel-parameters.txt documentation for the rcutree.rcu_resched_ns module parameter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-05-10doc: Get rcutree module parameters back into alpha orderPaul E. McKenney1-60/+60
This commit puts the rcutree module parameters back into proper alphabetical order. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-04-29Merge tag 'riscv-for-linus-6.4-mw1' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for runtime detection of the Svnapot extension - Support for Zicboz when clearing pages - We've moved to GENERIC_ENTRY - Support for !MMU on rv32 systems - The linear region is now mapped via huge pages - Support for building relocatable kernels - Support for the hwprobe interface - Various fixes and cleanups throughout the tree * tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits) RISC-V: hwprobe: Explicity check for -1 in vdso init RISC-V: hwprobe: There can only be one first riscv: Allow to downgrade paging mode from the command line dt-bindings: riscv: add sv57 mmu-type RISC-V: hwprobe: Remove __init on probe_vendor_features() riscv: Use --emit-relocs in order to move .rela.dyn in init riscv: Check relocations at compile time powerpc: Move script to check relocations at compile time in scripts/ riscv: Introduce CONFIG_RELOCATABLE riscv: Move .rela.dyn outside of init to avoid empty relocations riscv: Prepare EFI header for relocatable kernels riscv: Unconditionnally select KASAN_VMALLOC if KASAN riscv: Fix ptdump when KASAN is enabled riscv: Fix EFI stub usage of KASAN instrumented strcmp function riscv: Move DTB_EARLY_BASE_VA to the kernel address space riscv: Rework kasan population functions riscv: Split early and final KASAN population functions riscv: Use PUD/P4D/PGD pages for the linear mapping riscv: Move the linear mapping creation in its own function riscv: Get rid of riscv_pfn_base variable ...
2023-04-29Merge tag 'smp-core-2023-04-27' of ↵Linus Torvalds1-9/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-28Merge tag 'modules-6.4-rc1' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The summary of the changes for this pull requests is: - Song Liu's new struct module_memory replacement - Nick Alcock's MODULE_LICENSE() removal for non-modules - My cleanups and enhancements to reduce the areas where we vmalloc module memory for duplicates, and the respective debug code which proves the remaining vmalloc pressure comes from userspace. Most of the changes have been in linux-next for quite some time except the minor fixes I made to check if a module was already loaded prior to allocating the final module memory with vmalloc and the respective debug code it introduces to help clarify the issue. Although the functional change is small it is rather safe as it can only *help* reduce vmalloc space for duplicates and is confirmed to fix a bootup issue with over 400 CPUs with KASAN enabled. I don't expect stable kernels to pick up that fix as the cleanups would have also had to have been picked up. Folks on larger CPU systems with modules will want to just upgrade if vmalloc space has been an issue on bootup. Given the size of this request, here's some more elaborate details: The functional change change in this pull request is the very first patch from Song Liu which replaces the 'struct module_layout' with a new 'struct module_memory'. The old data structure tried to put together all types of supported module memory types in one data structure, the new one abstracts the differences in memory types in a module to allow each one to provide their own set of details. This paves the way in the future so we can deal with them in a cleaner way. If you look at changes they also provide a nice cleanup of how we handle these different memory areas in a module. This change has been in linux-next since before the merge window opened for v6.3 so to provide more than a full kernel cycle of testing. It's a good thing as quite a bit of fixes have been found for it. Jason Baron then made dynamic debug a first class citizen module user by using module notifier callbacks to allocate / remove module specific dynamic debug information. Nick Alcock has done quite a bit of work cross-tree to remove module license tags from things which cannot possibly be module at my request so to: a) help him with his longer term tooling goals which require a deterministic evaluation if a piece a symbol code could ever be part of a module or not. But quite recently it is has been made clear that tooling is not the only one that would benefit. Disambiguating symbols also helps efforts such as live patching, kprobes and BPF, but for other reasons and R&D on this area is active with no clear solution in sight. b) help us inch closer to the now generally accepted long term goal of automating all the MODULE_LICENSE() tags from SPDX license tags In so far as a) is concerned, although module license tags are a no-op for non-modules, tools which would want create a mapping of possible modules can only rely on the module license tag after the commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"). Nick has been working on this *for years* and AFAICT I was the only one to suggest two alternatives to this approach for tooling. The complexity in one of my suggested approaches lies in that we'd need a possible-obj-m and a could-be-module which would check if the object being built is part of any kconfig build which could ever lead to it being part of a module, and if so define a new define -DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've suggested would be to have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as well but that means getting kconfig symbol names mapping to modules always, and I don't think that's the case today. I am not aware of Nick or anyone exploring either of these options. Quite recently Josh Poimboeuf has pointed out that live patching, kprobes and BPF would benefit from resolving some part of the disambiguation as well but for other reasons. The function granularity KASLR (fgkaslr) patches were mentioned but Joe Lawrence has clarified this effort has been dropped with no clear solution in sight [1]. In the meantime removing module license tags from code which could never be modules is welcomed for both objectives mentioned above. Some developers have also welcomed these changes as it has helped clarify when a module was never possible and they forgot to clean this up, and so you'll see quite a bit of Nick's patches in other pull requests for this merge window. I just picked up the stragglers after rc3. LWN has good coverage on the motivation behind this work [2] and the typical cross-tree issues he ran into along the way. The only concrete blocker issue he ran into was that we should not remove the MODULE_LICENSE() tags from files which have no SPDX tags yet, even if they can never be modules. Nick ended up giving up on his efforts due to having to do this vetting and backlash he ran into from folks who really did *not understand* the core of the issue nor were providing any alternative / guidance. I've gone through his changes and dropped the patches which dropped the module license tags where an SPDX license tag was missing, it only consisted of 11 drivers. To see if a pull request deals with a file which lacks SPDX tags you can just use: ./scripts/spdxcheck.py -f \ $(git diff --name-only commid-id | xargs echo) You'll see a core module file in this pull request for the above, but that's not related to his changes. WE just need to add the SPDX license tag for the kernel/module/kmod.c file in the future but it demonstrates the effectiveness of the script. Most of Nick's changes were spread out through different trees, and I just picked up the slack after rc3 for the last kernel was out. Those changes have been in linux-next for over two weeks. The cleanups, debug code I added and final fix I added for modules were motivated by David Hildenbrand's report of boot failing on a systems with over 400 CPUs when KASAN was enabled due to running out of virtual memory space. Although the functional change only consists of 3 lines in the patch "module: avoid allocation if module is already present and ready", proving that this was the best we can do on the modules side took quite a bit of effort and new debug code. The initial cleanups I did on the modules side of things has been in linux-next since around rc3 of the last kernel, the actual final fix for and debug code however have only been in linux-next for about a week or so but I think it is worth getting that code in for this merge window as it does help fix / prove / evaluate the issues reported with larger number of CPUs. Userspace is not yet fixed as it is taking a bit of time for folks to understand the crux of the issue and find a proper resolution. Worst come to worst, I have a kludge-of-concept [3] of how to make kernel_read*() calls for modules unique / converge them, but I'm currently inclined to just see if userspace can fix this instead" Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0] Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1] Link: https://lwn.net/Articles/927569/ [2] Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3] * tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits) module: add debugging auto-load duplicate module support module: stats: fix invalid_mod_bytes typo module: remove use of uninitialized variable len module: fix building stats for 32-bit targets module: stats: include uapi/linux/module.h module: avoid allocation if module is already present and ready module: add debug stats to help identify memory pressure module: extract patient module check into helper modules/kmod: replace implementation with a semaphore Change DEFINE_SEMAPHORE() to take a number argument module: fix kmemleak annotations for non init ELF sections module: Ignore L0 and rename is_arm_mapping_symbol() module: Move is_arm_mapping_symbol() to module_symbol.h module: Sync code of is_arm_mapping_symbol() scripts/gdb: use mem instead of core_layout to get the module address interconnect: remove module-related code interconnect: remove MODULE_LICENSE in non-modules zswap: remove MODULE_LICENSE in non-modules zpool: remove MODULE_LICENSE in non-modules x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules ...
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds1-9/+14
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-26riscv: Allow to downgrade paging mode from the command lineAlexandre Ghiti1-1/+4
Add 2 early command line parameters that allow to downgrade satp mode (using the same naming as x86): - "no5lvl": use a 4-level page table (down from sv57 to sv48) - "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39) Note that going through the device tree to get the kernel command line works with ACPI too since the efi stub creates a device tree anyway with the command line. In KASAN kernels, we can't use the libfdt that early in the boot process since we are not ready to execute instrumented functions. So instead of using the "generic" libfdt, we compile our own versions of those functions that are not instrumented and that are prefixed so that they do not conflict with the generic ones. We also need the non-instrumented versions of the string functions and the prefixed versions of memcpy/memmove. This is largely inspired by commit aacd149b6238 ("arm64: head: avoid relocating the kernel twice for KASLR") from which I removed compilation flags that were not relevant to RISC-V at the moment (LTO, SCS). Also note that we have to link with -z norelro to avoid ld.lld to throw a warning with the new .got sections, like in commit 311bea3cb9ee ("arm64: link with -z norelro for LLD or aarch64-elf"). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-26Merge tag 'pm-6.4-rc1' of ↵Linus Torvalds1-17/+23
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These update several cpufreq drivers and the cpufreq core, add sysfs interface for exposing the time really spent in the platform low-power state during suspend-to-idle, update devfreq (core and drivers) and the pm-graph suite of tools and clean up code. Specifics: - Fix the frequency unit in cpufreq_verify_current_freq checks() Sanjay Chandrashekara) - Make mode_state_machine in amd-pstate static (Tom Rix) - Make the cpufreq core require drivers with target_index() to set freq_table (Viresh Kumar) - Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry (Jingyu Wang) - Use of_property_read_bool() for boolean properties in the pmac32 cpufreq driver (Rob Herring) - Make the cpufreq sysfs interface return proper error codes on obviously invalid input (qinyu) - Add guided autonomous mode support to the AMD P-state driver (Wyes Karny) - Make the Intel P-state driver enable HWP IO boost on all server platforms (Srinivas Pandruvada) - Add opp and bandwidth support to tegra194 cpufreq driver (Sumit Gupta) - Use of_property_present() for testing DT property presence (Rob Herring) - Remove MODULE_LICENSE in non-modules (Nick Alcock) - Add SM7225 to cpufreq-dt-platdev blocklist (Luca Weiss) - Optimizations and fixes for qcom-cpufreq-hw driver (Krzysztof Kozlowski, Konrad Dybcio, and Bjorn Andersson) - DT binding updates for qcom-cpufreq-hw driver (Konrad Dybcio and Bartosz Golaszewski) - Updates and fixes for mediatek driver (Jia-Wei Chang and AngeloGioacchino Del Regno) - Use of_property_present() for testing DT property presence in the cpuidle code (Rob Herring) - Drop unnecessary (void *) conversions from the PM core (Li zeming) - Add sysfs files to represent time spent in a platform sleep state during suspend-to-idle and make AMD and Intel PMC drivers use them Mario Limonciello) - Use of_property_present() for testing DT property presence (Rob Herring) - Add set_required_opps() callback to the 'struct opp_table', to make the code paths cleaner (Viresh Kumar) - Update the pm-graph siute of utilities to v5.11 with the following changes: * New script which allows users to install the latest pm-graph from the upstream github repo. * Update all the dmesg suspend/resume PM print formats to be able to process recent timelines using dmesg only. * Add ethtool output to the log for the system's ethernet device if ethtool exists. * Make the tool more robustly handle events where mangled dmesg or ftrace outputs do not include all the requisite data. - Make the sleepgraph utility recognize "CPU killed" messages (Xueqin Luo) - Remove unneeded SRCU selection in Kconfig because it's always set from devfreq core (Paul E. McKenney) - Drop of_match_ptr() macro from exynos-bus.c because this driver is always using the DT table for driver probe (Krzysztof Kozlowski) - Use the preferred of_property_present() instead of the low-level of_get_property() on exynos-bus.c (Rob Herring) - Use devm_platform_get_and_ioream_resource() in exyno-ppmu.c (Yang Li)" * tag 'pm-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits) platform/x86/intel/pmc: core: Report duration of time in HW sleep state platform/x86/intel/pmc: core: Always capture counters on suspend platform/x86/amd: pmc: Report duration of time in hw sleep state PM: Add sysfs files to represent time spent in hardware sleep state cpufreq: use correct unit when verify cur freq cpufreq: tegra194: add OPP support and set bandwidth cpufreq: amd-pstate: Make varaiable mode_state_machine static PM: core: Remove unnecessary (void *) conversions cpufreq: drivers with target_index() must set freq_table PM / devfreq: exynos-ppmu: Use devm_platform_get_and_ioremap_resource() OPP: Move required opps configuration to specialized callback OPP: Handle all genpd cases together in _set_required_opps() cpufreq: qcom-cpufreq-hw: Revert adding cpufreq qos dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCM2290 dt-bindings: cpufreq: cpufreq-qcom-hw: Sanitize data per compatible dt-bindings: cpufreq: cpufreq-qcom-hw: Allow just 1 frequency domain cpufreq: Add SM7225 to cpufreq-dt-platdev blocklist cpufreq: qcom-cpufreq-hw: fix double IO unmap and resource release on exit cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623 cpufreq: mediatek: raise proc/sram max voltage for MT8516 ...
2023-04-24Merge tag 'docs-6.4' of git://git.lwn.net/linuxLinus Torvalds1-172/+169
Pull documentation updates from Jonathan Corbet: "Commit volume in documentation is relatively low this time, but there is still a fair amount going on, including: - Reorganize the architecture-specific documentation under Documentation/arch This makes the structure match the source directory and helps to clean up the mess that is the top-level Documentation directory a bit. This work creates the new directory and moves x86 and most of the less-active architectures there. The current plan is to move the rest of the architectures in 6.5, with the patches going through the appropriate subsystem trees. - Some more Spanish translations and maintenance of the Italian translation - A new "Kernel contribution maturity model" document from Ted - A new tutorial on quickly building a trimmed kernel from Thorsten Plus the usual set of updates and fixes" * tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits) media: Adjust column width for pdfdocs media: Fix building pdfdocs docs: clk: add documentation to log which clocks have been disabled docs: trace: Fix typo in ftrace.rst Documentation/process: always CC responsible lists docs: kmemleak: adjust to config renaming ELF: document some de-facto PT_* ABI quirks Documentation: arm: remove stih415/stih416 related entries docs: turn off "smart quotes" in the HTML build Documentation: firmware: Clarify firmware path usage docs/mm: Physical Memory: Fix grammar Documentation: Add document for false sharing dma-api-howto: typo fix docs: move m68k architecture documentation under Documentation/arch/ docs: move parisc documentation under Documentation/arch/ docs: move ia64 architecture docs under Documentation/arch/ docs: Move arc architecture docs under Documentation/arch/ docs: move nios2 documentation under Documentation/arch/ docs: move openrisc documentation under Documentation/arch/ docs: move superh documentation under Documentation/arch/ ...
2023-04-20module: add debugging auto-load duplicate module supportLuis Chamberlain1-0/+6
The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs. To help debug this situation we need to consider the different sources for finit_module(). Requests from the kernel that rely on module auto-loading, ie, the kernel's *request_module() API, are one source of calls. Although modprobe checks to see if a module is already loaded prior to calling finit_module() there is a small race possible allowing userspace to trigger multiple modprobe calls racing against modprobe and this not seeing the module yet loaded. This adds debugging support to the kernel module auto-loader (*request_module() calls) to easily detect duplicate module requests. To aid with possible bootup failure issues incurred by this, it will converge duplicates requests to a single request. This avoids any possible strain on virtual memory during bootup which could be incurred by duplicate module autoloading requests. Folks debugging virtual memory abuse on bootup can and should enable this to see what pr_warn()s come on, to see if module auto-loading is to blame for their wores. If they see duplicates they can further debug this by enabling the module.enable_dups_trace kernel parameter or by enabling CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE. Current evidence seems to point to only a few duplicates for module auto-loading. And so the source for other duplicates creating heavy virtual memory pressure due to larger number of CPUs should becoming from another place (likely udev). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-18LoongArch: Make WriteCombine configurable for ioremap()Huacai Chen1-0/+6
LoongArch maintains cache coherency in hardware, but when paired with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar to WriteCombine) is out of the scope of cache coherency machanism for PCIe devices (this is a PCIe protocol violation, which may be fixed in newer chipsets). This means WUC can only used for write-only memory regions now, so this option is disabled by default, making WUC silently fallback to SUC for ioremap(). You can enable this option if the kernel is ensured to run on hardware without this bug. Kernel parameter writecombine=on/off can be used to override the Kconfig option. Cc: stable@vger.kernel.org Suggested-by: WANG Xuerui <kernel@xen0n.name> Reviewed-by: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-04-06mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov1-1/+1
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-30docs: move x86 documentation into Documentation/arch/Jonathan Corbet1-4/+4
Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes. All in-kernel references to the old paths have been updated. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/ Signed-off-by: Jonathan Corbet <corbet@lwn.net>