summaryrefslogtreecommitdiff
path: root/arch/s390/include
AgeCommit message (Collapse)AuthorFilesLines
2023-07-06Merge tag 's390-6.5-2' of ↵Linus Torvalds11-103/+100
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Alexander Gordeev: - Fix virtual vs physical address confusion in vmem_add_range() and vmem_remove_range() functions - Include <linux/io.h> instead of <asm/io.h> and <asm-generic/io.h> throughout s390 code - Make all PSW related defines also available for assembler files. Remove PSW_DEFAULT_KEY define from uapi for that - When adding an undefined symbol the build still succeeds, but userspace crashes trying to execute VDSO, because the symbol is not resolved. Add undefined symbols check to prevent that - Use kvmalloc_array() instead of kzalloc() for allocaton of 256k memory when executing s390 crypto adapter IOCTL - Add -fPIE flag to prevent decompressor misaligned symbol build error with clang - Use .balign instead of .align everywhere. This is a no-op for s390, but with this there no mix in using .align and .balign anymore - Filter out -mno-pic-data-is-text-relative flag when compiling kernel to prevent VDSO build error - Rework entering of DAT-on mode on CPU restart to use PSW_KERNEL_BITS mask directly - Do not retry administrative requests to some s390 crypto cards, since the firmware assumes replay attacks - Remove most of the debug code, which is build in when kernel config option CONFIG_ZCRYPT_DEBUG is enabled - Remove CONFIG_ZCRYPT_MULTIDEVNODES kernel config option and switch off the multiple devices support for the s390 zcrypt device driver - With the conversion to generic entry machine checks are accounted to the current context instead of irq time. As result, the STCKF instruction at the beginning of the machine check handler and the lowcore member are no longer required, therefore remove it - Fix various typos found with codespell - Minor cleanups to CPU-measurement Counter and Sampling Facilities code - Revert patch that removes VMEM_MAX_PHYS macro, since it causes a regression * tag 's390-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (25 commits) Revert "s390/mm: get rid of VMEM_MAX_PHYS macro" s390/cpum_sf: remove check on CPU being online s390/cpum_sf: handle casts consistently s390/cpum_sf: remove unnecessary debug statement s390/cpum_sf: remove parameter in call to pr_err s390/cpum_sf: simplify function setup_pmu_cpu s390/cpum_cf: remove unneeded debug statements s390/entry: remove mcck clock s390: fix various typos s390/zcrypt: remove ZCRYPT_MULTIDEVNODES kernel config option s390/zcrypt: do not retry administrative requests s390/zcrypt: cleanup some debug code s390/entry: rework entering DAT-on mode on CPU restart s390/mm: fence off VM macros from asm and linker s390: include linux/io.h instead of asm/io.h s390/ptrace: make all psw related defines also available for asm s390/ptrace: remove PSW_DEFAULT_KEY from uapi s390/vdso: filter out mno-pic-data-is-text-relative cflag s390: consistently use .balign instead of .align s390/decompressor: fix misaligned symbol build error ...
2023-07-04Revert "s390/mm: get rid of VMEM_MAX_PHYS macro"Alexander Gordeev1-0/+2
This reverts commit 456be42aa713e7f83b467db66ceae779431c7d9d. The assumption VMEM_MAX_PHYS should match ident_map_size is wrong. At least discontiguous saved segments (DCSS) could be loaded at addresses beyond ident_map_size and dcssblk device driver might fail as result. Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-3/+82
Pull kvm updates from Paolo Bonzini: "ARM64: - Eager page splitting optimization for dirty logging, optionally allowing for a VM to avoid the cost of hugepage splitting in the stage-2 fault path. - Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with services that live in the Secure world. pKVM intervenes on FF-A calls to guarantee the host doesn't misuse memory donated to the hyp or a pKVM guest. - Support for running the split hypervisor with VHE enabled, known as 'hVHE' mode. This is extremely useful for testing the split hypervisor on VHE-only systems, and paves the way for new use cases that depend on having two TTBRs available at EL2. - Generalized framework for configurable ID registers from userspace. KVM/arm64 currently prevents arbitrary CPU feature set configuration from userspace, but the intent is to relax this limitation and allow userspace to select a feature set consistent with the CPU. - Enable the use of Branch Target Identification (FEAT_BTI) in the hypervisor. - Use a separate set of pointer authentication keys for the hypervisor when running in protected mode, as the host is untrusted at runtime. - Ensure timer IRQs are consistently released in the init failure paths. - Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps (FEAT_EVT), as it is a register commonly read from userspace. - Erratum workaround for the upcoming AmpereOne part, which has broken hardware A/D state management. RISC-V: - Redirect AMO load/store misaligned traps to KVM guest - Trap-n-emulate AIA in-kernel irqchip for KVM guest - Svnapot support for KVM Guest s390: - New uvdevice secret API - CMM selftest and fixes - fix racy access to target CPU for diag 9c x86: - Fix missing/incorrect #GP checks on ENCLS - Use standard mmu_notifier hooks for handling APIC access page - Drop now unnecessary TR/TSS load after VM-Exit on AMD - Print more descriptive information about the status of SEV and SEV-ES during module load - Add a test for splitting and reconstituting hugepages during and after dirty logging - Add support for CPU pinning in demand paging test - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes included along the way - Add a "nx_huge_pages=never" option to effectively avoid creating NX hugepage recovery threads (because nx_huge_pages=off can be toggled at runtime) - Move handling of PAT out of MTRR code and dedup SVM+VMX code - Fix output of PIC poll command emulation when there's an interrupt - Add a maintainer's handbook to document KVM x86 processes, preferred coding style, testing expectations, etc. - Misc cleanups, fixes and comments Generic: - Miscellaneous bugfixes and cleanups Selftests: - Generate dependency files so that partial rebuilds work as expected" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits) Documentation/process: Add a maintainer handbook for KVM x86 Documentation/process: Add a label for the tip tree handbook's coding style KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index RISC-V: KVM: Remove unneeded semicolon RISC-V: KVM: Allow Svnapot extension for Guest/VM riscv: kvm: define vcpu_sbi_ext_pmu in header RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip RISC-V: KVM: Add in-kernel emulation of AIA APLIC RISC-V: KVM: Implement device interface for AIA irqchip RISC-V: KVM: Skeletal in-kernel AIA irqchip support RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero RISC-V: KVM: Add APLIC related defines RISC-V: KVM: Add IMSIC related defines RISC-V: KVM: Implement guest external interrupt line management KVM: x86: Remove PRIx* definitions as they are solely for user space s390/uv: Update query for secret-UVCs s390/uv: replace scnprintf with sysfs_emit s390/uvdevice: Add 'Lock Secret Store' UVC ...
2023-07-03s390/entry: remove mcck clockSven Schnelle1-2/+2
In the past machine checks where accounted as irq time. With the conversion to generic entry, it was decided to account machine checks to the current context. The stckf at the beginning of the machine check handler and the lowcore member is no longer required, therefore remove it. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03s390: fix various typosHeiko Carstens4-6/+6
Fix various typos found with codespell. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03s390/mm: fence off VM macros from asm and linkerAlexander Gordeev1-2/+2
Prevent assembler and linker scripts compilation errors by fencing it off with __ASSEMBLY__ define. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03s390: include linux/io.h instead of asm/io.hHeiko Carstens2-2/+2
Include linux/io.h instead of asm/io.h everywhere. linux/io.h includes asm/io.h, so this shouldn't cause any problems. Instead this might help for some randconfig build errors which were reported due to some undefined io related functions. Also move the changed include so it stays grouped together with other includes from the same directory. For ctcm_mpc.c also remove not needed comments (actually questions). Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03s390/ptrace: make all psw related defines also available for asmHeiko Carstens2-85/+84
Use the _AC() macro to make all psw related defines also available for assembler files. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-03s390/ptrace: remove PSW_DEFAULT_KEY from uapiHeiko Carstens3-5/+3
Move PSW_DEFAULT_KEY from uapi/asm/ptrace.h to asm/ptrace.h. This is possible, since it depends on PAGE_DEFAULT_ACC which is not part of uapi. Or in other words: this define cannot be used without error. Therefore remove it from uapi. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-07-01Merge tag 'kvm-s390-next-6.5-1' of ↵Paolo Bonzini2-3/+82
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * New uvdevice secret API * New CMM selftest * cmm fix * diag 9c racy access of target cpu fix
2023-06-28Merge tag 'mm-nonmm-stable-2023-06-24-19-23' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-mm updates from Andrew Morton: - Arnd Bergmann has fixed a bunch of -Wmissing-prototypes in top-level directories - Douglas Anderson has added a new "buddy" mode to the hardlockup detector. It permits the detector to work on architectures which cannot provide the required interrupts, by having CPUs periodically perform checks on other CPUs - Zhen Lei has enhanced kexec's ability to support two crash regions - Petr Mladek has done a lot of cleanup on the hard lockup detector's Kconfig entries - And the usual bunch of singleton patches in various places * tag 'mm-nonmm-stable-2023-06-24-19-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits) kernel/time/posix-stubs.c: remove duplicated include ocfs2: remove redundant assignment to variable bit_off watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h devres: show which resource was invalid in __devm_ioremap_resource() watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64 watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h watchdog/hardlockup: make the config checks more straightforward watchdog/hardlockup: sort hardlockup detector related config values a logical way watchdog/hardlockup: move SMP barriers from common code to buddy code watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY watchdog/buddy: don't copy the cpumask in watchdog_next_cpu() watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog() watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy() watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick() watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails ...
2023-06-28s390: consistently use .balign instead of .alignHeiko Carstens1-2/+2
The .align directive has inconsistent behavior across architectures. Use .balign instead everywhere. This is a no-op for s390, but with this there is no mix in using .align and .balign anymore. Future code is supposed to use only .balign. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-28s390/mm: get rid of VMEM_MAX_PHYS macroAlexander Gordeev1-2/+0
VMEM_MAX_PHYS is supposed to be the highest physical address that can be added to the identity mapping. It should match ident_map_size, which has the same meaning. However, unlike ident_map_size it is not adjusted against various limiting factors (see the comment to setup_ident_map_size() function). That renders all checks against VMEM_MAX_PHYS invalid. Further, VMEM_MAX_PHYS is currently set to vmemmap, which is an address in virtual memory space. However, it gets compared against physical addresses in various locations. That works, because both address spaces are the same on s390, but otherwise it is wrong. Instead of fixing VMEM_MAX_PHYS misuse and semantics just remove it. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-28Merge tag 's390-6.5-1' of ↵Linus Torvalds5-10/+27
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Fix the style of protected key API driver source: use x-mas tree for all local variable declarations - Rework protected key API driver to not use the struct pkey_protkey and pkey_clrkey anymore. Both structures have a fixed size buffer, but with the support of ECC protected key these buffers are not big enough. Use dynamic buffers internally and transparently for userspace - Add support for a new 'non CCA clear key token' with ECC clear keys supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448. This makes it possible to derive a protected key from the ECC clear key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way to derive is via PCKMO instruction - The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t for reference counting. Replace this with the proper data type refcount_t - Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since gcc generates inefficient code, which may lead to stack overflows - Replace one-element array with flexible-array member in struct vfio_ccw_parent and refactor the rest of the code accordingly. Also, prefer struct_size() over sizeof() open- coded versions - Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the system memory should be cleared or not once dumped - Fix a hang when a user attempts to remove a VFIO-AP mediated device attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback to request a release of the device - Fix calculation for R_390_GOTENT relocations for modules - Allow any user space process with CAP_PERFMON capability read and display the CPU Measurement facility counter sets - Rework large statically-defined per-CPU cpu_cf_events data structure and replace it with dynamically allocated structures created when a perf_event_open() system call is invoked or /dev/hwctr device is accessed * tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process s390/module: fix rela calculation for R_390_GOTENT s390/vfio-ap: wire in the vfio_device_ops request callback s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl s390/pkey: add support for ecc clear key s390/pkey: do not use struct pkey_protkey s390/pkey: introduce reverse x-mas trees s390/zcore: conditionally clear memory on reipl s390/ipl: add REIPL_CLEAR flag to os_info vfio/ccw: use struct_size() helper vfio/ccw: replace one-element array with flexible-array member s390: select ARCH_SUPPORTS_INT128 s390/pai_ext: replace atomic_t with refcount_t s390/pai_crypto: replace atomic_t with refcount_t
2023-06-28Merge tag 'locking-core-2023-06-27' of ↵Linus Torvalds3-45/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface. Instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. The generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. * tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg() locking/atomic: treewide: delete arch_atomic_*() kerneldoc locking/atomic: docs: Add atomic operations to the driver basic API documentation locking/atomic: scripts: generate kerneldoc comments docs: scripts: kernel-doc: accept bitwise negation like ~@var locking/atomic: scripts: simplify raw_atomic*() definitions locking/atomic: scripts: simplify raw_atomic_long*() definitions locking/atomic: scripts: split pfx/name/sfx/order locking/atomic: scripts: restructure fallback ifdeffery locking/atomic: scripts: build raw_atomic_long*() directly locking/atomic: treewide: use raw_atomic*_<op>() locking/atomic: scripts: add trivial raw_atomic*_<op>() locking/atomic: scripts: factor out order template generation locking/atomic: scripts: remove leftover "${mult}" locking/atomic: scripts: remove bogus order parameter locking/atomic: xtensa: add preprocessor symbols locking/atomic: x86: add preprocessor symbols locking/atomic: sparc: add preprocessor symbols locking/atomic: sh: add preprocessor symbols ...
2023-06-28Merge tag 'sched-core-2023-06-27' of ↵Linus Torvalds1-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Scheduler SMP load-balancer improvements: - Avoid unnecessary migrations within SMT domains on hybrid systems. Problem: On hybrid CPU systems, (processors with a mixture of higher-frequency SMT cores and lower-frequency non-SMT cores), under the old code lower-priority CPUs pulled tasks from the higher-priority cores if more than one SMT sibling was busy - resulting in many unnecessary task migrations. Solution: The new code improves the load balancer to recognize SMT cores with more than one busy sibling and allows lower-priority CPUs to pull tasks, which avoids superfluous migrations and lets lower-priority cores inspect all SMT siblings for the busiest queue. - Implement the 'runnable boosting' feature in the EAS balancer: consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection. This improves CPU utilization for certain workloads, while leaves other key workloads unchanged. Scheduler infrastructure improvements: - Rewrite the scheduler topology setup code by consolidating it into the build_sched_topology() helper function and building it dynamically on the fly. - Resolve the local_clock() vs. noinstr complications by rewriting the code: provide separate sched_clock_noinstr() and local_clock_noinstr() functions to be used in instrumentation code, and make sure it is all instrumentation-safe. Fixes: - Fix a kthread_park() race with wait_woken() - Fix misc wait_task_inactive() bugs unearthed by the -rt merge: - Fix UP PREEMPT bug by unifying the SMP and UP implementations - Fix task_struct::saved_state handling - Fix various rq clock update bugs, unearthed by turning on the rq clock debugging code. - Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger by creating enough cgroups, by removing the warnign and restricting window size triggers to PSI file write-permission or CAP_SYS_RESOURCE. - Propagate SMT flags in the topology when removing degenerate domain - Fix grub_reclaim() calculation bug in the deadline scheduler code - Avoid resetting the min update period when it is unnecessary, in psi_trigger_destroy(). - Don't balance a task to its current running CPU in load_balance(), which was possible on certain NUMA topologies with overlapping groups. - Fix the sched-debug printing of rq->nr_uninterruptible Cleanups: - Address various -Wmissing-prototype warnings, as a preparation to (maybe) enable this warning in the future. - Remove unused code - Mark more functions __init - Fix shadow-variable warnings" * tag 'sched-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle() sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop() sched/core: Fixed missing rq clock update before calling set_rq_offline() sched/deadline: Update GRUB description in the documentation sched/deadline: Fix bandwidth reclaim equation in GRUB sched/wait: Fix a kthread_park race with wait_woken() sched/topology: Mark set_sched_topology() __init sched/fair: Rename variable cpu_util eff_util arm64/arch_timer: Fix MMIO byteswap sched/fair, cpufreq: Introduce 'runnable boosting' sched/fair: Refactor CPU utilization functions cpuidle: Use local_clock_noinstr() sched/clock: Provide local_clock_noinstr() x86/tsc: Provide sched_clock_noinstr() clocksource: hyper-v: Provide noinstr sched_clock() clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX x86/vdso: Fix gettimeofday masking math64: Always inline u128 version of mul_u64_u64_shr() s390/time: Provide sched_clock_noinstr() loongarch: Provide noinstr sched_clock_read() ...
2023-06-26Merge tag 'v6.5/vfs.misc' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Miscellaneous features, cleanups, and fixes for vfs and individual fs Features: - Use mode 0600 for file created by cachefilesd so it can be run by unprivileged users. This aligns them with directories which are already created with mode 0700 by cachefilesd - Reorder a few members in struct file to prevent some false sharing scenarios - Indicate that an eventfd is used a semaphore in the eventfd's fdinfo procfs file - Add a missing uapi header for eventfd exposing relevant uapi defines - Let the VFS protect transitions of a superblock from read-only to read-write in addition to the protection it already provides for transitions from read-write to read-only. Protecting read-only to read-write transitions allows filesystems such as ext4 to perform internal writes, keeping writers away until the transition is completed Cleanups: - Arnd removed the architecture specific arch_report_meminfo() prototypes and added a generic one into procfs.h. Note, we got a report about a warning in amdpgpu codepaths that suggested this was bisectable to this change but we concluded it was a false positive - Remove unused parameters from split_fs_names() - Rename put_and_unmap_page() to unmap_and_put_page() to let the name reflect the order of the cleanup operation that has to unmap before the actual put - Unexport buffer_check_dirty_writeback() as it is not used outside of block device aops - Stop allocating aio rings from highmem - Protecting read-{only,write} transitions in the VFS used open-coded barriers in various places. Replace them with proper little helpers and document both the helpers and all barrier interactions involved when transitioning between read-{only,write} states - Use flexible array members in old readdir codepaths Fixes: - Use the correct type __poll_t for epoll and eventfd - Replace all deprecated strlcpy() invocations, whose return value isn't checked with an equivalent strscpy() call - Fix some kernel-doc warnings in fs/open.c - Reduce the stack usage in jffs2's xattr codepaths finally getting rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] royally annoying compilation warning - Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not fmode_t is required to avoid fmode_t to integer degradation warnings - Create coredumps with O_WRONLY instead of O_RDWR. There's a long explanation in that commit how O_RDWR is actually a bug which we found out with the help of Linus and git archeology - Fix "no previous prototype" warnings in the pipe codepaths - Add overflow calculations for remap_verify_area() as a signed addition overflow could be triggered in xfstests - Fix a null pointer dereference in sysv - Use an unsigned variable for length calculations in jfs avoiding compilation warnings with gcc 13 - Fix a dangling pipe pointer in the watch queue codepath - The legacy mount option parser provided as a fallback by the VFS for filesystems not yet converted to the new mount api did prefix the generated mount option string with a leading ',' causing issues for some filesystems - Fix a repeated word in a comment in fs.h - autofs: Update the ctime when mtime is updated as mandated by POSIX" * tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) readdir: Replace one-element arrays with flexible-array members fs: Provide helpers for manipulating sb->s_readonly_remount fs: Protect reconfiguration of sb read-write from racing writes eventfd: add a uapi header for eventfd userspace APIs autofs: set ctime as well when mtime changes on a dir eventfd: show the EFD_SEMAPHORE flag in fdinfo fs/aio: Stop allocating aio rings from HIGHMEM fs: Fix comment typo fs: unexport buffer_check_dirty_writeback fs: avoid empty option when generating legacy mount string watch_queue: prevent dangling pipe pointer fs.h: Optimize file struct to prevent false sharing highmem: Rename put_and_unmap_page() to unmap_and_put_page() cachefiles: Allow the cache to be non-root init: remove unused names parameter in split_fs_names() jfs: Use unsigned variable for length calculations fs/sysv: Null check to prevent null-ptr-deref bug fs: use UB-safe check for signed addition overflow in remap_verify_area procfs: consolidate arch_report_meminfo declaration fs: pipe: reveal missing function protoypes ...
2023-06-20s390/boot: fix physmem_info virtual vs physical address confusionAlexander Gordeev1-2/+3
Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-16s390/uv: Update query for secret-UVCsSteffen Eiden1-2/+11
Update the query struct such that secret-UVC related information can be parsed. Add sysfs files for these new values. 'supp_add_secret_req_ver' notes the supported versions for the Add Secret UVC. Bit 0 indicates that version 0x100 is supported, bit 1 indicates 0x200, and so on. 'supp_add_secret_pcf' notes the supported plaintext flags for the Add Secret UVC. 'supp_secret_types' notes the supported types of secrets. Bit 0 indicates secret type 1, bit 1 indicates type 2, and so on. 'max_secrets' notes the maximum amount of secrets the secret store can store per pv guest. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230615100533.3996107-8-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20230615100533.3996107-8-seiden@linux.ibm.com>
2023-06-16s390/uvdevice: Add 'Lock Secret Store' UVCSteffen Eiden2-0/+5
Userspace can call the Lock Secret Store Ultravisor Call using IOCTLs on the uvdevice. The Lock Secret Store UV call disables all additions of secrets for the future. The uvdevice is merely transporting the request from userspace to the Ultravisor. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230615100533.3996107-6-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20230615100533.3996107-6-seiden@linux.ibm.com>
2023-06-16s390/uvdevice: Add 'List Secrets' UVCSteffen Eiden2-0/+7
Userspace can call the List Secrets Ultravisor Call using IOCTLs on the uvdevice. The List Secrets UV call lists the identifier of the secrets in the UV secret store. The uvdevice is merely transporting the request from userspace to Ultravisor. It's neither checking nor manipulating the request or response data. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230615100533.3996107-5-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20230615100533.3996107-5-seiden@linux.ibm.com>
2023-06-16s390/uvdevice: Add 'Add Secret' UVCSteffen Eiden2-0/+18
Userspace can call the Add Secret Ultravisor Call using IOCTLs on the uvdevice. The Add Secret UV call sends an encrypted and cryptographically verified request to the Ultravisor. The request inserts a protected guest's secret into the Ultravisor for later use. The uvdevice is merely transporting the request from userspace to the Ultravisor. It's neither checking nor manipulating the request data. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230615100533.3996107-4-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20230615100533.3996107-4-seiden@linux.ibm.com>
2023-06-16s390/uvdevice: Add info IOCTLSteffen Eiden1-1/+41
Add an IOCTL that allows userspace to find out which IOCTLs the uvdevice supports without trial and error. Explicitly expose the IOCTL nr for the request types. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230615100533.3996107-3-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-Id: <20230615100533.3996107-3-seiden@linux.ibm.com>
2023-06-10thread_info: move function declarations to linux/thread_info.hArnd Bergmann1-3/+0
There are a few __weak functions in kernel/fork.c, which architectures can override. If there is no prototype, the compiler warns about them: kernel/fork.c:164:13: error: no previous prototype for 'arch_release_task_struct' [-Werror=missing-prototypes] kernel/fork.c:991:20: error: no previous prototype for 'arch_task_cache_init' [-Werror=missing-prototypes] kernel/fork.c:1086:12: error: no previous prototype for 'arch_dup_task_struct' [-Werror=missing-prototypes] There are already prototypes in a number of architecture specific headers that have addressed those warnings before, but it's much better to have these in a single place so the warning no longer shows up anywhere. Link: https://lkml.kernel.org/r/20230517131102.934196-14-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Eric Paris <eparis@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-06Merge branch 'protected-key' into featuresAlexander Gordeev3-8/+18
Harald Freudenberger says: =================== This patches do some cleanup and reorg of the pkey module code and extend the existing ioctl with supporting derivation of protected key material from clear key material for some ECC curves with the help of the PCKMO instruction. Please note that 'protected key' is a special type of key only available on s390. It is similar to an secure key which is encrypted by a master key sitting inside an HSM. In contrast to secure keys a protected key is encrypted by a random key located in a hidden firmware memory accessible by the CPU and thus much faster but less secure. =================== The merged updates are: - Fix the style of protected key API driver source: use x-mas tree for all local variable declarations. - Rework protected key API driver to not use the struct pkey_protkey and pkey_clrkey anymore. Both structures have a fixed size buffer, but with the support of ECC protected key these buffers are not big enough. Use dynamic buffers internally and transparently for userspace. - Add support for a new 'non CCA clear key token' with ECC clear keys supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448. This makes it possible to derive a protected key from the ECC clear key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way to derive is via PCKMO instruction. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-05s390/time: Provide sched_clock_noinstr()Peter Zijlstra1-4/+9
With the intent to provide local_clock_noinstr(), a variant of local_clock() that's safe to be called from noinstr code (with the assumption that any such code will already be non-preemptible), prepare for things by providing a noinstr sched_clock_noinstr() function. Specifically, preempt_enable_*() calls out to schedule(), which upsets noinstr validation efforts. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.570170436@infradead.org
2023-06-05s390/cpum_sf: Convert to cmpxchg128()Peter Zijlstra1-1/+1
Now that there is a cross arch u128 and cmpxchg128(), use those instead of the custom CDSG helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132324.058821078@infradead.org
2023-06-05arch: Remove cmpxchg_doublePeter Zijlstra2-52/+0
No moar users, remove the monster. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
2023-06-05percpu: Wire up cmpxchg128Peter Zijlstra1-0/+16
In order to replace cmpxchg_double() with the newly minted cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.654945124@infradead.org
2023-06-05arch: Introduce arch_{,try_}_cmpxchg128{,_local}()Peter Zijlstra1-0/+14
For all architectures that currently support cmpxchg_double() implement the cmpxchg128() family of functions that is basically the same but with a saner interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.452120708@infradead.org
2023-06-01s390/pkey: add support for ecc clear keyHarald Freudenberger2-6/+16
Add support for a new 'non CCA clear key token' with these ECC clear keys supported: - ECC P256 - ECC P384 - ECC P521 - ECC ED25519 - ECC ED448 This makes it possible to derive a protected key from this ECC clear key input via PKEY_KBLOB2PROTK3 ioctl. As of now the only way to derive protected keys from these clear key tokens is via PCKMO instruction. For AES keys an alternate path via creating a secure key from the clear key and then derive a protected key from the secure key exists. This alternate path is not implemented for ECC keys as it would require to rearrange and maybe recalculate the clear key material for input to derive an CCA or EP11 ECC secure key. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-01s390/pkey: do not use struct pkey_protkeyHarald Freudenberger1-2/+2
This is an internal rework of the pkey code to not use the struct pkey_protkey internal any more. This struct has a hard coded protected key buffer with MAXPROTKEYSIZE = 64 bytes. However, with support for ECC protected key, this limit is too short and thus this patch reworks all the internal code to use the triple u8 *protkey, u32 protkeylen, u32 protkeytype instead. So the ioctl which still has to deal with this struct coming from userspace and/or provided to userspace invoke all the internal functions now with the triple instead of passing a pointer to struct pkey_protkey. Also the struct pkey_clrkey has been internally replaced in a similar way. This struct also has a hard coded clear key buffer of MAXCLRKEYSIZE = 32 bytes and thus is not usable with e.g. ECC clear key material. This is a transparent rework for userspace applications using the pkey API. The internal kernel API used by the PAES crypto ciphers has been adapted to this change to make it possible to provide ECC protected keys via this interface in the future. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-01s390/ipl: add REIPL_CLEAR flag to os_infoMikhail Zaslonko1-2/+5
Introduce new OS_INFO_FLAGS_ENTRY to os_info pointing to the field with bit flags. Add OS_INFO_FLAGS_ENTRY upon dump_reipl shutdown action processing and set OS_INFO_FLAG_REIPL_CLEAR flag indicating 'clear' sysfs attribute has been set on the panicked system for specified ipl type. This flag can be used to inform the dumper whether LOAD_CLEAR or LOAD_NORMAL diag308 subcode to be used for ipl after dumping the memory. Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17s390/uapi: cover statfs padding by growing f_spareIlya Leoshkevich2-3/+3
pahole says: struct compat_statfs64 { ... u32 f_spare[4]; /* 68 16 */ /* size: 88, cachelines: 1, members: 12 */ /* padding: 4 */ struct statfs { ... unsigned int f_spare[4]; /* 68 16 */ /* size: 88, cachelines: 1, members: 12 */ /* padding: 4 */ struct statfs64 { ... unsigned int f_spare[4]; /* 68 16 */ /* size: 88, cachelines: 1, members: 12 */ /* padding: 4 */ One has to keep the existence of padding in mind when working with these structs. Grow f_spare arrays to 5 in order to simplify things. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230504144021.808932-3-iii@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-17procfs: consolidate arch_report_meminfo declarationArnd Bergmann1-3/+0
The arch_report_meminfo() function is provided by four architectures, with a __weak fallback in procfs itself. On architectures that don't have a custom version, the __weak version causes a warning because of the missing prototype. Remove the architecture specific prototypes and instead add one in linux/proc_fs.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for arch/x86 Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Message-Id: <20230516195834.551901-1-arnd@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-15s390: select ARCH_SUPPORTS_INT128Heiko Carstens1-0/+4
s390 has instructions to support 128 bit arithmetics, e.g. a 64 bit multiply instruction with a 128 bit result. Also 128 bit integer artithmetics are already used in s390 specific architecture code (see e.g. read_persistent_clock64()). Therefore select ARCH_SUPPORTS_INT128. However limit this to clang for now, since gcc generates inefficient code, which may lead to stack overflows, when compiling lib/crypto/curve25519-hacl64.c which depends on ARCH_SUPPORTS_INT128. The gcc generated functions have 6kb stack frames, compared to only 1kb of the code generated with clang. If the kernel is compiled with -Os library calls for __ashlti3(), __ashrti3(), and __lshrti3() may be generated. Similar to arm64 and riscv provide assembler implementations for these functions. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-05-05Merge tag 'locking-core-2023-05-05' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-04-30Merge tag 's390-6.4-1' of ↵Linus Torvalds18-237/+441
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for stackleak feature. Also allow specifying architecture-specific stackleak poison function to enable faster implementation. On s390, the mvc-based implementation helps decrease typical overhead from a factor of 3 to just 25% - Convert all assembler files to use SYM* style macros, deprecating the ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS - Improve KASLR to also randomize module and special amode31 code base load addresses - Rework decompressor memory tracking to support memory holes and improve error handling - Add support for protected virtualization AP binding - Add support for set_direct_map() calls - Implement set_memory_rox() and noexec module_alloc() - Remove obsolete overriding of mem*() functions for KASAN - Rework kexec/kdump to avoid using nodat_stack to call purgatory - Convert the rest of the s390 code to use flexible-array member instead of a zero-length array - Clean up uaccess inline asm - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable DEBUG_FORCE_FUNCTION_ALIGN_64B - Resolve last_break in userspace fault reports - Simplify one-level sysctl registration - Clean up branch prediction handling - Rework CPU counter facility to retrieve available counter sets just once - Other various small fixes and improvements all over the code * tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits) s390/stackleak: provide fast __stackleak_poison() implementation stackleak: allow to specify arch specific stackleak poison function s390: select ARCH_USE_SYM_ANNOTATIONS s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc() s390: wire up memfd_secret system call s390/mm: enable ARCH_HAS_SET_DIRECT_MAP s390/mm: use BIT macro to generate SET_MEMORY bit masks s390/relocate_kernel: adjust indentation s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc. s390/entry: use SYM* macros instead of ENTRY(), etc. s390/purgatory: use SYM* macros instead of ENTRY(), etc. s390/kprobes: use SYM* macros instead of ENTRY(), etc. s390/reipl: use SYM* macros instead of ENTRY(), etc. s390/head64: use SYM* macros instead of ENTRY(), etc. s390/earlypgm: use SYM* macros instead of ENTRY(), etc. s390/mcount: use SYM* macros instead of ENTRY(), etc. s390/crc32le: use SYM* macros instead of ENTRY(), etc. s390/crc32be: use SYM* macros instead of ENTRY(), etc. s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc. s390/amode31: use SYM* macros instead of ENTRY(), etc. ...
2023-04-29locking/arch: Rename all internal __xchg() names to __arch_xchg()Andrzej Hajda1-4/+4
Decrease the probability of this internal facility to be used by driver code. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv] Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-28Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds1-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-20s390/stackleak: provide fast __stackleak_poison() implementationHeiko Carstens1-0/+35
Provide an s390 specific __stackleak_poison() implementation which is faster than the generic variant. For the original implementation with an enforced 4kb stackframe for the getpid() system call the system call overhead increases by a factor of 3 if the stackleak feature is enabled. Using the s390 mvc based variant this is reduced to an increase of 25% instead. This is within the expected area, since the mvc based implementation is more or less a memset64() variant which comes with similar results. See commit 0b77d6701cf8 ("s390: implement memset16, memset32 & memset64"). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20230405130841.1350565-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20s390/mm: enable ARCH_HAS_SET_DIRECT_MAPHeiko Carstens1-0/+7
Implement the set_direct_map_*() API, which allows to invalidate and set default permissions to pages within the direct mapping. Note that kernel_page_present(), which is also supposed to be part of this API, is intentionally not implemented. The reason for this is that kernel_page_present() is only used (and currently only makes sense) for suspend/resume, which isn't supported on s390. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-20s390/mm: use BIT macro to generate SET_MEMORY bit masksHeiko Carstens1-5/+13
Use BIT macro to generate SET_MEMORY bit masks, which is easier to maintain if bits get added, or removed. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19s390/kasan: remove override of mem*() functionsHeiko Carstens1-12/+3
The kasan mem*() functions are not used anymore since s390 has switched to GENERIC_ENTRY and commit 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions"). Therefore remove the now dead code, similar to x86. While at it also use the SYM* macros in mem.S. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19s390/kdump: remove nodat stack restriction for calling nodat functionsAlexander Gordeev1-1/+3
To allow calling of DAT-off code from kernel the stack needs to be switched to nodat_stack (or other stack mapped as 1:1). Before call_nodat() macro was introduced that was necessary to provide the very same memory address for STNSM and STOSM instructions. If the kernel would stay on a random stack (e.g. a virtually mapped one) then a virtual address provided for STNSM instruction could differ from the physical address needed for the corresponding STOSM instruction. After call_nodat() macro is introduced the kernel stack does not need to be mapped 1:1 anymore, since the macro stores the physical memory address of return PSW in a register before entering DAT-off mode. This way the return LPSWE instruction is able to pick the correct memory location and restore the DAT-on mode. That however might fail in case the 16-byte return PSW happened to cross page boundary: PSW mask and PSW address could end up in two separate non-contiguous physical pages. Align the return PSW on 16-byte boundary so it always fits into a single physical page. As result any stack (including the virtually mapped one) could be used for calling DAT-off code and prior switching to nodat_stack becomes unnecessary. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-19s390/kdump: rework invocation of DAT-off codeAlexander Gordeev1-0/+47
Calling kdump kernel is a two-step process that involves invocation of the purgatory code: first time - to verify the new kernel checksum and second time - to call the new kernel itself. The purgatory code operates on real addresses and does not expect any memory protection. Therefore, before the purgatory code is entered the DAT mode is always turned off. However, it is only restored upon return from the new kernel checksum verification. In case the purgatory was called to start the new kernel and failed the control is returned to the old kernel, but the DAT mode continues staying off. The new kernel start failure is unlikely and leads to the disabled wait state anyway. Still that poses a risk, since the kernel code in general is not DAT-off safe and even calling the disabled_wait() function might crash. Introduce call_nodat() macro that allows entering DAT-off mode, calling an arbitrary function and restoring DAT mode back on. Switch all invocations of DAT-off code to that macro and avoid the above described scenario altogether. Name the call_nodat() macro in small letters after the already existing call_on_stack() and put it to the same header file. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> [hca@linux.ibm.com: some small modifications to call_nodat() macro] Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/fcx: replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
Zero-length arrays are deprecated [1] and have to be replaced by C99 flexible-array members. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help to make progress towards globally enabling -fstrict-flex-arrays=3 [2] Link: https://github.com/KSPP/linux/issues/78 [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZC7XT5prvoE4Yunm@work Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/diag: replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
Zero-length arrays are deprecated [1] and have to be replaced by C99 flexible-array members. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help to make progress towards globally enabling -fstrict-flex-arrays=3 [2] Link: https://github.com/KSPP/linux/issues/78 [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZC7XGpUtVhqlRLhH@work Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/mm: fix direct map accountingHeiko Carstens1-1/+1
Commit bb1520d581a3 ("s390/mm: start kernel with DAT enabled") did not implement direct map accounting in the early page table setup code. In result the reported values are bogus now: $cat /proc/meminfo ... DirectMap4k: 5120 kB DirectMap1M: 18446744073709546496 kB DirectMap2G: 0 kB Fix this by adding the missing accounting. The result looks sane again: $cat /proc/meminfo ... DirectMap4k: 6156 kB DirectMap1M: 2091008 kB DirectMap2G: 6291456 kB Fixes: bb1520d581a3 ("s390/mm: start kernel with DAT enabled") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-04-13s390/mm: implement set_memory_rwnx()Heiko Carstens1-0/+5
Given that set_memory_rox() is implemented, provide also set_memory_rwnx(). This allows to get rid of all open coded __set_memory() usages in s390 architecture code. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>