summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)AuthorFilesLines
2024-03-13Merge tag 'hardening-v6.9-rc1' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "As is pretty normal for this tree, there are changes all over the place, especially for small fixes, selftest improvements, and improved macro usability. Some header changes ended up landing via this tree as they depended on the string header cleanups. Also, a notable set of changes is the work for the reintroduction of the UBSAN signed integer overflow sanitizer so that we can continue to make improvements on the compiler side to make this sanitizer a more viable future security hardening option. Summary: - string.h and related header cleanups (Tanzir Hasan, Andy Shevchenko) - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev, Harshit Mogalapalli) - selftests/powerpc: Fix load_unaligned_zeropad build failure (Michael Ellerman) - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn) - Handle tail call optimization better in LKDTM (Douglas Anderson) - Use long form types in overflow.h (Andy Shevchenko) - Add flags param to string_get_size() (Andy Shevchenko) - Add Coccinelle script for potential struct_size() use (Jacob Keller) - Fix objtool corner case under KCFI (Josh Poimboeuf) - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng) - Add str_plural() helper (Michal Wajdeczko, Kees Cook) - Ignore relocations in .notes section - Add comments to explain how __is_constexpr() works - Fix m68k stack alignment expectations in stackinit Kunit test - Convert string selftests to KUnit - Add KUnit tests for fortified string functions - Improve reporting during fortified string warnings - Allow non-type arg to type_max() and type_min() - Allow strscpy() to be called with only 2 arguments - Add binary mode to leaking_addresses scanner - Various small cleanups to leaking_addresses scanner - Adding wrapping_*() arithmetic helper - Annotate initial signed integer wrap-around in refcount_t - Add explicit UBSAN section to MAINTAINERS - Fix UBSAN self-test warnings - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL - Reintroduce UBSAN's signed overflow sanitizer" * tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits) selftests/powerpc: Fix load_unaligned_zeropad build failure string: Convert helpers selftest to KUnit string: Convert selftest to KUnit sh: Fix build with CONFIG_UBSAN=y compiler.h: Explain how __is_constexpr() works overflow: Allow non-type arg to type_max() and type_min() VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() lib/string_helpers: Add flags param to string_get_size() x86, relocs: Ignore relocations in .notes section objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks overflow: Use POD in check_shl_overflow() lib: stackinit: Adjust target string to 8 bytes for m68k sparc: vdso: Disable UBSAN instrumentation kernel.h: Move lib/cmdline.c prototypes to string.h leaking_addresses: Provide mechanism to scan binary files leaking_addresses: Ignore input device status lines leaking_addresses: Use File::Temp for /tmp files MAINTAINERS: Update LEAKING_ADDRESSES details fortify: Improve buffer overflow reporting fortify: Add KUnit tests for runtime overflows ...
2024-03-12Merge tag 'asm-generic-6.9' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Just two small updates this time: - A series I did to unify the definition of PAGE_SIZE through Kconfig, intended to help with a vdso rework that needs the constant but cannot include the normal kernel headers when building the compat VDSO on arm64 and potentially others - a patch from Yan Zhao to remove the pfn_to_virt() definitions from a couple of architectures after finding they were both incorrect and entirely unused" * tag 'asm-generic-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: define CONFIG_PAGE_SIZE_*KB on all architectures arch: simplify architecture specific page size configuration arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions mm: Remove broken pfn_to_virt() on arch csky/hexagon/openrisc
2024-03-12Merge tag 's390-6.9-1' of ↵Linus Torvalds87-1281/+2451
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Various virtual vs physical address usage fixes - Fix error handling in Processor Activity Instrumentation device driver, and export number of counters with a sysfs file - Allow for multiple events when Processor Activity Instrumentation counters are monitored in system wide sampling - Change multiplier and shift values of the Time-of-Day clock source to improve steering precision - Remove a couple of unneeded GFP_DMA flags from allocations - Disable mmap alignment if randomize_va_space is also disabled, to avoid a too small heap - Various changes to allow s390 to be compiled with LLVM=1, since ld.lld and llvm-objcopy will have proper s390 support witch clang 19 - Add __uninitialized macro to Compiler Attributes. This is helpful with s390's FPU code where some users have up to 520 byte stack frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the intention (performance improvement) of such code sections. - Convert switch_to() to an out-of-line function, and use the generic switch_to header file - Replace the usage of s390's debug feature with pr_debug() calls within the zcrypt device driver - Improve hotplug support of the Adjunct Processor device driver - Improve retry handling in the zcrypt device driver - Various changes to the in-kernel FPU code: - Make in-kernel FPU sections preemptible - Convert various larger inline assemblies and assembler files to C, mainly by using singe instruction inline assemblies. This increases readability, but also allows makes it easier to add proper instrumentation hooks - Cleanup of the header files - Provide fast variants of csum_partial() and csum_partial_copy_nocheck() based on vector instructions - Introduce and use a lock to synchronize accesses to zpci device data structures to avoid inconsistent states caused by concurrent accesses - Compile the kernel without -fPIE. This addresses the following problems if the kernel is compiled with -fPIE: - It uses dynamic symbols (.dynsym), for which the linker refuses to allow more than 64k sections. This can break features which use '-ffunction-sections' and '-fdata-sections', including kpatch-build and function granular KASLR - It unnecessarily uses GOT relocations, adding an extra layer of indirection for many memory accesses - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were reported as globally shared * tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits) s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64 s390/cache: prevent rebuild of shared_cpu_list s390/crypto: remove retry loop with sleep from PAES pkey invocation s390/pkey: improve pkey retry behavior s390/zcrypt: improve zcrypt retry behavior s390/zcrypt: introduce retries on in-kernel send CPRB functions s390/ap: introduce mutex to lock the AP bus scan s390/ap: rework ap_scan_bus() to return true on config change s390/ap: clarify AP scan bus related functions and variables s390/ap: rearm APQNs bindings complete completion s390/configs: increase number of LOCKDEP_BITS s390/vfio-ap: handle hardware checkstop state on queue reset operation s390/pai: change sampling event assignment for PMU device driver s390/boot: fix minor comment style damages s390/boot: do not check for zero-termination relocation entry s390/boot: make type of __vmlinux_relocs_64_start|end consistent s390/boot: sanitize kaslr_adjust_relocs() function prototype s390/boot: simplify GOT handling s390: vmlinux.lds.S: fix .got.plt assertion s390/boot: workaround current 'llvm-objdump -t -j ...' behavior ...
2024-03-12Merge tag 'timers-core-2024-03-10' of ↵Linus Torvalds2-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A large set of updates and features for timers and timekeeping: - The hierarchical timer pull model When timer wheel timers are armed they are placed into the timer wheel of a CPU which is likely to be busy at the time of expiry. This is done to avoid wakeups on potentially idle CPUs. This is wrong in several aspects: 1) The heuristics to select the target CPU are wrong by definition as the chance to get the prediction right is close to zero. 2) Due to #1 it is possible that timers are accumulated on a single target CPU 3) The required computation in the enqueue path is just overhead for dubious value especially under the consideration that the vast majority of timer wheel timers are either canceled or rearmed before they expire. The timer pull model avoids the above by removing the target computation on enqueue and queueing timers always on the CPU on which they get armed. This is achieved by having separate wheels for CPU pinned timers and global timers which do not care about where they expire. As long as a CPU is busy it handles both the pinned and the global timers which are queued on the CPU local timer wheels. When a CPU goes idle it evaluates its own timer wheels: - If the first expiring timer is a pinned timer, then the global timers can be ignored as the CPU will wake up before they expire. - If the first expiring timer is a global timer, then the expiry time is propagated into the timer pull hierarchy and the CPU makes sure to wake up for the first pinned timer. The timer pull hierarchy organizes CPUs in groups of eight at the lowest level and at the next levels groups of eight groups up to the point where no further aggregation of groups is required, i.e. the number of levels is log8(NR_CPUS). The magic number of eight has been established by experimention, but can be adjusted if needed. In each group one busy CPU acts as the migrator. It's only one CPU to avoid lock contention on remote timer wheels. The migrator CPU checks in its own timer wheel handling whether there are other CPUs in the group which have gone idle and have global timers to expire. If there are global timers to expire, the migrator locks the remote CPU timer wheel and handles the expiry. Depending on the group level in the hierarchy this handling can require to walk the hierarchy downwards to the CPU level. Special care is taken when the last CPU goes idle. At this point the CPU is the systemwide migrator at the top of the hierarchy and it therefore cannot delegate to the hierarchy. It needs to arm its own timer device to expire either at the first expiring timer in the hierarchy or at the first CPU local timer, which ever expires first. This completely removes the overhead from the enqueue path, which is e.g. for networking a true hotpath and trades it for a slightly more complex idle path. This has been in development for a couple of years and the final series has been extensively tested by various teams from silicon vendors and ran through extensive CI. There have been slight performance improvements observed on network centric workloads and an Intel team confirmed that this allows them to power down a die completely on a mult-die socket for the first time in a mostly idle scenario. There is only one outstanding ~1.5% regression on a specific overloaded netperf test which is currently investigated, but the rest is either positive or neutral performance wise and positive on the power management side. - Fixes for the timekeeping interpolation code for cross-timestamps: cross-timestamps are used for PTP to get snapshots from hardware timers and interpolated them back to clock MONOTONIC. The changes address a few corner cases in the interpolation code which got the math and logic wrong. - Simplifcation of the clocksource watchdog retry logic to automatically adjust to handle larger systems correctly instead of having more incomprehensible command line parameters. - Treewide consolidation of the VDSO data structures. - The usual small improvements and cleanups all over the place" * tag 'timers-core-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) timer/migration: Fix quick check reporting late expiry tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n vdso/datapage: Quick fix - use asm/page-def.h for ARM64 timers: Assert no next dyntick timer look-up while CPU is offline tick: Assume timekeeping is correctly handed over upon last offline idle call tick: Shut down low-res tick from dying CPU tick: Split nohz and highres features from nohz_mode tick: Move individual bit features to debuggable mask accesses tick: Move got_idle_tick away from common flags tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING tick: Move tick cancellation up to CPUHP_AP_TICK_DYING tick: Start centralizing tick related CPU hotplug operations tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick() tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick() tick: Use IS_ENABLED() whenever possible tick/sched: Remove useless oneshot ifdeffery tick/nohz: Remove duplicate between lowres and highres handlers tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer() hrtimer: Select housekeeping CPU during migration ...
2024-03-11Merge tag 'linux_kselftest-next-6.9-rc1' of ↵Linus Torvalds2-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest update from Shuah Khan: - livepatch restructuring to move the module out of lib to be built as a out-of-tree modules during kselftest build. This makes it easier change, debug and rebuild the tests by running make on the selftests/livepatch directory, which is not currently possible since the modules on lib/livepatch are build and installed using the main makefile modules target. - livepatch restructuring fixes for problems found by kernel test robot. The change skips the test if kernel-devel isn't installed (default value of KDIR), or if KDIR variable passed doesn't exists. - resctrl test restructuring and new non-contiguous CBMs CAT test - new ktap_helpers to print diagnostic messages, pass/fail tests based on exit code, abort test, and finish the test. - a new test verify power supply properties. - a new ftrace to exercise function tracer across cpu hotplug. - timeout increase for mqueue test to allow the test to run on i3.metal AWS instances. - minor spelling corrections in several tests. - missing gitignore files and changes to existing gitignore files. * tag 'linux_kselftest-next-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (57 commits) kselftest: Add basic test for probing the rust sample modules selftests: lib.mk: Do not process TEST_GEN_MODS_DIR selftests: livepatch: Avoid running the tests if kernel-devel is missing selftests: livepatch: Add initial .gitignore selftests/resctrl: Add non-contiguous CBMs CAT test selftests/resctrl: Add resource_info_file_exists() selftests/resctrl: Split validate_resctrl_feature_request() selftests/resctrl: Add a helper for the non-contiguous test selftests/resctrl: Add test groups and name L3 CAT test L3_CAT selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy" selftests/mqueue: Set timeout to 180 seconds selftests/ftrace: Add test to exercize function tracer across cpu hotplug selftest: ftrace: fix minor typo in log selftests: thermal: intel: workload_hint: add missing gitignore selftests: thermal: intel: power_floor: add missing gitignore selftests: uevent: add missing gitignore selftests: Add test to verify power supply properties selftests: ktap_helpers: Add a helper to finish the test selftests: ktap_helpers: Add a helper to abort the test selftests: ktap_helpers: Add helper to pass/fail test based on exit code ...
2024-03-07s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64Sumanth Korikkar1-0/+2
lkp test robot reported unhandled relocation type: R_390_GOTPCDBL, when kernel is built with -fno-PIE. relocs tool reads vmlinux and handles absolute relocations. PC relative relocs doesn't need adjustment. Also, the R_390_GOTPCDBL/R_390_GOTOFF64 relocations are present currently only when KASAN is enabled. The following program can create a R_390_GOTPCDBL/R_390_GOTOFF64 reloc (with fPIE/fPIC). void funcb(int *b) { *b = *b + 100; } void gen_gotoff(void) { int b = 10; funcb (&b); } gcc -c sample.c -fPIC -fsanitize=kernel-address --param asan-stack=1 The above example (built with -fPIC) was linked to one of the built-in.a (built with -fno-PIE) and checked for correctness with kaslr enabled. Both the relocs turns out relative and can be skipped. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402221404.T2TGs8El-lkp@intel.com/ Fixes: 55dc65b46023 ("s390: add relocs tool") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07s390/cache: prevent rebuild of shared_cpu_listHeiko Carstens1-0/+1
With commit 36bbc5b4ffab ("cacheinfo: Allow early detection and population of cache attributes") the shared cpu list for each cache level higher than L1 is rebuilt even if the list already has been set up. This is caused by the removal of the cpumask_empty() check within cache_shared_cpu_map_setup(). However architectures can enforce that the shared cpu list is not rebuilt by simply setting cpu_map_populated of the per cpu cache info structure to true, which is also the fix for this problem. Before: $ cat /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list 0-7 After: $ cat /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list 1 Fixes: 36bbc5b4ffab ("cacheinfo: Allow early detection and population of cache attributes") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07s390/crypto: remove retry loop with sleep from PAES pkey invocationHarald Freudenberger1-14/+2
Upon calling the pkey module to (re-)derive an protected key from a secure key the PAES implementation did a retry 3 times with an 1000 ms sleep after each failure. This patch removes this retry loop - retries should be done if needed in a lower layer but the consumer of the pkey module functions should not be bothered with retries. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07s390/configs: increase number of LOCKDEP_BITSHeiko Carstens1-0/+2
Set LOCKDEP_BITS to 16 and LOCKDEP_CHAINS_BITS to 17, since test systems frequently run out of lockdep entries and lockdep chains. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07s390/pai: change sampling event assignment for PMU device driverThomas Richter2-3/+6
Currently only one PAI sampling event can be created and active at any one time. The PMU device drivers store a pointer to this event in their data structures even when the event is created for counting and the PMU device driver reference to this counting event is never needed. Change this and assign the pointer to the PMU device driver only when a sampling event is created. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-06arch: define CONFIG_PAGE_SIZE_*KB on all architecturesArnd Bergmann2-1/+2
Most architectures only support a single hardcoded page size. In order to ensure that each one of these sets the corresponding Kconfig symbols, change over the PAGE_SHIFT definition to the common one and allow only the hardware page size to be selected. Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Stafford Horne <shorne@gmail.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-02-26s390/boot: fix minor comment style damagesAlexander Gordeev1-3/+3
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26s390/boot: do not check for zero-termination relocation entryAlexander Gordeev1-3/+1
The relocation table is not expected to contain a zero-termination entry. The existing check is likely a left-over from similar x86 code that uses zero-entries as delimiters. s390 does not have ones and therefore the check could be avoided. Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26s390/boot: make type of __vmlinux_relocs_64_start|end consistentAlexander Gordeev2-8/+6
Make the type of __vmlinux_relocs_64_start|end symbols as char array, just like it is done for all other sections. Function rescue_relocs() is simplified as result. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26s390/boot: sanitize kaslr_adjust_relocs() function prototypeAlexander Gordeev1-4/+3
Do not use vmlinux.image_size within kaslr_adjust_relocs() function to calculate the upper relocation table boundary. Instead, make both lower and upper boundaries the function input parameters. Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26s390/boot: simplify GOT handlingAlexander Gordeev3-9/+7
The end of GOT is calculated dynamically on boot. The size of GOT is calculated on build from the start and end of GOT. Avoid both calculations and use the end of GOT directly. Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-25s390: vmlinux.lds.S: fix .got.plt assertionHeiko Carstens2-6/+16
Naresh reported this build error on linux-next: s390x-linux-gnu-ld: Unexpected GOT/PLT entries detected! make[3]: *** [/builds/linux/arch/s390/boot/Makefile:87: arch/s390/boot/vmlinux.syms] Error 1 make[3]: Target 'arch/s390/boot/bzImage' not remade because of errors. The reason for the build error is an incorrect/incomplete assertion which checks the size of the .got.plt section. Similar to x86 the size is either zero or 24 bytes (three entries). See commit 262b5cae67a6 ("x86/boot/compressed: Move .got.plt entries out of the .got section") for more details. The three reserved/additional entries for s390 are described in chapter 3.2.2 of the s390x ABI [1] (thanks to Andreas Krebbel for pointing this out!). [1] https://github.com/IBM/s390x-abi/releases/download/v1.6.1/lzsabi_s390x.pdf Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYvWp8TY-fMEvc3UhoVtoR_eM5VsfHj3+n+kexcfJJ+Cvw@mail.gmail.com Fixes: 30226853d6ec ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-23Merge tag 's390-6.8-3' of ↵Linus Torvalds5-17/+7
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Fix invalid -EBUSY on ccw_device_start() which can lead to failing device initialization - Add missing multiplication by 8 in __iowrite64_copy() to get the correct byte length before calling zpci_memcpy_toio() - Various config updates * tag 's390-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cio: fix invalid -EBUSY on ccw_device_start s390: use the correct count for __iowrite64_copy() s390/configs: update default configurations s390/configs: enable INIT_STACK_ALL_ZERO in all configurations s390/configs: provide compat topic configuration target
2024-02-22s390/boot: workaround current 'llvm-objdump -t -j ...' behaviorNathan Chancellor1-2/+2
When building with OBJDUMP=llvm-objdump, there are a series of warnings from the section comparisons that arch/s390/boot/Makefile performs between vmlinux and arch/s390/boot/vmlinux: llvm-objdump: warning: section '.boot.preserved.data' mentioned in a -j/--section option, but not found in any input file llvm-objdump: warning: section '.boot.data' mentioned in a -j/--section option, but not found in any input file llvm-objdump: warning: section '.boot.preserved.data' mentioned in a -j/--section option, but not found in any input file llvm-objdump: warning: section '.boot.data' mentioned in a -j/--section option, but not found in any input file The warning is a little misleading, as these sections do exist in the input files. It is really pointing out that llvm-objdump does not match GNU objdump's behavior of respecting '-j' / '--section' in combination with '-t' / '--syms': $ s390x-linux-gnu-objdump -t -j .boot.data vmlinux.full vmlinux.full: file format elf64-s390 SYMBOL TABLE: 0000000001951000 l O .boot.data 0000000000003000 sclp_info_sccb 00000000019550e0 l O .boot.data 0000000000000001 sclp_info_sccb_valid 00000000019550e2 g O .boot.data 0000000000001000 early_command_line ... $ llvm-objdump -t -j .boot.data vmlinux.full vmlinux.full: file format elf64-s390 SYMBOL TABLE: 0000000000100040 l O .text 0000000000000010 dw_psw 0000000000000000 l df *ABS* 0000000000000000 main.c 00000000001001b0 l F .text 00000000000000c6 trace_event_raw_event_initcall_level 0000000000100280 l F .text 0000000000000100 perf_trace_initcall_level ... It may be possible to change llvm-objdump's behavior to match GNU objdump's behavior but the difficulty of that task has not yet been explored. The combination of '$(OBJDUMP) -t -j' is not common in the kernel tree on a whole, so workaround this tool difference by grepping for the sections in the full symbol table output in a similar manner to the sed invocation. This results in no visible change for GNU objdump users while fixing the warnings for OBJDUMP=llvm-objdump, further enabling use of LLVM=1 for ARCH=s390 with versions of LLVM that have support for s390 in ld.lld and llvm-objcopy. Reported-by: Heiko Carstens <hca@linux.ibm.com> Closes: https://lore.kernel.org/20240219113248.16287-C-hca@linux.ibm.com/ Link: https://github.com/ClangBuiltLinux/linux/issues/859 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240220-s390-work-around-llvm-objdump-t-j-v1-1-47bb0366a831@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22KVM: s390: fix access register usage in ioctlsEric Farman3-1/+7
The routine ar_translation() can be reached by both the instruction intercept path (where the access registers had been loaded with the guest register contents), and the MEM_OP ioctls (which hadn't). Since this routine saves the current registers to vcpu->run, this routine erroneously saves host registers into the guest space. Introduce a boolean in the kvm_vcpu_arch struct to indicate whether the registers contain guest contents. If they do (the instruction intercept path), the save can be performed and the AR translation is done just as it is today. If they don't (the MEM_OP path), the AR can be read from vcpu->run without stashing the current contents. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20240220211211.3102609-2-farman@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-21KVM: s390: introduce kvm_s390_fpu_(store|load)Janosch Frank3-20/+22
It's a bit nicer than having multiple lines and will help if there's another re-work since we'll only have to change one location. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-21s390: use the correct count for __iowrite64_copy()Jason Gunthorpe1-1/+1
The signature for __iowrite64_copy() requires the number of 64 bit quantities, not bytes. Multiple by 8 to get to a byte length before invoking zpci_memcpy_toio() Fixes: 87bc359b9822 ("s390/pci: speed up __iowrite64_copy by using pci store block insn") Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-9223d11a7662+1d7785-s390_iowrite64_jgg@nvidia.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/vdso: Use generic union vdso_data_storeAnna-Maria Behnsen1-4/+1
There is already a generic union definition for vdso_data_store in the vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240219153939.75719-8-anna-maria@linutronix.de
2024-02-20s390/vdso/data: Drop unnecessary header includeAnna-Maria Behnsen1-1/+0
vdso/datapage.h includes the architecture specific vdso/data.h header file. So there is no need to do it also the other way round and including the generic vdso/datapage.h header file inside the architecture specific data.h header file. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240219153939.75719-3-anna-maria@linutronix.de
2024-02-20s390: compile relocatable kernel without -fPIEJosh Poimboeuf9-13/+145
On s390, currently kernel uses the '-fPIE' compiler flag for compiling vmlinux. This has a few problems: - It uses dynamic symbols (.dynsym), for which the linker refuses to allow more than 64k sections. This can break features which use '-ffunction-sections' and '-fdata-sections', including kpatch-build [1] and Function Granular KASLR. - It unnecessarily uses GOT relocations, adding an extra layer of indirection for many memory accesses. Instead of using '-fPIE', resolve all the relocations at link time and then manually adjust any absolute relocations (R_390_64) during boot. This is done by first telling the linker to preserve all relocations during the vmlinux link. (Note this is harmless: they are later stripped in the vmlinux.bin link.) Then use the 'relocs' tool to find all absolute relocations (R_390_64) which apply to allocatable sections. The offsets of those relocations are saved in a special section which is then used to adjust the relocations during boot. (Note: For some reason, Clang occasionally creates a GOT reference, even without '-fPIE'. So Clang-compiled kernels have a GOT, which needs to be adjusted.) On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%. [1] https://github.com/dynup/kpatch/issues/1284 [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html [3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html Compiler consideration: Gcc recently implemented an optimization [2] for loading symbols without explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates symbols to reside on a 2-byte boundary, enabling the use of the larl instruction. However, kernel linker scripts may still generate unaligned symbols. To address this, a new -munaligned-symbols option has been introduced [3] in recent gcc versions. This option has to be used with future gcc versions. Older Clang lacks support for handling unaligned symbols generated by kernel linker scripts when the kernel is built without -fPIE. However, future versions of Clang will include support for the -munaligned-symbols option. When the support is unavailable, compile the kernel with -fPIE to maintain the existing behavior. In addition to it: move vmlinux.relocs to safe relocation When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire uncompressed vmlinux.bin is positioned in the bzImage decompressor image at the default kernel LMA of 0x100000, enabling it to be executed in-place. However, the size of .vmlinux.relocs could be large enough to cause an overlap with the uncompressed kernel at the address 0x100000. To address this issue, .vmlinux.relocs is positioned after the .rodata.compressed in the bzImage. Nevertheless, in this configuration, vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To overcome that, move vmlinux.relocs to a safe location before clearing .bss and handling relocs. Compile warning fix from Sumanth Korikkar: When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are several warnings: ld: warning: orphan section `.rela.iplt' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' ld: warning: orphan section `.rela.head.text' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' ld: warning: orphan section `.rela.init.text' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' ld: warning: orphan section `.rela.rodata.cst8' from `arch/s390/kernel/head64.o' being placed in section `.rela.dyn' Orphan sections are sections that exist in an object file but don't have a corresponding output section in the final executable. ld raises a warning when it identifies such sections. Eliminate the warning by placing all .rela orphan sections in .rela.dyn and raise an error when size of .rela.dyn is greater than zero. i.e. Dont just neglect orphan sections. This is similar to adjustment performed in x86, where kernel is built with -fno-PIE. commit 5354e84598f2 ("x86/build: Add asserts for unwanted sections") [sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move vmlinux.relocs to safe location] [hca@linux.ibm.com: merged compile warning fix from Sumanth] Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390: add relocs toolJosh Poimboeuf3-0/+391
This 'relocs' tool is copied from the x86 version, ported for s390, and greatly simplified to remove unnecessary features. It reads vmlinux and outputs assembly to create a .vmlinux.relocs_64 section which contains the offsets of all R_390_64 relocations which apply to allocatable sections. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: https://lore.kernel.org/r/20240219132734.22881-3-sumanthk@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/vdso64: filter out munaligned-symbols flag for vdsoSumanth Korikkar1-0/+1
Gcc recently implemented an optimization [1] for loading symbols without explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates symbols to reside on a 2-byte boundary, enabling the use of the larl instruction. However, kernel linker scripts may still generate unaligned symbols. To address this, a new -munaligned-symbols option has been introduced [2] in recent gcc versions. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html However, when -munaligned-symbols is used in vdso code, it leads to the following compilation error: `.data.rel.ro.local' referenced in section `.text' of arch/s390/kernel/vdso64/vdso64_generic.o: defined in discarded section `.data.rel.ro.local' of arch/s390/kernel/vdso64/vdso64_generic.o vdso linker script discards .data section to make it lightweight. However, -munaligned-symbols in vdso object files references literal pool and accesses _vdso_data. Hence, compile vdso code without -munaligned-symbols. This means in the future, vdso code should deal with alignment of newly introduced unaligned linker symbols. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: https://lore.kernel.org/r/20240219132734.22881-2-sumanthk@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/boot: add 'alloc' to info.bin .vmlinux.info section flagsNathan Chancellor1-1/+1
When attempting to boot a kernel compiled with OBJCOPY=llvm-objcopy, there is a crash right at boot: Out of memory allocating 6d7800 bytes 8 aligned in range 0:20000000 Reserved memory ranges: 0000000000000000 a394c3c30d90cdaf DECOMPRESSOR Usable online memory ranges (info source: sclp read info [3]): 0000000000000000 0000000020000000 Usable online memory total: 20000000 Reserved: a394c3c30d90cdaf Free: 0 Call Trace: (sp:0000000000033e90 [<0000000000012fbc>] physmem_alloc_top_down+0x5c/0x104) sp:0000000000033f00 [<0000000000011d56>] startup_kernel+0x3a6/0x77c sp:0000000000033f60 [<00000000000100f4>] startup_normal+0xd4/0xd4 GNU objcopy does not have any issues. Looking at differences between the object files in each build reveals info.bin does not get properly populated with llvm-objcopy, which results in an empty .vmlinux.info section. $ file {gnu,llvm}-objcopy/arch/s390/boot/info.bin gnu-objcopy/arch/s390/boot/info.bin: data llvm-objcopy/arch/s390/boot/info.bin: empty $ llvm-readelf --section-headers {gnu,llvm}-objcopy/arch/s390/boot/vmlinux | rg 'File:|\.vmlinux\.info|\.decompressor\.syms' File: gnu-objcopy/arch/s390/boot/vmlinux [12] .vmlinux.info PROGBITS 0000000000034000 035000 000078 00 WA 0 0 1 [13] .decompressor.syms PROGBITS 0000000000034078 035078 000b00 00 WA 0 0 1 File: llvm-objcopy/arch/s390/boot/vmlinux [12] .vmlinux.info PROGBITS 0000000000034000 035000 000000 00 WA 0 0 1 [13] .decompressor.syms PROGBITS 0000000000034000 035000 000b00 00 WA 0 0 1 Ulrich points out that llvm-objcopy only copies sections marked as alloc with a binary output target, whereas the .vmlinux.info section is only marked as load. Add 'alloc' in addition to 'load', so that both objcopy implementations work properly: $ file {gnu,llvm}-objcopy/arch/s390/boot/info.bin gnu-objcopy/arch/s390/boot/info.bin: data llvm-objcopy/arch/s390/boot/info.bin: data $ llvm-readelf --section-headers {gnu,llvm}-objcopy/arch/s390/boot/vmlinux | rg 'File:|\.vmlinux\.info|\.decompressor\.syms' File: gnu-objcopy/arch/s390/boot/vmlinux [12] .vmlinux.info PROGBITS 0000000000034000 035000 000078 00 WA 0 0 1 [13] .decompressor.syms PROGBITS 0000000000034078 035078 000b00 00 WA 0 0 1 File: llvm-objcopy/arch/s390/boot/vmlinux [12] .vmlinux.info PROGBITS 0000000000034000 035000 000078 00 WA 0 0 1 [13] .decompressor.syms PROGBITS 0000000000034078 035078 000b00 00 WA 0 0 1 Closes: https://github.com/ClangBuiltLinux/linux/issues/1996 Link: https://github.com/llvm/llvm-project/commit/3c02cb7492fc78fb678264cebf57ff88e478e14f Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240216-s390-fix-boot-with-llvm-objcopy-v1-1-0ac623daf42b@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: fix three typos in commentsGerd Bayer2-3/+3
Found and fixed these while working on synchronizing the state handling of zpci_dev's. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: remove hotplug slot when releasing the deviceGerd Bayer1-2/+3
Centralize the removal so all paths are covered and the hotplug slot will remain active until the device is really destroyed. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: introduce lock to synchronize state of zpci_dev'sGerd Bayer4-30/+63
There's a number of tasks that need the state of a zpci device to be stable. Other tasks need to be synchronized as they change the state. State changes could be generated by the system as availability or error events, or be requested by the user through manipulations in sysfs. Some other actions accessible through sysfs - like device resets - need the state to be stable. Unsynchronized state handling could lead to unusable devices. This has been observed in cases of concurrent state changes through systemd udev rules and DPM boot control. Some breakage can be provoked by artificial tests, e.g. through repetitively injecting "recover" on a PCI function through sysfs while running a "hotplug remove/add" in a loop through a PCI slot's "power" attribute in sysfs. After a few iterations this could result in a kernel oops. So introduce a new mutex "state_lock" to guard the state property of the struct zpci_dev. Acquire this lock in all task that modify the state: - hotplug add and remove, through the PCI hotplug slot entry, - avaiability events, as reported by the platform, - error events, as reported by the platform, - during device resets, explicit through sysfs requests or implict through the common PCI layer. Break out an inner _do_recover() routine out of recover_store() to separte the necessary synchronizations from the actual manipulations of the zpci_dev required for the reset. With the following changes I was able to run the inject loops for hours without hitting an error. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: rename lock member in struct zpci_devGerd Bayer3-7/+7
Since this guards only the Function Measurement Block, rename from generic lock to fmb_lock in preparation to introduce another lock that guards the state member Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pai: adjust whitespace indentationThomas Richter1-1/+1
Adjust whitespace indentation. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pai: simplify event start function for perf statThomas Richter2-15/+4
When an event is started, read the current value of the PAI counter. This value is saved in event::hw.prev_count. When an event is stopped, this value is subtracted from the current value read out at event stop time. The difference is the delta of this counter. Simplify the logic and read the event value every time the event is started. This scheme is identical to other device drivers. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pai: save PAI counter value page in event structureThomas Richter3-16/+60
When the PAI events ALL_CRYPTO or ALL_NNPA are created for system wide sampling, all PAI counters are monitored. On each process schedule out, the values of all PAI counters are investigated. Non-zero values are saved in the event's ring buffer as raw data. This scheme expects the start value of each counter to be reset to zero after each read operation performed by the PAI PMU device driver. This allows for only one active event at any one time as it relies on the start value of counters to be reset to zero. Create a save area for each installed PAI XXXX_ALL event and save all PAI counter values in this save area. Instead of clearing the PAI counter lowcore area to zero after each read operation, copy them from the lowcore area to the event's save area at process schedule out time. The delta of each PAI counter is calculated by subtracting the old counter's value stored in the event's save area from the current value stored in the lowcore area. With this scheme, mulitple events of the PAI counters XXXX_ALL can be handled at the same time. This will be addressed in a follow-on patch. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/crc32le: convert to CHeiko Carstens3-146/+109
Convert CRC-32 LE variants to C. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/crc32be: convert to CHeiko Carstens3-110/+80
Convert CRC-32 BE variant to C. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: add vector instruction inline assemblies for crc32Heiko Carstens1-0/+56
Provide various vector instruction inline assemblies for crc32 calculations. This is just preparation to keep the conversion of the existing crc32 implementations from assembly to C small. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/sysinfo: convert bogomips calculation to CHeiko Carstens2-15/+43
Provide several one instruction fpu inline assemebles and use them to implement the bogomips calculation in C like style. This is more for illustration purposes on how kernel fpu code can be written in C. This has the advantage that the author only has to take care of the floating point instructions, but doesn't need to take care of general purpose register allocation (if needed), and the semantics of all other instructions not related to fpu. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/raid6: convert to use standard fpu_*() inline assembliesHeiko Carstens1-0/+48
Move the s390 specific raid6 inline assemblies, make them generic, and reuse them to implement the raid6 gen/xor implementation. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/checksum: provide csum_partial_copy_nocheck()Heiko Carstens4-13/+112
With csum_partial(), which reads all bytes into registers it is easy to also implement csum_partial_copy_nocheck() which copies the buffer while calculating its checksum. For a 512 byte buffer this reduces the runtime by 19%. Compared to the old generic variant (memcpy() + cksm instruction) runtime is reduced by 42%). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/checksum: provide vector register variant of csum_partial()Heiko Carstens6-16/+187
Provide a faster variant of csum_partial() which uses vector registers instead of the cksm instruction. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/checksum: provide and use cksm() inline assemblyHeiko Carstens3-16/+20
Convert those callers of csum_partial() to use the cksm instruction, which are either very early or in critical paths, like panic/dump, so they don't have to rely on a working kernel infrastructure, which will be introduced with a subsequent patch. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/checksum: call instrument_read() instead of kasan_check_read()Heiko Carstens1-2/+2
Call instrument_read() from csum_partial() instead of kasan_check_read(). instrument_read() covers all memory access instrumentation methods. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: remove TIF_FPUHeiko Carstens1-2/+0
TIF_FPU is unused - remove it. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: limit save and restore to used registersHeiko Carstens4-69/+128
The first invocation of kernel_fpu_begin() after switching from user to kernel context will save all vector registers, even if only parts of the vector registers are used within the kernel fpu context. Given that save and restore of all vector registers is quite expensive change the current approach in several ways: - Instead of saving and restoring all user registers limit this to those registers which are actually used within an kernel fpu context. - On context switch save all remaining user fpu registers, so they can be restored when the task is rescheduled. - Saving user registers within kernel_fpu_begin() is done without disabling and enabling interrupts - which also slightly reduces runtime. In worst case (e.g. interrupt context uses the same registers) this may lead to the situation that registers are saved several times, however the assumption is that this will not happen frequently, so that the new method is faster in nearly all cases. - save_user_fpu_regs() can still be called from all contexts and saves all (or all remaining) user registers to a tasks ufpu user fpu save area. Overall this reduces the time required to save and restore the user fpu context for nearly all cases. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: decrease stack usage for some casesHeiko Carstens7-40/+94
The kernel_fpu structure has a quite large size of 520 bytes. In order to reduce stack footprint introduce several kernel fpu structures with different and also smaller sizes. This way every kernel fpu user must use the correct variant. A compile time check verifies that the correct variant is used. There are several users which use only 16 instead of all 32 vector registers. For those users the new kernel_fpu_16 structure with a size of only 266 bytes can be used. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: let fpu_vlm() and fpu_vstm() return number of registersHeiko Carstens1-8/+16
Let fpu_vlm() and fpu_vstm() macros return the number of registers saved / loaded. This is helpful to read easy to read code in case there are several subsequent fpu_vlm() or fpu_vstm() calls: __vector128 *vxrs = .... vxrs += fpu_vstm(0, 15, vxrs); vxrs += fpu_vstm(16, 31, vxrs); Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: remove anonymous union from struct fpuHeiko Carstens5-109/+78
The anonymous union within struct fpu contains a floating point register array and a vector register array. Given that the vector register is always present remove the floating point register array. For configurations without vector registers save the floating point register contents within their corresponding vector register location. This allows to remove the union, and also to simplify ptrace and perf code. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: remove regs member from struct fpuHeiko Carstens4-10/+4
KVM was the only user which modified the regs pointer in struct fpu. Remove the pointer and convert the rest of the core fpu code to directly access the save area embedded within struct fpu. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>