summaryrefslogtreecommitdiff
path: root/include/asm-generic/vmlinux.lds.h
AgeCommit message (Collapse)AuthorFilesLines
2023-12-18kunit: add KUNIT_INIT_TABLE to init linker sectionRae Moar1-1/+8
Add KUNIT_INIT_TABLE to the INIT_DATA linker section. Alter the KUnit macros to create init tests: kunit_test_init_section_suites Update lib/kunit/executor.c to run both the suites in KUNIT_TABLE and KUNIT_INIT_TABLE. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-12-18kunit: move KUNIT_TABLE out of INIT_DATARae Moar1-3/+3
Alter the linker section of KUNIT_TABLE to move it out of INIT_DATA and into DATA_DATA. Data for KUnit tests does not need to be in the init section. In order to run tests again after boot the KUnit data cannot be labeled as init data as the kernel could write over it. Add a KUNIT_INIT_TABLE in the next patch for KUnit tests that test init data/functions. Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-28linux/init: remove __memexit* annotationsMasahiro Yamada1-6/+0
We have never used __memexit, __memexitdata, or __memexitconst. These were unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de>
2023-10-01vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macrosMasahiro Yamada1-7/+0
Remove the left-over of commit e24f6628811e ("modpost: remove all traces of cpuinit/cpuexit sections"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2023-07-12vmlinux.lds.h: Remove a reference to no longer used sections .text..refcountPetr Pavlu1-1/+0
Sections .text..refcount were previously used to hold an error path code for fast refcount overflow protection on x86, see commit 7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") and commit 564c9cc84e2a ("locking/refcounts, x86/asm: Use unique .text section for refcount exceptions"). The code was replaced and removed in commit fb041bb7c0a9 ("locking/refcount: Consolidate implementations of refcount_t") and no sections .text..refcount are present since then. Remove then a relic referencing these sections from TEXT_TEXT to avoid confusing people, like me. This is a non-functional change. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Link: https://lore.kernel.org/r/20230711125054.9000-1-petr.pavlu@suse.com Signed-off-by: Kees Cook <keescook@chromium.org>
2023-07-07Merge tag 'riscv-for-linus-6.5-mw2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - A bunch of fixes/cleanups from the first part of the merge window, mostly related to ACPI and vector as those were large - Some documentation improvements, mostly related to the new code - The "riscv,isa" DT key is deprecated - Support for link-time dead code elimination - Support for minor fault registration in userfaultd - A handful of cleanups around CMO alternatives * tag 'riscv-for-linus-6.5-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (23 commits) riscv: mm: mark noncoherent_supported as __ro_after_init riscv: mm: mark CBO relate initialization funcs as __init riscv: errata: thead: only set cbom size & noncoherent during boot riscv: Select HAVE_ARCH_USERFAULTFD_MINOR RISC-V: Document the ISA string parsing rules for ACPI risc-v: Fix order of IPI enablement vs RCU startup mm: riscv: fix an unsafe pte read in huge_pte_alloc() dt-bindings: riscv: deprecate riscv,isa RISC-V: drop error print from riscv_hartid_to_cpuid() riscv: Discard vector state on syscalls riscv: move memblock_allow_resize() after linear mapping is ready riscv: Enable ARCH_SUSPEND_POSSIBLE for s2idle riscv: vdso: include vdso/vsyscall.h for vdso_data selftests: Test RISC-V Vector's first-use handler riscv: vector: clear V-reg in the first-use trap riscv: vector: only enable interrupts in the first-use trap RISC-V: Fix up some vector state related build failures RISC-V: Document that V registers are clobbered on syscalls riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION ...
2023-07-01Merge tag 'kbuild-v6.5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove the deprecated rule to build *.dtbo from *.dts - Refactor section mismatch detection in modpost - Fix bogus ARM section mismatch detections - Fix error of 'make gtags' with O= option - Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error with the latest LLVM version - Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed - Ignore more compiler-generated symbols for kallsyms - Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles - Enable more kernel-doc warnings with W=2 - Refactor <linux/export.h> by generating KSYMTAB data by modpost - Deprecate <asm/export.h> and <asm-generic/export.h> - Remove the EXPORT_DATA_SYMBOL macro - Move the check for static EXPORT_SYMBOL back to modpost, which makes the build faster - Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm - Warn missing MODULE_DESCRIPTION when building modules with W=1 - Make 'make clean' robust against too long argument error - Exclude more objects from GCOV to fix CFI failures with GCOV - Allow 'make modules_install' to install modules.builtin and modules.builtin.modinfo even when CONFIG_MODULES is disabled - Include modules.builtin and modules.builtin.modinfo in the linux-image Debian package even when CONFIG_MODULES is disabled - Revive "Entering directory" logging for the latest Make version * tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (72 commits) modpost: define more R_ARM_* for old distributions kbuild: revive "Entering directory" for Make >= 4.4.1 kbuild: set correct abs_srctree and abs_objtree for package builds scripts/mksysmap: Ignore prefixed KCFI symbols kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb kbuild: builddeb: always make modules_install, to install modules.builtin* modpost: continue even with unknown relocation type modpost: factor out Elf_Sym pointer calculation to section_rel() modpost: factor out inst location calculation to section_rel() kbuild: Disable GCOV for *.mod.o kbuild: Fix CFI failures with GCOV kbuild: make clean rule robust against too long argument error script: modpost: emit a warning when the description is missing kbuild: make modules_install copy modules.builtin(.modinfo) linux/export.h: rename 'sec' argument to 'license' modpost: show offset from symbol for section mismatch warnings modpost: merge two similar section mismatch warnings kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion modpost: use null string instead of NULL pointer for default namespace modpost: squash sym_update_namespace() into sym_add_exported() ...
2023-07-01Merge patch series "riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION"Palmer Dabbelt1-1/+1
Jisheng Zhang <jszhang@kernel.org> says: When trying to run linux with various opensource riscv core on resource limited FPGA platforms, for example, those FPGAs with less than 16MB SDRAM, I want to save mem as much as possible. One of the major technologies is kernel size optimizations, I found that riscv does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which passes -fdata-sections, -ffunction-sections to CFLAGS and passes the --gc-sections flag to the linker. This not only benefits my case on FPGA but also benefits defconfigs. Here are some notable improvements from enabling this with defconfigs: nommu_k210_defconfig: text data bss dec hex 1112009 410288 59837 1582134 182436 before 962838 376656 51285 1390779 1538bb after rv32_defconfig: text data bss dec hex 8804455 2816544 290577 11911576 b5c198 before 8692295 2779872 288977 11761144 b375f8 after defconfig: text data bss dec hex 9438267 3391332 485333 13314932 cb2b74 before 9285914 3350052 483349 13119315 c82f53 after patch1 and patch2 are clean ups. patch3 fixes a typo. patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv. * b4-shazam-merge: riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION vmlinux.lds.h: use correct .init.data.* section name riscv: vmlinux-xip.lds.S: remove .alternative section riscv: move options to keep entries sorted riscv: Fix orphan section warnings caused by kernel/pi Link: https://lore.kernel.org/r/20230523165502.2592-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-26vmlinux.lds.h: use correct .init.data.* section nameJisheng Zhang1-1/+1
If building with -fdata-sections on riscv, LD_ORPHAN_WARN will warn similar as below: riscv64-linux-gnu-ld: warning: orphan section `.init.data.efi_loglevel' from `./drivers/firmware/efi/libstub/printk.stub.o' being placed in section `.init.data.efi_loglevel' I believe this is caused by a a typo: init.data.* should be .init.data.* Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build Link: https://lore.kernel.org/r/20230523165502.2592-4-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-22kbuild: generate KSYMTAB entries by modpostMasahiro Yamada1-0/+1
Commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") made modpost output CRCs in the same way whether the EXPORT_SYMBOL() is placed in *.c or *.S. For further cleanups, this commit applies a similar approach to the entire data structure of EXPORT_SYMBOL(). The EXPORT_SYMBOL() compilation is split into two stages. When a source file is compiled, EXPORT_SYMBOL() will be converted into a dummy symbol in the .export_symbol section. For example, EXPORT_SYMBOL(foo); EXPORT_SYMBOL_NS_GPL(bar, BAR_NAMESPACE); will be encoded into the following assembly code: .section ".export_symbol","a" __export_symbol_foo: .asciz "" /* license */ .asciz "" /* name space */ .balign 8 .quad foo /* symbol reference */ .previous .section ".export_symbol","a" __export_symbol_bar: .asciz "GPL" /* license */ .asciz "BAR_NAMESPACE" /* name space */ .balign 8 .quad bar /* symbol reference */ .previous They are mere markers to tell modpost the name, license, and namespace of the symbols. They will be dropped from the final vmlinux and modules because the *(.export_symbol) will go into /DISCARD/ in the linker script. Then, modpost extracts all the information about EXPORT_SYMBOL() from the .export_symbol section, and generates the final C code: KSYMTAB_FUNC(foo, "", ""); KSYMTAB_FUNC(bar, "_gpl", "BAR_NAMESPACE"); KSYMTAB_FUNC() (or KSYMTAB_DATA() if it is data) is expanded to struct kernel_symbol that will be linked to the vmlinux or a module. With this change, EXPORT_SYMBOL() works in the same way for *.c and *.S files, providing the following benefits. [1] Deprecate EXPORT_DATA_SYMBOL() In the old days, EXPORT_SYMBOL() was only available in C files. To export a symbol in *.S, EXPORT_SYMBOL() was placed in a separate *.c file. arch/arm/kernel/armksyms.c is one example written in the classic manner. Commit 22823ab419d8 ("EXPORT_SYMBOL() for asm") removed this limitation. Since then, EXPORT_SYMBOL() can be placed close to the symbol definition in *.S files. It was a nice improvement. However, as that commit mentioned, you need to use EXPORT_DATA_SYMBOL() for data objects on some architectures. In the new approach, modpost checks symbol's type (STT_FUNC or not), and outputs KSYMTAB_FUNC() or KSYMTAB_DATA() accordingly. There are only two users of EXPORT_DATA_SYMBOL: EXPORT_DATA_SYMBOL_GPL(empty_zero_page) (arch/ia64/kernel/head.S) EXPORT_DATA_SYMBOL(ia64_ivt) (arch/ia64/kernel/ivt.S) They are transformed as follows and output into .vmlinux.export.c KSYMTAB_DATA(empty_zero_page, "_gpl", ""); KSYMTAB_DATA(ia64_ivt, "", ""); The other EXPORT_SYMBOL users in ia64 assembly are output as KSYMTAB_FUNC(). EXPORT_DATA_SYMBOL() is now deprecated. [2] merge <linux/export.h> and <asm-generic/export.h> There are two similar header implementations: include/linux/export.h for .c files include/asm-generic/export.h for .S files Ideally, the functionality should be consistent between them, but they tend to diverge. Commit 8651ec01daed ("module: add support for symbol namespaces.") did not support the namespace for *.S files. This commit shifts the essential implementation part to C, which supports EXPORT_SYMBOL_NS() for *.S files. <asm/export.h> and <asm-generic/export.h> will remain as a wrapper of <linux/export.h> for a while. They will be removed after #include <asm/export.h> directives are all replaced with #include <linux/export.h>. [3] Implement CONFIG_TRIM_UNUSED_KSYMS in one-pass algorithm (by a later commit) When CONFIG_TRIM_UNUSED_KSYMS is enabled, Kbuild recursively traverses the directory tree to determine which EXPORT_SYMBOL to trim. If an EXPORT_SYMBOL turns out to be unused by anyone, Kbuild begins the second traverse, where some source files are recompiled with their EXPORT_SYMBOL() tuned into a no-op. We can do this better now; modpost can selectively emit KSYMTAB entries that are really used by modules. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2023-06-16x86/unwind/orc: Add ELF section with ORC version identifierOmar Sandoval1-0/+3
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") changed the ORC format. Although ORC is internal to the kernel, it's the only way for external tools to get reliable kernel stack traces on x86-64. In particular, the drgn debugger [1] uses ORC for stack unwinding, and these format changes broke it [2]. As the drgn maintainer, I don't care how often or how much the kernel changes the ORC format as long as I have a way to detect the change. It suffices to store a version identifier in the vmlinux and kernel module ELF files (to use when parsing ORC sections from ELF), and in kernel memory (to use when parsing ORC from a core dump+symbol table). Rather than hard-coding a version number that needs to be manually bumped, Peterz suggested hashing the definitions from orc_types.h. If there is a format change that isn't caught by this, the hashing script can be updated. This patch adds an .orc_header allocated ELF section containing the 20-byte hash to vmlinux and kernel modules, along with the corresponding __start_orc_header and __stop_orc_header symbols in vmlinux. 1: https://github.com/osandov/drgn 2: https://github.com/osandov/drgn/issues/303 Fixes: ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") Fixes: fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
2023-05-16vmlinux.lds.h: Discard .note.gnu.property sectionJosh Poimboeuf1-1/+8
When tooling reads ELF notes, it assumes each note entry is aligned to the value listed in the .note section header's sh_addralign field. The kernel-created ELF notes in the .note.Linux and .note.Xen sections are aligned to 4 bytes. This causes the toolchain to set those sections' sh_addralign values to 4. On the other hand, the GCC-created .note.gnu.property section has an sh_addralign value of 8 for some reason, despite being based on struct Elf32_Nhdr which only needs 4-byte alignment. When the mismatched input sections get linked together into the vmlinux .notes output section, the higher alignment "wins", resulting in an sh_addralign of 8, which confuses tooling. For example: $ readelf -n .tmp_vmlinux.btf ... readelf: .tmp_vmlinux.btf: Warning: note with invalid namesz and/or descsz found at offset 0x170 readelf: .tmp_vmlinux.btf: Warning: type: 0x4, namesize: 0x006e6558, descsize: 0x00008801, alignment: 8 In this case readelf thinks there's alignment padding where there is none, so it starts reading an ELF note in the middle. With newer toolchains (e.g., latest Fedora Rawhide), a similar mismatch triggers a build failure when combined with CONFIG_X86_KERNEL_IBT: btf_encoder__encode: btf__dedup failed! Failed to encode BTF libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 255 This latter error was caused by pahole crashing when it encountered the corrupt .notes section. This crash has been fixed in dwarves version 1.25. As Tianyi Liu describes: "Pahole reads .notes to look for LINUX_ELFNOTE_BUILD_LTO. When LTO is enabled, pahole needs to call cus__merge_and_process_cu to merge compile units, at which point there should only be one unspecified type (used to represent some compilation information) in the global context. However, when the kernel is compiled without LTO, if pahole calls cus__merge_and_process_cu due to alignment issues with notes, multiple unspecified types may appear after merging the cus, and older versions of pahole only support up to one. This is why pahole 1.24 crashes, while newer versions support multiple. However, the latest version of pahole still does not solve the problem of incorrect LTO recognition, so compiling the kernel may be slower than normal." Even with the newer pahole, the note section misaligment issue still exists and pahole is misinterpreting the LTO note. Fix it by discarding the .note.gnu.property section. While GNU properties are important for user space (and VDSO), they don't seem to have any use for vmlinux. (In fact, they're already getting (inadvertently) stripped from vmlinux when CONFIG_DEBUG_INFO_BTF is enabled. The BTF data is extracted from vmlinux.o with "objcopy --only-section=.BTF" into .btf.vmlinux.bin.o. That file doesn't have .note.gnu.property, so when it gets modified and linked back into the main object, the linker automatically strips it (see "How GNU properties are merged" in the ld man page).) Reported-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lkml.kernel.org/bpf/57830c30-cd77-40cf-9cd1-3bb608aa602e@app.fastmail.com Debugged-by: Tianyi Liu <i.pear@outlook.com> Suggested-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Link: https://lore.kernel.org/r/20230418214925.ay3jpf2zhw75kgmd@treble Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-01-31Merge tag 'v6.2-rc6' into sched/core, to pick up fixesIngo Molnar1-0/+5
Pick up fixes before merging another batch of cpuidle updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-13objtool/idle: Validate __cpuidle code as noinstrPeter Zijlstra1-6/+3
Idle code is very like entry code in that RCU isn't available. As such, add a little validation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
2022-12-30arch: fix broken BuildID for arm64 and riscvMasahiro Yamada1-0/+5
Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux since commit 994b7ac1697b ("arm64: remove special treatment for the link order of head.o"). The issue is that the type of .notes section, which contains the BuildID, changed from NOTES to PROGBITS. Ard Biesheuvel figured out that whichever object gets linked first gets to decide the type of a section. The PROGBITS type is the result of the compiler emitting .note.GNU-stack as PROGBITS rather than NOTE. While Ard provided a fix for arm64, I want to fix this globally because the same issue is happening on riscv since commit 2348e6bf4421 ("riscv: remove special treatment for the link order of head.o"). This problem will happen in general for other architectures if they start to drop unneeded entries from scripts/head-object-list.txt. Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h. Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/ Fixes: 994b7ac1697b ("arm64: remove special treatment for the link order of head.o") Fixes: 2348e6bf4421 ("riscv: remove special treatment for the link order of head.o") Reported-by: Dennis Gilmore <dennis@ausil.us> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-16Merge tag 'driver-core-6.2-rc1' of ↵Linus Torvalds1-140/+94
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.2-rc1. The "big" change in here is the addition of a new macro, container_of_const() that will preserve the "const-ness" of a pointer passed into it. The "problem" of the current container_of() macro is that if you pass in a "const *", out of it can comes a non-const pointer unless you specifically ask for it. For many usages, we want to preserve the "const" attribute by using the same call. For a specific example, this series changes the kobj_to_dev() macro to use it, allowing it to be used no matter what the const value is. This prevents every subsystem from having to declare 2 different individual macros (i.e. kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce the const value at build time, which having 2 macros would not do either. The driver for all of this have been discussions with the Rust kernel developers as to how to properly mark driver core, and kobject, objects as being "non-mutable". The changes to the kobject and driver core in this pull request are the result of that, as there are lots of paths where kobjects and device pointers are not modified at all, so marking them as "const" allows the compiler to enforce this. So, a nice side affect of the Rust development effort has been already to clean up the driver core code to be more obvious about object rules. All of this has been bike-shedded in quite a lot of detail on lkml with different names and implementations resulting in the tiny version we have in here, much better than my original proposal. Lots of subsystem maintainers have acked the changes as well. Other than this change, included in here are smaller stuff like: - kernfs fixes and updates to handle lock contention better - vmlinux.lds.h fixes and updates - sysfs and debugfs documentation updates - device property updates All of these have been in the linux-next tree for quite a while with no problems" * tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits) device property: Fix documentation for fwnode_get_next_parent() firmware_loader: fix up to_fw_sysfs() to preserve const usb.h: take advantage of container_of_const() device.h: move kobj_to_dev() to use container_of_const() container_of: add container_of_const() that preserves const-ness of the pointer driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion. driver core: fix up missed scsi/cxlflash class.devnode() conversion. driver core: fix up some missing class.devnode() conversions. driver core: make struct class.devnode() take a const * driver core: make struct class.dev_uevent() take a const * cacheinfo: Remove of_node_put() for fw_token device property: Add a blank line in Kconfig of tests device property: Rename goto label to be more precise device property: Move PROPERTY_ENTRY_BOOL() a bit down device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*() kernfs: fix all kernel-doc warnings and multiple typos driver core: pass a const * into of_device_uevent() kobject: kset_uevent_ops: make name() callback take a const * kobject: kset_uevent_ops: make filter() callback take a const * kobject: make kobject_namespace take a const * ...
2022-12-15Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-12-12Merge tag 'arm64-upstream' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights this time are support for dynamically enabling and disabling Clang's Shadow Call Stack at boot and a long-awaited optimisation to the way in which we handle the SVE register state on system call entry to avoid taking unnecessary traps from userspace. Summary: ACPI: - Enable FPDT support for boot-time profiling - Fix CPU PMU probing to work better with PREEMPT_RT - Update SMMUv3 MSI DeviceID parsing to latest IORT spec - APMT support for probing Arm CoreSight PMU devices CPU features: - Advertise new SVE instructions (v2.1) - Advertise range prefetch instruction - Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount - Enable DIT (Data Independent Timing) when running in the kernel - More conversion of system register fields over to the generated header CPU misfeatures: - Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: - Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: - Remove static ftrace in favour of, err, dynamic ftrace! - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS - Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: - Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: - Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: - Support for TLP filters in Hisilicon's PCIe PMU device - Support for the DDR PMU present in Amlogic Meson G12 SoCs - Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: - Tighten up our boot protocol for systems with memory above 52 bits physical - Const-ify static keys to satisty jump label asm constraints - Trivial FFA driver cleanups in preparation for v1.1 support - Export the kernel_neon_* APIs as GPL symbols - Harden our instruction generation routines against instrumentation - A bunch of robustness improvements to our arch-specific selftests - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits) arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler arm64: Prohibit instrumentation on arch_stack_walk() arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: alternatives: add __init/__initconst to some functions/variables arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() kselftest/arm64: Allow epoll_wait() to return more than one result kselftest/arm64: Don't drain output while spawning children kselftest/arm64: Hold fp-stress children until they're all spawned arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation ...
2022-11-22Merge tag 'v6.1-rc6' into x86/core, to resolve conflictsIngo Molnar1-1/+1
Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c: # upstream: debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file") # retbleed work in x86/core: 5d8213864ade ("x86/retbleed: Add SKL return thunk") ... and these commits in include/linux/bpf.h: # upstram: 18acb7fac22f ("bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")") # x86/core commits: 931ab63664f0 ("x86/ibt: Implement FineIBT") bea75b33895f ("x86/Kconfig: Introduce function padding") The latter two modify BPF_DISPATCHER_ATTRIBUTES(), which was removed upstream. Conflicts: arch/x86/kernel/asm-offsets.c include/linux/bpf.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-11-21Merge 6.1-rc6 into driver-core-nextGreg Kroah-Hartman1-7/+13
We need the kernfs changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-17vmlinux.lds.h: add HEADERED_SECTION_* macrosJim Cromie1-0/+15
These macros elaborate on BOUNDED_SECTION_(PRE|POST)_LABEL macros, prepending an optional KEEP(.gnu.linkonce##_sec_) reservation, and a linker-symbol to address it. This allows a developer to define a header struct (which must fit with the section's base struct-type), and could contain: 1- fields whose value is common to the entire set of data-records. This allows the header & data structs to specialize, complement each other, and shrink. 2- an uplink pointer to an organizing struct which refs other related/sub data-tables header record is addressable via the extern'd header linker-symbol Once the linker-symbols created by the macro are ref'd extern in code, that code can compute a record's index (ptr - start) in the "primary" table, then use it to index into the related/sub tables. Adding a primary.map_* field foreach sub-table would then allow deduplication and remapping of that sub-table. This is aimed at dyndbg's struct _ddebug __dyndbg[] section, whose 3 columns: function, file, module are 50%, 90%, 100% redundant. The module column is fully recoverable after dynamic_debug_init() saves it to each ddebug_table.module as the builtin __dyndbg[] table is parsed. Given that those 3 columns use 24/56 of a _ddebug record, a dyndbg=y kernel with ~5k callsites could reduce kernel memory substantially. Returning that memory to the kernel buddy-allocator? is then possible. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20221117171633.923628-3-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-17vmlinux.lds.h: fix BOUNDED_SECTION_(PRE|POST)_LABEL macrosJim Cromie1-8/+6
Commit 2f465b921bb8 ("vmlinux.lds.h: place optional header space in BOUNDED_SECTION") added BOUNDED_SECTION_(PRE|POST)_LABEL macros, encapsulating the basic boilerplate to KEEP/pack records into a section, and to mark the begin and end of the section with linker-symbols. But it tried to do extra, adding KEEP(*(.gnu.linkonce.##_sec_)) to optionally reserve a header record in front of the data. It wrongly placed the KEEP after the linker-symbol starting the section, so if a header was added, it would wind up in the data. Moving the KEEP to the "correct" place proved brittle, and too clever by half. The obvious safe fix is to remove the KEEP and restore the plain old boilerplate. The header can be added later, with separate macros. Also, the macro var-names: _s_, _e_ are nearly invisible, change them to more obvious names: _BEGIN_, _END_ Fixes: 2f465b921bb8 ("vmlinux.lds.h: place optional header space in BOUNDED_SECTION") Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20221117171633.923628-2-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-11Merge tag 'hardening-v6.1-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fix from Kees Cook: - Fix !SMP placement of '.data..decrypted' section (Nathan Chancellor) * tag 'hardening-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: vmlinux.lds.h: Fix placement of '.data..decrypted' section
2022-11-10vmlinux.lds.h: place optional header space in BOUNDED_SECTIONJim Cromie1-0/+2
Extend recently added BOUNDED_SECTION(_name) macro by adding a KEEP(*(.gnu.linkonce.##_name)) before the KEEP(*(_name)). This does nothing by itself, vmlinux is the same before and after this patch. But if a developer adds a .gnu.linkonce.foo record, that record is placed in the front of the section, where it can be used as a header for the table. The intent is to create an up-link to another organizing struct, from where related tables can be referenced. And since every item in a table has a known offset from its header, that same offset can be used to fetch records from the related tables. By itself, this doesnt gain much, unless maybe the pattern of access is to scan 1 or 2 fields in each fat record, but with 2 16 bit .map* fields added, we could de-duplicate 2 related tables. The use case here is struct _ddebug, which has 3 pointers (function, file, module) with substantial repetition; respectively 53%, 90%, and the module column is fully recoverable after dynamic_debug_init() splits the table into a linked list of "module" chunks. On a DYNAMIC_DEBUG=y kernel with 5k pr_debugs, the memory savings should be ~100 KiB. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20221022225637.1406715-3-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10vmlinux.lds.h: add BOUNDED_SECTION* macrosJim Cromie1-140/+79
vmlinux.lds.h has ~45 occurrences of this general pattern: __start_foo = .; KEEP(*(foo)) __stop_foo = .; Reduce this pattern to a (group of 4) macros, and use them to reduce linecount. This was inspired by the codetag patchset. no functional change. CC: Suren Baghdasaryan <surenb@google.com> CC: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20221022225637.1406715-2-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-09arm64: unwind: add asynchronous unwind tables to kernel and modulesArd Biesheuvel1-2/+7
Enable asynchronous unwind table generation for both the core kernel as well as modules, and emit the resulting .eh_frame sections as init code so we can use the unwind directives for code patching at boot or module load time. This will be used by dynamic shadow call stack support, which will rely on code patching rather than compiler codegen to emit the shadow call stack push and pop instructions. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-11-08vmlinux.lds.h: Fix placement of '.data..decrypted' sectionNathan Chancellor1-1/+1
Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP") fixed an orphan section warning by adding the '.data..decrypted' section to the linker script under the PERCPU_DECRYPTED_SECTION define but that placement introduced a panic with !SMP, as the percpu sections are not instantiated with that configuration so attempting to access variables defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault. Move the '.data..decrypted' section to the DATA_MAIN define so that the variables in it are properly instantiated at boot time with CONFIG_SMP=n. Cc: stable@vger.kernel.org Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP") Link: https://lore.kernel.org/cbbd3548-880c-d2ca-1b67-5bb93b291d5f@huawei.com/ Debugged-by: Ard Biesheuvel <ardb@kernel.org> Reported-by: Zhao Wenhui <zhaowenhui8@huawei.com> Tested-by: xiafukun <xiafukun@huawei.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221108174934.3384275-1-nathan@kernel.org
2022-10-22Merge branch 'x86/urgent' into x86/core, to resolve conflictIngo Molnar1-6/+12
There's a conflict between the call-depth tracking commits in x86/core: ee3e2469b346 ("x86/ftrace: Make it call depth tracking aware") 36b64f101219 ("x86/ftrace: Rebalance RSB") eac828eaef29 ("x86/ftrace: Remove ftrace_epilogue()") And these fixes in x86/urgent: 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()") b5f1fc318440 ("x86/ftrace: Remove ftrace_epilogue()") It's non-trivial overlapping modifications - resolve them. Conflicts: arch/x86/kernel/ftrace_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-10-20ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()Peter Zijlstra1-6/+12
Different function signatures means they needs to be different functions; otherwise CFI gets upset. As triggered by the ftrace boot tests: [] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347) Fixes: 3c516f89e17e ("x86: Add support for CONFIG_CFI_CLANG") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net
2022-10-17arch: Introduce CONFIG_FUNCTION_ALIGNMENTPeter Zijlstra1-2/+2
Generic function-alignment infrastructure. Architectures can select FUNCTION_ALIGNMENT_xxB symbols; the FUNCTION_ALIGNMENT symbol is then set to the largest such selected size, 0 otherwise. From this the -falign-functions compiler argument and __ALIGN macro are set. This incorporates the DEBUG_FORCE_FUNCTION_ALIGN_64B knob and future alignment requirements for x86_64 (later in this series) into a single place. NOTE: also removes the 0x90 filler byte from the generic __ALIGN primitive, that value makes no sense outside of x86. NOTE: .balign 0 reverts to a no-op. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.719248727@infradead.org
2022-10-08Merge tag 'driver-core-6.1-rc1' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core and debug printk changes for 6.1-rc1. Included in here is: - dynamic debug updates for the core and the drm subsystem. The drm changes have all been acked by the relevant maintainers - kernfs fixes for syzbot reported problems - kernfs refactors and updates for cgroup requirements - magic number cleanups and removals from the kernel tree (they were not being used and they really did not actually do anything) - other tiny cleanups All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits) docs: filesystems: sysfs: Make text and code for ->show() consistent Documentation: NBD_REQUEST_MAGIC isn't a magic number a.out: restore CMAGIC device property: Add const qualifier to device_get_match_data() parameter drm_print: add _ddebug descriptor to drm_*dbg prototypes drm_print: prefer bare printk KERN_DEBUG on generic fn drm_print: optimize drm_debug_enabled for jump-label drm-print: add drm_dbg_driver to improve namespace symmetry drm-print.h: include dyndbg header drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro drm_print: interpose drm_*dbg with forwarding macros drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers. drm_print: condense enum drm_debug_category debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs() Documentation: ENI155_MAGIC isn't a magic number Documentation: NBD_REPLY_MAGIC isn't a magic number nbd: remove define-only NBD_MAGIC, previously magic number Documentation: FW_HEADER_MAGIC isn't a magic number Documentation: EEPROM_MAGIC_VALUE isn't a magic number ...
2022-10-04Merge tag 'net-next-6.1' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ...
2022-10-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-1/+10
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26cfi: Switch to -fsanitize=kcfiSami Tolvanen1-18/+19
Switch from Clang's original forward-edge control-flow integrity implementation to -fsanitize=kcfi, which is better suited for the kernel, as it doesn't require LTO, doesn't use a jump table that requires altering function references, and won't break cross-module function address equality. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-6-samitolvanen@google.com
2022-09-23vmlinux.lds.h: CFI: Reduce alignment of jump-table to function alignmentWill Deacon1-2/+1
Due to undocumented, hysterical raisins on x86, the CFI jump-table sections in .text are needlessly aligned to PMD_SIZE in the vmlinux linker script. When compiling a CFI-enabled arm64 kernel with a 64KiB page-size, a PMD maps 512MiB of virtual memory and so the .text section increases to a whopping 940MiB and blows the final Image up to 960MiB. Others report a link failure. Since the CFI jump-table requires only instruction alignment, reduce the alignment directives to function alignment for parity with other parts of the .text section. This reduces the size of the .text section for the aforementioned 64KiB page size arm64 kernel to 19MiB for a much more reasonable total Image size of 39MiB. Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Mohan Rao .vanimina" <mailtoc.mohanrao@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/all/CAL_GTzigiNOMYkOPX1KDnagPhJtFNqSK=1USNbS0wUL4PW6-Uw@mail.gmail.com/ Fixes: cf68fffb66d6 ("add support for Clang CFI") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220922215715.13345-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-09-16ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLEPeter Zijlstra (Intel)1-1/+10
x86 will shortly start using -fpatchable-function-entry for purposes other than ftrace, make sure the __patchable_function_entry section isn't merged in the mcount_loc section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
2022-09-07kernel/module: add __dyndbg_classes sectionJim Cromie1-0/+3
Add __dyndbg_classes section, using __dyndbg as a model. Use it: vmlinux.lds.h: KEEP the new section, which also silences orphan section warning on loadable modules. Add (__start_/__stop_)__dyndbg_classes linker symbols for the c externs (below). kernel/module/main.c: - fill new fields in find_module_sections(), using section_objs() - extend callchain prototypes to pass classes, length load_module(): pass new info to dynamic_debug_setup() dynamic_debug_setup(): new params, pass through to ddebug_add_module() dynamic_debug.c: - add externs to the linker symbols. ddebug_add_module(): - It currently builds a debug_table, and *will* find and attach classes. dynamic_debug_init(): - add class fields to the _ddebug_info cursor var: di. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220904214134.408619-16-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-20sched: Reverse sched_class layoutPeter Zijlstra1-6/+6
Because GCC-12 is fully stupid about array bounds and it's just really hard to get a solid array definition from a linker script, flip the array order to avoid needing negative offsets :-/ This makes the whole relational pointer magic a little less obvious, but alas. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/YoOLLmLG7HRTXeEm@hirez.programming.kicks-ass.net
2022-03-27Merge tag 'x86_core_for_5.18_rc1' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra: "Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP. Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is limited to 2 instructions (and typically fewer) on branch targets not starting with ENDBR. CET-IBT also limits speculation of the next sequential instruction after the indirect CALL/JMP [1]. CET-IBT is fundamentally incompatible with retpolines, but provides, as described above, speculation limits itself" [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html * tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) kvm/emulate: Fix SETcc emulation for ENDBR x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0 x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0 kbuild: Fixup the IBT kbuild changes x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy x86: Remove toolchain check for X32 ABI capability x86/alternative: Use .ibt_endbr_seal to seal indirect calls objtool: Find unused ENDBR instructions objtool: Validate IBT assumptions objtool: Add IBT/ENDBR decoding objtool: Read the NOENDBR annotation x86: Annotate idtentry_df() x86,objtool: Move the ASM_REACHABLE annotation to objtool.h x86: Annotate call_on_stack() objtool: Rework ASM_REACHABLE x86: Mark __invalid_creds() __noreturn exit: Mark do_group_exit() __noreturn x86: Mark stop_this_cpu() __noreturn objtool: Ignore extra-symbol code objtool: Rename --duplicate to --lto ...
2022-03-15static_call: Avoid building empty .static_call_sitesPeter Zijlstra1-0/+4
Without CONFIG_HAVE_STATIC_CALL_INLINE there's no point in creating the .static_call_sites section and it's related symbols. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154317.223798256@infradead.org
2022-02-04powercap/drivers/dtpm: Convert the init table section to a simple arrayDaniel Lezcano1-11/+0
The init table section is freed after the system booted. However the next changes will make per module the DTPM description, so the table won't be accessible when the module is loaded. In order to fix that, we should move the table to the data section where there are very few entries and that makes strange to add it there. The main goal of the table was to keep self-encapsulated code and we can keep it almost as it by using an array instead. Suggested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20220128163537.212248-2-daniel.lezcano@linaro.org
2021-11-04Merge tag 'driver-core-5.16-rc1' of ↵Linus Torvalds1-7/+13
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 5.16-rc1. All of these have been in linux-next for a while now with no reported problems. Included in here are: - big update and cleanup of the sysfs abi documentation files and scripts from Mauro. We are almost at the place where we can properly check that the running kernel's sysfs abi is documented fully. - firmware loader updates - dyndbg updates - kernfs cleanups and fixes from Christoph - device property updates - component fix - other minor driver core cleanups and fixes" * tag 'driver-core-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (122 commits) device property: Drop redundant NULL checks x86/build: Tuck away built-in firmware under FW_LOADER vmlinux.lds.h: wrap built-in firmware support under FW_LOADER firmware_loader: move struct builtin_fw to the only place used x86/microcode: Use the firmware_loader built-in API firmware_loader: remove old DECLARE_BUILTIN_FIRMWARE() firmware_loader: formalize built-in firmware API component: do not leave master devres group open after bind dyndbg: refine verbosity 1-4 summary-detail gpiolib: acpi: Replace custom code with device_match_acpi_handle() i2c: acpi: Replace custom function with device_match_acpi_handle() driver core: Provide device_match_acpi_handle() helper dyndbg: fix spurious vNpr_info change dyndbg: no vpr-info on empty queries dyndbg: vpr-info on remove-module complete, not starting device property: Add missed header in fwnode.h Documentation: dyndbg: Improve cli param examples dyndbg: Remove support for ddebug_query param dyndbg: make dyndbg a known cli param dyndbg: show module in vpr-info in dd-exec-queries ...
2021-11-02Merge tag 'x86_core_for_v5.16_rc1' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to keep old userspace from breaking. Adjust the corresponding iopl selftest to that. - Improve stack overflow warnings to say which stack got overflowed and raise the exception stack sizes to 2 pages since overflowing the single page of exception stack is very easy to do nowadays with all the tracing machinery enabled. With that, rip out the custom mapping of AMD SEV's too. - A bunch of changes in preparation for FGKASLR like supporting more than 64K section headers in the relocs tool, correct ORC lookup table size to cover the whole kernel .text and other adjustments. * tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage vmlinux.lds.h: Have ORC lookup cover entire _etext - _stext x86/boot/compressed: Avoid duplicate malloc() implementations x86/boot: Allow a "silent" kaslr random byte fetch x86/tools/relocs: Support >64K section headers x86/sev: Make the #VC exception stacks part of the default stacks storage x86: Increase exception stack sizes x86/mm/64: Improve stack overflow warnings x86/iopl: Fake iopl(3) CLI/STI usage
2021-10-27vmlinux.lds.h: Have ORC lookup cover entire _etext - _stextKristen Carlson Accardi1-1/+2
When using -ffunction-sections to place each function in its own text section (so it can be randomized at load time in the future FGKASLR series), the linker will place most of the functions into separate .text.* sections. SIZEOF(.text) won't work here for calculating the ORC lookup table size, so the total text size must be calculated to include .text AND all .text.* sections. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> [ alobakin: move it to vmlinux.lds.h and make arch-indep ] Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211013175742.1197608-5-keescook@chromium.org
2021-10-22vmlinux.lds.h: wrap built-in firmware support under FW_LOADERLuis Chamberlain1-7/+13
The firmware loader built-in firmware is only available when FW_LOADER is built-in, so tuck away the sections for built-in firmware under it. This ensures no oddball user tries to uses these sections without first enabling FW_LOADER=y. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211021155843.1969401-6-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-20tracing: Use linker magic instead of recasting ftrace_ops_list_func()Steven Rostedt (VMware)1-2/+8
In an effort to enable -Wcast-function-type in the top-level Makefile to support Control Flow Integrity builds, all function casts need to be removed. This means that ftrace_ops_list_func() can no longer be defined as ftrace_ops_no_ops(). The reason for ftrace_ops_no_ops() is to use that when an architecture calls ftrace_ops_list_func() with only two parameters (called from assembly). And to make sure there's no C side-effects, those archs call ftrace_ops_no_ops() which only has two parameters, as ftrace_ops_list_func() has four parameters. Instead of a typecast, use vmlinux.lds.h to define ftrace_ops_list_func() to arch_ftrace_ops_list_func() that will define the proper set of parameters. Link: https://lore.kernel.org/r/20200614070154.6039-1-oscar.carter@gmx.com Link: https://lkml.kernel.org/r/20200617165616.52241bde@oasis.local.home Link: https://lore.kernel.org/all/20211005053922.GA702049@embeddedor/ Requested-by: Oscar Carter <oscar.carter@gmx.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-13vmlinux.lds.h: remove old check for GCC 4.9Nick Desaulniers1-4/+0
Now that GCC 5.1 is the minimally supported version of GCC, we can effectively revert commit 85c2ce9104eb ("sched, vmlinux.lds: Increase STRUCT_ALIGNMENT to 64 bytes for GCC-4.9") Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-02Merge tag 'printk-for-5.15' of ↵Linus Torvalds1-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-08-11vmlinux.lds.h: Handle clang's module.{c,d}tor sectionsNathan Chancellor1-0/+1
A recent change in LLVM causes module_{c,d}tor sections to appear when CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings because these are not handled anywhere: ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor' ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN flag, so it is in a separate section even with -fno-function-sections (default)". Place them in the TEXT_TEXT section so that these technologies continue to work with the newer compiler versions. All of the KASAN and KCSAN KUnit tests continue to pass after this change. Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1432 Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08a59a01865 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Fangrui Song <maskray@google.com> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org
2021-07-19printk: Userspace format indexing supportChris Down1-0/+13
We have a number of systems industry-wide that have a subset of their functionality that works as follows: 1. Receive a message from local kmsg, serial console, or netconsole; 2. Apply a set of rules to classify the message; 3. Do something based on this classification (like scheduling a remediation for the machine), rinse, and repeat. As a couple of examples of places we have this implemented just inside Facebook, although this isn't a Facebook-specific problem, we have this inside our netconsole processing (for alarm classification), and as part of our machine health checking. We use these messages to determine fairly important metrics around production health, and it's important that we get them right. While for some kinds of issues we have counters, tracepoints, or metrics with a stable interface which can reliably indicate the issue, in order to react to production issues quickly we need to work with the interface which most kernel developers naturally use when developing: printk. Most production issues come from unexpected phenomena, and as such usually the code in question doesn't have easily usable tracepoints or other counters available for the specific problem being mitigated. We have a number of lines of monitoring defence against problems in production (host metrics, process metrics, service metrics, etc), and where it's not feasible to reliably monitor at another level, this kind of pragmatic netconsole monitoring is essential. As one would expect, monitoring using printk is rather brittle for a number of reasons -- most notably that the message might disappear entirely in a new version of the kernel, or that the message may change in some way that the regex or other classification methods start to silently fail. One factor that makes this even harder is that, under normal operation, many of these messages are never expected to be hit. For example, there may be a rare hardware bug which one wants to detect if it was to ever happen again, but its recurrence is not likely or anticipated. This precludes using something like checking whether the printk in question was printed somewhere fleetwide recently to determine whether the message in question is still present or not, since we don't anticipate that it should be printed anywhere, but still need to monitor for its future presence in the long-term. This class of issue has happened on a number of occasions, causing unhealthy machines with hardware issues to remain in production for longer than ideal. As a recent example, some monitoring around blk_update_request fell out of date and caused semi-broken machines to remain in production for longer than would be desirable. Searching through the codebase to find the message is also extremely fragile, because many of the messages are further constructed beyond their callsite (eg. btrfs_printk and other module-specific wrappers, each with their own functionality). Even if they aren't, guessing the format and formulation of the underlying message based on the aesthetics of the message emitted is not a recipe for success at scale, and our previous issues with fleetwide machine health checking demonstrate as much. This provides a solution to the issue of silently changed or deleted printks: we record pointers to all printk format strings known at compile time into a new .printk_index section, both in vmlinux and modules. At runtime, this can then be iterated by looking at <debugfs>/printk/index/<module>, which emits the following format, both readable by humans and able to be parsed by machines: $ head -1 vmlinux; shuf -n 5 vmlinux # <level[,flags]> filename:line function "format" <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n" <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n" <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n" <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n" <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n" This mitigates the majority of cases where we have a highly-specific printk which we want to match on, as we can now enumerate and check whether the format changed or the printk callsite disappeared entirely in userspace. This allows us to catch changes to printks we monitor earlier and decide what to do about it before it becomes problematic. There is no additional runtime cost for printk callers or printk itself, and the assembly generated is exactly the same. Signed-off-by: Chris Down <chris@chrisdown.name> Cc: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h} Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name