summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2021-05-27Merge tag 'net-5.13-rc4' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.13-rc4, including fixes from bpf, netfilter, can and wireless trees. Notably including fixes for the recently announced "FragAttacks" WiFi vulnerabilities. Rather large batch, touching some core parts of the stack, too, but nothing hair-raising. Current release - regressions: - tipc: make node link identity publish thread safe - dsa: felix: re-enable TAS guard band mode - stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid() - stmmac: fix system hang if change mac address after interface ifdown Current release - new code bugs: - mptcp: avoid OOB access in setsockopt() - bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers - ethtool: stats: fix a copy-paste error - init correct array size Previous releases - regressions: - sched: fix packet stuck problem for lockless qdisc - net: really orphan skbs tied to closing sk - mlx4: fix EEPROM dump support - bpf: fix alu32 const subreg bound tracking on bitwise operations - bpf: fix mask direction swap upon off reg sign change - bpf, offload: reorder offload callback 'prepare' in verifier - stmmac: Fix MAC WoL not working if PHY does not support WoL - packetmmap: fix only tx timestamp on request - tipc: skb_linearize the head skb when reassembling msgs Previous releases - always broken: - mac80211: address recent "FragAttacks" vulnerabilities - mac80211: do not accept/forward invalid EAPOL frames - mptcp: avoid potential error message floods - bpf, ringbuf: deny reserve of buffers larger than ringbuf to prevent out of buffer writes - bpf: forbid trampoline attach for functions with variable arguments - bpf: add deny list of functions to prevent inf recursion of tracing programs - tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT - can: isotp: prevent race between isotp_bind() and isotp_setsockopt() - netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version Misc: - bpf: add kconfig knob for disabling unpriv bpf by default" * tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (172 commits) net: phy: Document phydev::dev_flags bits allocation mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer mptcp: avoid error message on infinite mapping mptcp: drop unconditional pr_warn on bad opt mptcp: avoid OOB access in setsockopt() nfp: update maintainer and mailing list addresses net: mvpp2: add buffer header handling in RX bnx2x: Fix missing error code in bnx2x_iov_init_one() net: zero-initialize tc skb extension on allocation net: hns: Fix kernel-doc sctp: fix the proc_handler for sysctl encap_port sctp: add the missing setting for asoc encap_port bpf, selftests: Adjust few selftest result_unpriv outcomes bpf: No need to simulate speculative domain for immediates bpf: Fix mask direction swap upon off reg sign change bpf: Wrap aux data inside bpf_sanitize_info container bpf: Fix BPF_LSM kconfig symbol dependency selftests/bpf: Add test for l3 use of bpf_redirect_peer bpftool: Add sock_release help info for cgroup attach/prog load command net: dsa: microchip: enable phy errata workaround on 9567 ...
2021-05-23Merge tag 'perf-urgent-2021-05-23' of ↵Linus Torvalds4-9/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Two perf fixes: - Do not check the LBR_TOS MSR when setting up unrelated LBR MSRs as this can cause malfunction when TOS is not supported - Allocate the LBR XSAVE buffers along with the DS buffers upfront because allocating them when adding an event can deadlock" * tag 'perf-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
2021-05-23Merge tag 'irq-urgent-2021-05-23' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A few fixes for irqchip drivers: - Allocate interrupt descriptors correctly on Mainstone PXA when SPARSE_IRQ is enabled; otherwise the interrupt association fails - Make the APPLE AIC chip driver depend on APPLE - Remove redundant error output on devm_ioremap_resource() failure" * tag 'irq-urgent-2021-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: Remove redundant error printing irqchip/apple-aic: APPLE_AIC should depend on ARCH_APPLE ARM: PXA: Fix cplds irqdesc allocation when using legacy mode
2021-05-23Merge tag 'x86_urgent_for_v5.13_rc3' of ↵Linus Torvalds3-57/+92
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix how SEV handles MMIO accesses by forwarding potential page faults instead of killing the machine and by using the accessors with the exact functionality needed when accessing memory. - Fix a confusion with Clang LTO compiler switches passed to the it - Handle the case gracefully when VMGEXIT has been executed in userspace * tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Use __put_user()/__get_user() for data accesses x86/sev-es: Forward page-faults which happen during emulation x86/sev-es: Don't return NULL from sev_es_get_ghcb() x86/build: Fix location of '-plugin-opt=' flags x86/sev-es: Invalidate the GHCB after completing VMGEXIT x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
2021-05-23Merge tag 'powerpc-5.13-4' of ↵Linus Torvalds3-37/+54
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix breakage of strace (and other ptracers etc.) when using the new scv ABI (Power9 or later with glibc >= 2.33). - Fix early_ioremap() on 64-bit, which broke booting on some machines. Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and Christophe Leroy. * tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls powerpc: Fix early setup to make early_ioremap() work
2021-05-22Merge tag 'for-linus-5.13b-rc3-tag' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a fix for a boot regression when running as PV guest on hardware without NX support - a small series fixing a bug in the Xen pciback driver when configuring a PCI card with multiple virtual functions * tag 'for-linus-5.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: reconfigure also from backend watch handler xen-pciback: redo VF placement in the virtual topology x86/Xen: swap NX determination and GDT setup on BSP
2021-05-21Merge branch 'for-v5.13-rc3' of ↵Linus Torvalds2-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo fix from Eric Biederman: "During the merge window an issue with si_perf and the siginfo ABI came up. The alpha and sparc siginfo structure layout had changed with the addition of SIGTRAP TRAP_PERF and the new field si_perf. The reason only alpha and sparc were affected is that they are the only architectures that use si_trapno. Looking deeper it was discovered that si_trapno is used for only a few select signals on alpha and sparc, and that none of the other _sigfault fields past si_addr are used at all. Which means technically no regression on alpha and sparc. While the alignment concerns might be dismissed the abuse of si_errno by SIGTRAP TRAP_PERF does have the potential to cause regressions in existing userspace. While we still have time before userspace starts using and depending on the new definition siginfo for SIGTRAP TRAP_PERF this set of changes cleans up siginfo_t. - The si_trapno field is demoted from magic alpha and sparc status and made an ordinary union member of the _sigfault member of siginfo_t. Without moving it of course. - si_perf is replaced with si_perf_data and si_perf_type ending the abuse of si_errno. - Unnecessary additions to signalfd_siginfo are removed" * 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo signal: Deliver all of the siginfo perf data in _perf signal: Factor force_sig_perf out of perf_sigtrap signal: Implement SIL_FAULT_TRAPNO siginfo: Move si_trapno inside the union inside _si_fault
2021-05-21Merge tag 'for-linus' of git://github.com/openrisc/linuxLinus Torvalds3-5/+12
Pull OpenRISC fixes from Stafford Horne: "A few fixes that came in around the time of the merge window" * tag 'for-linus' of git://github.com/openrisc/linux: openrisc: Define memory barrier mb openrisc: mm/init.c: remove unused variable 'end' in paging_init() openrisc: mm/init.c: remove unused memblock_region variable in map_ram() openrisc: Fix a memory leak
2021-05-21x86/Xen: swap NX determination and GDT setup on BSPJan Beulich1-4/+4
xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-21Merge tag 'arm-soc-fixes-5.13-1' of ↵Linus Torvalds17-8/+86
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Only a small number of fixes so far, including some that I had applied during the merge window, so this is based on the original merge of the other branches. - The largest change is a fix for a reference counting bug in the AMD TEE driver. - Neil Armstrong now co-maintains Amlogic SoC support - Two build warning fixes for renesas device tree files - A sign expansion bug for optee - A DT binding fix for a mismerge" * tag 'arm-soc-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: npcm: wpcm450: select interrupt controller driver MAINTAINERS: ARM/Amlogic SoCs: add Neil as primary maintainer tee: amdtee: unload TA only when its refcount becomes 0 dt-bindings: nvmem: mediatek: remove duplicate mt8192 line firmware: arm_scmi: Remove duplicate declaration of struct scmi_protocol_handle firmware: arm_scpi: Prevent the ternary sign expansion bug arm64: dts: renesas: Add port@0 node for all CSI-2 nodes to dtsi arm64: dts: renesas: aistarvision-mipi-adapter-2.1: Fix CSI40 ports
2021-05-21bpf: Fix BPF_JIT kconfig symbol dependencyDaniel Borkmann1-2/+1
Randy reported a randconfig build error recently on i386: ld: arch/x86/net/bpf_jit_comp32.o: in function `do_jit': bpf_jit_comp32.c:(.text+0x28c9): undefined reference to `__bpf_call_base' ld: arch/x86/net/bpf_jit_comp32.o: in function `bpf_int_jit_compile': bpf_jit_comp32.c:(.text+0x3694): undefined reference to `bpf_jit_blind_constants' ld: bpf_jit_comp32.c:(.text+0x3719): undefined reference to `bpf_jit_binary_free' ld: bpf_jit_comp32.c:(.text+0x3745): undefined reference to `bpf_jit_binary_alloc' ld: bpf_jit_comp32.c:(.text+0x37d3): undefined reference to `bpf_jit_prog_release_other' [...] The cause was that b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf with core options") moved BPF_JIT from net/Kconfig into kernel/bpf/Kconfig and previously BPF_JIT was guarded by a 'if NET'. However, there is no actual dependency on NET, it's just that menuconfig NET selects BPF. And the latter in turn causes kernel/bpf/core.o to be built which contains above symbols. Randy's randconfig didn't have NET set, and BPF wasn't either, but BPF_JIT otoh was. Detangle this by making BPF_JIT depend on BPF instead. arm64 was the only arch that pulled in its JIT in net/ via obj-$(CONFIG_NET), all others unconditionally pull this dir in via obj-y. Do the same since CONFIG_NET guard there is really useless as we compiled the JIT via obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o anyway. Fixes: b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf with core options") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org>
2021-05-20Merge tag 'quota_for_v5.13-rc3' of ↵Linus Torvalds17-18/+17
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fixes from Jan Kara: "The most important part in the pull is disablement of the new syscall quotactl_path() which was added in rc1. The reason is some people at LWN discussion pointed out dirfd would be useful for this path based syscall and Christian Brauner agreed. Without dirfd it may be indeed problematic for containers. So let's just disable the syscall for now when it doesn't have users yet so that we have more time to mull over how to best specify the filesystem we want to work on" * tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Disable quotactl_path syscall quota: Use 'hlist_for_each_entry' to simplify code
2021-05-20powerpc/64s/syscall: Fix ptrace syscall info with scv syscallsNicholas Piggin2-35/+52
The scv implementation missed updating syscall return value and error value get/set functions to deal with the changed register ABI. This broke ptrace PTRACE_GET_SYSCALL_INFO as well as some kernel auditing and tracing functions. Fix. tools/testing/selftests/ptrace/get_syscall_info now passes when scv is used. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520111931.2597127-2-npiggin@gmail.com
2021-05-20powerpc: Fix early setup to make early_ioremap() workAlexey Kardashevskiy1-2/+2
The immediate problem is that after commit 0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()") the kernel silently reboots on some systems. The reason is that early_ioremap() returns broken addresses as it uses slot_virt[] array which initialized with offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE == KERN_IO_END - FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0 when early_ioremap_setup() is called. __kernel_io_end is initialized little bit later in early_init_mmu(). This fixes the initialization by swapping early_ioremap_setup() and early_init_mmu(). Fixes: 265c3491c4bc ("powerpc: Add support for GENERIC_EARLY_IOREMAP") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop unrelated cleanup & cleanup change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520032919.358935-1-aik@ozlabs.ru
2021-05-19x86/sev-es: Use __put_user()/__get_user() for data accessesJoerg Roedel1-20/+46
The put_user() and get_user() functions do checks on the address which is passed to them. They check whether the address is actually a user-space address and whether its fine to access it. They also call might_fault() to indicate that they could fault and possibly sleep. All of these checks are neither wanted nor needed in the #VC exception handler, which can be invoked from almost any context and also for MMIO instructions from kernel space on kernel memory. All the #VC handler wants to know is whether a fault happened when the access was tried. This is provided by __put_user()/__get_user(), which just do the access no matter what. Also add comments explaining why __get_user() and __put_user() are the best choice here and why it is safe to use them in this context. Also explain why copy_to/from_user can't be used. In addition, also revert commit 7024f60d6552 ("x86/sev-es: Handle string port IO to kernel memory properly") because using __get_user()/__put_user() fixes the same problem while the above commit introduced several problems: 1) It uses access_ok() which is only allowed in task context. 2) It uses memcpy() which has no fault handling at all and is thus unsafe to use here. [ bp: Fix up commit ID of the reverted commit above. ] Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org
2021-05-19x86/sev-es: Forward page-faults which happen during emulationJoerg Roedel1-0/+4
When emulating guest instructions for MMIO or IOIO accesses, the #VC handler might get a page-fault and will not be able to complete. Forward the page-fault in this case to the correct handler instead of killing the machine. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org
2021-05-19x86/sev-es: Don't return NULL from sev_es_get_ghcb()Joerg Roedel1-13/+12
sev_es_get_ghcb() is called from several places but only one of them checks the return value. The reaction to returning NULL is always the same: calling panic() and kill the machine. Instead of adding checks to all call sites, move the panic() into the function itself so that it will no longer return NULL. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org
2021-05-19x86/build: Fix location of '-plugin-opt=' flagsNathan Chancellor1-6/+6
Commit b33fff07e3e3 ("x86, build: allow LTO to be selected") added a couple of '-plugin-opt=' flags to KBUILD_LDFLAGS because the code model and stack alignment are not stored in LLVM bitcode. However, these flags were added to KBUILD_LDFLAGS prior to the emulation flag assignment, which uses ':=', so they were overwritten and never added to $(LD) invocations. The absence of these flags caused misalignment issues in the AMDGPU driver when compiling with CONFIG_LTO_CLANG, resulting in general protection faults. Shuffle the assignment below the initial one so that the flags are properly passed along and all of the linker flags stay together. At the same time, avoid any future issues with clobbering flags by changing the emulation flag assignment to '+=' since KBUILD_LDFLAGS is already defined with ':=' in the main Makefile before being exported for modification here as a result of commit: ce99d0bf312d ("kbuild: clear LDFLAGS in the top Makefile") Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected") Reported-by: Anthony Ruhier <aruhier@mailbox.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Anthony Ruhier <aruhier@mailbox.org> Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1374 Link: https://lore.kernel.org/r/20210518190106.60935-1-nathan@kernel.org
2021-05-19signal: Deliver all of the siginfo perf data in _perfEric W. Biederman2-3/+6
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-19siginfo: Move si_trapno inside the union inside _si_faultEric W. Biederman1-0/+3
It turns out that linux uses si_trapno very sparingly, and as such it can be considered extra information for a very narrow selection of signals, rather than information that is present with every fault reported in siginfo. As such move si_trapno inside the union inside of _si_fault. This results in no change in placement, and makes it eaiser to extend _si_fault in the future as this reduces the number of special cases. In particular with si_trapno included in the union it is no longer a concern that the union must be pointer aligned on most architectures because the union follows immediately after si_addr which is a pointer. This change results in a difference in siginfo field placement on sparc and alpha for the fields si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. These architectures do not implement the signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. Further these architecture have not yet implemented the userspace that would use si_perf. The point of this change is in fact to correct these placement issues before sparc or alpha grow userspace that cares. This change was discussed[1] and the agreement is that this change is currently safe. [1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.com Acked-by: Marco Elver <elver@google.com> v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18ARM: npcm: wpcm450: select interrupt controller driverJonathan Neuschäfer1-0/+1
The interrupt controller driver is necessary in order to have a functioning Linux system on WPCM450. Select it in mach-npcm/Kconfig. Fixes: ece3fe93e8f4 ("ARM: npcm: Introduce Nuvoton WPCM450 SoC") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20210513165627.1767093-1-j.neuschaefer@gmx.net Link: https://lore.kernel.org/r/20210518071514.604492-1-joel@jms.id.au' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-05-18perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic contextLike Xu3-8/+30
If the kernel is compiled with the CONFIG_LOCKDEP option, the conditional might_sleep_if() deep in kmem_cache_alloc() will generate the following trace, and potentially cause a deadlock when another LBR event is added: [] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196 [] Call Trace: [] kmem_cache_alloc+0x36/0x250 [] intel_pmu_lbr_add+0x152/0x170 [] x86_pmu_add+0x83/0xd0 Make it symmetric with the release_lbr_buffers() call and mirror the existing DS buffers. Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simplified] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20210430052247.3079672-2-like.xu@linux.intel.com
2021-05-18perf/x86: Avoid touching LBR_TOS MSR for Arch LBRLike Xu1-1/+1
The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). In a guest that should support Architecture LBR, check_msr() will be a non-related check for the architecture MSR 0x0 (IA32_P5_MC_ADDR) that is also not supported by KVM. The failure will cause x86_pmu.lbr_nr = 0, thereby preventing the initialization of the guest Arch LBR. Fix it by avoiding this extraneous check in intel_pmu_init() for Arch LBR. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Like Xu <like.xu@linux.intel.com> [peterz: simpler still] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210430052247.3079672-1-like.xu@linux.intel.com
2021-05-18x86/sev-es: Invalidate the GHCB after completing VMGEXITTom Lendacky2-0/+6
Since the VMGEXIT instruction can be issued from userspace, invalidate the GHCB after performing VMGEXIT processing in the kernel. Invalidation is only required after userspace is available, so call vc_ghcb_invalidate() from sev_es_put_ghcb(). Update vc_ghcb_invalidate() to additionally clear the GHCB exit code so that it is always presented as 0 when VMGEXIT has been issued by anything else besides the kernel. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/5a8130462e4f0057ee1184509cd056eedd78742b.1621273353.git.thomas.lendacky@amd.com
2021-05-18x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patchTom Lendacky1-18/+18
Move the location of sev_es_put_ghcb() in preparation for an update to it in a follow-on patch. This will better highlight the changes being made to the function. No functional change. Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/8c07662ec17d3d82e5c53841a1d9e766d3bdbab6.1621273353.git.thomas.lendacky@amd.com
2021-05-17Merge tag 'irqchip-fixes-5.13-1' of ↵Thomas Gleixner1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Fix PXA Mainstone CPLD irq allocation in legacy mode - Restrict the Apple AIC controller to the Apple platform - Remove a few supperfluous messages on devm_ioremap_resource() failure Link: https://lore.kernel.org/r/20210516122217.13234-1-maz@kernel.org
2021-05-17quota: Disable quotactl_path syscallJan Kara17-18/+17
In commit fa8b90070a80 ("quota: wire up quotactl_path") we have wired up new quotactl_path syscall. However some people in LWN discussion have objected that the path based syscall is missing dirfd and flags argument which is mostly standard for contemporary path based syscalls. Indeed they have a point and after a discussion with Christian Brauner and Sascha Hauer I've decided to disable the syscall for now and update its API. Since there is no userspace currently using that syscall and it hasn't been released in any major release, we should be fine. CC: Christian Brauner <christian.brauner@ubuntu.com> CC: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-16Merge tag 'timers-urgent-2021-05-16' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes for timers: - Use the ALARM feature check in the alarmtimer core code insted of the old method of checking for the set_alarm() callback. Drivers can have that callback set but the feature bit cleared. If such a RTC device is selected then alarms wont work. - Use a proper define to let the preprocessor check whether Hyper-V VDSO clocksource should be active. The code used a constant in an enum with #ifdef, which evaluates to always false and disabled the clocksource for VDSO" * tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86 alarmtimer: Check RTC features instead of ops
2021-05-16Merge tag 'for-linus-5.13b-rc2-tag' of ↵Linus Torvalds2-14/+9
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - two patches for error path fixes - a small series for fixing a regression with swiotlb with Xen on Arm * tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/swiotlb: check if the swiotlb has already been initialized arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h xen/unpopulated-alloc: fix error return code in fill_list() xen/gntdev: fix gntdev_mmap() error exit path
2021-05-16Merge tag 'x86_urgent_for_v5.13_rc2' of ↵Linus Torvalds25-105/+116
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The three SEV commits are not really urgent material. But we figured since getting them in now will avoid a huge amount of conflicts between future SEV changes touching tip, the kvm and probably other trees, sending them to you now would be best. The idea is that the tip, kvm etc branches for 5.14 will all base ontop of -rc2 and thus everything will be peachy. What is more, those changes are purely mechanical and defines movement so they should be fine to go now (famous last words). Summary: - Enable -Wundef for the compressed kernel build stage - Reorganize SEV code to streamline and simplify future development" * tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/compressed: Enable -Wundef x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG x86/sev: Move GHCB MSR protocol and NAE definitions in a common header x86/sev-es: Rename sev-es.{ch} to sev.{ch}
2021-05-16Merge tag 'powerpc-5.13-3' of ↵Linus Torvalds13-75/+175
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a regression in the conversion of the 64-bit BookE interrupt entry to C. - Fix KVM hosts running with the hash MMU since the recent KVM gfn changes. - Fix a deadlock in our paravirt spinlocks when hcall tracing is enabled. - Several fixes for oopses in our runtime code patching for security mitigations. - A couple of minor fixes for the recent conversion of 32-bit interrupt entry/exit to C. - Fix __get_user() causing spurious crashes in sigreturn due to a bad inline asm constraint, spotted with GCC 11. - A fix for the way we track IRQ masking state vs NMI interrupts when using the new scv system call entry path. - A couple more minor fixes. Thanks to Cédric Le Goater, Christian Zigotzky, Christophe Leroy, Naveen N. Rao, Nicholas Piggin Paul Menzel, and Sean Christopherson. * tag 'powerpc-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64e/interrupt: Fix nvgprs being clobbered powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled powerpc/64s: Fix stf mitigation patching w/strict RWX & hash powerpc/64s: Fix entry flush patching w/strict RWX & hash powerpc/64s: Fix crashes when toggling entry flush barrier powerpc/64s: Fix crashes when toggling stf barrier KVM: PPC: Book3S HV: Fix kvm_unmap_gfn_range_hv() for Hash MMU powerpc/legacy_serial: Fix UBSAN: array-index-out-of-bounds powerpc/signal: Fix possible build failure with unsafe_copy_fpr_{to/from}_user powerpc/uaccess: Fix __get_user() with CONFIG_CC_HAS_ASM_GOTO_OUTPUT powerpc/pseries: warn if recursing into the hcall tracing code powerpc/pseries: use notrace hcall variant for H_CEDE idle powerpc/pseries: Don't trace hcall tracing wrapper powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks powerpc/syscall: Calling kuap_save_and_lock() is wrong powerpc/interrupts: Fix kuep_unlock() call
2021-05-15Merge tag 'sched-urgent-2021-05-15' of ↵Linus Torvalds3-1/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Fix an idle CPU selection bug, and an AMD Ryzen maximum frequency enumeration bug" * tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu()
2021-05-15Merge tag 'irq-urgent-2021-05-15' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix build warning on SH" * tag 'irq-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sh: Remove unused variable
2021-05-15Merge tag 'arc-5.13-rc2' of ↵Linus Torvalds12-25/+41
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - PAE fixes - syscall num check off-by-one bug - misc fixes * tag 'arc-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: mm: Use max_high_pfn as a HIGHMEM zone border ARC: mm: PAE: use 40-bit physical page mask ARC: entry: fix off-by-one error in syscall number validation ARC: kgdb: add 'fallthrough' to prevent a warning arc: Fix typos/spellos
2021-05-15openrisc: Define memory barrier mbPeter Zijlstra1-0/+9
This came up in the discussion of the requirements of qspinlock on an architecture. OpenRISC uses qspinlock, but it was noticed that the memmory barrier was not defined. Peter defined it in the mail thread writing: As near as I can tell this should do. The arch spec only lists this one instruction and the text makes it sound like a completion barrier. This is correct so applying this patch. Signed-off-by: Peter Zijlstra <peterz@infradead.org> [shorne@gmail.com:Turned the mail into a patch] Signed-off-by: Stafford Horne <shorne@gmail.com>
2021-05-14Merge tag 'arm64-fixes' of ↵Linus Torvalds8-75/+147
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "Fixes and cpucaps.h automatic generation: - Generate cpucaps.h at build time rather than carrying lots of #defines. Merged at -rc1 to avoid some conflicts during the merge window. - Initialise RGSR_EL1.SEED in __cpu_setup() as it may be left as 0 out of reset and the IRG instruction would not function as expected if only the architected pseudorandom number generator is implemented. - Fix potential race condition in __sync_icache_dcache() where the PG_dcache_clean page flag is set before the actual cache maintenance. - Fix header include in BTI kselftests" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() arm64: tools: Add __ASM_CPUCAPS_H to the endif in cpucaps.h arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup kselftest/arm64: Add missing stddef.h include to BTI tests arm64: Generate cpucaps.h
2021-05-14arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()Catalin Marinas1-1/+3
To ensure that instructions are observable in a new mapping, the arm64 set_pte_at() implementation cleans the D-cache and invalidates the I-cache to the PoU. As an optimisation, this is only done on executable mappings and the PG_dcache_clean page flag is set to avoid future cache maintenance on the same page. When two different processes map the same page (e.g. private executable file or shared mapping) there's a potential race on checking and setting PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the fault paths the page is locked (PG_locked), mprotect() does not take the page lock. The result is that one process may see the PG_dcache_clean flag set but the I/D cache maintenance not yet performed. Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit() and set_bit(). In the rare event of a race, the cache maintenance is done twice. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-05-14xen/swiotlb: check if the swiotlb has already been initializedStefano Stabellini1-1/+7
xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with -ENOMEM if the swiotlb has already been initialized. Add an explicit check io_tlb_default_mem != NULL at the beginning of xen_swiotlb_init. If the swiotlb is already initialized print a warning and return -EEXIST. On x86, the error propagates. On ARM, we don't actually need a special swiotlb buffer (yet), any buffer would do. So ignore the error and continue. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14arm64: do not set SWIOTLB_NO_FORCE when swiotlb is requiredChristoph Hellwig1-1/+2
Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init, today dma_direct_map_page returns error if SWIOTLB_NO_FORCE. For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it is going to be required later (e.g. Xen requires it). CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: catalin.marinas@arm.com CC: will@kernel.org CC: linux-arm-kernel@lists.infradead.org Fixes: 2726bf3ff252 ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.hStefano Stabellini1-12/+0
Move xen_swiotlb_detect to a static inline function to make it available to !CONFIG_XEN builds. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210512201823.1963-1-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86Vitaly Kuznetsov1-0/+2
Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029) the commit e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") broke vDSO on x86. The problem appears to be that VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and '#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not a define). Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead. Fixes: e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") Reported-by: Mohammed Gamal <mgamal@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
2021-05-14powerpc/64e/interrupt: Fix nvgprs being clobberedNicholas Piggin1-14/+24
Some interrupt handlers have an "extra" that saves 1 or 2 registers (r14, r15) in the paca save area and makes them available to use by the handler. The change to always save nvgprs in exception handlers lead to some interrupt handlers saving those scratch r14 / r15 registers into the interrupt frame's GPR saves, which get restored on interrupt exit. Fix this by always reloading those scratch registers from paca before the EXCEPTION_COMMON that saves nvgprs. Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
2021-05-14powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabledNicholas Piggin1-0/+7
scv support introduced the notion of code that implicitly soft-masks irqs due to the instruction addresses. This is required because scv enters the kernel with MSR[EE]=1. If a NMI (including soft-NMI) interrupt hits when we are implicitly soft-masked then its regs->softe does not reflect this because it is derived from the explicit soft mask state (paca->irq_soft_mask). This makes arch_irq_disabled_regs(regs) return false. This can trigger a warning in the soft-NMI watchdog code (shown below). Fix it by having NMI interrupts set regs->softe to disabled in case of interrupting an implicit soft-masked region. ------------[ cut here ]------------ WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0 CPU: 41 PID: 1103 Comm: (spawn) Not tainted NIP: c000000000039534 LR: c000000000039234 CTR: c000000000009a00 REGS: c000007fffbcf940 TRAP: 0700 Not tainted MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 22042482 XER: 200400ad CFAR: c000000000039260 IRQMASK: 3 GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003 GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020 GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265 GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000 GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040 GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029 GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60 NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0 LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0 Call Trace: [c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable) [c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4 --- interrupt: 900 at end_real_trampolines+0x0/0x1000 NIP: c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033 REGS: c000007fffbcfd60 TRAP: 0900 MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44042482 XER: 200400ad CFAR: 00007ca426946020 IRQMASK: 0 GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035 GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020 GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110 GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000 GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040 GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68 NIP [c000000000003000] end_real_trampolines+0x0/0x1000 LR [00007ca426adb03c] 0x7ca426adb03c --- interrupt: 900 Instruction dump: 60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008 7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8 ---[ end trace dc75f67d819779da ]--- Fixes: 118178e62e2e ("powerpc: move NMI entry/exit code into wrapper") Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
2021-05-14powerpc/64s: Fix stf mitigation patching w/strict RWX & hashMichael Ellerman1-10/+10
The stf entry barrier fallback is unsafe to execute in a semi-patched state, which can happen when enabling/disabling the mitigation with strict kernel RWX enabled and using the hash MMU. See the previous commit for more details. Fix it by changing the order in which we patch the instructions. Note the stf barrier fallback is only used on Power6 or earlier. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
2021-05-14powerpc/64s: Fix entry flush patching w/strict RWX & hashMichael Ellerman1-16/+43
The entry flush mitigation can be enabled/disabled at runtime. When this happens it results in the kernel patching its own instructions to enable/disable the mitigation sequence. With strict kernel RWX enabled instruction patching happens via a secondary mapping of the kernel text, so that we don't have to make the primary mapping writable. With the hash MMU this leads to a hash fault, which causes us to execute the exception entry which contains the entry flush mitigation. This means we end up executing the entry flush in a semi-patched state, ie. after we have patched the first instruction but before we patch the second or third instruction of the sequence. On machines with updated firmware the entry flush is a series of special nops, and it's safe to to execute in a semi-patched state. However when using the fallback flush the sequence is mflr/branch/mtlr, and so it's not safe to execute if we have patched out the mflr but not the other two instructions. Doing so leads to us corrputing LR, leading to an oops, for example: # echo 0 > /sys/kernel/debug/powerpc/entry_flush kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc000000002971000 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1 NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40 REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce) MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000 ... NIP 0xc000000002971000 LR 0xc000000002971000 Call Trace: do_patch_instruction+0xc4/0x340 (unreliable) do_entry_flush_fixups+0x100/0x3b0 entry_flush_set+0x50/0xe0 simple_attr_write+0x160/0x1a0 full_proxy_write+0x8c/0x110 vfs_write+0xf0/0x340 ksys_write+0x84/0x140 system_call_exception+0x164/0x2d0 system_call_common+0xec/0x278 The simplest fix is to change the order in which we patch the instructions, so that the sequence is always safe to execute. For the non-fallback flushes it doesn't matter what order we patch in. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
2021-05-14powerpc/64s: Fix crashes when toggling entry flush barrierMichael Ellerman1-1/+15
The entry flush mitigation can be enabled/disabled at runtime via a debugfs file (entry_flush), which causes the kernel to patch itself to enable/disable the relevant mitigations. However depending on which mitigation we're using, it may not be safe to do that patching while other CPUs are active. For example the following crash: sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20 Shows that we returned to userspace with a corrupted LR that points into the kernel, due to executing the partially patched call to the fallback entry flush (ie. we missed the LR restore). Fix it by doing the patching under stop machine. The CPUs that aren't doing the patching will be spinning in the core of the stop machine logic. That is currently sufficient for our purposes, because none of the patching we do is to that code or anywhere in the vicinity. Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
2021-05-14powerpc/64s: Fix crashes when toggling stf barrierMichael Ellerman1-2/+17
The STF (store-to-load forwarding) barrier mitigation can be enabled/disabled at runtime via a debugfs file (stf_barrier), which causes the kernel to patch itself to enable/disable the relevant mitigations. However depending on which mitigation we're using, it may not be safe to do that patching while other CPUs are active. For example the following crash: User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0) segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1 code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018 code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c Shows that we returned to userspace without restoring the user r13 value, due to executing the partially patched STF exit code. Fix it by doing the patching under stop machine. The CPUs that aren't doing the patching will be spinning in the core of the stop machine logic. That is currently sufficient for our purposes, because none of the patching we do is to that code or anywhere in the vicinity. Fixes: a048a07d7f45 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au
2021-05-13arm64: tools: Add __ASM_CPUCAPS_H to the endif in cpucaps.hMark Brown1-1/+1
Anshuman suggested this. Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210513151819.12526-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-05-13x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen ↵Huang Rui3-1/+19
generations Some AMD Ryzen generations has different calculation method on maximum performance. 255 is not for all ASICs, some specific generations should use 166 as the maximum performance. Otherwise, it will report incorrect frequency value like below: ~ → lscpu | grep MHz CPU MHz: 3400.000 CPU max MHz: 7228.3198 CPU min MHz: 2200.0000 [ mingo: Tidied up whitespace use. ] [ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ] Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Reported-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com> Fixed-by: Alexander Monakov <amonakov@ispras.ru> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-05-12x86/boot/compressed: Enable -WundefNick Desaulniers3-2/+3
A discussion around -Wundef showed that there were still a few boolean Kconfigs where #if was used rather than #ifdef to guard different code. Kconfig doesn't define boolean configs, which can result in -Wundef warnings. arch/x86/boot/compressed/Makefile resets the CFLAGS used for this directory, and doesn't re-enable -Wundef as the top level Makefile does. If re-added, with RANDOMIZE_BASE and X86_NEED_RELOCS disabled, the following warnings are visible. arch/x86/boot/compressed/misc.h:82:5: warning: 'CONFIG_RANDOMIZE_BASE' is not defined, evaluates to 0 [-Wundef] ^ arch/x86/boot/compressed/misc.c:175:5: warning: 'CONFIG_X86_NEED_RELOCS' is not defined, evaluates to 0 [-Wundef] ^ Simply fix these and re-enable this warning for this directory. Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20210422190450.3903999-1-ndesaulniers@google.com