summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2022-05-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-69/+190
Pull kvm fixes from Paolo Bonzini: "x86: - Account for family 17h event renumberings in AMD PMU emulation - Remove CPUID leaf 0xA on AMD processors - Fix lockdep issue with locking all vCPUs - Fix loss of A/D bits in SPTEs - Fix syzkaller issue with invalid guest state" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Exit to userspace if vCPU has injected exception and invalid state KVM: SEV: Mark nested locking of vcpu->lock kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bits KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits() KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D)
2022-05-06Merge tag 'riscv-for-linus-5.18-rc6' of ↵Linus Torvalds1-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Palmer Dabbelt: - A fix to relocate the DTB early in boot, in cases where the bootloader doesn't put the DTB in a region that will end up mapped by the kernel. This manifests as a crash early in boot on a handful of configurations. * tag 'riscv-for-linus-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: relocate DTB if it's outside memory region
2022-05-06KVM: VMX: Exit to userspace if vCPU has injected exception and invalid stateSean Christopherson1-1/+1
Exit to userspace with an emulation error if KVM encounters an injected exception with invalid guest state, in addition to the existing check of bailing if there's a pending exception (KVM doesn't support emulating exceptions except when emulating real mode via vm86). In theory, KVM should never get to such a situation as KVM is supposed to exit to userspace before injecting an exception with invalid guest state. But in practice, userspace can intervene and manually inject an exception and/or stuff registers to force invalid guest state while a previously injected exception is awaiting reinjection. Fixes: fc4fad79fc3d ("KVM: VMX: Reject KVM_RUN if emulation is required with pending exception") Reported-by: syzbot+cfafed3bb76d3e37581b@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220502221850.131873-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-06KVM: SEV: Mark nested locking of vcpu->lockPeter Gonda1-4/+38
svm_vm_migrate_from() uses sev_lock_vcpus_for_migration() to lock all source and target vcpu->locks. Unfortunately there is an 8 subclass limit, so a new subclass cannot be used for each vCPU. Instead maintain ownership of the first vcpu's mutex.dep_map using a role specific subclass: source vs target. Release the other vcpu's mutex.dep_maps. Fixes: b56639318bb2b ("KVM: SEV: Add support for SEV intra host migration") Reported-by: John Sperbeck<jsperbeck@google.com> Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20220502165807.529624-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-05Merge tag 's390-5.18-4' of ↵Linus Torvalds3-1/+27
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Disable -Warray-bounds warning for gcc12, since the only known way to workaround false positive warnings on lowcore accesses would result in worse code on fast paths. - Avoid lockdep_assert_held() warning in kvm vm memop code. - Reduce overhead within gmap_rmap code to get rid of long latencies when e.g. shutting down 2nd level guests. * tag 's390-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: KVM: s390: vsie/gmap: reduce gmap_rmap overhead KVM: s390: Fix lockdep issue in vm memop s390: disable -Warray-bounds
2022-05-05Merge tag 'mips-fixes_5.18_1' of ↵Linus Torvalds2-12/+7
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Extend R4000/R4400 CPU erratum workaround to all revisions" * tag 'mips-fixes_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Fix CP0 counter erratum detection for R4k CPUs
2022-05-03KVM: s390: vsie/gmap: reduce gmap_rmap overheadChristian Borntraeger1-0/+7
there are cases that trigger a 2nd shadow event for the same vmaddr/raddr combination. (prefix changes, reboots, some known races) This will increase memory usages and it will result in long latencies when cleaning up, e.g. on shutdown. To avoid cases with a list that has hundreds of identical raddrs we check existing entries at insert time. As this measurably reduces the list length this will be faster than traversing the list at shutdown time. In the long run several places will be optimized to create less entries and a shrinker might be necessary. Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20220429151526.1560-1-borntraeger@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-03Merge branch 'kvm-amd-pmu-fixes' into HEADPaolo Bonzini2-3/+30
2022-05-03kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMUSandipan Das1-0/+5
On some x86 processors, CPUID leaf 0xA provides information on Architectural Performance Monitoring features. It advertises a PMU version which Qemu uses to determine the availability of additional MSRs to manage the PMCs. Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for the same, the kernel constructs return values based on the x86_pmu_capability irrespective of the vendor. This leaf and the additional MSRs are not supported on AMD and Hygon processors. If AMD PerfMonV2 is detected, the PMU version is set to 2 and guest startup breaks because of an attempt to access a non-existent MSR. Return zeros to avoid this. Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf") Reported-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_idKyle Huey1-3/+25
Zen renumbered some of the performance counters that correspond to the well known events in perf_hw_id. This code in KVM was never updated for that, so guest that attempt to use counters on Zen that correspond to the pre-Zen perf_hw_id values will silently receive the wrong values. This has been observed in the wild with rr[0] when running in Zen 3 guests. rr uses the retired conditional branch counter 00d1 which is incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND. [0] https://rr-project.org/ Signed-off-by: Kyle Huey <me@kylehuey.com> Message-Id: <20220503050136.86298-1-khuey@kylehuey.com> Cc: stable@vger.kernel.org [Check guest family, not host. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03Merge branch 'kvm-tdp-mmu-atomicity-fix' into HEADPaolo Bonzini5-61/+121
We are dropping A/D bits (and W bits) in the TDP MMU. Even if mmu_lock is held for write, as volatile SPTEs can be written by other tasks/vCPUs outside of mmu_lock. Attempting to prove that bug exposed another notable goof, which has been lurking for a decade, give or take: KVM treats _all_ MMU-writable SPTEs as volatile, even though KVM never clears WRITABLE outside of MMU lock. As a result, the legacy MMU (and the TDP MMU if not fixed) uses XCHG to update writable SPTEs. The fix does not seem to have an easily-measurable affect on performance; page faults are so slow that wasting even a few hundred cycles is dwarfed by the base cost.
2022-05-03KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bitsSean Christopherson2-31/+85
Use an atomic XCHG to write TDP MMU SPTEs that have volatile bits, even if mmu_lock is held for write, as volatile SPTEs can be written by other tasks/vCPUs outside of mmu_lock. If a vCPU uses the to-be-modified SPTE to write a page, the CPU can cache the translation as WRITABLE in the TLB despite it being seen by KVM as !WRITABLE, and/or KVM can clobber the Accessed/Dirty bits and not properly tag the backing page. Exempt non-leaf SPTEs from atomic updates as KVM itself doesn't modify non-leaf SPTEs without holding mmu_lock, they do not have Dirty bits, and KVM doesn't consume the Accessed bit of non-leaf SPTEs. Dropping the Dirty and/or Writable bits is most problematic for dirty logging, as doing so can result in a missed TLB flush and eventually a missed dirty page. In the unlikely event that the only dirty page(s) is a clobbered SPTE, clear_dirty_gfn_range() will see the SPTE as not dirty (based on the Dirty or Writable bit depending on the method) and so not update the SPTE and ultimately not flush. If the SPTE is cached in the TLB as writable before it is clobbered, the guest can continue writing the associated page without ever taking a write-protect fault. For most (all?) file back memory, dropping the Dirty bit is a non-issue. The primary MMU write-protects its PTEs on writeback, i.e. KVM's dirty bit is effectively ignored because the primary MMU will mark that page dirty when the write-protection is lifted, e.g. when KVM faults the page back in for write. The Accessed bit is a complete non-issue. Aside from being unused for non-leaf SPTEs, KVM doesn't do a TLB flush when aging SPTEs, i.e. the Accessed bit may be dropped anyways. Lastly, the Writable bit is also problematic as an extension of the Dirty bit, as KVM (correctly) treats the Dirty bit as volatile iff the SPTE is !DIRTY && WRITABLE. If KVM fixes an MMU-writable, but !WRITABLE, SPTE out of mmu_lock, then it can allow the CPU to set the Dirty bit despite the SPTE being !WRITABLE when it is checked by KVM. But that all depends on the Dirty bit being problematic in the first place. Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits()Sean Christopherson3-27/+32
Move the is_shadow_present_pte() check out of spte_has_volatile_bits() and into its callers. Well, caller, since only one of its two callers doesn't already do the shadow-present check. Opportunistically move the helper to spte.c/h so that it can be used by the TDP MMU, which is also the primary motivation for the shadow-present change. Unlike the legacy MMU, the TDP MMU uses a single path for clear leaf and non-leaf SPTEs, and to avoid unnecessary atomic updates, the TDP MMU will need to check is_last_spte() prior to calling spte_has_volatile_bits(), and calling is_last_spte() without first calling is_shadow_present_spte() is at best odd, and at worst a violation of KVM's loosely defines SPTE rules. Note, mmu_spte_clear_track_bits() could likely skip the write entirely for SPTEs that are not shadow-present. Leave that cleanup for a future patch to avoid introducing a functional change, and because the shadow-present check can likely be moved further up the stack, e.g. drop_large_spte() appears to be the only path that doesn't already explicitly check for a shadow-present SPTE. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-03KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D)Sean Christopherson2-9/+10
Don't treat SPTEs that are truly writable, i.e. writable in hardware, as being volatile (unless they're volatile for other reasons, e.g. A/D bits). KVM _sets_ the WRITABLE bit out of mmu_lock, but never _clears_ the bit out of mmu_lock, so if the WRITABLE bit is set, it cannot magically get cleared just because the SPTE is MMU-writable. Rename the wrapper of MMU-writable to be more literal, the previous name of spte_can_locklessly_be_made_writable() is wrong and misleading. Fixes: c7ba5b48cc8d ("KVM: MMU: fast path of handling guest page fault") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220423034752.1161007-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-02KVM: s390: Fix lockdep issue in vm memopJanis Schoetterl-Glausch1-1/+10
Issuing a memop on a protected vm does not make sense, neither is the memory readable/writable, nor does it make sense to check storage keys. This is why the ioctl will return -EINVAL when it detects the vm to be protected. However, in order to ensure that the vm cannot become protected during the memop, the kvm->lock would need to be taken for the duration of the ioctl. This is also required because kvm_s390_pv_is_protected asserts that the lock must be held. Instead, don't try to prevent this. If user space enables secure execution concurrently with a memop it must accecpt the possibility of the memop failing. Still check if the vm is currently protected, but without locking and consider it a heuristic. Fixes: ef11c9463ae0 ("KVM: s390: Add vm IOCTL for key checked guest absolute memory access") Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220322153204.2637400-1-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds15-54/+187
Pull kvm fixes from Paolo Bonzini: "ARM: - Take care of faults occuring between the PARange and IPA range by injecting an exception - Fix S2 faults taken from a host EL0 in protected mode - Work around Oops caused by a PMU access from a 32bit guest when PMU has been created. This is a temporary bodge until we fix it for good. x86: - Fix potential races when walking host page table - Fix shadow page table leak when KVM runs nested - Work around bug in userspace when KVM synthesizes leaf 0x80000021 on older (pre-EPYC) or Intel processors Generic (but affects only RISC-V): - Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: work around QEMU issue with synthetic CPUID leaves Revert "x86/mm: Introduce lookup_address_in_mm()" KVM: x86/mmu: fix potential races when walking host page table KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR KVM: arm64: Inject exception on out-of-IPA-range translation fault KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set KVM: arm64: Handle host stage-2 faults from 32-bit EL0
2022-05-01Merge tag 'x86_urgent_for_v5.18_rc5' of ↵Linus Torvalds13-11/+38
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is solely controlled by the hypervisor - A build fix to make the function prototype (__warn()) as visible as the definition itself - A bunch of objtool annotation fixes which have accumulated over time - An ORC unwinder fix to handle bad input gracefully - Well, we thought the microcode gets loaded in time in order to restore the microcode-emulated MSRs but we thought wrong. So there's a fix for that to have the ordering done properly - Add new Intel model numbers - A spelling fix * tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests bug: Have __warn() prototype defined unconditionally x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config objtool: Use offstr() to print address of missing ENDBR objtool: Print data address for "!ENDBR" data warnings x86/xen: Add ANNOTATE_NOENDBR to startup_xen() x86/uaccess: Add ENDBR to __put_user_nocheck*() x86/retpoline: Add ANNOTATE_NOENDBR for retpolines x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline objtool: Enable unreachable warnings for CLANG LTO x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE x86,objtool: Mark cpu_startup_entry() __noreturn x86,xen,objtool: Add UNWIND hint lib/strn*,objtool: Enforce user_access_begin() rules MAINTAINERS: Add x86 unwinding entry x86/unwind/orc: Recheck address range after stack info was updated x86/cpu: Load microcode during restore_processor_state() x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
2022-05-01Merge tag 'objtool_urgent_for_v5.18_rc5' of ↵Linus Torvalds1-35/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Borislav Petkov: "A bunch of objtool fixes to improve unwinding, sibling call detection, fallthrough detection and relocation handling of weak symbols when the toolchain strips section symbols" * tag 'objtool_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix code relocs vs weak symbols objtool: Fix type of reloc::addend objtool: Fix function fallthrough detection for vmlinux objtool: Fix sibling call detection in alternatives objtool: Don't set 'jump_dest' for sibling calls x86/uaccess: Don't jump between functions
2022-04-30Merge tag 'soc-fixes-5.18-3' of ↵Linus Torvalds59-208/+235
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: - A fix for a regression caused by the previous set of bugfixes changing tegra and at91 pinctrl properties. More work is needed to figure out what this should actually be, but a revert makes it work for the moment. - Defconfig regression fixes for tegra after renamed symbols - Build-time warning and static checker fixes for imx, op-tee, sunxi, meson, at91, and omap - More at91 DT fixes for audio, regulator and spi nodes - A regression fix for Renesas Hyperflash memory probe - A stability fix for amlogic boards, modifying the allowed cpufreq states - Multiple fixes for system suspend on omap2+ - DT fixes for various i.MX bugs - A probe error fix for imx6ull-colibri MMC - A MAINTAINERS file entry for samsung bug reports * tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits) Revert "arm: dts: at91: Fix boolean properties with values" bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() Revert "arm64: dts: tegra: Fix boolean properties with values" arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock ARM: dts: imx6ull-colibri: fix vqmmc regulator MAINTAINERS: add Bug entry for Samsung and memory controller drivers memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35 ARM: dts: am3517-evm: Fix misc pinmuxing ARM: dts: am33xx-l4: Add missing touchscreen clock properties ARM: dts: Fix mmc order for omap3-gta04 ARM: dts: at91: fix pinctrl phandles ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name ARM: dts: at91: Describe regulators on at91sam9g20ek ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek ARM: dts: at91: Fix boolean properties with values ARM: dts: at91: use generic node name for dataflash ARM: dts: at91: align SPI NOR node name with dtschema ARM: dts: at91: sama7g5ek: Align the impedance of the QSPI0's HSIO and PCB lines ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines ...
2022-04-30Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds3-8/+20
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A semi-large pile of clk driver fixes this time around. Nothing is touching the core so these fixes are fairly well contained to specific devices that use these clk drivers. - Some Allwinner SoC fixes to gracefully handle errors and mark an RTC clk as critical so that the RTC keeps ticking. - Fix AXI bus clks and RTC clk design for Microchip PolarFire SoC driver introduced this cycle. This has some devicetree bits acked by riscv maintainers. We're fixing it now so that the prior bindings aren't released in a major kernel version. - Remove a reset on Microchip PolarFire SoCs that broke when enabling CONFIG_PM. - Set a min/max for the Qualcomm graphics clk. This got broken by the clk rate range patches introduced this cycle" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical riscv: dts: microchip: reparent mpfs clocks clk: microchip: mpfs: add RTCREF clock control clk: microchip: mpfs: re-parent the configurable clocks dt-bindings: rtc: add refclk to mpfs-rtc dt-bindings: clk: mpfs: add defines for two new clocks dt-bindings: clk: mpfs document msspll dri registers riscv: dts: microchip: fix usage of fic clocks on mpfs clk: microchip: mpfs: mark CLK_ATHENA as critical clk: microchip: mpfs: fix parents for FIC clocks clk: qcom: clk-rcg2: fix gfx3d frequency calculation clk: microchip: mpfs: don't reset disabled peripherals clk: sunxi-ng: fix not NULL terminated coccicheck error
2022-04-30Revert "arm: dts: at91: Fix boolean properties with values"Arnd Bergmann2-2/+2
This reverts commit 0dc23d1a8e17, which caused another regression as the pinctrl code actually expects an integer value of 0 or 1 rather than a simple boolean property. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29KVM: x86: work around QEMU issue with synthetic CPUID leavesPaolo Bonzini1-5/+14
Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU, which assumes the *host* CPUID[0x80000000].EAX is higher or equal to what KVM_GET_SUPPORTED_CPUID reports. This causes QEMU to issue bogus host CPUIDs when preparing the input to KVM_SET_CPUID2. It can even get into an infinite loop, which is only terminated by an abort(): cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e) To work around this, only synthesize those leaves if 0x8000001d exists on the host. The synthetic 0x80000021 leaf is mostly useful on Zen2, which satisfies the condition. Fixes: f144c49e8c39 ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29Merge tag 'riscv-for-linus-5.18-rc5' of ↵Linus Torvalds3-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix to properly ensure a single CPU is running during patch_text(). - A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set, necessary after a recent refactoring. * tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL riscv: patch_text: Fixup last cpu should be master
2022-04-29Merge tag 'arm64-fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type. This is a fix to the MTE ELF ABI for a bug that was added during the most recent merge window as part of the coredump support. The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE segment type has already been allocated to PT_AARCH64_UNWIND by the ELF ABI, so we've bumped the value and changed the name of the identifier to be better aligned with the existing one" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: elf: Fix the arm64 MTE ELF segment name and value
2022-04-29Revert "x86/mm: Introduce lookup_address_in_mm()"Sean Christopherson2-15/+0
Drop lookup_address_in_mm() now that KVM is providing it's own variant of lookup_address_in_pgd() that is safe for use with user addresses, e.g. guards against page tables being torn down. A variant that provides a non-init mm is inherently dangerous and flawed, as the only reason to use an mm other than init_mm is to walk a userspace mapping, and lookup_address_in_pgd() does not play nice with userspace mappings, e.g. doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE() to ensure an upper level entry isn't converted to a huge page between checking the PAGE_SIZE bit and grabbing the address of the next level down. This reverts commit 13c72c060f1ba6f4eddd7b1c4f52a8aded43d6d9. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <YmwIi3bXr/1yhYV/@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29Merge branch 'kvm-fixes-for-5.18-rc5' into HEADPaolo Bonzini7-24/+94
Fixes for (relatively) old bugs, to be merged in both the -rc and next development trees: * Fix potential races when walking host page table * Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT * Fix shadow page table leak when KVM runs nested
2022-04-29KVM: x86/mmu: fix potential races when walking host page tableMingwei Zhang1-5/+42
KVM uses lookup_address_in_mm() to detect the hugepage size that the host uses to map a pfn. The function suffers from several issues: - no usage of READ_ONCE(*). This allows multiple dereference of the same page table entry. The TOCTOU problem because of that may cause KVM to incorrectly treat a newly generated leaf entry as a nonleaf one, and dereference the content by using its pfn value. - the information returned does not match what KVM needs; for non-present entries it returns the level at which the walk was terminated, as long as the entry is not 'none'. KVM needs level information of only 'present' entries, otherwise it may regard a non-present PXE entry as a present large page mapping. - the function is not safe for mappings that can be torn down, because it does not disable IRQs and because it returns a PTE pointer which is never safe to dereference after the function returns. So implement the logic for walking host page tables directly in KVM, and stop using lookup_address_in_mm(). Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220429031757.2042406-1-mizhang@google.com> [Inline in host_pfn_mapping_level, ensure no semantic change for its callers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENTPaolo Bonzini3-3/+7
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags member that at the time was unused. Unfortunately this extensibility mechanism has several issues: - x86 is not writing the member, so it would not be possible to use it on x86 except for new events - the member is not aligned to 64 bits, so the definition of the uAPI struct is incorrect for 32- on 64-bit userspace. This is a problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately usage of flags was only introduced in 5.18. Since padding has to be introduced, place a new field in there that tells if the flags field is valid. To allow further extensibility, in fact, change flags to an array of 16 values, and store how many of the values are valid. The availability of the new ndata field is tied to a system capability; all architectures are changed to fill in the field. To avoid breaking compilation of userspace that was using the flags field, provide a userspace-only union to overlap flags with data[0]. The new field is placed at the same offset for both 32- and 64-bit userspace. Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220422103013.34832-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDRSean Christopherson5-16/+45
Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR. The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the guest, possibly with help from userspace, manages to coerce KVM into creating a SPTE for an "impossible" gfn, KVM will leak the associated shadow pages (page tables): WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57 kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm] Modules linked in: kvm_intel kvm irqbypass CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G W 5.18.0-rc1+ #293 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2d0 [kvm] kvm_vm_release+0x1d/0x30 [kvm] __fput+0x82/0x240 task_work_run+0x5b/0x90 exit_to_user_mode_prepare+0xd2/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> On bare metal, encountering an impossible gpa in the page fault path is well and truly impossible, barring CPU bugs, as the CPU will signal #PF during the gva=>gpa translation (or a similar failure when stuffing a physical address into e.g. the VMCS/VMCB). But if KVM is running as a VM itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR of the underlying hardware, in which case the hardware will not fault on the illegal-from-KVM's-perspective gpa. Alternatively, KVM could continue allowing the dodgy behavior and simply zap the max possible range. But, for hosts with MAXPHYADDR < 52, that's a (minor) waste of cycles, and more importantly, KVM can't reasonably support impossible memslots when running on bare metal (or with an accurate MAXPHYADDR as a VM). Note, limiting the overhead by checking if KVM is running as a guest is not a safe option as the host isn't required to announce itself to the guest in any way, e.g. doesn't need to set the HYPERVISOR CPUID bit. A second alternative to disallowing the memslot behavior would be to disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR. That restriction is undesirable as there are legitimate use cases for doing so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous systems so that VMs can be migrated between hosts with different MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess. Note that any guest.MAXPHYADDR is valid with shadow paging, and it is even useful in order to test KVM with MAXPHYADDR=52 (i.e. without any reserved physical address bits). The now common kvm_mmu_max_gfn() is inclusive instead of exclusive. The memslot and TDP MMU code want an exclusive value, but the name implies the returned value is inclusive, and the MMIO path needs an inclusive check. Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Ben Gardon <bgardon@google.com> Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220428233416.2446833-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29Merge tag 'kvmarm-fixes-5.18-2' of ↵Paolo Bonzini5-10/+79
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.18, take #2 - Take care of faults occuring between the PARange and IPA range by injecting an exception - Fix S2 faults taken from a host EL0 in protected mode - Work around Oops caused by a PMU access from a 32bit guest when PMU has been created. This is a temporary bodge until we fix it for good.
2022-04-29RISC-V: relocate DTB if it's outside memory regionNick Kossifidis1-2/+19
In case the DTB provided by the bootloader/BootROM is before the kernel image or outside /memory, we won't be able to access it through the linear mapping, and get a segfault on setup_arch(). Currently OpenSBI relocates DTB but that's not always the case (e.g. if FW_JUMP_FDT_ADDR is not specified), and it's also not the most portable approach since the default FW_JUMP_FDT_ADDR of the generic platform relocates the DTB at a specific offset that may not be available. To avoid this situation copy DTB so that it's visible through the linear mapping. Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Link: https://lore.kernel.org/r/20220322132839.3653682-1-mick@ics.forth.gr Tested-by: Conor Dooley <conor.dooley@microchip.com> Fixes: f105aa940e78 ("riscv: add BUILTIN_DTB support for MMU-enabled targets") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-29Merge tag 'tegra-for-5.18-arm-defconfig-fixes' of ↵Arnd Bergmann2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes ARM: tegra: Default configuration fixes for v5.18 This contains two updates to the default configuration needed because of a Kconfig symbol name change. This fixes a failure that was detected in the NVIDIA automated test farm. * tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: ARM: config: multi v7: Enable NVIDIA Tegra video decoder driver ARM: tegra_defconfig: Update CONFIG_TEGRA_VDE option Link: https://lore.kernel.org/r/20220429080626.494150-1-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29Merge tag 'imx-fixes-5.18-2' of ↵Arnd Bergmann2-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.18, 2nd round: - Fix one sparse warning on imx-weim driver. - Fix vqmmc regulator to get UHS-I mode work on imx6ull-colibri board. - Add missing 32.768 kHz PMIC clock for imx8mn-ddr4-evk board to fix bd718xx-clk probe error. * tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock ARM: dts: imx6ull-colibri: fix vqmmc regulator bus: imx-weim: make symbol 'weim_of_notifier' static Link: https://lore.kernel.org/r/20220426013427.GB14615@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29MIPS: Fix CP0 counter erratum detection for R4k CPUsMaciej W. Rozycki2-12/+7
Fix the discrepancy between the two places we check for the CP0 counter erratum in along with the incorrect comparison of the R4400 revision number against 0x30 which matches none and consistently consider all R4000 and R4400 processors affected, as documented in processor errata publications[1][2][3], following the mapping between CP0 PRId register values and processor models: PRId | Processor Model ---------+-------------------- 00000422 | R4000 Revision 2.2 00000430 | R4000 Revision 3.0 00000440 | R4400 Revision 1.0 00000450 | R4400 Revision 2.0 00000460 | R4400 Revision 3.0 No other revision of either processor has ever been spotted. Contrary to what has been stated in commit ce202cbb9e0b ("[MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug") marking the CP0 counter as buggy does not preclude it from being used as either a clock event or a clock source device. It just cannot be used as both at a time, because in that case clock event interrupts will be occasionally lost, and the use as a clock event device takes precedence. Compare against 0x4ff in `can_use_mips_counter' so that a single machine instruction is produced. References: [1] "MIPS R4000PC/SC Errata, Processor Revision 2.2 and 3.0", MIPS Technologies Inc., May 10, 1994, Erratum 53, p.13 [2] "MIPS R4400PC/SC Errata, Processor Revision 1.0", MIPS Technologies Inc., February 9, 1994, Erratum 21, p.4 [3] "MIPS R4400PC/SC Errata, Processor Revision 2.0 & 3.0", MIPS Technologies Inc., January 24, 1995, Erratum 14, p.3 Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: ce202cbb9e0b ("[MIPS] Assume R4000/R4400 newer than 3.0 don't have the mfc0 count bug") Cc: stable@vger.kernel.org # v2.6.24+ Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2022-04-29x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guestsThomas Gleixner1-1/+5
When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI layer. This can lead to a situation where the PCI/MSI layer masks an MSI[-X] interrupt and the hypervisor grants the write despite the fact that it already requested the interrupt. As a consequence interrupt delivery on the affected device is not happening ever. Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests already. Fixes: 809f9267bbab ("xen: map MSIs into pirqs") Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: Dusty Mabe <dustymabe@redhat.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Noah Meyerhans <noahm@debian.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
2022-04-28elf: Fix the arm64 MTE ELF segment name and valueCatalin Marinas1-1/+1
Unfortunately, the name/value choice for the MTE ELF segment type (PT_ARM_MEMTAG_MTE) was pretty poor: LOPROC+1 is already in use by PT_AARCH64_UNWIND, as defined in the AArch64 ELF ABI (https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst). Update the ELF segment type value to LOPROC+2 and also change the define to PT_AARCH64_MEMTAG_MTE to match the AArch64 ELF ABI namespace. The AArch64 ELF ABI document is updating accordingly (segment type not previously mentioned in the document). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 761b9b366cec ("elf: Introduce the ARM MTE ELF segment type") Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Machado <luis.machado@arm.com> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com> Link: https://lore.kernel.org/r/20220425151833.2603830-1-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-04-28KVM: arm64: Inject exception on out-of-IPA-range translation faultMarc Zyngier3-0/+48
When taking a translation fault for an IPA that is outside of the range defined by the hypervisor (between the HW PARange and the IPA range), we stupidly treat it as an IO and forward the access to userspace. Of course, userspace can't do much with it, and things end badly. Arguably, the guest is braindead, but we should at least catch the case and inject an exception. Check the faulting IPA against: - the sanitised PARange: inject an address size fault - the IPA size: inject an abort Reported-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-04-28KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not setAlexandru Elisei1-1/+22
kvm->arch.arm_pmu is set when userspace attempts to set the first PMU attribute. As certain attributes are mandatory, arm_pmu ends up always being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU. However, this only happens if the VCPU has the PMU feature. If the VCPU doesn't have the feature bit set, kvm->arch.arm_pmu will be left uninitialized and equal to NULL. KVM doesn't do ID register emulation for 32-bit guests and accesses to the PMU registers aren't gated by the pmu_visibility() function. This is done to prevent injecting unexpected undefined exceptions in guests which have detected the presence of a hardware PMU. But even though the VCPU feature is missing, KVM still attempts to emulate certain aspects of the PMU when PMU registers are accessed. This leads to a NULL pointer dereference like this one, which happens on an odroid-c4 board when running the kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU feature being set: [ 454.402699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150 [ 454.405865] Mem abort info: [ 454.408596] ESR = 0x96000004 [ 454.411638] EC = 0x25: DABT (current EL), IL = 32 bits [ 454.416901] SET = 0, FnV = 0 [ 454.419909] EA = 0, S1PTW = 0 [ 454.423010] FSC = 0x04: level 0 translation fault [ 454.427841] Data abort info: [ 454.430687] ISV = 0, ISS = 0x00000004 [ 454.434484] CM = 0, WnR = 0 [ 454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000c924000 [ 454.443800] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000 [ 454.450528] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 454.456036] Modules linked in: [ 454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 #113 [ 454.465697] Hardware name: Hardkernel ODROID-C4 (DT) [ 454.470612] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74 [ 454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80 [ 454.487775] sp : ffff80000a9839c0 [ 454.491050] x29: ffff80000a9839c0 x28: ffff000000a83a00 x27: 0000000000000000 [ 454.498127] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000a510000 [ 454.505198] x23: ffff000000a83a00 x22: ffff000003b01000 x21: 0000000000000000 [ 454.512271] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000 [ 454.519343] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000 [ 454.526416] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 454.533489] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b [ 454.540561] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00 [ 454.547633] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000 [ 454.554705] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000 [ 454.561779] Call trace: [ 454.564191] kvm_pmu_event_mask.isra.0+0x14/0x74 [ 454.568764] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 454.573766] access_pmu_evtyper+0x128/0x170 [ 454.577905] perform_access+0x34/0x80 [ 454.581527] kvm_handle_cp_32+0x13c/0x160 [ 454.585495] kvm_handle_cp15_32+0x1c/0x30 [ 454.589462] handle_exit+0x70/0x180 [ 454.592912] kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0 [ 454.597485] kvm_vcpu_ioctl+0x23c/0x940 [ 454.601280] __arm64_sys_ioctl+0xa8/0xf0 [ 454.605160] invoke_syscall+0x48/0x114 [ 454.608869] el0_svc_common.constprop.0+0xd4/0xfc [ 454.613527] do_el0_svc+0x28/0x90 [ 454.616803] el0_svc+0x34/0xb0 [ 454.619822] el0t_64_sync_handler+0xa4/0x130 [ 454.624049] el0t_64_sync+0x18c/0x190 [ 454.627675] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001) [ 454.633714] ---[ end trace 0000000000000000 ]--- In this particular case, Linux hasn't detected the presence of a hardware PMU because the PMU node is missing from the DTB, so userspace would have been unable to set the VCPU PMU feature even if it attempted it. What happens is that the 32-bit guest reads ID_DFR0, which advertises the presence of the PMU, and when it tries to program a counter, it triggers the NULL pointer dereference because kvm->arch.arm_pmu is NULL. kvm-arch.arm_pmu was introduced by commit 46b187821472 ("KVM: arm64: Keep a per-VM pointer to the default PMU"). Until that commit, this error would be triggered instead: [ 73.388140] ------------[ cut here ]------------ [ 73.388189] Unknown PMU version 0 [ 73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.399821] Modules linked in: [ 73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 #114 [ 73.409132] Hardware name: Hardkernel ODROID-C4 (DT) [ 73.414048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.430779] sp : ffff80000a8db9b0 [ 73.434055] x29: ffff80000a8db9b0 x28: ffff000000dbaac0 x27: 0000000000000000 [ 73.441131] x26: ffff000000dbaac0 x25: 00000000c600000d x24: 0000000000180720 [ 73.448203] x23: ffff800009ffbe10 x22: ffff00000b612000 x21: 0000000000000000 [ 73.455276] x20: 000000000000001f x19: 0000000000000000 x18: ffffffffffffffff [ 73.462348] x17: 000000008003fe98 x16: 0000000000000000 x15: 0720072007200720 [ 73.469420] x14: 0720072007200720 x13: ffff800009d32488 x12: 00000000000004e6 [ 73.476493] x11: 00000000000001a2 x10: ffff800009d32488 x9 : ffff800009d32488 [ 73.483565] x8 : 00000000ffffefff x7 : ffff800009d8a488 x6 : ffff800009d8a488 [ 73.490638] x5 : ffff0000f461a9d8 x4 : 0000000000000000 x3 : 0000000000000001 [ 73.497710] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000dbaac0 [ 73.504784] Call trace: [ 73.507195] kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.511768] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 73.516770] access_pmu_evtyper+0x128/0x16c [ 73.520910] perform_access+0x34/0x80 [ 73.524532] kvm_handle_cp_32+0x13c/0x160 [ 73.528500] kvm_handle_cp15_32+0x1c/0x30 [ 73.532467] handle_exit+0x70/0x180 [ 73.535917] kvm_arch_vcpu_ioctl_run+0x20c/0x6e0 [ 73.540489] kvm_vcpu_ioctl+0x2b8/0x9e0 [ 73.544283] __arm64_sys_ioctl+0xa8/0xf0 [ 73.548165] invoke_syscall+0x48/0x114 [ 73.551874] el0_svc_common.constprop.0+0xd4/0xfc [ 73.556531] do_el0_svc+0x28/0x90 [ 73.559808] el0_svc+0x28/0x80 [ 73.562826] el0t_64_sync_handler+0xa4/0x130 [ 73.567054] el0t_64_sync+0x1a0/0x1a4 [ 73.570676] ---[ end trace 0000000000000000 ]--- [ 73.575382] kvm: pmu event creation failed -2 The root cause remains the same: kvm->arch.pmuver was never set to something sensible because the VCPU feature itself was never set. The odroid-c4 is somewhat of a special case, because Linux doesn't probe the PMU. But the above errors can easily be reproduced on any hardware, with or without a PMU driver, as long as userspace doesn't set the PMU feature. Work around the fact that KVM advertises a PMU even when the VCPU feature is not set by gating all PMU emulation on the feature. The guest can still access the registers without KVM injecting an undefined exception. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425145530.723858-1-alexandru.elisei@arm.com
2022-04-27KVM: arm64: Handle host stage-2 faults from 32-bit EL0Will Deacon1-9/+9
When pKVM is enabled, host memory accesses are translated by an identity mapping at stage-2, which is populated lazily in response to synchronous exceptions from 64-bit EL1 and EL0. Extend this handling to cover exceptions originating from 32-bit EL0 as well. Although these are very unlikely to occur in practice, as the kernel typically ensures that user pages are initialised before mapping them in, drivers could still map previously untouched device pages into userspace and expect things to work rather than panic the system. Cc: Quentin Perret <qperret@google.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220427171332.13635-1-will@kernel.org
2022-04-27s390: disable -Warray-boundsSven Schnelle1-0/+10
gcc-12 shows a lot of array bound warnings on s390. This is caused by the S390_lowcore macro which uses a hardcoded address of 0. Wrapping that with absolute_pointer() works, but gcc no longer knows that a 12 bit displacement is sufficient to access lowcore. So it emits instructions like 'lghi %r1,0; l %rx,xxx(%r1)' instead of a single load/store instruction. As s390 stores variables often read/written in lowcore, this is considered problematic. Therefore disable -Warray-bounds on s390 for gcc-12 for the time being, until there is a better solution. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://lore.kernel.org/r/yt9dzgkelelc.fsf@linux.ibm.com Link: https://lore.kernel.org/r/20220422134308.1613610-1-svens@linux.ibm.com Link: https://lore.kernel.org/r/20220425121742.3222133-1-svens@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-04-27Merge tag 'pinctrl-v5.18-2' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix some register offsets on Intel Alderlake - Fix the order the UFS and SDC pins on Qualcomm SM6350 - Fix a build error in Mediatek Moore. - Fix a pin function table in the Sunplus SP7021. - Fix some Kconfig and static keywords on the Samsung Tesla FSD SoC. - Fix up the EOI function for edge triggered IRQs and keep the block clock enabled for level IRQs in the STM32 driver. - Fix some bits and order in the Rockchip RK3308 driver. - Handle the errorpath in the Pistachio driver probe() properly. * tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: pistachio: fix use of irq_of_parse_and_map() pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested pinctrl: rockchip: sort the rk3308_mux_recalced_data entries pinctrl: rockchip: fix RK3308 pinmux bits pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI pinctrl: Fix an error in pin-function table of SP7021 pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config pinctrl: mediatek: moore: Fix build error pinctrl: qcom: sm6350: fix order of UFS & SDC pins pinctrl: alderlake: Fix register offsets for ADL-N variant pinctrl: samsung: staticize fsd_pin_ctrl
2022-04-26RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRLArnaud Pouliquen2-0/+2
In the commit 617d32938d1b ("rpmsg: Move the rpmsg control device from rpmsg_char to rpmsg_ctrl"), we split the rpmsg_char driver in two. By default give everyone who had the old driver enabled the rpmsg_ctrl driver too. Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20220404090527.582217-1-arnaud.pouliquen@foss.st.com Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-25Revert "arm64: dts: tegra: Fix boolean properties with values"Arnd Bergmann8-27/+27
This reverts commit 1a67653de0dd, which caused a boot regression. The behavior of the "drive-push-pull" in the kernel does not match what the binding document describes. Revert Rob's patch to make the DT match the kernel again, rather than the binding. Link: https://lore.kernel.org/lkml/YlVAy95eF%2F9b1nmu@orome/ Reported-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-24Merge tag 'powerpc-5.18-3' of ↵Linus Torvalds6-66/+66
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Partly revert a change to our timer_interrupt() that caused lockups with high res timers disabled. - Fix a bug in KVM TCE handling that could corrupt kernel memory. - Two commits fixing Power9/Power10 perf alternative event selection. Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin. * tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf: Fix 32bit compile powerpc/perf: Fix power10 event alternatives powerpc/perf: Fix power9 event alternatives KVM: PPC: Fix TCE handling for VFIO powerpc/time: Always set decrementer in timer_interrupt()
2022-04-24Merge tag 'perf_urgent_for_v5.18_rc4' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Add Sapphire Rapids CPU support - Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use) * tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
2022-04-24arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clockFabio Estevam1-0/+4
The ROHM BD71847 PMIC has a 32.768 kHz clock. Describe the PMIC clock to fix the following boot errors: bd718xx-clk bd71847-clk.1.auto: No parent clk found bd718xx-clk: probe of bd71847-clk.1.auto failed with error -22 Based on the same fix done for imx8mm-evk as per commit a6a355ede574 ("arm64: dts: imx8mm-evk: Add 32.768 kHz clock to PMIC") Fixes: 3e44dd09736d ("arm64: dts: imx8mn-ddr4-evk: Add rohm,bd71847 PMIC support") Signed-off-by: Fabio Estevam <festevam@denx.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2022-04-24ARM: dts: imx6ull-colibri: fix vqmmc regulatorMax Krummenacher1-1/+1
The correct spelling for the property is gpios. Otherwise, the regulator will neither reserve nor control any GPIOs. Thus, any SD/MMC card which can use UHS-I modes will fail. Fixes: c2e4987e0e02 ("ARM: dts: imx6ull: add Toradex Colibri iMX6ULL support") Signed-off-by: Max Krummenacher <max.krummenacher@toradex.com> Signed-off-by: Denys Drozdov <denys.drozdov@toradex.com> Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2022-04-24Merge tag 'arc-5.18-rc4' of ↵Linus Torvalds9-27/+24
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Assorted fixes * tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: remove redundant READ_ONCE() in cmpxchg loop ARC: atomic: cleanup atomic-llsc definitions arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirely ARC: dts: align SPI NOR node name with dtschema ARC: Remove a redundant memset() ARC: fix typos in comments ARC: entry: fix syscall_trace_exit argument
2022-04-23Merge tag 'for-linus-5.18-rc4-tag' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "A simple cleanup patch and a refcount fix for Xen on Arm" * tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: arm/xen: Fix some refcount leaks xen: Convert kmap() to kmap_local_page()
2022-04-23sparc: cacheflush_32.h needs struct pageRandy Dunlap1-0/+1
Add a struct page forward declaration to cacheflush_32.h. Fixes this build warning: CC drivers/crypto/xilinx/zynqmp-sha.o In file included from arch/sparc/include/asm/cacheflush.h:11, from include/linux/cacheflush.h:5, from drivers/crypto/xilinx/zynqmp-sha.c:6: arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration 38 | void sparc_flush_page_to_ram(struct page *page); Exposed by commit 0e03b8fd2936 ("crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST") but not Fixes: that commit because the underlying problem is older. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: sparclinux@vger.kernel.org Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>