summaryrefslogtreecommitdiff
path: root/arch/s390/kvm/kvm-s390.c
AgeCommit message (Collapse)AuthorFilesLines
2021-02-13s390/kvm: use union tod_clockHeiko Carstens1-15/+9
Use union tod_clock and get rid of the kvm specific struct kvm_s390_tod_clock_ext which apparently was introduced for the same purpose. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-01-19s390: convert to generic entrySven Schnelle1-0/+3
This patch converts s390 to use the generic entry infrastructure from kernel/entry/*. There are a few special things on s390: - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't know about our PIF flags in exit_to_user_mode_loop(). - The old code had several ways to restart syscalls: a) PIF_SYSCALL_RESTART, which was only set during execve to force a restart after upgrading a process (usually qemu-kvm) to pgste page table extensions. b) PIF_SYSCALL, which is set by do_signal() to indicate that the current syscall should be restarted. This is changed so that do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use PIF_SYSCALL doesn't work with the generic code, and changing it to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART more unique. - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by executing a svc instruction on the process stack which causes a fault. While handling that fault the fault code sets PIF_SYSCALL to hand over processing to the syscall code on exit to usermode. The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets a return value for a syscall. The s390x ptrace ABI uses r2 both for the syscall number and return value, so ptrace cannot set the syscall number + return value at the same time. The flag makes handling that a bit easier. do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET is set. CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY. CR1/7/13 will be checked both on kernel entry and exit to contain the correct asces. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-10/+12
Pull KVM updates from Paolo Bonzini: "Much x86 work was pushed out to 5.12, but ARM more than made up for it. ARM: - PSCI relay at EL2 when "protected KVM" is enabled - New exception injection code - Simplification of AArch32 system register handling - Fix PMU accesses when no PMU is enabled - Expose CSV3 on non-Meltdown hosts - Cache hierarchy discovery fixes - PV steal-time cleanups - Allow function pointers at EL2 - Various host EL2 entry cleanups - Simplification of the EL2 vector allocation s390: - memcg accouting for s390 specific parts of kvm and gmap - selftest for diag318 - new kvm_stat for when async_pf falls back to sync x86: - Tracepoints for the new pagetable code from 5.10 - Catch VFIO and KVM irqfd events before userspace - Reporting dirty pages to userspace with a ring buffer - SEV-ES host support - Nested VMX support for wait-for-SIPI activity state - New feature flag (AVX512 FP16) - New system ioctl to report Hyper-V-compatible paravirtualization features Generic: - Selftest improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: SVM: fix 32-bit compilation KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting KVM: SVM: Provide support to launch and run an SEV-ES guest KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests KVM: SVM: Provide support for SEV-ES vCPU loading KVM: SVM: Provide support for SEV-ES vCPU creation/loading KVM: SVM: Update ASID allocation to support SEV-ES guests KVM: SVM: Set the encryption mask for the SVM host save area KVM: SVM: Add NMI support for an SEV-ES guest KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest KVM: SVM: Do not report support for SMM for an SEV-ES guest KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES KVM: SVM: Add support for CR8 write traps for an SEV-ES guest KVM: SVM: Add support for CR4 write traps for an SEV-ES guest KVM: SVM: Add support for CR0 write traps for an SEV-ES guest KVM: SVM: Add support for EFER write traps for an SEV-ES guest KVM: SVM: Support string IO operations for an SEV-ES guest KVM: SVM: Support MMIO for an SEV-ES guest KVM: SVM: Create trace events for VMGEXIT MSR protocol processing KVM: SVM: Create trace events for VMGEXIT processing ...
2020-12-10KVM: s390: track synchronous pfault events in kvm_statChristian Borntraeger1-0/+2
Right now we do count pfault (pseudo page faults aka async page faults start and completion events). What we do not count is, if an async page fault would have been possible by the host, but it was disabled by the guest (e.g. interrupts off, pfault disabled, secure execution....). Let us count those as well in the pfault_sync counter. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20201125090658.38463-1-borntraeger@de.ibm.com
2020-12-10KVM: s390: Add memcg accounting to KVM allocationsChristian Borntraeger1-10/+10
Almost all kvm allocations in the s390x KVM code can be attributed to the process that triggers the allocation (in other words, no global allocation for other guests). This will help the memcg controller to make the right decisions. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
2020-11-11KVM: s390: remove diag318 reset codeCollin Walling1-2/+0
The diag318 data must be set to 0 by VM-wide reset events triggered by diag308. As such, KVM should not handle resetting this data via the VCPU ioctls. Fixes: 23a60f834406 ("s390/kvm: diagnose 0x318 sync and reset") Signed-off-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20201104181032.109800-1-walling@linux.ibm.com
2020-11-11KVM: s390: pv: Mark mm as protected after the set secure parameters and ↵Janosch Frank1-1/+1
improve cleanup We can only have protected guest pages after a successful set secure parameters call as only then the UV allows imports and unpacks. By moving the test we can now also check for it in s390_reset_acc() and do an early return if it is 0. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Fixes: 29b40f105ec8 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling") Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-08-12mm/gup: remove task_struct pointer for all gup codePeter Xu1-1/+1
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-03Merge tag 'kvm-s390-next-5.9-1' of ↵Paolo Bonzini1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next-5.6 KVM: s390: Enhancement for 5.9 - implement diagnose 318
2020-07-10KVM: s390: clean up redundant 'kvm_run' parametersTianjia Zhang1-8/+15
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200623131418.31473-2-tianjia.zhang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: async_pf: change kvm_setup_async_pf()/kvm_arch_setup_async_pf() return ↵Vitaly Kuznetsov1-11/+9
type to bool Unlike normal 'int' functions returning '0' on success, kvm_setup_async_pf()/ kvm_arch_setup_async_pf() return '1' when a job to handle page fault asynchronously was scheduled and '0' otherwise. To avoid the confusion change return type to 'bool'. No functional change intended. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200615121334.91300-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23s390/kvm: diagnose 0x318 sync and resetCollin Walling1-1/+10
DIAGNOSE 0x318 (diag318) sets information regarding the environment the VM is running in (Linux, z/VM, etc) and is observed via firmware/service events. This is a privileged s390x instruction that must be intercepted by SIE. Userspace handles the instruction as well as migration. Data is communicated via VCPU register synchronization. The Control Program Name Code (CPNC) is stored in the SIE block. The CPNC along with the Control Program Version Code (CPVC) are stored in the kvm_vcpu_arch struct. This data is reset on load normal and clear resets. Signed-off-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200622154636.5499-3-walling@linux.ibm.com [borntraeger@de.ibm.com: fix sync_reg position] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-06-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+3
Pull more KVM updates from Paolo Bonzini: "The guest side of the asynchronous page fault work has been delayed to 5.9 in order to sync with Thomas's interrupt entry rework, but here's the rest of the KVM updates for this merge window. MIPS: - Loongson port PPC: - Fixes ARM: - Fixes x86: - KVM_SET_USER_MEMORY_REGION optimizations - Fixes - Selftest fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits) KVM: x86: do not pass poisoned hva to __kvm_set_memory_region KVM: selftests: fix sync_with_host() in smm_test KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected KVM: async_pf: Cleanup kvm_setup_async_pf() kvm: i8254: remove redundant assignment to pointer s KVM: x86: respect singlestep when emulating instruction KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check KVM: nVMX: Consult only the "basic" exit reason when routing nested exit KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts KVM: arm64: Remove host_cpu_context member from vcpu structure KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr KVM: arm64: Handle PtrAuth traps early KVM: x86: Unexport x86_fpu_cache and make it static KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K KVM: arm64: Save the host's PtrAuth keys in non-preemptible context KVM: arm64: Stop save/restoring ACTLR_EL1 KVM: arm64: Add emulation for 32bit guests accessing ACTLR2 ...
2020-06-11KVM: async_pf: Inject 'page ready' event only if 'page not present' was ↵Vitaly Kuznetsov1-1/+3
previously injected 'Page not present' event may or may not get injected depending on guest's state. If the event wasn't injected, there is no need to inject the corresponding 'page ready' event as the guest may get confused. E.g. Linux thinks that the corresponding 'page not present' event wasn't delivered *yet* and allocates a 'dummy entry' for it. This entry is never freed. Note, 'wakeup all' events have no corresponding 'page not present' event and always get injected. s390 seems to always be able to inject 'page not present', the change is effectively a nop. Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610175532.779793-2-vkuznets@redhat.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse1-14/+14
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: reorder includes after introduction of linux/pgtable.hMike Rapoport1-1/+1
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include of the latter in the middle of asm includes. Fix this up with the aid of the below script and manual adjustments here and there. import sys import re if len(sys.argv) is not 3: print "USAGE: %s <file> <header>" % (sys.argv[0]) sys.exit(1) hdr_to_move="#include <linux/%s>" % sys.argv[2] moved = False in_hdrs = False with open(sys.argv[1], "r") as f: lines = f.readlines() for _line in lines: line = _line.rstrip(' ') if line == hdr_to_move: continue if line.startswith("#include <linux/"): in_hdrs = True elif not moved and in_hdrs: moved = True print hdr_to_move print line Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport1-1/+1
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08Merge tag 's390-5.8-1' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for multi-function devices in pci code. - Enable PF-VF linking for architectures using the pdev->no_vf_scan flag (currently just s390). - Add reipl from NVMe support. - Get rid of critical section cleanup in entry.S. - Refactor PNSO CHSC (perform network subchannel operation) in cio and qeth. - QDIO interrupts and error handling fixes and improvements, more refactoring changes. - Align ioremap() with generic code. - Accept requests without the prefetch bit set in vfio-ccw. - Enable path handling via two new regions in vfio-ccw. - Other small fixes and improvements all over the code. * tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits) vfio-ccw: make vfio_ccw_regops variables declarations static vfio-ccw: Add trace for CRW event vfio-ccw: Wire up the CRW irq and CRW region vfio-ccw: Introduce a new CRW region vfio-ccw: Refactor IRQ handlers vfio-ccw: Introduce a new schib region vfio-ccw: Refactor the unregister of the async regions vfio-ccw: Register a chp_event callback for vfio-ccw vfio-ccw: Introduce new helper functions to free/destroy regions vfio-ccw: document possible errors vfio-ccw: Enable transparent CCW IPL from DASD s390/pci: Log new handle in clp_disable_fh() s390/cio, s390/qeth: cleanup PNSO CHSC s390/qdio: remove q->first_to_kick s390/qdio: fix up qdio_start_irq() kerneldoc s390: remove critical section cleanup from entry.S s390: add machine check SIGP s390/pci: ioremap() align with generic code s390/ap: introduce new ap function ap_get_qdev() Documentation/s390: Update / remove developerWorks web links ...
2020-06-01KVM: rename kvm_arch_can_inject_async_page_present() to ↵Vitaly Kuznetsov1-1/+1
kvm_arch_can_dequeue_async_page_present() An innocent reader of the following x86 KVM code: bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu) { if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) return true; ... may get very confused: if APF mechanism is not enabled, why do we report that we 'can inject async page present'? In reality, upon injection kvm_arch_async_page_present() will check the same condition again and, in case APF is disabled, will just drop the item. This is fine as the guest which deliberately disabled APF doesn't expect to get any APF notifications. Rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() to make it clear what we are checking: if the item can be dequeued (meaning either injected or just dropped). On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so the rename doesn't matter much. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-28s390: remove critical section cleanup from entry.SSven Schnelle1-3/+0
The current code is rather complex and caused a lot of subtle and hard to debug bugs in the past. Simplify the code by calling the system_call handler with interrupts disabled, save machine state, and re-enable them later. This requires significant changes to the machine check handling code as well. When the machine check interrupt arrived while being in kernel mode the new code will signal pending machine checks with a SIGP external call. When userspace was interrupted, the handler will switch to the kernel stack and directly execute s390_handle_mcck(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-05-15kvm: add halt-polling cpu usage statsDavid Matlack1-0/+2
Two new stats for exposing halt-polling cpu usage: halt_poll_success_ns halt_poll_fail_ns Thus sum of these 2 stats is the total cpu time spent polling. "success" means the VCPU polled until a virtual interrupt was delivered. "fail" means the VCPU had to schedule out (either because the maximum poll time was reached or it needed to yield the CPU). To avoid touching every arch's kvm_vcpu_stat struct, only update and export halt-polling cpu usage stats if we're on x86. Exporting cpu usage as a u64 and in nanoseconds means we will overflow at ~500 years, which seems reasonably large. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Jon Cargille <jcargill@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200508182240.68440-1-jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13Merge branch 'kvm-amd-fixes' into HEADPaolo Bonzini1-0/+1
2020-05-07KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu1-0/+1
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_runTianjia Zhang1-1/+2
In earlier versions of kvm, 'kvm_run' was an independent structure and was not included in the vcpu structure. At present, 'kvm_run' is already included in the vcpu structure, so the parameter 'kvm_run' is redundant. This patch simplifies the function definition, removes the extra 'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure if necessary. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Message-Id: <20200416051057.26526-1-tianjia.zhang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-21kvm_host: unify VM_STAT and VCPU_STAT definitions in a single placeEmanuele Giuseppe Esposito1-103/+100
The macros VM_STAT and VCPU_STAT are redundantly implemented in multiple files, each used by a different architecure to initialize the debugfs entries for statistics. Since they all have the same purpose, they can be unified in a single common definition in include/linux/kvm_host.h Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20200414155625.20559-1-eesposit@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-14KVM: s390: Return last valid slot if approx index is out-of-boundsSean Christopherson1-0/+3
Return the index of the last valid slot from gfn_to_memslot_approx() if its binary search loop yielded an out-of-bounds index. The index can be out-of-bounds if the specified gfn is less than the base of the lowest memslot (which is also the last valid memslot). Note, the sole caller, kvm_s390_get_cmma(), ensures used_slots is non-zero. Fixes: afdad61615cc3 ("KVM: s390: Fix storage attributes migration with memory slots") Cc: stable@vger.kernel.org # 4.19.x: 0774a964ef56: KVM: Fix out of range accesses to memslots Cc: stable@vger.kernel.org # 4.19.x Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200408064059.8957-3-sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-04-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-89/+508
Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...
2020-03-31KVM: Pass kvm_init()'s opaque param to additional arch funcsSean Christopherson1-2/+2
Pass @opaque to kvm_arch_hardware_setup() and kvm_arch_check_processor_compat() to allow architecture specific code to reference @opaque without having to stash it away in a temporary global variable. This will enable x86 to separate its vendor specific callback ops, which are passed via @opaque, into "init" and "runtime" ops without having to stash away the "init" ops. No functional change intended. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Cornelia Huck <cohuck@redhat.com> #s390 Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-26Merge tag 'kvm-s390-next-5.7-2' of ↵Paolo Bonzini1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: cleanups for 5.7 - mark sie control block as 512 byte aligned - use fallthrough;
2020-03-26KVM: Fix out of range accesses to memslotsSean Christopherson1-0/+3
Reset the LRU slot if it becomes invalid when deleting a memslot to fix an out-of-bounds/use-after-free access when searching through memslots. Explicitly check for there being no used slots in search_memslots(), and in the caller of s390's approximation variant. Fixes: 36947254e5f9 ("KVM: Dynamically size memslot array based on number of used slots") Reported-by: Qian Cai <cai@lca.pw> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320205546.2396-2-sean.j.christopherson@intel.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-23KVM: s390: Use fallthrough;Joe Perches1-2/+2
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/d63c86429f3e5aa806aa3e185c97d213904924a5.1583896348.git.joe@perches.com [borntrager@de.ibm.com: Fix link to tool and subject] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-03-16Merge tag 'kvm-s390-next-5.7-1' of ↵Paolo Bonzini1-66/+497
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and Enhancements for 5.7 part1 1. Allow to disable gisa 2. protected virtual machines Protected VMs (PVM) are KVM VMs, where KVM can't access the VM's state like guest memory and guest registers anymore. Instead the PVMs are mostly managed by a new entity called Ultravisor (UV), which provides an API, so KVM and the PV can request management actions. PVMs are encrypted at rest and protected from hypervisor access while running. They switch from a normal operation into protected mode, so we can still use the standard boot process to load a encrypted blob and then move it into protected mode. Rebooting is only possible by passing through the unprotected/normal mode and switching to protected again. One mm related patch will go via Andrews mm tree ( mm/gup/writeback: add callbacks for inaccessible pages)
2020-03-16KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()Sean Christopherson1-10/+2
Rework kvm_get_dirty_log() so that it "returns" the associated memslot on success. A future patch will rework memslot handling such that id_to_memslot() can return NULL, returning the memslot makes it more obvious that the validity of the memslot has been verified, i.e. precludes the need to add validity checks in the arch code that are technically unnecessary. To maintain ordering in s390, move the call to kvm_arch_sync_dirty_log() from s390's kvm_vm_ioctl_get_dirty_log() to the new kvm_get_dirty_log(). This is a nop for PPC, the only other arch that doesn't select KVM_GENERIC_DIRTYLOG_READ_PROTECT, as its sync_dirty_log() is empty. Ideally, moving the sync_dirty_log() call would be done in a separate patch, but it can't be done in a follow-on patch because that would temporarily break s390's ordering. Making the move in a preparatory patch would be functionally correct, but would create an odd scenario where the moved sync_dirty_log() would operate on a "different" memslot due to consuming the result of a different id_to_memslot(). The memslot couldn't actually be different as slots_lock is held, but the code is confusing enough as it is, i.e. moving sync_dirty_log() in this patch is the lesser of all evils. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Provide common implementation for generic dirty log functionsSean Christopherson1-3/+2
Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code. The arch specific implemenations are extremely similar, differing only in whether the dirty log needs to be sync'd from hardware (x86) and how the TLBs are flushed. Add new arch hooks to handle sync and TLB flush; the sync will also be used for non-generic dirty log support in a future patch (s390). The ulterior motive for providing a common implementation is to eliminate the dependency between arch and common code with respect to the memslot referenced by the dirty log, i.e. to make it obvious in the code that the validity of the memslot is guaranteed, as a future patch will rework memslot handling such that id_to_memslot() can return NULL. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Drop "const" attribute from old memslot in commit_memory_region()Sean Christopherson1-1/+1
Drop the "const" attribute from @old in kvm_arch_commit_memory_region() to allow arch specific code to free arch specific resources in the old memslot without having to cast away the attribute. Freeing resources in kvm_arch_commit_memory_region() paves the way for simplifying kvm_free_memslot() by eliminating the last usage of its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Drop kvm_arch_create_memslot()Sean Christopherson1-6/+0
Remove kvm_arch_create_memslot() now that all arch implementations are effectively nops. Removing kvm_arch_create_memslot() eliminates the possibility for arch specific code to allocate memory prior to setting a memslot, which sets the stage for simplifying kvm_free_memslot(). Cc: Janosch Frank <frankja@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-11KVM: s390: Also reset registers in sync regs for initial cpu resetChristian Borntraeger1-1/+17
When we do the initial CPU reset we must not only clear the registers in the internal data structures but also in kvm_run sync_regs. For modern userspace sync_regs is the only place that it looks at. Fixes: 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API") Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: introduce module parameter kvm.use_gisaMichael Mueller1-1/+7
The boolean module parameter "kvm.use_gisa" controls if newly created guests will use the GISA facility if provided by the host system. The default is yes. # cat /sys/module/kvm/parameters/use_gisa Y The parameter can be changed on the fly. # echo N > /sys/module/kvm/parameters/use_gisa Already running guests are not affected by this change. The kvm s390 debug feature shows if a guest is running with GISA. # grep gisa /sys/kernel/debug/s390dbf/kvm-$pid/sprintf 00 01582725059:843303 3 - 08 00000000e119bc01 gisa 0x00000000c9ac2642 initialized 00 01582725059:903840 3 - 11 000000004391ee22 00[0000000000000000-0000000000000000]: AIV gisa format-1 enabled for cpu 000 ... 00 01582725059:916847 3 - 08 0000000094fff572 gisa 0x00000000c9ac2642 cleared In general, that value should not be changed as the GISA facility enhances interruption delivery performance. A reason to switch the GISA facility off might be a performance comparison run or debugging. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20200227091031.102993-1-mimu@linux.ibm.com Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTEDChristian Borntraeger1-0/+3
Now that everything is in place, we can announce the feature. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2020-02-27KVM: s390: protvirt: Add UV cpu reset callsJanosch Frank1-0/+20
For protected VMs, the VCPU resets are done by the Ultravisor, as KVM has no access to the VCPU registers. Note that the ultravisor will only accept a call for the exact reset that has been requested. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: do not inject interrupts after startChristian Borntraeger1-0/+7
As PSW restart is handled by the ultravisor (and we only get a start notification) we must re-check the PSW after a start before injecting interrupts. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2020-02-27KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112Janosch Frank1-0/+11
We're not allowed to inject interrupts on intercepts that leave the guest state in an "in-between" state where the next SIE entry will do a continuation, namely secure instruction interception (104) and secure prefix interception (112). As our PSW is just a copy of the real one that will be replaced on the next exit, we can mask out the interrupt bits in the PSW to make sure that we do not inject anything. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: Support cmd 5 operation stateJanosch Frank1-0/+6
Code 5 for the set cpu state UV call tells the UV to load a PSW from the SE header (first IPL) or from guest location 0x0 (diag 308 subcode 0/1). Also it sets the cpu into operating state afterwards, so we can start it. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: Report CPU state to UltravisorJanosch Frank1-10/+34
VCPU states have to be reported to the ultravisor for SIGP interpretation, kdump, kexec and reboot. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: UV calls in support of diag308 0, 1Janosch Frank1-0/+22
diag 308 subcode 0 and 1 require several KVM and Ultravisor interactions. Specific to these "soft" reboots are * The "unshare all" UVC * The "prepare for reset" UVC Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: Only sync fmt4 registersJanosch Frank1-40/+70
A lot of the registers are controlled by the Ultravisor and never visible to KVM. Also some registers are overlayed, like gbea is with sidad, which might leak data to userspace. Hence we sync a minimal set of registers for both SIE formats and then check and sync format 2 registers if necessary. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: Do only reset registers that are accessibleJanosch Frank1-4/+11
For protected VMs the hypervisor can not access guest breaking event address, program parameter, bpbc and todpr. Do not reset those fields as the control block does not provide access to these fields. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: disallow one_regJanosch Frank1-0/+3
A lot of the registers are controlled by the Ultravisor and never visible to KVM. Some fields in the sie control block are overlayed, like gbea. As no known userspace uses the ONE_REG interface on s390 if sync regs are available, no functionality is lost if it is disabled for protected guests. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: S390: protvirt: Introduce instruction data area bounce bufferJanosch Frank1-8/+58
Now that we can't access guest memory anymore, we have a dedicated satellite block that's a bounce buffer for instruction data. We re-use the memop interface to copy the instruction data to / from userspace. This lets us re-use a lot of QEMU code which used that interface to make logical guest memory accesses which are not possible anymore in protected mode anyway. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27KVM: s390: protvirt: Add new gprs location handlingJanosch Frank1-0/+11
Guest registers for protected guests are stored at offset 0x380. We will copy those to the usual places. Long term we could refactor this or use register access functions. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>