summaryrefslogtreecommitdiff
path: root/Documentation/virt/kvm/api.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt/kvm/api.rst')
-rw-r--r--Documentation/virt/kvm/api.rst412
1 files changed, 393 insertions, 19 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..a6729c8cf063 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -688,9 +688,14 @@ MSRs that have been set successfully.
Defines the vcpu responses to the cpuid instruction. Applications
should use the KVM_SET_CPUID2 ioctl if available.
-Note, when this IOCTL fails, KVM gives no guarantees that previous valid CPUID
-configuration (if there is) is not corrupted. Userspace can get a copy of the
-resulting CPUID configuration through KVM_GET_CPUID2 in case.
+Caveat emptor:
+ - If this IOCTL fails, KVM gives no guarantees that previous valid CPUID
+ configuration (if there is) is not corrupted. Userspace can get a copy
+ of the resulting CPUID configuration through KVM_GET_CPUID2 in case.
+ - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model
+ after running the guest, may cause guest instability.
+ - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc...
+ may cause guest instability.
::
@@ -850,7 +855,7 @@ in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
use PPIs designated for specific cpus. The irq field is interpreted
like this::
-  bits: | 31 ... 28 | 27 ... 24 | 23 ... 16 | 15 ... 0 |
+ bits: | 31 ... 28 | 27 ... 24 | 23 ... 16 | 15 ... 0 |
field: | vcpu2_index | irq_type | vcpu_index | irq_id |
The irq_type field has the following values:
@@ -2144,10 +2149,10 @@ prior to calling the KVM_RUN ioctl.
Errors:
====== ============================================================
-  ENOENT   no such register
-  EINVAL   invalid register ID, or no such register or used with VMs in
+ ENOENT no such register
+ EINVAL invalid register ID, or no such register or used with VMs in
protected virtualization mode on s390
-  EPERM    (arm64) register access not allowed before vcpu finalization
+ EPERM (arm64) register access not allowed before vcpu finalization
====== ============================================================
(These error codes are indicative only: do not rely on a specific error
@@ -2585,10 +2590,10 @@ following id bit patterns::
Errors include:
======== ============================================================
-  ENOENT   no such register
-  EINVAL   invalid register ID, or no such register or used with VMs in
+ ENOENT no such register
+ EINVAL invalid register ID, or no such register or used with VMs in
protected virtualization mode on s390
-  EPERM    (arm64) register access not allowed before vcpu finalization
+ EPERM (arm64) register access not allowed before vcpu finalization
======== ============================================================
(These error codes are indicative only: do not rely on a specific error
@@ -3107,13 +3112,13 @@ current state. "addr" is ignored.
Errors:
====== =================================================================
-  EINVAL    the target is unknown, or the combination of features is invalid.
-  ENOENT    a features bit specified is unknown.
+ EINVAL the target is unknown, or the combination of features is invalid.
+ ENOENT a features bit specified is unknown.
====== =================================================================
This tells KVM what type of CPU to present to the guest, and what
-optional features it should have.  This will cause a reset of the cpu
-registers to their initial values.  If this is not called, KVM_RUN will
+optional features it should have. This will cause a reset of the cpu
+registers to their initial values. If this is not called, KVM_RUN will
return ENOEXEC for that vcpu.
The initial values are defined as:
@@ -3234,8 +3239,8 @@ VCPU matching underlying host.
Errors:
===== ==============================================================
-  E2BIG     the reg index list is too big to fit in the array specified by
-             the user (the number required will be written into n).
+ E2BIG the reg index list is too big to fit in the array specified by
+ the user (the number required will be written into n).
===== ==============================================================
::
@@ -3283,7 +3288,7 @@ specific device.
ARM/arm64 divides the id field into two parts, a device id and an
address type id specific to the individual device::
-  bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 |
+ bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 |
field: | 0x00000000 | device id | addr type id |
ARM/arm64 currently only require this when using the in-kernel GIC
@@ -3352,6 +3357,7 @@ flags which can include the following:
- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
- KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
+ - KVM_GUESTDBG_BLOCKIRQ: avoid injecting interrupts/NMI/SMI [x86]
For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
are enabled in memory so we need to ensure breakpoint exceptions are
@@ -5034,6 +5040,283 @@ see KVM_XEN_VCPU_SET_ATTR above.
The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
with the KVM_XEN_VCPU_GET_ATTR ioctl.
+4.130 KVM_ARM_MTE_COPY_TAGS
+---------------------------
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect
+ arguments, -EFAULT if memory cannot be accessed).
+
+::
+
+ struct kvm_arm_copy_mte_tags {
+ __u64 guest_ipa;
+ __u64 length;
+ void __user *addr;
+ __u64 flags;
+ __u64 reserved[2];
+ };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+field must point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / 16)`` bytes
+(granules in MTE are 16 bytes long). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
+If an error occurs before any data is copied then a negative error code is
+returned. If some tags have been copied before an error occurs then the number
+of bytes successfully copied is returned. If the call completes successfully
+then ``length`` is returned.
+
+4.131 KVM_GET_SREGS2
+--------------------
+
+:Capability: KVM_CAP_SREGS2
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_sregs2 (out)
+:Returns: 0 on success, -1 on error
+
+Reads special registers from the vcpu.
+This ioctl (when supported) replaces the KVM_GET_SREGS.
+
+::
+
+ struct kvm_sregs2 {
+ /* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */
+ struct kvm_segment cs, ds, es, fs, gs, ss;
+ struct kvm_segment tr, ldt;
+ struct kvm_dtable gdt, idt;
+ __u64 cr0, cr2, cr3, cr4, cr8;
+ __u64 efer;
+ __u64 apic_base;
+ __u64 flags;
+ __u64 pdptrs[4];
+ };
+
+flags values for ``kvm_sregs2``:
+
+``KVM_SREGS2_FLAGS_PDPTRS_VALID``
+
+ Indicates thats the struct contain valid PDPTR values.
+
+
+4.132 KVM_SET_SREGS2
+--------------------
+
+:Capability: KVM_CAP_SREGS2
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_sregs2 (in)
+:Returns: 0 on success, -1 on error
+
+Writes special registers into the vcpu.
+See KVM_GET_SREGS2 for the data structures.
+This ioctl (when supported) replaces the KVM_SET_SREGS.
+
+4.133 KVM_GET_STATS_FD
+----------------------
+
+:Capability: KVM_CAP_STATS_BINARY_FD
+:Architectures: all
+:Type: vm ioctl, vcpu ioctl
+:Parameters: none
+:Returns: statistics file descriptor on success, < 0 on error
+
+Errors:
+
+ ====== ======================================================
+ ENOMEM if the fd could not be created due to lack of memory
+ EMFILE if the number of opened files exceeds the limit
+ ====== ======================================================
+
+The returned file descriptor can be used to read VM/vCPU statistics data in
+binary format. The data in the file descriptor consists of four blocks
+organized as follows:
+
++-------------+
+| Header |
++-------------+
+| id string |
++-------------+
+| Descriptors |
++-------------+
+| Stats Data |
++-------------+
+
+Apart from the header starting at offset 0, please be aware that it is
+not guaranteed that the four blocks are adjacent or in the above order;
+the offsets of the id, descriptors and data blocks are found in the
+header. However, all four blocks are aligned to 64 bit offsets in the
+file and they do not overlap.
+
+All blocks except the data block are immutable. Userspace can read them
+only one time after retrieving the file descriptor, and then use ``pread`` or
+``lseek`` to read the statistics repeatedly.
+
+All data is in system endianness.
+
+The format of the header is as follows::
+
+ struct kvm_stats_header {
+ __u32 flags;
+ __u32 name_size;
+ __u32 num_desc;
+ __u32 id_offset;
+ __u32 desc_offset;
+ __u32 data_offset;
+ };
+
+The ``flags`` field is not used at the moment. It is always read as 0.
+
+The ``name_size`` field is the size (in byte) of the statistics name string
+(including trailing '\0') which is contained in the "id string" block and
+appended at the end of every descriptor.
+
+The ``num_desc`` field is the number of descriptors that are included in the
+descriptor block. (The actual number of values in the data block may be
+larger, since each descriptor may comprise more than one value).
+
+The ``id_offset`` field is the offset of the id string from the start of the
+file indicated by the file descriptor. It is a multiple of 8.
+
+The ``desc_offset`` field is the offset of the Descriptors block from the start
+of the file indicated by the file descriptor. It is a multiple of 8.
+
+The ``data_offset`` field is the offset of the Stats Data block from the start
+of the file indicated by the file descriptor. It is a multiple of 8.
+
+The id string block contains a string which identifies the file descriptor on
+which KVM_GET_STATS_FD was invoked. The size of the block, including the
+trailing ``'\0'``, is indicated by the ``name_size`` field in the header.
+
+The descriptors block is only needed to be read once for the lifetime of the
+file descriptor contains a sequence of ``struct kvm_stats_desc``, each followed
+by a string of size ``name_size``.
+::
+
+ #define KVM_STATS_TYPE_SHIFT 0
+ #define KVM_STATS_TYPE_MASK (0xF << KVM_STATS_TYPE_SHIFT)
+ #define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT)
+ #define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT)
+ #define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT)
+ #define KVM_STATS_TYPE_LINEAR_HIST (0x3 << KVM_STATS_TYPE_SHIFT)
+ #define KVM_STATS_TYPE_LOG_HIST (0x4 << KVM_STATS_TYPE_SHIFT)
+ #define KVM_STATS_TYPE_MAX KVM_STATS_TYPE_LOG_HIST
+
+ #define KVM_STATS_UNIT_SHIFT 4
+ #define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT)
+ #define KVM_STATS_UNIT_NONE (0x0 << KVM_STATS_UNIT_SHIFT)
+ #define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT)
+ #define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT)
+ #define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT)
+ #define KVM_STATS_UNIT_MAX KVM_STATS_UNIT_CYCLES
+
+ #define KVM_STATS_BASE_SHIFT 8
+ #define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT)
+ #define KVM_STATS_BASE_POW10 (0x0 << KVM_STATS_BASE_SHIFT)
+ #define KVM_STATS_BASE_POW2 (0x1 << KVM_STATS_BASE_SHIFT)
+ #define KVM_STATS_BASE_MAX KVM_STATS_BASE_POW2
+
+ struct kvm_stats_desc {
+ __u32 flags;
+ __s16 exponent;
+ __u16 size;
+ __u32 offset;
+ __u32 bucket_size;
+ char name[];
+ };
+
+The ``flags`` field contains the type and unit of the statistics data described
+by this descriptor. Its endianness is CPU native.
+The following flags are supported:
+
+Bits 0-3 of ``flags`` encode the type:
+
+ * ``KVM_STATS_TYPE_CUMULATIVE``
+ The statistics reports a cumulative count. The value of data can only be increased.
+ Most of the counters used in KVM are of this type.
+ The corresponding ``size`` field for this type is always 1.
+ All cumulative statistics data are read/write.
+ * ``KVM_STATS_TYPE_INSTANT``
+ The statistics reports an instantaneous value. Its value can be increased or
+ decreased. This type is usually used as a measurement of some resources,
+ like the number of dirty pages, the number of large pages, etc.
+ All instant statistics are read only.
+ The corresponding ``size`` field for this type is always 1.
+ * ``KVM_STATS_TYPE_PEAK``
+ The statistics data reports a peak value, for example the maximum number
+ of items in a hash table bucket, the longest time waited and so on.
+ The value of data can only be increased.
+ The corresponding ``size`` field for this type is always 1.
+ * ``KVM_STATS_TYPE_LINEAR_HIST``
+ The statistic is reported as a linear histogram. The number of
+ buckets is specified by the ``size`` field. The size of buckets is specified
+ by the ``hist_param`` field. The range of the Nth bucket (1 <= N < ``size``)
+ is [``hist_param``*(N-1), ``hist_param``*N), while the range of the last
+ bucket is [``hist_param``*(``size``-1), +INF). (+INF means positive infinity
+ value.) The bucket value indicates how many samples fell in the bucket's range.
+ * ``KVM_STATS_TYPE_LOG_HIST``
+ The statistic is reported as a logarithmic histogram. The number of
+ buckets is specified by the ``size`` field. The range of the first bucket is
+ [0, 1), while the range of the last bucket is [pow(2, ``size``-2), +INF).
+ Otherwise, The Nth bucket (1 < N < ``size``) covers
+ [pow(2, N-2), pow(2, N-1)). The bucket value indicates how many samples fell
+ in the bucket's range.
+
+Bits 4-7 of ``flags`` encode the unit:
+
+ * ``KVM_STATS_UNIT_NONE``
+ There is no unit for the value of statistics data. This usually means that
+ the value is a simple counter of an event.
+ * ``KVM_STATS_UNIT_BYTES``
+ It indicates that the statistics data is used to measure memory size, in the
+ unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
+ determined by the ``exponent`` field in the descriptor.
+ * ``KVM_STATS_UNIT_SECONDS``
+ It indicates that the statistics data is used to measure time or latency.
+ * ``KVM_STATS_UNIT_CYCLES``
+ It indicates that the statistics data is used to measure CPU clock cycles.
+
+Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the
+unit:
+
+ * ``KVM_STATS_BASE_POW10``
+ The scale is based on power of 10. It is used for measurement of time and
+ CPU clock cycles. For example, an exponent of -9 can be used with
+ ``KVM_STATS_UNIT_SECONDS`` to express that the unit is nanoseconds.
+ * ``KVM_STATS_BASE_POW2``
+ The scale is based on power of 2. It is used for measurement of memory size.
+ For example, an exponent of 20 can be used with ``KVM_STATS_UNIT_BYTES`` to
+ express that the unit is MiB.
+
+The ``size`` field is the number of values of this statistics data. Its
+value is usually 1 for most of simple statistics. 1 means it contains an
+unsigned 64bit data.
+
+The ``offset`` field is the offset from the start of Data Block to the start of
+the corresponding statistics data.
+
+The ``bucket_size`` field is used as a parameter for histogram statistics data.
+It is only used by linear histogram statistics data, specifying the size of a
+bucket.
+
+The ``name`` field is the name string of the statistics data. The name string
+starts at the end of ``struct kvm_stats_desc``. The maximum length including
+the trailing ``'\0'``, is indicated by ``name_size`` in the header.
+
+The Stats Data block contains an array of 64-bit values in the same order
+as the descriptors in Descriptors block.
+
5. The kvm_run structure
========================
@@ -6323,6 +6606,7 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
This capability can be used to check / enable 2nd DAWR feature provided
by POWER10 processor.
+
7.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM
-------------------------------------
@@ -6360,7 +6644,67 @@ system fingerprint. To prevent userspace from circumventing such restrictions
by running an enclave in a VM, KVM prevents access to privileged attributes by
default.
-See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
+See Documentation/x86/sgx.rst for more details.
+
+7.26 KVM_CAP_PPC_RPT_INVALIDATE
+-------------------------------
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is enabled for hypervisors on platforms like POWER9
+that support radix MMU.
+
+7.27 KVM_CAP_EXIT_ON_EMULATION_FAILURE
+--------------------------------------
+
+:Architectures: x86
+:Parameters: args[0] whether the feature should be enabled or not
+
+When this capability is enabled, an emulation failure will result in an exit
+to userspace with KVM_INTERNAL_ERROR (except when the emulator was invoked
+to handle a VMware backdoor instruction). Furthermore, KVM will now provide up
+to 15 instruction bytes for any exit to userspace resulting from an emulation
+failure. When these exits to userspace occur use the emulation_failure struct
+instead of the internal struct. They both have the same layout, but the
+emulation_failure struct matches the content better. It also explicitly
+defines the 'flags' field which is used to describe the fields in the struct
+that are valid (ie: if KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is
+set in the 'flags' field then both 'insn_size' and 'insn_bytes' have valid data
+in them.)
+
+7.28 KVM_CAP_ARM_MTE
+--------------------
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before creating any VCPUs to allow the guest access. Note that MTE is only
+available to a guest running in AArch64 mode and enabling this capability will
+cause attempts to create AArch32 VCPUs to fail.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the tags are maintained during swap or
+hibernation of the host; however the VMM needs to manually save/restore the
+tags as appropriate if the VM is migrated.
+
+When this capability is enabled all memory in memslots must be mapped as
+not-shareable (no MAP_SHARED), attempts to create a memslot with a
+MAP_SHARED mmap will result in an -EINVAL return.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
8. Other capabilities.
======================
@@ -6729,7 +7073,7 @@ In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
trap and emulate MSRs that are outside of the scope of KVM as well as
limit the attack surface on KVM's MSR emulation code.
-8.28 KVM_CAP_ENFORCE_PV_CPUID
+8.28 KVM_CAP_ENFORCE_PV_FEATURE_CPUID
-----------------------------
Architectures: x86
@@ -6891,3 +7235,33 @@ This capability is always enabled.
This capability indicates that the KVM virtual PTP service is
supported in the host. A VMM can check whether the service is
available to the guest on migration.
+
+8.33 KVM_CAP_HYPERV_ENFORCE_CPUID
+---------------------------------
+
+Architectures: x86
+
+When enabled, KVM will disable emulated Hyper-V features provided to the
+guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all
+currently implmented Hyper-V features are provided unconditionally when
+Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001)
+leaf.
+
+8.34 KVM_CAP_EXIT_HYPERCALL
+---------------------------
+
+:Capability: KVM_CAP_EXIT_HYPERCALL
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to exit to userspace
+with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
+
+Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
+of hypercalls that can be configured to exit to userspace.
+Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
+
+The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
+of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace
+the hypercalls whose corresponding bit is in the argument, and return
+ENOSYS for the others.