summaryrefslogtreecommitdiff
path: root/include/uapi/linux/virtio_mem.h
AgeCommit message (Collapse)AuthorFilesLines
2024-05-22virtio-mem: support suspend+resumeDavid Hildenbrand1-0/+2
With virtio-mem, primarily hibernation is problematic: as the machine shuts down, the virtio-mem device loses its state. Powering the machine back up is like losing a bunch of DIMMs. While there would be ways to add limited support, suspend+resume is more commonly used for VMs and "easier" to support cleanly. s2idle can be supported without any device dependencies. Similarly, one would expect suspend-to-ram (i.e., S3) to work out of the box. However, QEMU currently unplugs all device memory when resuming the VM, using a cold reset on the "wakeup" path. In order to support S3, we need a feature flag for the device to tell us if memory remains plugged when waking up. In the future, QEMU will implement this feature. So let's always support s2idle and support S3 with plugged memory only if the device indicates support. Block hibernation early using the PM notifier. Trying to hibernate now fails early: # echo disk > /sys/power/state [ 26.455369] PM: hibernation: hibernation entry [ 26.458271] virtio_mem virtio0: hibernation is not supported. [ 26.462498] PM: hibernation: hibernation exit -bash: echo: write error: Operation not permitted s2idle works even without the new feature bit: # echo s2idle > /sys/power/mem_sleep # echo mem > /sys/power/state [ 52.083725] PM: suspend entry (s2idle) [ 52.095950] Filesystems sync: 0.010 seconds [ 52.101493] Freezing user space processes [ 52.104213] Freezing user space processes completed (elapsed 0.001 seconds) [ 52.106520] OOM killer disabled. [ 52.107655] Freezing remaining freezable tasks [ 52.110880] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 52.113296] printk: Suspending console(s) (use no_console_suspend to debug) S3 does not work without the feature bit when memory is plugged: # echo deep > /sys/power/mem_sleep # echo mem > /sys/power/state [ 32.788281] PM: suspend entry (deep) [ 32.816630] Filesystems sync: 0.027 seconds [ 32.820029] Freezing user space processes [ 32.823870] Freezing user space processes completed (elapsed 0.001 seconds) [ 32.827756] OOM killer disabled. [ 32.829608] Freezing remaining freezable tasks [ 32.833842] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 32.837953] printk: Suspending console(s) (use no_console_suspend to debug) [ 32.916172] virtio_mem virtio0: suspend+resume with plugged memory is not supported [ 32.916181] virtio-pci 0000:00:02.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -1 [ 32.916197] virtio-pci 0000:00:02.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -1 [ 32.916210] virtio-pci 0000:00:02.0: PM: failed to suspend async: error -1 But S3 works with the new feature bit when memory is plugged (patched QEMU): # echo deep > /sys/power/mem_sleep # echo mem > /sys/power/state [ 33.983694] PM: suspend entry (deep) [ 34.009828] Filesystems sync: 0.024 seconds [ 34.013589] Freezing user space processes [ 34.016722] Freezing user space processes completed (elapsed 0.001 seconds) [ 34.019092] OOM killer disabled. [ 34.020291] Freezing remaining freezable tasks [ 34.023549] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 34.026090] printk: Suspending console(s) (use no_console_suspend to debug) Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240318120645.105664-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-10virtio-mem: support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLEDavid Hildenbrand1-3/+6
The initial virtio-mem spec states that while unplugged memory should not be read, the device still has to allow for reading unplugged memory inside the usable region. The primary motivation for this default handling was to simplify bringup of virtio-mem, because there were corner cases where Linux might have accidentially read unplugged memory inside added Linux memory blocks. In the meantime, we: 1. Removed /dev/kmem in commit bbcd53c96071 ("drivers/char: remove /dev/kmem for good") 2. Disallowed access to virtio-mem device memory via /dev/mem in commit 2128f4e21aa2 ("virtio-mem: disallow mapping virtio-mem memory via /dev/mem") 3. Sanitized access to virtio-mem device memory via /proc/kcore in commit 0daa322b8ff9 ("fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages") 4. Sanitized access to virtio-mem device memory via /proc/vmcore in commit ce2814622e84 ("virtio-mem: kdump mode to sanitize /proc/vmcore access") "Accidential" access to unplugged memory is no longer possible; we can support the new VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature that will be required by some hypervisors implementing virtio-mem in the near future. Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: Hui Zhu <teawater@gmail.com> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2020-08-05virtio_mem: correct tags for config space fieldsMichael S. Tsirkin1-7/+7
Since this is a modern-only device, tag config space fields as having little endian-ness. TODO: check other uses of __virtioXX types in this header, should probably be __leXX. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2020-06-09virtio_mem: convert device block size into 64bitMichael S. Tsirkin1-2/+2
If subblock size is large (e.g. 1G) 32 bit math involving it can overflow. Rather than try to catch all instances of that, let's tweak block size to 64 bit. It ripples through UAPI which is an ABI change, but it's not too late to make it, and it will allow supporting >4Gbyte blocks while might become necessary down the road. Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
2020-06-04virtio-mem: Don't rely on implicit compiler padding for requestsDavid Hildenbrand1-0/+3
The compiler will add padding after the last member, make that explicit. The size of a request is always 24 bytes. The size of a response always 10 bytes. Add compile-time checks. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200515101402.16597-1-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Allow to specify an ACPI PXM as nidDavid Hildenbrand1-1/+9
We want to allow to specify (similar as for a DIMM), to which node a virtio-mem device (and, therefore, its memory) belongs. Add a new virtio-mem feature flag and export pxm_to_node, so it can be used in kernel module context. Acked-by: Michal Hocko <mhocko@suse.com> # for the export Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotplugDavid Hildenbrand1-0/+200
Each virtio-mem device owns exactly one memory region. It is responsible for adding/removing memory from that memory region on request. When the device driver starts up, the requested amount of memory is queried and then plugged to Linux. On request, further memory can be plugged or unplugged. This patch only implements the plugging part. On x86-64, memory can currently be plugged in 4MB ("subblock") granularity. When required, a new memory block will be added (e.g., usually 128MB on x86-64) in order to plug more subblocks. Only x86-64 was tested for now. The online_page callback is used to keep unplugged subblocks offline when onlining memory - similar to the Hyper-V balloon driver. Unplugged pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to skip them. User space is usually responsible for onlining the added memory. The memory hotplug notifier is used to synchronize virtio-mem activity against memory onlining/offlining. Each virtio-mem device can belong to a NUMA node, which allows us to easily add/remove small chunks of memory to/from a specific NUMA node by using multiple virtio-mem devices. Something that works even when the guest has no idea about the NUMA topology. One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many "sub-DIMMS". This patch directly introduces the basic infrastructure to implement memory unplug. Especially the memory block states and subblock bitmaps will be heavily used there. Notes: - In case memory is to be onlined by user space, we limit the amount of offline memory blocks, to not run out of memory. This is esp. an issue if memory is added faster than it is getting onlined. - Suspend/Hibernate is not supported due to the way virtio-mem devices behave. Limited support might be possible in the future. - Reloading the device driver is not supported. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>