summaryrefslogtreecommitdiff
path: root/drivers/virtio/virtio_mem.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-19virtio-mem: generalize virtio_mem_overlaps_range()David Hildenbrand1-7/+3
Avoid using memory block ids. While at it, use uint64_t for address/size. This is a preparation for Big Block Mode (BBM). Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-14-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: generalize virtio_mem_owned_mb()David Hildenbrand1-4/+5
Avoid using memory block ids. Rename it to virtio_mem_contains_range(). This is a preparation for Big Block Mode (BBM). Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-13-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: generalize check for added memoryDavid Hildenbrand1-4/+15
Let's check by traversing busy system RAM resources instead, to avoid relying on memory block states. Don't use walk_system_ram_range(), as that works on pages and we want to use the bare addresses we have easily at hand. This is a preparation for Big Block Mode (BBM), which won't have memory block states. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-12-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLEDavid Hildenbrand1-11/+26
ZONE_MOVABLE is supposed to give some guarantees, yet, alloc_contig_range() isn't prepared to properly deal with some racy cases properly (e.g., temporary page pinning when exiting processed, PCP). Retry 5 times for now. There is certainly room for improvement in the future. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-11-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: factor out handling of fake-offline pages in memory notifierDavid Hildenbrand1-23/+50
Let's factor out the core pieces and place the implementation next to virtio_mem_fake_offline(). We'll reuse this functionality soon. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-10-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()David Hildenbrand1-10/+24
... which now matches virtio_mem_fake_online(). We'll reuse this functionality soon. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-9-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: print debug messages from virtio_mem_send_*_request()David Hildenbrand1-15/+35
Let's move the existing dev_dbg() into the functions, print if something went wrong, and also print for virtio_mem_send_unplug_all_request(). Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-8-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: factor out calculation of the bit number within the subblock bitmapDavid Hildenbrand1-5/+15
The calculation is already complicated enough, let's limit it to one location. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-7-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: use "unsigned long" for nr_pages when fake onlining/offliningDavid Hildenbrand1-4/+4
No harm done, but let's be consistent. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-6-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()David Hildenbrand1-3/+2
We can drop rc2, we don't actually need the value. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-5-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handlingDavid Hildenbrand1-16/+19
Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where possible to simplify. Add a comment why we have that restriction for now. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-4-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb()David Hildenbrand1-4/+2
We actually need one byte less (next_mb_id is exclusive, first_mb_id is inclusive). While at it, compact the code. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-19virtio-mem: determine nid only once using memory_add_physaddr_to_nid()David Hildenbrand1-17/+11
Let's determine the target nid only once in case we have none specified - usually, we'll end up with node 0 either way. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-10-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+1
Pull virtio updates from Michael Tsirkin: "vhost, vdpa, and virtio cleanups and fixes A very quiet cycle, no new features" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: MAINTAINERS: add URL for virtio-mem vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call vringh: fix __vringh_iov() when riov and wiov are different vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK s390: virtio: PV needs VIRTIO I/O device protection virtio: let arch advertise guest's memory access restrictions vhost_vdpa: Fix duplicate included kernel.h vhost: reduce stack usage in log_used virtio-mem: Constify mem_id_table virtio_input: Constify id_table virtio-balloon: Constify id_table vdpa/mlx5: Fix failure to bring link up vdpa/mlx5: Make use of a specific 16 bit endianness API
2020-10-21virtio-mem: Constify mem_id_tableRikard Falkeborn1-1/+1
mem_id_table is not modified, so make it const to allow the compiler to put it in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Link: https://lore.kernel.org/r/20200911203509.26505-4-rikard.falkeborn@gmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
2020-10-16virtio-mem: try to merge system ram resourcesDavid Hildenbrand1-1/+2
virtio-mem adds memory in memory block granularity, to be able to remove it in the same granularity again later, and to grow slowly on demand. This, however, results in quite a lot of resources when adding a lot of memory. Resources are effectively stored in a list-based tree. Having a lot of resources not only wastes memory, it also makes traversing that tree more expensive, and makes /proc/iomem explode in size (e.g., requiring kexec-tools to manually merge resources later when e.g., trying to create a kdump header). Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem on x86-64: [...] 100000000-13fffffff : System RAM 140000000-33fffffff : virtio0 140000000-147ffffff : System RAM (virtio_mem) 148000000-14fffffff : System RAM (virtio_mem) 150000000-157ffffff : System RAM (virtio_mem) 158000000-15fffffff : System RAM (virtio_mem) 160000000-167ffffff : System RAM (virtio_mem) 168000000-16fffffff : System RAM (virtio_mem) 170000000-177ffffff : System RAM (virtio_mem) 178000000-17fffffff : System RAM (virtio_mem) 180000000-187ffffff : System RAM (virtio_mem) 188000000-18fffffff : System RAM (virtio_mem) 190000000-197ffffff : System RAM (virtio_mem) 198000000-19fffffff : System RAM (virtio_mem) 1a0000000-1a7ffffff : System RAM (virtio_mem) 1a8000000-1afffffff : System RAM (virtio_mem) 1b0000000-1b7ffffff : System RAM (virtio_mem) 1b8000000-1bfffffff : System RAM (virtio_mem) 3280000000-32ffffffff : PCI Bus 0000:00 With this patch, we get (/proc/iomem): [...] fffc0000-ffffffff : Reserved 100000000-13fffffff : System RAM 140000000-33fffffff : virtio0 140000000-1bfffffff : System RAM (virtio_mem) 3280000000-32ffffffff : PCI Bus 0000:00 Of course, with more hotplugged memory, it gets worse. When unplugging memory blocks again, try_remove_memory() (via offline_and_remove_memory()) will properly split the resource up again. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Juergen Gross <jgross@suse.com> Cc: Julien Grall <julien@xen.org> Cc: Kees Cook <keescook@chromium.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <lenb@kernel.org> Cc: Leonardo Bras <leobras.c@gmail.com> Cc: Libor Pechacek <lpechacek@suse.cz> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Liu <wei.liu@kernel.org> Link: https://lkml.kernel.org/r/20200911103459.10306-7-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16mm/memory_hotplug: prepare passing flags to add_memory() and friendsDavid Hildenbrand1-1/+1
We soon want to pass flags, e.g., to mark added System RAM resources. mergeable. Prepare for that. This patch is based on a similar patch by Oscar Salvador: https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Juergen Gross <jgross@suse.com> # Xen related part Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Libor Pechacek <lpechacek@suse.cz> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Leonardo Bras <leobras.c@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julien Grall <julien@xen.org> Cc: Kees Cook <keescook@chromium.org> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richardw.yang@linux.intel.com> Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-14virtio-mem: don't special-case ZONE_MOVABLEDavid Hildenbrand1-39/+8
When introducing virtio-mem, the semantics of ZONE_MOVABLE were rather unclear, which is why we special-cased ZONE_MOVABLE such that partially plugged blocks would never end up in ZONE_MOVABLE. Now that the semantics are much clearer (and will be documented in a follow-up patch including the new virtio-mem behavior), let's allow to online partially plugged memory blocks to ZONE_MOVABLE and also consider memory blocks that were onlined to ZONE_MOVABLE when unplugging memory. While unplugged memory pages are, in general, unmovable, they can be skipped when offlining memory. virtio-mem only unplugs fairly big chunks (in the megabyte range) and rather tries to shrink the memory region than randomly choosing memory. In theory, if all other pages in the movable zone would be movable, virtio-mem would only shrink that zone and not create any kind of fragmentation. In the future, we might want to remember the zone again and use the information when (un)plugging memory. For now, let's keep it simple. Note: Support for defragmentation is planned, to deal with fragmentation after unplug due to memory chunks within memory blocks that could not get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE for a longer time). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200816125333.7434-6-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-05virtio_mem: convert to LE accessorsMichael S. Tsirkin1-15/+15
Virtio mem is modern-only. Use LE accessors for config space. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-22virtio-mem: add memory via add_memory_driver_managed()David Hildenbrand1-3/+22
Virtio-mem managed memory is always detected and added by the virtio-mem driver, never using something like the firmware-provided memory map. This is the case after an ordinary system reboot, and has to be guaranteed after kexec. Especially, virtio-mem added memory resources can contain inaccessible parts ("unblocked memory blocks"), blindly forwarding them to a kexec kernel is dangerous, as unplugged memory will get accessed (esp. written). Let's use the new way of adding special driver-managed memory introduced in commit 7b7b27214bba ("mm/memory_hotplug: introduce add_memory_driver_managed()"). This will result in no entries in /sys/firmware/memmap ("raw firmware- provided memory map"), the memory resource will be flagged IORESOURCE_MEM_DRIVER_MANAGED (esp., kexec_file_load() will not place kexec images on this memory), and it is exposed as "System RAM (virtio_mem)" in /proc/iomem, so esp. kexec-tools can properly handle it. Example /proc/iomem before this change: [...] 140000000-333ffffff : virtio0 140000000-147ffffff : System RAM 334000000-533ffffff : virtio1 338000000-33fffffff : System RAM 340000000-347ffffff : System RAM 348000000-34fffffff : System RAM [...] Example /proc/iomem after this change: [...] 140000000-333ffffff : virtio0 140000000-147ffffff : System RAM (virtio_mem) 334000000-533ffffff : virtio1 338000000-33fffffff : System RAM (virtio_mem) 340000000-347ffffff : System RAM (virtio_mem) 348000000-34fffffff : System RAM (virtio_mem) [...] Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200611093518.5737-1-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
2020-06-22virtio-mem: silence a static checker warningDan Carpenter1-1/+1
Smatch complains that "rc" can be uninitialized if we hit the "break;" statement on the first iteration through the loop. I suspect that this can't happen in real life, but returning a zero literal is cleaner and silence the static checker warning. Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20200610085911.GC5439@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-09virtio_mem: convert device block size into 64bitMichael S. Tsirkin1-9/+9
If subblock size is large (e.g. 1G) 32 bit math involving it can overflow. Rather than try to catch all instances of that, let's tweak block size to 64 bit. It ripples through UAPI which is an ABI change, but it's not too late to make it, and it will allow supporting >4Gbyte blocks while might become necessary down the road. Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
2020-06-08virtio-mem: drop unnecessary initializationMichael S. Tsirkin1-1/+1
rc is initialized to -ENIVAL but that's never used. Drop it. Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
2020-06-04virtio-mem: Don't rely on implicit compiler padding for requestsDavid Hildenbrand1-0/+3
The compiler will add padding after the last member, make that explicit. The size of a request is always 24 bytes. The size of a response always 10 bytes. Add compile-time checks. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: teawater <teawaterz@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200515101402.16597-1-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Try to unplug the complete online memory block firstDavid Hildenbrand1-31/+57
Right now, we always try to unplug single subblocks when processing an online memory block. Let's try to unplug the complete online memory block first, in case it is fully plugged and the unplug request is large enough. Fallback to single subblocks in case the memory block cannot get unplugged as a whole. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-16-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Use -ETXTBSY as error code if the device is busyDavid Hildenbrand1-6/+10
Let's be able to distinguish if the device or if memory is busy. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-15-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Unplug subblocks right-to-leftDavid Hildenbrand1-22/+16
We unplug blocks right-to-left, let's also unplug subblocks within a block right-to-left. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-14-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Drop manual check for already present memoryDavid Hildenbrand1-43/+12
Registering our parent resource will fail if any memory is still present (e.g., because somebody unloaded the driver and tries to reload it). No need for the manual check. Move our "unplug all" handling to after registering the resource. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-13-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Add parent resource for all added "System RAM"David Hildenbrand1-1/+51
Let's add a parent resource, named after the virtio device (inspired by drivers/dax/kmem.c). This allows user space to identify which memory belongs to which virtio-mem device. With this change and two virtio-mem devices: :/# cat /proc/iomem 00000000-00000fff : Reserved 00001000-0009fbff : System RAM [...] 140000000-333ffffff : virtio0 140000000-147ffffff : System RAM 148000000-14fffffff : System RAM 150000000-157ffffff : System RAM [...] 334000000-3033ffffff : virtio1 338000000-33fffffff : System RAM 340000000-347ffffff : System RAM 348000000-34fffffff : System RAM [...] Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-12-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
2020-06-04virtio-mem: Better retry handlingDavid Hildenbrand1-3/+8
Let's start with a retry interval of 5 seconds and double the time until we reach 5 minutes, in case we keep getting errors. Reset the retry interval in case we succeeded. The two main reasons for having to retry are - The hypervisor is busy and cannot process our request - We cannot reach the desired requested_size (esp., not enough memory can get unplugged because we can't allocate any subblocks). Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-11-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Offline and remove completely unplugged memory blocksDavid Hildenbrand1-4/+43
Let's offline+remove memory blocks once all subblocks are unplugged. We can use the new Linux MM interface for that. As no memory is in use anymore, this shouldn't take a long time and shouldn't fail. There might be corner cases where the offlining could still fail (especially, if another notifier NACKs the offlining request). Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-10-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Allow to offline partially unplugged memory blocksDavid Hildenbrand1-1/+67
Dropping the reference count of PageOffline() pages during MEM_GOING_ONLINE allows offlining code to skip them. However, we also have to clear PG_reserved, because PG_reserved pages get detected as unmovable right away. Take care of restoring the reference count when offlining is canceled. Clarify why we don't have to perform any action when unloading the driver. Also, let's add a warning if anybody is still holding a reference to unplugged pages when offlining. Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-8-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotunplug part 2David Hildenbrand1-14/+143
We also want to unplug online memory (contained in online memory blocks and, therefore, managed by the buddy), and eventually replug it later. When requested to unplug memory, we use alloc_contig_range() to allocate subblocks in online memory blocks (so we are the owner) and send them to our hypervisor. When requested to plug memory, we can replug such memory using free_contig_range() after asking our hypervisor. We also want to mark all allocated pages PG_offline, so nobody will touch them. To differentiate pages that were never onlined when onlining the memory block from pages allocated via alloc_contig_range(), we use PageDirty(). Based on this flag, virtio_mem_fake_online() can either online the pages for the first time or use free_contig_range(). It is worth noting that there are no guarantees on how much memory can actually get unplugged again. All device memory might completely be fragmented with unmovable data, such that no subblock can get unplugged. We are not touching the ZONE_MOVABLE. If memory is onlined to the ZONE_MOVABLE, it can only get unplugged after that memory was offlined manually by user space. In normal operation, virtio-mem memory is suggested to be onlined to ZONE_NORMAL. In the future, we will try to make unplug more likely to succeed. Add a module parameter to control if online memory shall be touched. As we want to access alloc_contig_range()/free_contig_range() from kernel module context, export the symbols. Note: Whenever virtio-mem uses alloc_contig_range(), all affected pages are on the same node, in the same zone, and contain no holes. Acked-by: Michal Hocko <mhocko@suse.com> # to export contig range allocator API Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-6-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotunplug part 1David Hildenbrand1-2/+114
Unplugging subblocks of memory blocks that are offline is easy. All we have to do is watch out for concurrent onlining activity. Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-5-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Allow to specify an ACPI PXM as nidDavid Hildenbrand1-2/+37
We want to allow to specify (similar as for a DIMM), to which node a virtio-mem device (and, therefore, its memory) belongs. Add a new virtio-mem feature flag and export pxm_to_node, so it can be used in kernel module context. Acked-by: Michal Hocko <mhocko@suse.com> # for the export Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-06-04virtio-mem: Paravirtualized memory hotplugDavid Hildenbrand1-0/+1533
Each virtio-mem device owns exactly one memory region. It is responsible for adding/removing memory from that memory region on request. When the device driver starts up, the requested amount of memory is queried and then plugged to Linux. On request, further memory can be plugged or unplugged. This patch only implements the plugging part. On x86-64, memory can currently be plugged in 4MB ("subblock") granularity. When required, a new memory block will be added (e.g., usually 128MB on x86-64) in order to plug more subblocks. Only x86-64 was tested for now. The online_page callback is used to keep unplugged subblocks offline when onlining memory - similar to the Hyper-V balloon driver. Unplugged pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to skip them. User space is usually responsible for onlining the added memory. The memory hotplug notifier is used to synchronize virtio-mem activity against memory onlining/offlining. Each virtio-mem device can belong to a NUMA node, which allows us to easily add/remove small chunks of memory to/from a specific NUMA node by using multiple virtio-mem devices. Something that works even when the guest has no idea about the NUMA topology. One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many "sub-DIMMS". This patch directly introduces the basic infrastructure to implement memory unplug. Especially the memory block states and subblock bitmaps will be heavily used there. Notes: - In case memory is to be onlined by user space, we limit the amount of offline memory blocks, to not run out of memory. This is esp. an issue if memory is added faster than it is getting onlined. - Suspend/Hibernate is not supported due to the way virtio-mem devices behave. Limited support might be possible in the future. - Reloading the device driver is not supported. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200507140139.17083-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>