summaryrefslogtreecommitdiff
path: root/drivers/nvdimm
AgeCommit message (Collapse)AuthorFilesLines
2019-07-26libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fieldsDan Williams3-4/+17
commit 7e3e888dfc138089f4c15a81b418e88f0978f744 upstream. At namespace creation time there is the potential for the "expected to be zero" fields of a 'pfn' info-block to be filled with indeterminate data. While the kernel buffer is zeroed on allocation it is immediately overwritten by nd_pfn_validate() filling it with the current contents of the on-media info-block location. For fields like, 'flags' and the 'padding' it potentially means that future implementations can not rely on those fields being zero. In preparation to stop using the 'start_pad' and 'end_trunc' fields for section alignment, arrange for fields that are not explicitly initialized to be guaranteed zero. Bump the minor version to indicate it is safe to assume the 'padding' and 'flags' are zero. Otherwise, this corruption is expected to benign since all other critical fields are explicitly initialized. Note The cc: stable is about spreading this new policy to as many kernels as possible not fixing an issue in those kernels. It is not until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" where this improper initialization becomes a problem. So if someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces to section alignment" (which is not tagged for stable), make sure this pre-requisite is flagged. Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: <stable@vger.kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19libnvdimm: Fix compilation warnings with W=1Qian Cai3-4/+4
[ Upstream commit c01dafad77fea8d64c4fdca0a6031c980842ad65 ] Several places (dimm_devs.c, core.c etc) include label.h but only label.c uses NSINDEX_SIGNATURE, so move its definition to label.c instead. In file included from drivers/nvdimm/dimm_devs.c:23: drivers/nvdimm/label.h:41:19: warning: 'NSINDEX_SIGNATURE' defined but not used [-Wunused-const-variable=] Also, some places abuse "/**" which is only reserved for the kernel-doc. drivers/nvdimm/bus.c:648: warning: cannot understand function prototype: 'struct attribute_group nd_device_attribute_group = ' drivers/nvdimm/bus.c:677: warning: cannot understand function prototype: 'struct attribute_group nd_numa_attribute_group = ' Those are just some member assignments for the "struct attribute_group" instances and it can't be expressed in the kernel-doc. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-31libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overheadDan Williams1-2/+8
commit 52f476a323f9efc959be1c890d0cdcf12e1582e0 upstream. Jeff discovered that performance improves from ~375K iops to ~519K iops on a simple psync-write fio workload when moving the location of 'struct page' from the default PMEM location to DRAM. This result is surprising because the expectation is that 'struct page' for dax is only needed for third party references to dax mappings. For example, a dax-mapped buffer passed to another system call for direct-I/O requires 'struct page' for sending the request down the driver stack and pinning the page. There is no usage of 'struct page' for first party access to a file via read(2)/write(2) and friends. However, this "no page needed" expectation is violated by CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The check_heap_object() helper routine assumes the buffer is backed by a slab allocator (DRAM) page and applies some checks. Those checks are invalid, dax pages do not originate from the slab, and redundant, dax_iomap_actor() has already validated that the I/O is within bounds. Specifically that routine validates that the logical file offset is within bounds of the file, then it does a sector-to-pfn translation which validates that the physical mapping is within bounds of the block device. Bypass additional hardened usercopy overhead and call the 'no check' versions of the copy_{to,from}_iter operations directly. Fixes: 0aed55af8834 ("x86, uaccess: introduce copy_from_iter_flushcache...") Cc: <stable@vger.kernel.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <willy@infradead.org> Reported-and-tested-by: Jeff Smits <jeff.smits@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31dax: Arrange for dax_supported check to span multiple devicesDan Williams1-0/+1
commit 7bf7eac8d648057519adb6fce1e31458c902212c upstream. Pankaj reports that starting with commit ad428cdb525a "dax: Check the end of the block-device capacity with dax_direct_access()" device-mapper no longer allows dax operation. This results from the stricter checks in __bdev_dax_supported() that validate that the start and end of a block-device map to the same 'pagemap' instance. Teach the dax-core and device-mapper to validate the 'pagemap' on a per-target basis. This is accomplished by refactoring the bdev_dax_supported() internals into generic_fsdax_supported() which takes a sector range to validate. Consequently generic_fsdax_supported() is suitable to be used in a device-mapper ->iterate_devices() callback. A new ->dax_supported() operation is added to allow composite devices to split and route upper-level bdev_dax_supported() requests. Fixes: ad428cdb525a ("dax: Check the end of the block-device...") Cc: <stable@vger.kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22libnvdimm/namespace: Fix label tracking errorDan Williams3-13/+35
commit c4703ce11c23423d4b46e3d59aef7979814fd608 upstream. Users have reported intermittent occurrences of DIMM initialization failures due to duplicate allocations of address capacity detected in the labels, or errors of the form below, both have the same root cause. nd namespace1.4: failed to track label: 0 WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863 RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm] Call Trace: ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm] uuid_store+0x17e/0x190 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x57/0xd0 do_syscall_64+0x60/0x210 Unfortunately those reports were typically with a busy parallel namespace creation / destruction loop making it difficult to see the components of the bug. However, Jane provided a simple reproducer using the work-in-progress sub-section implementation. When ndctl is reconfiguring a namespace it may take an existing defunct / disabled namespace and reconfigure it with a new uuid and other parameters. Critically namespace_update_uuid() takes existing address resources and renames them for the new namespace to use / reconfigure as it sees fit. The bug is that this rename only happens in the resource tracking tree. Existing labels with the old uuid are not reaped leading to a scenario where multiple active labels reference the same span of address range. Teach namespace_update_uuid() to flag any references to the old uuid for reaping at the next label update attempt. Cc: <stable@vger.kernel.org> Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation") Link: https://github.com/pmem/ndctl/issues/91 Reported-by: Jane Chu <jane.chu@oracle.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Erwin Tsaur <erwin.tsaur@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-08libnvdimm/pmem: fix a possible OOB access when read and write pmemLi RongQing1-4/+4
If offset is not zero and length is bigger than PAGE_SIZE, this will cause to out of boundary access to a page memory Fixes: 98cc093cba1e ("block, THP: make block_device_operations.rw_page support THP") Co-developed-by: Liang ZhiCheng <liangzhicheng@baidu.com> Signed-off-by: Liang ZhiCheng <liangzhicheng@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-30libnvdimm/security, acpi/nfit: unify zero-key for all security commandsDave Jiang1-48/+69
With zero-key defined, we can remove previous detection of key id 0 or null key in order to deal with a zero-key situation. Syncing all security commands to use the zero-key. Helper functions are introduced to return the data that points to the actual key payload or the zero_key. This helps uniformly handle the key material even with zero_key. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-30libnvdimm/security: provide fix for secure-erase to use zero-keyDave Jiang1-5/+12
Add a zero key in order to standardize hardware that want a key of 0's to be passed. Some platforms defaults to a zero-key with security enabled rather than allow the OS to enable the security. The zero key would allow us to manage those platform as well. This also adds a fix to secure erase so it can use the zero key to do crypto erase. Some other security commands already use zero keys. This introduces a standard zero-key to allow unification of semantics cross nvdimm security commands. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-27libnvdimm/btt: Fix a kmemdup failure checkAditya Pakki1-5/+13
In case kmemdup fails, the fix releases resources and returns to avoid the NULL pointer dereference. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-23libnvdimm/namespace: Fix a potential NULL pointer dereferenceKangjie Lu1-1/+4
In case kmemdup fails, the fix goes to blk_err to avoid NULL pointer dereference. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-16Merge tag 'devdax-for-5.1' of ↵Linus Torvalds4-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull device-dax updates from Dan Williams: "New device-dax infrastructure to allow persistent memory and other "reserved" / performance differentiated memories, to be assigned to the core-mm as "System RAM". Some users want to use persistent memory as additional volatile memory. They are willing to cope with potential performance differences, for example between DRAM and 3D Xpoint, and want to use typical Linux memory management apis rather than a userspace memory allocator layered over an mmap() of a dax file. The administration model is to decide how much Persistent Memory (pmem) to use as System RAM, create a device-dax-mode namespace of that size, and then assign it to the core-mm. The rationale for device-dax is that it is a generic memory-mapping driver that can be layered over any "special purpose" memory, not just pmem. On subsequent boots udev rules can be used to restore the memory assignment. One implication of using pmem as RAM is that mlock() no longer keeps data off persistent media. For this reason it is recommended to enable NVDIMM Security (previously merged for 5.0) to encrypt pmem contents at rest. We considered making this recommendation an actively enforced requirement, but in the end decided to leave it as a distribution / administrator policy to allow for emulation and test environments that lack security capable NVDIMMs. Summary: - Replace the /sys/class/dax device model with /sys/bus/dax, and include a compat driver so distributions can opt-in to the new ABI. - Allow for an alternative driver for the device-dax address-range - Introduce the 'kmem' driver to hotplug / assign a device-dax address-range to the core-mm. - Arrange for the device-dax target-node to be onlined so that the newly added memory range can be uniquely referenced by numa apis" NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because we currently have special - and very annoying rules in the kernel about accessing PMEM only with the "MC safe" accessors, because machine checks inside the regular repeat string copy functions can be fatal in some (not described) circumstances. And apparently the PMEM modules can cause that a lot more than regular RAM. The argument is that this happens because PMEM doesn't necessarily get scrubbed at boot like RAM does, but that is planned to be added for the user space tooling. Quoting Dan from another email: "The exposure can be reduced in the volatile-RAM case by scanning for and clearing errors before it is onlined as RAM. The userspace tooling for that can be in place before v5.1-final. There's also runtime notifications of errors via acpi_nfit_uc_error_notify() from background scrubbers on the DIMM devices. With that mechanism the kernel could proactively clear newly discovered poison in the volatile case, but that would be additional development more suitable for v5.2. I understand the concern, and the need to highlight this issue by tapping the brakes on feature development, but I don't see PMEM as RAM making the situation worse when the exposure is also there via DAX in the PMEM case. Volatile-RAM is arguably a safer use case since it's possible to repair pages where the persistent case needs active application coordination" * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: "Hotplug" persistent memory for use like normal RAM mm/resource: Let walk_system_ram_range() search child resources mm/memory-hotplug: Allow memory resources to be children mm/resource: Move HMM pr_debug() deeper into resource code mm/resource: Return real error codes from walk failures device-dax: Add a 'modalias' attribute to DAX 'bus' devices device-dax: Add a 'target_node' attribute device-dax: Auto-bind device after successful new_id acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node device-dax: Add /sys/class/dax backwards compatibility device-dax: Add support for a dax override driver device-dax: Move resource pinning+mapping into the common driver device-dax: Introduce bus + driver model device-dax: Start defining a dax bus model device-dax: Remove multi-resource infrastructure device-dax: Kill dax_region base device-dax: Kill dax_region ida
2019-03-13Merge tag 'libnvdimm-for-5.1' of ↵Linus Torvalds9-32/+94
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has been in -next since before the merge window opened, with no known collisions / issues reported. The only detail worth noting, outside the summary below, is that the "libnvdimm-start-pad" topic has been truncated to just cleanups and small fixes. The full topic branch would have doubled down on hacks around the "section alignment" limitation of the core-mm, instead effort is now being spent to address that root issue in the memory hotplug implementation for v5.2. - Fix nfit-bus command submission regression - Support retrieval of short-ARS results if the ARS state is "requires continuation", and even if the "no_init_ars" module parameter is specified - Allow busy-polling of the kernel ARS state by allowing root to reset the exponential back-off timer - Filter potentially stale ARS results by tracking query-ARS relative to the previous start-ARS - Enhance dax_device alignment checks - Add support for the Hyper-V family of device-specific-methods (DSMs) - Add several fixes and workarounds for Hyper-V compatibility - Fix support to cache the dirty-shutdown-count at init" * tag 'libnvdimm-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (25 commits) libnvdimm/namespace: Clean up holder_class_store() libnvdimm/of_pmem: Fix platform_no_drv_owner.cocci warnings acpi/nfit: Update NFIT flags error message libnvdimm/btt: Fix LBA masking during 'free list' population libnvdimm/btt: Remove unnecessary code in btt_freelist_init libnvdimm/pfn: Remove dax_label_reserve dax: Check the end of the block-device capacity with dax_direct_access() nfit/ars: Avoid stale ARS results nfit/ars: Allow root to busy-poll the ARS state machine nfit/ars: Introduce scrub_flags nfit/ars: Remove ars_start_flags nfit/ars: Attempt short-ARS even in the no_init_ars case nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot acpi/nfit: Require opt-in for read-only label configurations libnvdimm/pmem: Honor force_raw for legacy pmem regions libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init() libnvdimm: Fix altmap reservation size calculation libnvdimm, pfn: Fix over-trim in trim_pfn_device() acpi/nfit: Fix bus command validation libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family ...
2019-03-11Merge branch 'for-5.1/libnvdimm-start-pad' into libnvdimm-for-nextDan Williams2-10/+18
Merge the initial lead-in cleanups and fixes that resulted from the effort to resolve bugs in the section-alignment padding implementation in the nvdimm core. The back half of this topic is abandoned in favor of implementing sub-section hotplug support.
2019-03-11Merge branch 'for-5.1/libnvdimm' into libnvdimm-for-nextDan Williams8-22/+76
Merge miscellaneous libnvdimm sub-system updates for v5.1. Highlights include: * Support for the Hyper-V family of device-specific-methods (DSMs) * Several fixes and workarounds for Hyper-V compatibility. * Fix for the support to cache the dirty-shutdown-count at init.
2019-03-04libnvdimm/namespace: Clean up holder_class_store()Dan Williams1-4/+4
Use sysfs_streq() in place of open-coded strcmp()'s that check for an optional "\n" at the end of the input. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-03-02libnvdimm/of_pmem: Fix platform_no_drv_owner.cocci warningsYueHaibing1-1/+0
Remove .owner field if calls are used which set it automatically Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28libnvdimm/btt: Fix LBA masking during 'free list' populationVishal Verma3-6/+29
The Linux BTT implementation assumes that log entries will never have the 'zero' flag set, and indeed it never sets that flag for log entries itself. However, the UEFI spec is ambiguous on the exact format of the LBA field of a log entry, specifically as to whether it should include the additional flag bits or not. While a zero bit doesn't make sense in the context of a log entry, other BTT implementations might still have it set. If an implementation does happen to have it set, we would happily read it in as the next block to write to for writes. Since a high bit is set, it pushes the block number out of the range of an 'arena', and we fail such a write with an EIO. Follow the robustness principle, and tolerate such implementations by stripping out the zero flag when populating the free list during initialization. Additionally, use the same stripped out entries for detection of incomplete writes and map restoration that happens at this stage. Add a sysfs file 'log_zero_flags' that indicates the ability to accept such a layout to userspace applications. This enables 'ndctl check-namespace' to recognize whether the kernel is able to handle zero flags, or whether it should attempt a fix-up under the --repair option. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Dexuan Cui <decui@microsoft.com> Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28libnvdimm/btt: Remove unnecessary code in btt_freelist_initVishal Verma1-6/+2
We call btt_log_read() twice, once to get the 'old' log entry, and again to get the 'new' entry. However, we have no use for the 'old' entry, so remove it. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-22libnvdimm/pfn: Remove dax_label_reserveDan Williams1-4/+2
The reserve was for an abandoned effort to add label (partitioning support) to device-dax instances. Remove it. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm/pmem: Honor force_raw for legacy pmem regionsDan Williams1-0/+4
For recovery, where non-dax access is needed to a given physical address range, and testing, allow the 'force_raw' attribute to override the default establishment of a dev_pagemap. Otherwise without this capability it is possible to end up with a namespace that can not be activated due to corrupted info-block, and one that can not be repaired due to a section collision. Cc: <stable@vger.kernel.org> Fixes: 004f1afbe199 ("libnvdimm, pmem: direct map legacy pmem by default") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init()Dan Williams1-7/+13
Similar to "libnvdimm: Fix altmap reservation size calculation" provide for a reservation of a full page worth of info block space at info-block establishment time. Typically there is already slack in the padding from honoring the default 2MB alignment, but provide for a reservation for corner case configurations that would otherwise fit. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm: Fix altmap reservation size calculationOliver O'Halloran1-1/+1
Libnvdimm reserves the first 8K of pfn and devicedax namespaces to store a superblock describing the namespace. This 8K reservation is contained within the altmap area which the kernel uses for the vmemmap backing for the pages within the namespace. The altmap allows for some pages at the start of the altmap area to be reserved and that mechanism is used to protect the superblock from being re-used as vmemmap backing. The number of PFNs to reserve is calculated using: PHYS_PFN(SZ_8K) Which is implemented as: #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT)) So on systems where PAGE_SIZE is greater than 8K the reservation size is truncated to zero and the superblock area is re-used as vmemmap backing. As a result all the namespace information stored in the superblock (i.e. if it's a PFN or DAX namespace) is lost and the namespace needs to be re-created to get access to the contents. This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure that at least one page is reserved. On systems with a 4K pages size this patch should have no effect. Cc: stable@vger.kernel.org Cc: Dan Williams <dan.j.williams@intel.com> Fixes: ac515c084be9 ("libnvdimm, pmem, pfn: move pfn setup to the core") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm, pfn: Fix over-trim in trim_pfn_device()Wei Yang1-1/+1
When trying to see whether current nd_region intersects with others, trim_pfn_device() has already calculated the *size* to be expanded to SECTION size. Do not double append 'adjust' to 'size' when calculating whether the end of a region collides with the next pmem region. Fixes: ae86cbfef381 "libnvdimm, pfn: Pad pfn namespaces relative to other regions" Cc: <stable@vger.kernel.org> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-11Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman4-7/+26
We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-03libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM familyDan Williams4-0/+23
As Dexuan reports the NVDIMM_FAMILY_HYPERV platform is incompatible with the existing Linux namespace implementation because it uses NSLABEL_FLAG_LOCAL for x1-width PMEM interleave sets. Quirk it as an platform / DIMM that does not provide BLK-aperture access. Allow the libnvdimm core to assume no potential for aliasing. In case other implementations make the same mistake, provide a "noblk" module parameter to force-enable the quirk. Link: https://lkml.kernel.org/r/PU1P153MB0169977604493B82B662A01CBF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-31libnvdimm: Schedule device registration on node local to the deviceAlexander Duyck1-3/+8
Force the device registration for nvdimm devices to be closer to the actual device. This is achieved by using either the NUMA node ID of the region, or of the parent. By doing this we can have everything above the region based on the region, and everything below the region based on the nvdimm bus. By guaranteeing NUMA locality I see an improvement of as high as 25% for per-node init of a system with 12TB of persistent memory. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-21libnvdimm/security: Require nvdimm_security_setup_events() to succeedDan Williams3-5/+24
The following warning: ACPI0012:00: security event setup failed: -19 ...is meant to capture exceptional failures of sysfs_get_dirent(), however it will also fail in the common case when security support is disabled. A few issues: 1/ A dev_warn() report for a common case is too chatty 2/ The setup of this notifier is generic, no need for it to be driven from the nfit driver, it can exist completely in the core. 3/ If it fails for any reason besides security support being disabled, that's fatal and should abort DIMM activation. Userspace may hang if it never gets overwrite notifications. 4/ The dirent needs to be released. Move the call to the core 'dimm' driver, make it conditional on security support being active, make it fatal for the exceptional case, add the missing sysfs_put() at device disable time. Fixes: 7d988097c546 ("...Add security DSM overwrite support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-16libnvdimm/security: Fix nvdimm_security_state() state request selectionDave Jiang1-2/+2
The input parameter should be enum nvdimm_passphrase_type instead of bool for selection of master/user for selection of extended master passphrase state or the regular user passphrase state. Fixes: 89fa9d8ea7bdf ("...add Intel DSM 1.8 master passphrase support") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-15libnvdimm/label: Clear 'updating' flag after label-set updateDan Williams1-5/+18
The UEFI 2.7 specification sets expectations that the 'updating' flag is eventually cleared. To date, the libnvdimm core has never adhered to that protocol. The policy of the core matches the policy of other multi-device info-block formats like MD-Software-RAID that expect administrator intervention on inconsistent info-blocks, not automatic invalidation. However, some pre-boot environments may unfortunately attempt to "clean up" the labels and invalidate a set when it fails to find at least one "non-updating" label in the set. Clear the updating flag after set updates to minimize the window of vulnerability to aggressive pre-boot environments. Ideally implementations would not write to the label area outside of creating namespaces. Note that this only minimizes the window, it does not close it as the system can still crash while clearing the flag and the set can be subsequently deleted / invalidated by the pre-boot environment. Fixes: f524bf271a5c ("libnvdimm: write pmem label set") Cc: <stable@vger.kernel.org> Cc: Kelly Couch <kelly.j.couch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-07acpi/nfit, device-dax: Identify differentiated memory with a unique numa-nodeDan Williams4-1/+4
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-8/+5
Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ...
2018-12-29Merge tag 'libnvdimm-for-4.21' of ↵Linus Torvalds11-18/+781
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The vast bulk of this update is the new support for the security capabilities of some nvdimms. The userspace tooling for this capability is still a work in progress, but the changes survive the existing libnvdimm unit tests. The changes also pass manual checkout on hardware and the new nfit_test emulation of the security capability. The touches of the security/keys/ files have received the necessary acks from Mimi and David. Those changes were necessary to allow for a new generic encrypted-key type, and allow the nvdimm sub-system to lookup key material referenced by the libnvdimm-sysfs interface. Summary: - Add support for the security features of nvdimm devices that implement a security model similar to ATA hard drive security. The security model supports locking access to the media at device-power-loss, to be unlocked with a passphrase, and secure-erase (crypto-scramble). Unlike the ATA security case where the kernel expects device security to be managed in a pre-OS environment, the libnvdimm security implementation allows key provisioning and key-operations at OS runtime. Keys are managed with the kernel's encrypted-keys facility to provide data-at-rest security for the libnvdimm key material. The usage model mirrors fscrypt key management, but is driven via libnvdimm sysfs. - Miscellaneous updates for api usage and comment fixes" * tag 'libnvdimm-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) libnvdimm/security: Quiet security operations libnvdimm/security: Add documentation for nvdimm security support tools/testing/nvdimm: add Intel DSM 1.8 support for nfit_test tools/testing/nvdimm: Add overwrite support for nfit_test tools/testing/nvdimm: Add test support for Intel nvdimm security DSMs acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase support acpi/nfit, libnvdimm/security: Add security DSM overwrite support acpi/nfit, libnvdimm: Add support for issue secure erase DSM to Intel nvdimm acpi/nfit, libnvdimm: Add enable/update passphrase support for Intel nvdimms acpi/nfit, libnvdimm: Add disable passphrase support to Intel nvdimm. acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs acpi/nfit, libnvdimm: Add freeze security support to Intel nvdimm acpi/nfit, libnvdimm: Introduce nvdimm_security_ops keys-encrypted: add nvdimm key format type to encrypted keys keys: Export lookup_user_key to external users acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm libnvdimm, namespace: Replace kmemdup() with kstrndup() libnvdimm, label: Switch to bitmap_zalloc() ACPI/nfit: Adjust annotation for why return 0 if fail to find NFIT at start libnvdimm, bus: Check id immediately following ida_simple_get ...
2018-12-28mm, devm_memremap_pages: fix shutdown handlingDan Williams1-8/+5
The last step before devm_memremap_pages() returns success is to allocate a release action, devm_memremap_pages_release(), to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error from devm_add_action() is not enough. The api currently relies on the fact that the percpu_ref it is using is killed by the time the devm_memremap_pages_release() is run. Rather than continue this awkward situation, offload the responsibility of killing the percpu_ref to devm_memremap_pages_release() directly. This allows devm_memremap_pages() to do the right thing relative to init failures and shutdown. Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range. An argument could be made to require that the ->kill() operation be set in the @pgmap arg rather than passed in separately. However, it helps code readability, tracking the lifetime of a given instance, to be able to grep the kill routine directly at the devm_memremap_pages() call site. Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com> Reported-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28Merge miscellaneous libnvdimm updates for 4.21Dan Williams6-35/+86
* Use common helpers, bitmap_zalloc() and kstrndup(), to replace open coded versions. * Clarify the comments around hotplug vs initial init case for the nfit driver. * Cleanup the libnvdimm init path.
2018-12-22libnvdimm/security: Quiet security operationsDan Williams2-16/+16
The security implementation is too chatty. For example, the common case is that security is not enabled / setup, and booting a qemu configuration currently yields: nvdimm nmem0: request_key() found no key nvdimm nmem0: failed to unlock dimm: -126 nvdimm nmem1: request_key() found no key nvdimm nmem1: failed to unlock dimm: -126 Convert all security related log messages to debug level. Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21tools/testing/nvdimm: Add test support for Intel nvdimm security DSMsDave Jiang1-1/+1
Add nfit_test support for DSM functions "Get Security State", "Set Passphrase", "Disable Passphrase", "Unlock Unit", "Freeze Lock", and "Secure Erase" for the fake DIMMs. Also adding a sysfs knob in order to put the DIMMs in "locked" state. The order of testing DIMM unlocking would be. 1a. Disable DIMM X. 1b. Set Passphrase to DIMM X. 2. Write to /sys/devices/platform/nfit_test.0/nfit_test_dimm/test_dimmX/lock_dimm 3. Renable DIMM X 4. Check DIMM X state via sysfs "security" attribute for nmemX. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase supportDave Jiang3-29/+69
With Intel DSM 1.8 [1] two new security DSMs are introduced. Enable/update master passphrase and master secure erase. The master passphrase allows a secure erase to be performed without the user passphrase that is set on the NVDIMM. The commands of master_update and master_erase are added to the sysfs knob in order to initiate the DSMs. They are similar in opeartion mechanism compare to update and erase. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm/security: Add security DSM overwrite supportDave Jiang5-5/+200
Add support for the NVDIMM_FAMILY_INTEL "ovewrite" capability as described by the Intel DSM spec v1.7. This will allow triggering of overwrite on Intel NVDIMMs. The overwrite operation can take tens of minutes. When the overwrite DSM is issued successfully, the NVDIMMs will be unaccessible. The kernel will do backoff polling to detect when the overwrite process is completed. According to the DSM spec v1.7, the 128G NVDIMMs can take up to 15mins to perform overwrite and larger DIMMs will take longer. Given that overwrite puts the DIMM in an indeterminate state until it completes introduce the NDD_SECURITY_OVERWRITE flag to prevent other operations from executing when overwrite is happening. The NDD_WORK_PENDING flag is added to denote that there is a device reference on the nvdimm device for an async workqueue thread context. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add support for issue secure erase DSM to Intel nvdimmDave Jiang3-2/+53
Add support to issue a secure erase DSM to the Intel nvdimm. The required passphrase is acquired from an encrypted key in the kernel user keyring. To trigger the action, "erase <keyid>" is written to the "security" sysfs attribute. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add enable/update passphrase support for Intel nvdimmsDave Jiang3-7/+69
Add support for enabling and updating passphrase on the Intel nvdimms. The passphrase is the an encrypted key in the kernel user keyring. We trigger the update via writing "update <old_keyid> <new_keyid>" to the sysfs attribute "security". If no <old_keyid> exists (for enabling security) then a 0 should be used. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add disable passphrase support to Intel nvdimm.Dave Jiang3-3/+116
Add support to disable passphrase (security) for the Intel nvdimm. The passphrase used for disabling is pulled from an encrypted-key in the kernel user keyring. The action is triggered by writing "disable <keyid>" to the sysfs attribute "security". Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-14acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMsDave Jiang5-1/+177
Add support to unlock the dimm via the kernel key management APIs. The passphrase is expected to be pulled from userspace through keyutils. The key management and sysfs attributes are libnvdimm generic. Encrypted keys are used to protect the nvdimm passphrase at rest. The master key can be a trusted-key sealed in a TPM, preferred, or an encrypted-key, more flexible, but more exposure to a potential attacker. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-14acpi/nfit, libnvdimm: Add freeze security support to Intel nvdimmDave Jiang2-2/+65
Add support for freeze security on Intel nvdimm. This locks out any changes to security for the DIMM until a hard reset of the DIMM is performed. This is triggered by writing "freeze" to the generic nvdimm/nmemX "security" sysfs attribute. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-14acpi/nfit, libnvdimm: Introduce nvdimm_security_opsDave Jiang3-1/+63
Some NVDIMMs, like the ones defined by the NVDIMM_FAMILY_INTEL command set, expose a security capability to lock the DIMMs at poweroff and require a passphrase to unlock them. The security model is derived from ATA security. In anticipation of other DIMMs implementing a similar scheme, and to abstract the core security implementation away from the device-specific details, introduce nvdimm_security_ops. Initially only a status retrieval operation, ->state(), is defined, along with the base infrastructure and definitions for future operations. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-14acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimmDave Jiang2-5/+8
The generated dimm id is needed for the sysfs attribute as well as being used as the identifier/description for the security key. Since it's constant and should never change, store it as a member of struct nvdimm. As nvdimm_create() continues to grow parameters relative to NFIT driver requirements, do not require other implementations to keep pace. Introduce __nvdimm_create() to carry the new parameters and keep nvdimm_create() with the long standing default api. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-11libnvdimm, namespace: Replace kmemdup() with kstrndup()Andy Shevchenko1-2/+1
kstrndup() takes care of '\0' terminator for the strings. Use it here instead of kmemdup() + explicit terminating the input string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-11libnvdimm, label: Switch to bitmap_zalloc()Andy Shevchenko1-4/+3
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-11libnvdimm, bus: Check id immediately following ida_simple_getOcean He1-2/+2
The id check was not executed immediately following ida_simple_get. Just change the codes position, without function change. Signed-off-by: Ocean He <hehy1@lenovo.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-10Merge tag 'v4.20-rc6' into for-4.21/blockJens Axboe3-27/+80
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the two corruption fixes. We're going to be overhauling the direct dispatch path, and we need to do that on top of the changes we made for that in mainline. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-06libnvdimm, pfn: Pad pfn namespaces relative to other regionsDan Williams3-27/+80
Commit cfe30b872058 "libnvdimm, pmem: adjust for section collisions with 'System RAM'" enabled Linux to workaround occasions where platform firmware arranges for "System RAM" and "Persistent Memory" to collide within a single section boundary. Unfortunately, as reported in this issue [1], platform firmware can inflict the same collision between persistent memory regions. The approach of interrogating iomem_resource does not work in this case because platform firmware may merge multiple regions into a single iomem_resource range. Instead provide a method to interrogate regions that share the same parent bus. This is a stop-gap until the core-MM can grow support for hotplug on sub-section boundaries. [1]: https://github.com/pmem/ndctl/issues/76 Fixes: cfe30b872058 ("libnvdimm, pmem: adjust for section collisions with...") Cc: <stable@vger.kernel.org> Reported-by: Patrick Geary <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>