summaryrefslogtreecommitdiff
path: root/mm/memory-failure.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-28mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISONNaoya Horiguchi1-4/+4
After a memory error happens on a clean folio, a process unexpectedly receives SIGBUS when it accesses the error page. This SIGBUS killing is pointless and simply degrades the level of RAS of the system, because the clean folio can be dropped without any data lost on memory error handling as we do for a clean pagecache. When memory_failure() is called on a clean folio, try_to_unmap() is called twice (one from split_huge_page() and one from hwpoison_user_mappings()). The root cause of the issue is that pte conversion to hwpoisoned entry is now done in the first call of try_to_unmap() because PageHWPoison is already set at this point, while it's actually expected to be done in the second call. This behavior disturbs the error handling operation like removing pagecache, which results in the malfunction described above. So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only when we really intend to convert pte to hwpoison entry. This can prevent other callers of try_to_unmap() from accidentally converting to hwpoison entries. Link: https://lkml.kernel.org/r/20230221085905.1465385-1-naoya.horiguchi@linux.dev Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-20mm: change to return bool for isolate_movable_page()Baolin Wang1-2/+2
Now the isolate_movable_page() can only return 0 or -EBUSY, and no users will care about the negative return value, thus we can convert the isolate_movable_page() to return a boolean value to make the code more clear when checking the movable page isolation state. No functional changes intended. [akpm@linux-foundation.org: remove unneeded comment, per Matthew] Link: https://lkml.kernel.org/r/cb877f73f4fff8d309611082ec740a7065b1ade0.1676424378.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-20mm: hugetlb: change to return bool for isolate_hugetlb()Baolin Wang1-1/+1
Now the isolate_hugetlb() only returns 0 or -EBUSY, and most users did not care about the negative value, thus we can convert the isolate_hugetlb() to return a boolean value to make code more clear when checking the hugetlb isolation state. Moreover converts 2 users which will consider the negative value returned by isolate_hugetlb(). No functional changes intended. [akpm@linux-foundation.org: shorten locked section, per SeongJae Park] Link: https://lkml.kernel.org/r/12a287c5bebc13df304387087bbecc6421510849.1676424378.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-20mm: change to return bool for isolate_lru_page()Baolin Wang1-2/+2
The isolate_lru_page() can only return 0 or -EBUSY, and most users did not care about the negative error of isolate_lru_page(), except one user in add_page_for_migration(). So we can convert the isolate_lru_page() to return a boolean value, which can help to make the code more clear when checking the return value of isolate_lru_page(). Also convert all users' logic of checking the isolation state. No functional changes intended. Link: https://lkml.kernel.org/r/3074c1ab628d9dbf139b33f248a8bc253a3f95f0.1676424378.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-14mm/hugetlb: convert isolate_hugetlb to foliosSidhartha Kumar1-1/+1
Patch series "continue hugetlb folio conversion", v3. This series continues the conversion of core hugetlb functions to use folios. This series converts many helper funtions in the hugetlb fault path. This is in preparation for another series to convert the hugetlb fault code paths to operate on folios. This patch (of 8): Convert isolate_hugetlb() to take in a folio and convert its callers to pass a folio. Use page_folio() to convert the callers to use a folio is safe as isolate_hugetlb() operates on a head page. Link: https://lkml.kernel.org/r/20230113223057.173292-1-sidhartha.kumar@oracle.com Link: https://lkml.kernel.org/r/20230113223057.173292-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: memory-failure: bump memory failure stats to pglist_dataJiaqi Yan1-0/+36
Right before memory_failure finishes its handling, accumulate poisoned page's resolution counters to pglist_data's memory_failure_stats, so as to update the corresponding sysfs entries. Tested: 1) Start an application to allocate memory buffer chunks 2) Convert random memory buffer addresses to physical addresses 3) Inject memory errors using EINJ at chosen physical addresses 4) Access poisoned memory buffer and recover from SIGBUS 5) Check counter values under /sys/devices/system/node/node*/memory_failure/* Link: https://lkml.kernel.org/r/20230120034622.2698268-3-jiaqiyan@google.com Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: memory-failure: add memory failure stats to sysfsJiaqi Yan1-0/+35
Patch series "Introduce per NUMA node memory error statistics", v2. Background ========== In the RFC for Kernel Support of Memory Error Detection [1], one advantage of software-based scanning over hardware patrol scrubber is the ability to make statistics visible to system administrators. The statistics include 2 categories: * Memory error statistics, for example, how many memory error are encountered, how many of them are recovered by the kernel. Note these memory errors are non-fatal to kernel: during the machine check exception (MCE) handling kernel already classified MCE's severity to be unnecessary to panic (but either action required or optional). * Scanner statistics, for example how many times the scanner have fully scanned a NUMA node, how many errors are first detected by the scanner. The memory error statistics are useful to userspace and actually not specific to scanner detected memory errors, and are the focus of this patchset. Motivation ========== Memory error stats are important to userspace but insufficient in kernel today. Datacenter administrators can better monitor a machine's memory health with the visible stats. For example, while memory errors are inevitable on servers with 10+ TB memory, starting server maintenance when there are only 1~2 recovered memory errors could be overreacting; in cloud production environment maintenance usually means live migrate all the workload running on the server and this usually causes nontrivial disruption to the customer. Providing insight into the scope of memory errors on a system helps to determine the appropriate follow-up action. In addition, the kernel's existing memory error stats need to be standardized so that userspace can reliably count on their usefulness. Today kernel provides following memory error info to userspace, but they are not sufficient or have disadvantages: * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposing memory error stats is also a good start for the in-kernel memory error detector. Today the data source of memory error stats are either direct memory error consumption, or hardware patrol scrubber detection (either signaled as UCNA or SRAO). Once in-kernel memory scanner is implemented, it will be the main source as it is usually configured to scan memory DIMMs constantly and faster than hardware patrol scrubber. How Implemented =============== As Naoya pointed out [2], exposing memory error statistics to userspace is useful independent of software or hardware scanner. Therefore we implement the memory error statistics independent of the in-kernel memory error detector. It exposes the following per NUMA node memory error counters: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. This approach can be easier to extend for future use cases than /proc/meminfo, trace event, and log. The following math holds for the statistics: * total = recovered + ignored + failed + delayed These memory error stats are reset during machine boot. The 1st commit introduces these sysfs entries. The 2nd commit populates memory error stats every time memory_failure attempts memory error recovery. The 3rd commit adds documentations for introduced stats. [1] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#mc22959244f5388891c523882e61163c6e4d703af [2] https://lore.kernel.org/linux-mm/7E670362-C29E-4626-B546-26530D54F937@gmail.com/T/#m52d8d7a333d8536bd7ce74253298858b1c0c0ac6 This patch (of 3): Today kernel provides following memory error info to userspace, but each has its own disadvantage * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposes per NUMA node memory error stats as sysfs entries: /sys/devices/system/node/node${X}/memory_failure/total /sys/devices/system/node/node${X}/memory_failure/recovered /sys/devices/system/node/node${X}/memory_failure/ignored /sys/devices/system/node/node${X}/memory_failure/failed /sys/devices/system/node/node${X}/memory_failure/delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. The following math holds for the statistics: * total = recovered + ignored + failed + delayed Link: https://lkml.kernel.org/r/20230120034622.2698268-1-jiaqiyan@google.com Link: https://lkml.kernel.org/r/20230120034622.2698268-2-jiaqiyan@google.com Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/hugetlb: convert get_hwpoison_huge_page() to foliosSidhartha Kumar1-11/+11
Straightforward conversion of get_hwpoison_huge_page() to get_hwpoison_hugetlb_folio(). Reduces two references to a head page in memory-failure.c [arnd@arndb.de: fix get_hwpoison_hugetlb_folio() stub] Link: https://lkml.kernel.org/r/20230119111920.635260-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20230118174039.14247-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: clean up mlock_page / munlock_page references in commentsMatthew Wilcox (Oracle)1-1/+1
Change documentation and comments that refer to now-renamed functions. Link: https://lkml.kernel.org/r/20230116192827.2146732-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert unpoison_memory() to foliosSidhartha Kumar1-11/+9
Use a folio inside unpoison_memory which replaces a compound_head() call with a call to page_folio(). Link: https://lkml.kernel.org/r/20230112204608.80136-9-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert hugetlb_set_page_hwpoison() to foliosSidhartha Kumar1-4/+3
Change hugetlb_set_page_hwpoison() to folio_set_hugetlb_hwpoison() and use a folio internally. Link: https://lkml.kernel.org/r/20230112204608.80136-8-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert __free_raw_hwp_pages() to foliosSidhartha Kumar1-4/+3
Change __free_raw_hwp_pages() to __folio_free_raw_hwp() and modify its callers to pass in a folio. Link: https://lkml.kernel.org/r/20230112204608.80136-7-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert raw_hwp_list_head() to foliosSidhartha Kumar1-7/+9
Change raw_hwp_list_head() to take in a folio and modify its callers to pass in a folio. Also converts two users of hugetlb specific page macro users to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-6-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert free_raw_hwp_pages() to foliosSidhartha Kumar1-10/+12
Change free_raw_hwp_pages() to folio_free_raw_hwp(), converts two users of hugetlb specific page macro users to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-5-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert hugetlb_clear_page_hwpoison to foliosSidhartha Kumar1-5/+5
Change hugetlb_clear_page_hwpoison() to folio_clear_hugetlb_hwpoison() by changing the function to take in a folio. This converts one use of ClearPageHWPoison and HPageRawHwpUnreliable to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-4-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert try_memory_failure_hugetlb() to foliosSidhartha Kumar1-13/+13
Use a struct folio rather than a head page in try_memory_failure_hugetlb. This converts one user of SetHPageMigratable to the folio equivalent. Link: https://lkml.kernel.org/r/20230112204608.80136-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert __get_huge_page_for_hwpoison() to foliosSidhartha Kumar1-10/+10
Patch series "convert hugepage memory failure functions to folios". This series contains a 1:1 straightforward page to folio conversion for memory failure functions which deal with huge pages. I renamed a few functions to fit with how other folio operating functions are named. These include: hugetlb_clear_page_hwpoison -> folio_clear_hugetlb_hwpoison free_raw_hwp_pages -> folio_free_raw_hwp __free_raw_hwp_pages -> __folio_free_raw_hwp hugetlb_set_page_hwpoison -> folio_set_hugetlb_hwpoison The goal of this series was to reduce users of the hugetlb specific page flag macros which take in a page so users are protected by the compiler to make sure they are operating on a head page. This patch (of 8): Use a folio throughout the function rather than using a head page. This also reduces the users of the page version of hugetlb specific page flags. Link: https://lkml.kernel.org/r/20230112204608.80136-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-19tools/vm: rename tools/vm to tools/mmSeongJae Park1-1/+1
Rename tools/vm to tools/mm for being more consistent with the code and documentation directories, and won't be confused with virtual machines. Link: https://lkml.kernel.org/r/20230103180754.129637-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01mm/memory-failure.c: cleanup in unpoison_memoryMa Wupeng1-4/+2
If freeit is true, the value of ret must be zero, there is no need to check the value of freeit after label unlock_mutex. We can drop variable freeit to do this cleanup. Link: https://lkml.kernel.org/r/20221125065444.3462681-1-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01memory-failure: convert truncate_error_page() to use folioVishal Moola (Oracle)1-2/+3
Replace try_to_release_page() with filemap_release_folio(). This change is in preparation for the removal of the try_to_release_page() wrapper. Link: https://lkml.kernel.org/r/20221118073055.55694-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01mm,hugetlb: use folio fields in second tail pageHugh Dickins1-3/+2
Patch series "mm,huge,rmap: unify and speed up compound mapcounts". This patch (of 3): We want to declare one more int in the first tail of a compound page: that first tail page being valuable property, since every compound page has a first tail, but perhaps no more than that. No problem on 64-bit: there is already space for it. No problem with 32-bit THPs: 5.18 commit 5232c63f46fd ("mm: Make compound_pincount always available") kindly cleared the space for it, apparently not realizing that only 64-bit architectures enable CONFIG_THP_SWAP (whose use of tail page->private might conflict) - but make sure of that in its Kconfig. But hugetlb pages use tail page->private of the first tail page for a subpool pointer, which will conflict; and they also use page->private of the 2nd, 3rd and 4th tails. Undo "mm: add private field of first tail to struct page and struct folio"'s recent addition of private_1 to the folio tail: instead add hugetlb_subpool, hugetlb_cgroup, hugetlb_cgroup_rsvd, hugetlb_hwpoison to a second tail page of the folio: THP has long been using several fields of that tail, so make better use of it for hugetlb too. This is not how a generic folio should be declared in future, but it is an effective transitional way to make use of it. Delete the SUBPAGE_INDEX stuff, but keep __NR_USED_SUBPAGE: now 3. [hughd@google.com: prefix folio's page_1 and page_2 with double underscore, give folio's _flags_2 and _head_2 a line documentation each] Link: https://lkml.kernel.org/r/9e2cb6b-5b58-d3f2-b5ee-5f8a14e8f10@google.com Link: https://lkml.kernel.org/r/5f52de70-975-e94f-f141-543765736181@google.com Link: https://lkml.kernel.org/r/3818cc9a-9999-d064-d778-9c94c5911e6@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton1-1/+4
2022-11-09mm/hwpoison: introduce per-memory_block hwpoison counterNaoya Horiguchi1-25/+11
Currently PageHWPoison flag does not behave well when experiencing memory hotremove/hotplug. Any data field in struct page is unreliable when the associated memory is offlined, and the current mechanism can't tell whether a memory block is onlined because a new memory devices is installed or because previous failed offline operations are undone. Especially if there's a hwpoisoned memory, it's unclear what the best option is. So introduce a new mechanism to make struct memory_block remember that a memory block has hwpoisoned memory inside it. And make any online event fail if the onlining memory block contains hwpoison. struct memory_block is freed and reallocated over ACPI-based hotremove/hotplug, but not over sysfs-based hotremove/hotplug. So the new counter can distinguish these cases. Link: https://lkml.kernel.org/r/20221024062012.1520887-5-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09mm/hwpoison: pass pfn to num_poisoned_pages_*()Naoya Horiguchi1-7/+7
No functional change. Link: https://lkml.kernel.org/r/20221024062012.1520887-4-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.cNaoya Horiguchi1-0/+10
These interfaces will be used by drivers/base/memory.c by later patch, so as a preparatory work move them to more common header file visible to the file. Link: https://lkml.kernel.org/r/20221024062012.1520887-3-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned ↵Naoya Horiguchi1-4/+17
hugepage Patch series "mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug", v7. This patchset tries to solve the issue among memory_hotplug, hugetlb and hwpoison. In this patchset, memory hotplug handles hwpoison pages like below: - hwpoison pages should not prevent memory hotremove, - memory block with hwpoison pages should not be onlined. This patch (of 4): HWPoisoned page is not supposed to be accessed once marked, but currently such accesses can happen during memory hotremove because do_migrate_range() can be called before dissolve_free_huge_pages() is called. Clear HPageMigratable for hwpoisoned hugepages to prevent them from being migrated. This should be done in hugetlb_lock to avoid race against isolate_hugetlb(). get_hwpoison_huge_page() needs to have a flag to show it's called from unpoison to take refcount of hwpoisoned hugepages, so add it. [naoya.horiguchi@linux.dev: remove TestClearHPageMigratable and reduce to test and clear separately] Link: https://lkml.kernel.org/r/20221025053559.GA2104800@ik1-406-35019.vs.sakura.ne.jp Link: https://lkml.kernel.org/r/20221024062012.1520887-1-naoya.horiguchi@linux.dev Link: https://lkml.kernel.org/r/20221024062012.1520887-2-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09mm: memory-failure: make action_result() return intKefeng Wang1-29/+16
Check mf_result in action_result(), only return 0 when MF_RECOVERED, or return -EBUSY, which will simplify code a bit. [wangkefeng.wang@huawei.com: v2] Link: https://lkml.kernel.org/r/20221024035138.99119-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20221021084611.53765-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09mm: memory-failure: avoid pfn_valid() twice in soft_offline_page()Kefeng Wang1-3/+3
Simplify WARN_ON_ONCE(flags & MF_COUNT_INCREASED) under !pfn_valid(). Link: https://lkml.kernel.org/r/20221021084611.53765-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09mm: memory-failure: make put_ref_page() more usefulKefeng Wang1-17/+17
Pass pfn/flags to put_ref_page(), then check MF_COUNT_INCREASED and drop refcount to make the code look cleaner. Link: https://lkml.kernel.org/r/20221021084611.53765-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09hugetlbfs: don't delete error page from pagecacheJames Houghton1-1/+4
This change is very similar to the change that was made for shmem [1], and it solves the same problem but for HugeTLBFS instead. Currently, when poison is found in a HugeTLB page, the page is removed from the page cache. That means that attempting to map or read that hugepage in the future will result in a new hugepage being allocated instead of notifying the user that the page was poisoned. As [1] states, this is effectively memory corruption. The fix is to leave the page in the page cache. If the user attempts to use a poisoned HugeTLB page with a syscall, the syscall will fail with EIO, the same error code that shmem uses. For attempts to map the page, the thread will get a BUS_MCEERR_AR SIGBUS. [1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens") Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com Signed-off-by: James Houghton <jthoughton@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-04rmap: remove page_unlock_anon_vma_read()Matthew Wilcox (Oracle)1-1/+1
This was simply an alias for anon_vma_unlock_read() since 2011. Link: https://lkml.kernel.org/r/20220902194653.1739778-56-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm/swap: add swp_offset_pfn() to fetch PFN from swap entryPeter Xu1-1/+1
We've got a bunch of special swap entries that stores PFN inside the swap offset fields. To fetch the PFN, normally the user just calls swp_offset() assuming that'll be the PFN. Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the max possible length of a PFN on the host, meanwhile doing proper check with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry(). One reason to do so is we never tried to sanitize whether swap offset can really fit for storing PFN. At the meantime, this patch also prepares us with the future possibility to store more information inside the swp offset field, so assuming "swp_offset(entry)" to be the PFN will not stand any more very soon. Replace many of the swp_offset() callers to use swp_offset_pfn() where proper. Note that many of the existing users are not candidates for the replacement, e.g.: (1) When the swap entry is not a pfn swap entry at all, or, (2) when we wanna keep the whole swp_offset but only change the swp type. For the latter, it can happen when fork() triggered on a write-migration swap entry pte, we may want to only change the migration type from write->read but keep the rest, so it's not "fetching PFN" but "changing swap type only". They're left aside so that when there're more information within the swp offset they'll be carried over naturally in those cases. Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what the new swp_offset_pfn() is about. Link: https://lkml.kernel.org/r/20220811161331.37055-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Minchan Kim <minchan@kernel.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm, hwpoison: cleanup some obsolete commentsMiaohe Lin1-5/+5
1.Remove meaningless comment in kill_proc(). That doesn't tell anything. 2.Fix the wrong function name get_hwpoison_unless_zero(). It should be get_page_unless_zero(). 3.The gate keeper for free hwpoison page has moved to check_new_page(). Update the corresponding comment. Link: https://lkml.kernel.org/r/20220830123604.25763-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm, hwpoison: check PageTable() explicitly in hwpoison_user_mappings()Miaohe Lin1-1/+1
PageTable can't be handled by memory_failure(). Filter it out explicitly in hwpoison_user_mappings(). This will also make code more consistent with the relevant check in unpoison_memory(). Link: https://lkml.kernel.org/r/20220830123604.25763-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm, hwpoison: avoid unneeded page_mapped_in_vma() overhead in ↵Miaohe Lin1-3/+3
collect_procs_anon() If vma->vm_mm != t->mm, there's no need to call page_mapped_in_vma() as add_to_kill() won't be called in this case. Move up the mm check to avoid possible unneeded calling to page_mapped_in_vma(). Link: https://lkml.kernel.org/r/20220830123604.25763-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm, hwpoison: use num_poisoned_pages_sub() to decrease num_poisoned_pagesMiaohe Lin1-2/+4
Use num_poisoned_pages_sub() to combine multiple atomic ops into one. Also num_poisoned_pages_dec() can be killed as there's no caller now. Link: https://lkml.kernel.org/r/20220830123604.25763-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm, hwpoison: use __PageMovable() to detect non-lru movable pagesMiaohe Lin1-7/+9
It's more recommended to use __PageMovable() to detect non-lru movable pages. We can avoid bumping page refcnt via isolate_movable_page() for the isolated lru pages. Also if pages become PageLRU just after they're checked but before trying to isolate them, isolate_lru_page() will be called to do the right work. [linmiaohe@huawei.com: fixes per Naoya Horiguchi] Link: https://lkml.kernel.org/r/1f7ee86e-7d28-0d8c-e0de-b7a5a94519e8@huawei.com Link: https://lkml.kernel.org/r/20220830123604.25763-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm, hwpoison: use ClearPageHWPoison() in memory_failure()Miaohe Lin1-1/+1
Patch series "A few cleanup patches for memory-failure". his series contains a few cleanup patches to use __PageMovable() to detect non-lru movable pages, use num_poisoned_pages_sub() to reduce multiple atomic ops overheads and so on. More details can be found in the respective changelogs. This patch (of 6): Use ClearPageHWPoison() instead of TestClearPageHWPoison() to clear page hwpoison flags to avoid unneeded full memory barrier overhead. Link: https://lkml.kernel.org/r/20220830123604.25763-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220830123604.25763-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton1-11/+16
2022-09-26mm,hwpoison: check mm when killing accessing processShuai Xue1-0/+3
The GHES code calls memory_failure_queue() from IRQ context to queue work into workqueue and schedule it on the current CPU. Then the work is processed in memory_failure_work_func() by kworker and calls memory_failure(). When a page is already poisoned, commit a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") make memory_failure() call kill_accessing_process() that: - holds mmap locking of current->mm - does pagetable walk to find the error virtual address - and sends SIGBUS to the current process with error info. However, the mm of kworker is not valid, resulting in a null-pointer dereference. So check mm when killing the accessing process. [akpm@linux-foundation.org: remove unrelated whitespace alteration] Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Bixuan Cui <cuibixuan@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm: memory-failure: kill __soft_offline_page()Kefeng Wang1-15/+10
Squash the __soft_offline_page() into soft_offline_in_use_page() and kill __soft_offline_page(). [wangkefeng.wang@huawei.com: update hpage when try_to_split_thp_page() succeeds] Link: https://lkml.kernel.org/r/20220830104654.28234-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20220819033402.156519-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm: memory-failure: kill soft_offline_free_page()Kefeng Wang1-11/+1
Open-code the page_handle_poison() into soft_offline_page() and kill unneeded soft_offline_free_page(). Link: https://lkml.kernel.org/r/20220819033402.156519-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm, hwpoison: avoid trying to unpoison reserved pageMiaohe Lin1-2/+2
For reserved pages, HWPoison flag will be set without increasing the page refcnt. So we shouldn't even try to unpoison these pages and thus decrease the page refcnt unexpectly. Add a PageReserved() check to filter this case out and remove the below unneeded zero page (zero page is reserved) check. Link: https://lkml.kernel.org/r/20220818130016.45313-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm, hwpoison: kill procs if unmap failsMiaohe Lin1-8/+4
If try_to_unmap() fails, the hwpoisoned page still resides in the address space of some processes. We should kill these processes or the hwpoisoned page might be consumed later. collect_procs() is always called to collect relevant processes now so they can be killed later if unmap fails. [linmiaohe@huawei.com: v2] Link: https://lkml.kernel.org/r/20220823032346.4260-6-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220818130016.45313-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm, hwpoison: fix possible use-after-free in mf_dax_kill_procs()Miaohe Lin1-1/+2
After kill_procs(), tk will be freed without being removed from the to_kill list. In the next iteration, the freed list entry in the to_kill list will be accessed, thus leading to use-after-free issue. Adding list_del() in kill_procs() to fix the issue. Link: https://lkml.kernel.org/r/20220823032346.4260-5-linmiaohe@huawei.com Fixes: c36e20249571 ("mm: introduce mf_dax_kill_procs() for fsdax case") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm, hwpoison: fix extra put_page() in soft_offline_page()Miaohe Lin1-2/+0
When hwpoison_filter() refuses to soft offline a page, the page refcnt incremented previously by MF_COUNT_INCREASED would have been consumed via get_hwpoison_page() if ret <= 0. So the put_ref_page() here will put the extra one. Remove it to fix the issue. Link: https://lkml.kernel.org/r/20220818130016.45313-4-linmiaohe@huawei.com Fixes: 9113eaf331bf ("mm/memory-failure.c: add hwpoison_filter for soft offline") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm, hwpoison: fix page refcnt leaking in unpoison_memory()Miaohe Lin1-0/+1
When free_raw_hwp_pages() fails its work, the refcnt of the hugetlb page would have been incremented if ret > 0. Using put_page() to fix refcnt leaking in this case. Link: https://lkml.kernel.org/r/20220818130016.45313-3-linmiaohe@huawei.com Fixes: debb6b9c3fdd ("mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm, hwpoison: fix page refcnt leaking in try_memory_failure_hugetlb()Miaohe Lin1-2/+4
Patch series "A few fixup patches for memory-failure", v2. This series contains a few fixup patches to fix incorrect update of page refcnt, fix possible use-after-free issue and so on. More details can be found in the respective changelogs. This patch (of 6): When hwpoison_filter() refuses to hwpoison a hugetlb page, the refcnt of the page would have been incremented if res == 1. Using put_page() to fix the refcnt leaking in this case. Link: https://lkml.kernel.org/r/20220823032346.4260-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220818130016.45313-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220818130016.45313-2-linmiaohe@huawei.com Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm: memory-failure: cleanup try_to_split_thp_page()Kefeng Wang1-11/+12
Since commit 5d1fd5dc877b ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP"), the action_result(,MF_MSG_UNSPLIT_THP,) called to show memory error event in memory_failure(), so the pr_info() in try_to_split_thp_page() is only needed in soft_offline_in_use_page(). Meanwhile this could also fix the unexpected prefix for "thp split failed" due to commit 96f96763de26 ("mm: memory-failure: convert to pr_fmt()"). Link: https://lkml.kernel.org/r/20220809111813.139690-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/memory-failure: fall back to vma_address() when ->notify_failure() failsDan Williams1-10/+12
In the case where a filesystem is polled to take over the memory failure and receives -EOPNOTSUPP it indicates that page->index and page->mapping are valid for reverse mapping the failure address. Introduce FSDAX_INVALID_PGOFF to distinguish when add_to_kill() is being called from mf_dax_kill_procs() by a filesytem vs the typical memory_failure() path. Otherwise, vma_pgoff_address() is called with an invalid fsdax_pgoff which then trips this failing signature: kernel BUG at mm/memory-failure.c:319! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 13 PID: 1262 Comm: dax-pmd Tainted: G OE N 6.0.0-rc2+ #62 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:add_to_kill.cold+0x19d/0x209 [..] Call Trace: <TASK> collect_procs.part.0+0x2c4/0x460 memory_failure+0x71b/0xba0 ? _printk+0x58/0x73 do_madvise.part.0.cold+0xaf/0xc5 Link: https://lkml.kernel.org/r/166153429427.2758201.14605968329933175594.stgit@dwillia2-xfh.jf.intel.com Fixes: c36e20249571 ("mm: introduce mf_dax_kill_procs() for fsdax case") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Jane Chu <jane.chu@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>