From e1073d1e7920946ac4776a619cc40668b9e1401b Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 6 Jul 2017 15:39:17 -0700 Subject: mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE This moves the #ifdef in C code to a Kconfig dependency. Also we move the gigantic_page_supported() function to be arch specific. This allows architectures to conditionally enable runtime allocation of gigantic huge page. Architectures like ppc64 supports different gigantic huge page size (16G and 1G) based on the translation mode selected. This provides an opportunity for ppc64 to enable runtime allocation only w.r.t 1G hugepage. No functional change in this patch. Link: http://lkml.kernel.org/r/1494995292-4443-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V Cc: Michael Ellerman (powerpc) Cc: Anshuman Khandual Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm64/Kconfig | 2 +- arch/arm64/include/asm/hugetlb.h | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) (limited to 'arch/arm64') diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 9f7a934ff707..ff925ece82d6 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -13,7 +13,7 @@ config ARM64 select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL - select ARCH_HAS_GIGANTIC_PAGE + select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA select ARCH_HAS_KCOV select ARCH_HAS_SET_MEMORY select ARCH_HAS_SG_CHAIN diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index bbc1e35aa601..793bd73b0d07 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -83,4 +83,8 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm, extern void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE +static inline bool gigantic_page_supported(void) { return true; } +#endif + #endif /* __ASM_HUGETLB_H */ -- cgit v1.2.3 From bb9dd3df8ee9a0995da4c35251e6a8e2eefe0b41 Mon Sep 17 00:00:00 2001 From: Steve Capper Date: Thu, 6 Jul 2017 15:39:29 -0700 Subject: arm64: hugetlb: refactor find_num_contig() Patch series "Support for contiguous pte hugepages", v4. This patchset updates the hugetlb code to fix issues arising from contiguous pte hugepages (such as on arm64). Compared to v3, This version addresses a build failure on arm64 by including two cleanup patches. Other than the arm64 cleanups, the rest are generic code changes. The remaining arm64 support based on these patches will be posted separately. The patches are based on v4.12-rc2. Previous related postings can be found at [0], [1], [2], and [3]. The patches fall into three categories - * Patch 1-2 - arm64 cleanups required to greatly simplify changing huge_pte_offset() prototype in Patch 5. Catalin, Will - are you happy for these patches to go via mm? * Patches 3-4 address issues with gup * Patches 5-8 relate to passing a size argument to hugepage helpers to disambiguate the size of the referred page. These changes are required to enable arch code to properly handle swap entries for contiguous pte hugepages. The changes to huge_pte_offset() (patch 5) touch multiple architectures but I've managed to minimise these changes for the other affected functions - huge_pte_clear() and set_huge_pte_at(). These patches gate the enabling of contiguous hugepages support on arm64 which has been requested for systems using !4k page granule. The ARM64 architecture supports two flavours of hugepages - * Block mappings at the pud/pmd level These are regular hugepages where a pmd or a pud page table entry points to a block of memory. Depending on the PAGE_SIZE in use the following size of block mappings are supported - PMD PUD --- --- 4K: 2M 1G 16K: 32M 64K: 512M For certain applications/usecases such as HPC and large enterprise workloads, folks are using 64k page size but the minimum hugepage size of 512MB isn't very practical. To overcome this ... * Using the Contiguous bit The architecture provides a contiguous bit in the translation table entry which acts as a hint to the mmu to indicate that it is one of a contiguous set of entries that can be cached in a single TLB entry. We use the contiguous bit in Linux to increase the mapping size at the pmd and pte (last) level. The number of supported contiguous entries varies by page size and level of the page table. Using the contiguous bit allows additional hugepage sizes - CONT PTE PMD CONT PMD PUD -------- --- -------- --- 4K: 64K 2M 32M 1G 16K: 2M 32M 1G 64K: 2M 512M 16G Of these, 64K with 4K and 2M with 64K pages have been explicitly requested by a few different users. Entries with the contiguous bit set are required to be modified all together - which makes things like memory poisoning and migration impossible to do correctly without knowing the size of hugepage being dealt with - the reason for adding size parameter to a few of the hugepage helpers in this series. This patch (of 8): As we regularly check for contiguous pte's in the huge accessors, remove this extra check from find_num_contig. [punit.agrawal@arm.com: resolve rebase conflicts due to patch re-ordering] Link: http://lkml.kernel.org/r/20170524115409.31309-2-punit.agrawal@arm.com Signed-off-by: Steve Capper Signed-off-by: Punit Agrawal Cc: David Woods Cc: Kirill A. Shutemov Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Naoya Horiguchi Cc: Mark Rutland Cc: Hillf Danton Cc: Michal Hocko Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm64/mm/hugetlbpage.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'arch/arm64') diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 69b8200b1cfd..710bf935a473 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -42,15 +42,13 @@ int pud_huge(pud_t pud) } static int find_num_contig(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, size_t *pgsize) + pte_t *ptep, size_t *pgsize) { pgd_t *pgd = pgd_offset(mm, addr); pud_t *pud; pmd_t *pmd; *pgsize = PAGE_SIZE; - if (!pte_cont(pte)) - return 1; pud = pud_offset(pgd, addr); pmd = pmd_offset(pud, addr); if ((pte_t *)pmd == ptep) { @@ -65,15 +63,16 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, { size_t pgsize; int i; - int ncontig = find_num_contig(mm, addr, ptep, pte, &pgsize); + int ncontig; unsigned long pfn; pgprot_t hugeprot; - if (ncontig == 1) { + if (!pte_cont(pte)) { set_pte_at(mm, addr, ptep, pte); return; } + ncontig = find_num_contig(mm, addr, ptep, &pgsize); pfn = pte_pfn(pte); hugeprot = __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte)); for (i = 0; i < ncontig; i++) { @@ -188,7 +187,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, bool is_dirty = false; cpte = huge_pte_offset(mm, addr); - ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize); + ncontig = find_num_contig(mm, addr, cpte, &pgsize); /* save the 1st pte to return */ pte = ptep_get_and_clear(mm, addr, cpte); for (i = 1, addr += pgsize; i < ncontig; ++i, addr += pgsize) { @@ -228,7 +227,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, cpte = huge_pte_offset(vma->vm_mm, addr); pfn = pte_pfn(*cpte); ncontig = find_num_contig(vma->vm_mm, addr, cpte, - *cpte, &pgsize); + &pgsize); for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) { changed |= ptep_set_access_flags(vma, addr, cpte, pfn_pte(pfn, @@ -251,7 +250,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, size_t pgsize = 0; cpte = huge_pte_offset(mm, addr); - ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize); + ncontig = find_num_contig(mm, addr, cpte, &pgsize); for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) ptep_set_wrprotect(mm, addr, cpte); } else { @@ -269,7 +268,7 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma, cpte = huge_pte_offset(vma->vm_mm, addr); ncontig = find_num_contig(vma->vm_mm, addr, cpte, - *cpte, &pgsize); + &pgsize); for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) ptep_clear_flush(vma, addr, cpte); } else { -- cgit v1.2.3 From f0b38d65c9d0b42f3e6d861a18906d49441bf78e Mon Sep 17 00:00:00 2001 From: Steve Capper Date: Thu, 6 Jul 2017 15:39:33 -0700 Subject: arm64: hugetlb: remove spurious calls to huge_ptep_offset() We don't need to call huge_ptep_offset as our accessors are already supplied with the pte_t *. This patch removes those spurious calls. [punit.agrawal@arm.com: resolve rebase conflicts due to patch re-ordering] Link: http://lkml.kernel.org/r/20170524115409.31309-3-punit.agrawal@arm.com Signed-off-by: Steve Capper Signed-off-by: Punit Agrawal Cc: David Woods Cc: Kirill A. Shutemov Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Naoya Horiguchi Cc: Mark Rutland Cc: Hillf Danton Cc: Michal Hocko Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm64/mm/hugetlbpage.c | 37 ++++++++++++++----------------------- 1 file changed, 14 insertions(+), 23 deletions(-) (limited to 'arch/arm64') diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 710bf935a473..f89aa8fa5855 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -183,21 +183,19 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, if (pte_cont(*ptep)) { int ncontig, i; size_t pgsize; - pte_t *cpte; bool is_dirty = false; - cpte = huge_pte_offset(mm, addr); - ncontig = find_num_contig(mm, addr, cpte, &pgsize); + ncontig = find_num_contig(mm, addr, ptep, &pgsize); /* save the 1st pte to return */ - pte = ptep_get_and_clear(mm, addr, cpte); + pte = ptep_get_and_clear(mm, addr, ptep); for (i = 1, addr += pgsize; i < ncontig; ++i, addr += pgsize) { /* * If HW_AFDBM is enabled, then the HW could * turn on the dirty bit for any of the page * in the set, so check them all. */ - ++cpte; - if (pte_dirty(ptep_get_and_clear(mm, addr, cpte))) + ++ptep; + if (pte_dirty(ptep_get_and_clear(mm, addr, ptep))) is_dirty = true; } if (is_dirty) @@ -213,8 +211,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t pte, int dirty) { - pte_t *cpte; - if (pte_cont(pte)) { int ncontig, i, changed = 0; size_t pgsize = 0; @@ -224,12 +220,11 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte)); - cpte = huge_pte_offset(vma->vm_mm, addr); - pfn = pte_pfn(*cpte); - ncontig = find_num_contig(vma->vm_mm, addr, cpte, + pfn = pte_pfn(pte); + ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) { - changed |= ptep_set_access_flags(vma, addr, cpte, + for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) { + changed |= ptep_set_access_flags(vma, addr, ptep, pfn_pte(pfn, hugeprot), dirty); @@ -246,13 +241,11 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, { if (pte_cont(*ptep)) { int ncontig, i; - pte_t *cpte; size_t pgsize = 0; - cpte = huge_pte_offset(mm, addr); - ncontig = find_num_contig(mm, addr, cpte, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) - ptep_set_wrprotect(mm, addr, cpte); + ncontig = find_num_contig(mm, addr, ptep, &pgsize); + for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) + ptep_set_wrprotect(mm, addr, ptep); } else { ptep_set_wrprotect(mm, addr, ptep); } @@ -263,14 +256,12 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma, { if (pte_cont(*ptep)) { int ncontig, i; - pte_t *cpte; size_t pgsize = 0; - cpte = huge_pte_offset(vma->vm_mm, addr); - ncontig = find_num_contig(vma->vm_mm, addr, cpte, + ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); - for (i = 0; i < ncontig; ++i, ++cpte, addr += pgsize) - ptep_clear_flush(vma, addr, cpte); + for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) + ptep_clear_flush(vma, addr, ptep); } else { ptep_clear_flush(vma, addr, ptep); } -- cgit v1.2.3 From 7868a2087ec13ec4a5df0c5e00999863be132ba8 Mon Sep 17 00:00:00 2001 From: Punit Agrawal Date: Thu, 6 Jul 2017 15:39:42 -0700 Subject: mm/hugetlb: add size parameter to huge_pte_offset() A poisoned or migrated hugepage is stored as a swap entry in the page tables. On architectures that support hugepages consisting of contiguous page table entries (such as on arm64) this leads to ambiguity in determining the page table entry to return in huge_pte_offset() when a poisoned entry is encountered. Let's remove the ambiguity by adding a size parameter to convey additional information about the requested address. Also fixup the definition/usage of huge_pte_offset() throughout the tree. Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.com Signed-off-by: Punit Agrawal Acked-by: Steve Capper Cc: Catalin Marinas Cc: Will Deacon Cc: Tony Luck Cc: Fenghua Yu Cc: James Hogan (odd fixer:METAG ARCHITECTURE) Cc: Ralf Baechle (supporter:MIPS) Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Yoshinori Sato Cc: Rich Felker Cc: "David S. Miller" Cc: Chris Metcalf Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Alexander Viro Cc: Michal Hocko Cc: Mike Kravetz Cc: Naoya Horiguchi Cc: "Aneesh Kumar K.V" Cc: "Kirill A. Shutemov" Cc: Hillf Danton Cc: Mark Rutland Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/arm64/mm/hugetlbpage.c | 3 ++- arch/ia64/mm/hugetlbpage.c | 4 ++-- arch/metag/mm/hugetlbpage.c | 3 ++- arch/mips/mm/hugetlbpage.c | 3 ++- arch/parisc/mm/hugetlbpage.c | 3 ++- arch/powerpc/mm/hugetlbpage.c | 2 +- arch/s390/mm/hugetlbpage.c | 3 ++- arch/sh/mm/hugetlbpage.c | 3 ++- arch/sparc/mm/hugetlbpage.c | 3 ++- arch/tile/mm/hugetlbpage.c | 3 ++- arch/x86/mm/hugetlbpage.c | 2 +- fs/userfaultfd.c | 7 +++++-- include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 23 ++++++++++++++--------- mm/page_vma_mapped.c | 3 ++- mm/pagewalk.c | 3 ++- 16 files changed, 46 insertions(+), 27 deletions(-) (limited to 'arch/arm64') diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index f89aa8fa5855..656e0ece2289 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -131,7 +131,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index 85de86d36fdf..ae35140332f7 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -44,7 +44,7 @@ huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) } pte_t * -huge_pte_offset (struct mm_struct *mm, unsigned long addr) +huge_pte_offset (struct mm_struct *mm, unsigned long addr, unsigned long sz) { unsigned long taddr = htlbpage_to_page(addr); pgd_t *pgd; @@ -92,7 +92,7 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int writ if (REGION_NUMBER(addr) != RGN_HPAGE) return ERR_PTR(-EINVAL); - ptep = huge_pte_offset(mm, addr); + ptep = huge_pte_offset(mm, addr, HPAGE_SIZE); if (!ptep || pte_none(*ptep)) return NULL; page = pte_page(*ptep); diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c index db1b7da91e4f..67fd53e2935a 100644 --- a/arch/metag/mm/hugetlbpage.c +++ b/arch/metag/mm/hugetlbpage.c @@ -74,7 +74,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index 74aa6f62468f..cef152234312 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -36,7 +36,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, + unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index aa50ac090e9b..5eb8f633b282 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -69,7 +69,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 1816b965a142..c41dc44472c5 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -57,7 +57,7 @@ static unsigned nr_gpages; #define hugepd_none(hpd) (hpd_val(hpd) == 0) -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz) { /* Only called for hugetlbfs pages, hence can ignore THP */ return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL); diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index d3a5e39756f6..44a8e6f0391e 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -180,7 +180,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return (pte_t *) pmdp; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgdp; p4d_t *p4dp; diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index cc948db74878..d2412d2d6462 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -42,7 +42,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index 88855e383b34..28ee8d8ffa07 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -277,7 +277,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c index 03e5cc4e76e4..0986d426a413 100644 --- a/arch/tile/mm/hugetlbpage.c +++ b/arch/tile/mm/hugetlbpage.c @@ -102,7 +102,8 @@ static pte_t *get_pte(pte_t *base, int index, int level) return ptep; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; pud_t *pud; diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index adad702b39cd..2824607df108 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -33,7 +33,7 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) if (!vma || !is_vm_hugetlb_page(vma)) return ERR_PTR(-EINVAL); - pte = huge_pte_offset(mm, address); + pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); /* hugetlb should be locked, and hence, prefaulted */ WARN_ON(!pte || pte_none(*pte)); diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3d0dd082337a..cadcd12a3d35 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -214,6 +214,7 @@ static inline struct uffd_msg userfault_msg(unsigned long address, * hugepmd ranges. */ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, unsigned long address, unsigned long flags, unsigned long reason) @@ -224,7 +225,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); - pte = huge_pte_offset(mm, address); + pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); if (!pte) goto out; @@ -243,6 +244,7 @@ out: } #else static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, + struct vm_area_struct *vma, unsigned long address, unsigned long flags, unsigned long reason) @@ -448,7 +450,8 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, reason); else - must_wait = userfaultfd_huge_must_wait(ctx, vmf->address, + must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, + vmf->address, vmf->flags, reason); up_read(&mm->mmap_sem); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c92a1f0c7240..31e665fbcf76 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -137,7 +137,8 @@ extern struct list_head huge_boot_pages; pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz); -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr); +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz); int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep); struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, int write); @@ -190,7 +191,7 @@ static inline void hugetlb_show_meminfo(void) #define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; }) #define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \ src_addr, pagep) ({ BUG(); 0; }) -#define huge_pte_offset(mm, address) 0 +#define huge_pte_offset(mm, address, sz) 0 static inline int dequeue_hwpoisoned_huge_page(struct page *page) { return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c73828e43100..075345532396 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3246,7 +3246,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; - src_pte = huge_pte_offset(src, addr); + src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; dst_pte = huge_pte_alloc(dst, addr, sz); @@ -3330,7 +3330,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); address = start; for (; address < end; address += sz) { - ptep = huge_pte_offset(mm, address); + ptep = huge_pte_offset(mm, address, sz); if (!ptep) continue; @@ -3548,7 +3548,8 @@ retry_avoidcopy: unmap_ref_private(mm, vma, old_page, address); BUG_ON(huge_pte_none(pte)); spin_lock(ptl); - ptep = huge_pte_offset(mm, address & huge_page_mask(h)); + ptep = huge_pte_offset(mm, address & huge_page_mask(h), + huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) goto retry_avoidcopy; @@ -3587,7 +3588,8 @@ retry_avoidcopy: * before the page tables are altered */ spin_lock(ptl); - ptep = huge_pte_offset(mm, address & huge_page_mask(h)); + ptep = huge_pte_offset(mm, address & huge_page_mask(h), + huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { ClearPagePrivate(new_page); @@ -3874,7 +3876,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, address &= huge_page_mask(h); - ptep = huge_pte_offset(mm, address); + ptep = huge_pte_offset(mm, address, huge_page_size(h)); if (ptep) { entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { @@ -4131,7 +4133,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * * Note that page table lock is not held when pte is null. */ - pte = huge_pte_offset(mm, vaddr & huge_page_mask(h)); + pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), + huge_page_size(h)); if (pte) ptl = huge_pte_lock(h, mm, pte); absent = !pte || huge_pte_none(huge_ptep_get(pte)); @@ -4270,7 +4273,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, i_mmap_lock_write(vma->vm_file->f_mapping); for (; address < end; address += huge_page_size(h)) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address); + ptep = huge_pte_offset(mm, address, huge_page_size(h)); if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); @@ -4534,7 +4537,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) saddr = page_table_shareable(svma, vma, addr, idx); if (saddr) { - spte = huge_pte_offset(svma->vm_mm, saddr); + spte = huge_pte_offset(svma->vm_mm, saddr, + vma_mmu_pagesize(svma)); if (spte) { get_page(virt_to_page(spte)); break; @@ -4630,7 +4634,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, return pte; } -pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr) +pte_t *huge_pte_offset(struct mm_struct *mm, + unsigned long addr, unsigned long sz) { pgd_t *pgd; p4d_t *p4d; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index de9c40d7304a..8ec6ba230bb9 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -116,7 +116,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (unlikely(PageHuge(pvmw->page))) { /* when pud is not present, pte will be NULL */ - pvmw->pte = huge_pte_offset(mm, pvmw->address); + pvmw->pte = huge_pte_offset(mm, pvmw->address, + PAGE_SIZE << compound_order(page)); if (!pvmw->pte) return false; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 60f7856e508f..1a4197965415 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -180,12 +180,13 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, struct hstate *h = hstate_vma(vma); unsigned long next; unsigned long hmask = huge_page_mask(h); + unsigned long sz = huge_page_size(h); pte_t *pte; int err = 0; do { next = hugetlb_entry_end(h, addr, end); - pte = huge_pte_offset(walk->mm, addr & hmask); + pte = huge_pte_offset(walk->mm, addr & hmask, sz); if (pte && walk->hugetlb_entry) err = walk->hugetlb_entry(pte, hmask, addr, next, walk); if (err) -- cgit v1.2.3