summaryrefslogtreecommitdiff
path: root/arch/riscv/mm
AgeCommit message (Collapse)AuthorFilesLines
2022-03-08riscv: Fix config KASAN && DEBUG_VIRTUALAlexandre Ghiti1-0/+3
commit c648c4bb7d02ceb53ee40172fdc4433b37cee9c6 upstream. __virt_to_phys function is called very early in the boot process (ie kasan_early_init) so it should not be instrumented by KASAN otherwise it bugs. Fix this by declaring phys_addr.c as non-kasan instrumentable. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Fixes: 8ad8b72721d0 (riscv: Add KASAN support) Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAPAlexandre Ghiti1-2/+1
commit a3d328037846d013bb4c7f3777241e190e4c75e1 upstream. In order to get the pfn of a struct page* when sparsemem is enabled without vmemmap, the mem_section structures need to be initialized which happens in sparse_init. But kasan_early_init calls pfn_to_page way before sparse_init is called, which then tries to dereference a null mem_section pointer. Fix this by removing the usage of this function in kasan_early_init. Fixes: 8ad8b72721d0 ("riscv: Add KASAN support") Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08riscv/mm: Add XIP_FIXUP for phys_ram_basePalmer Dabbelt1-0/+1
[ Upstream commit 4b1c70aa8ed8249608bb991380cb8ff423edf49e ] This manifests as a crash early in boot on VexRiscv. Signed-off-by: Myrtle Shah <gatecat@ds0.me> [Palmer: split commit] Fixes: 6d7f91d914bc ("riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversion") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27riscv: mm: fix wrong phys_ram_base value for RV64Jisheng Zhang1-1/+1
commit b0fd4b1bf995172b9efcee23600d4f69571c321c upstream. Currently, if 64BIT and !XIP_KERNEL, the phys_ram_base is always 0, no matter the real start of dram reported by memblock is. Fixes: 6d7f91d914bc ("riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversion") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alex@ghiti.fr> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27riscv: try to allocate crashkern region from 32bit addressible memoryNick Kossifidis1-4/+13
commit decf89f86ecd3c3c3de81c562010d5797bea3de1 upstream. When allocating crash kernel region without explicitly specifying its base address/size, memblock_phys_alloc_range will attempt to allocate memory top to bottom (memblock.bottom_up is false), so the crash kernel region will end up in highmem on 64bit systems. This way swiotlb can't work on the crash kernel, since there won't be any 32bit addressible memory available for the bounce buffers. Try to allocate 32bit addressible memory if available, for the crash kernel by restricting the top search address to be less than SZ_4G. If that fails fallback to the previous behavior. I tested this on HiFive Unmatched where the pci-e controller needs swiotlb to work, with this patch it's possible to access the pci-e controller on crash kernel and mount the rootfs from the nvme. Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Fixes: e53d28180d4d ("RISC-V: Add kdump support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-29riscv: Fix asan-stack clang buildAlexandre Ghiti1-0/+3
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in Kconfig, it prevents asan-stack from getting disabled with clang even when CONFIG_KASAN_STACK is disabled: fix this by defining the corresponding config. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Fixes: 8ad8b72721d0 ("riscv: Add KASAN support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-29riscv: Do not re-populate shadow memory with kasan_populate_early_shadowAlexandre Ghiti1-11/+0
When calling this function, all the shadow memory is already populated with kasan_early_shadow_pte which has PAGE_KERNEL protection. kasan_populate_early_shadow write-protects the mapping of the range of addresses passed in argument in zero_pte_populate, which actually write-protects all the shadow memory mapping since kasan_early_shadow_pte is used for all the shadow memory at this point. And then when using memblock API to populate the shadow memory, the first write access to the kernel stack triggers a trap. This becomes visible with the next commit that contains a fix for asan-stack. We already manually populate all the shadow memory in kasan_early_init and we write-protect kasan_early_shadow_pte at the end of kasan_init which makes the calls to kasan_populate_early_shadow superfluous so we can remove them. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support") Fixes: 8ad8b72721d0 ("riscv: Add KASAN support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-05riscv: Flush current cpu icache before other cpusAlexandre Ghiti1-0/+2
On SiFive Unmatched, I recently fell onto the following BUG when booting: [ 0.000000] ftrace: allocating 36610 entries in 144 pages [ 0.000000] Oops - illegal instruction [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.13.1+ #5 [ 0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT) [ 0.000000] epc : riscv_cpuid_to_hartid_mask+0x6/0xae [ 0.000000] ra : __sbi_rfence_v02+0xc8/0x10a [ 0.000000] epc : ffffffff80007240 ra : ffffffff80009964 sp : ffffffff81803e10 [ 0.000000] gp : ffffffff81a1ea70 tp : ffffffff8180f500 t0 : ffffffe07fe30000 [ 0.000000] t1 : 0000000000000004 t2 : 0000000000000000 s0 : ffffffff81803e60 [ 0.000000] s1 : 0000000000000000 a0 : ffffffff81a22238 a1 : ffffffff81803e10 [ 0.000000] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.000000] a5 : 0000000000000000 a6 : ffffffff8000989c a7 : 0000000052464e43 [ 0.000000] s2 : ffffffff81a220c8 s3 : 0000000000000000 s4 : 0000000000000000 [ 0.000000] s5 : 0000000000000000 s6 : 0000000200000100 s7 : 0000000000000001 [ 0.000000] s8 : ffffffe07fe04040 s9 : ffffffff81a22c80 s10: 0000000000001000 [ 0.000000] s11: 0000000000000004 t3 : 0000000000000001 t4 : 0000000000000008 [ 0.000000] t5 : ffffffcf04000808 t6 : ffffffe3ffddf188 [ 0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000002 [ 0.000000] [<ffffffff80007240>] riscv_cpuid_to_hartid_mask+0x6/0xae [ 0.000000] [<ffffffff80009474>] sbi_remote_fence_i+0x1e/0x26 [ 0.000000] [<ffffffff8000b8f4>] flush_icache_all+0x12/0x1a [ 0.000000] [<ffffffff8000666c>] patch_text_nosync+0x26/0x32 [ 0.000000] [<ffffffff8000884e>] ftrace_init_nop+0x52/0x8c [ 0.000000] [<ffffffff800f051e>] ftrace_process_locs.isra.0+0x29c/0x360 [ 0.000000] [<ffffffff80a0e3c6>] ftrace_init+0x80/0x130 [ 0.000000] [<ffffffff80a00f8c>] start_kernel+0x5c4/0x8f6 [ 0.000000] ---[ end trace f67eb9af4d8d492b ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- While ftrace is looping over a list of addresses to patch, it always failed when patching the same function: riscv_cpuid_to_hartid_mask. Looking at the backtrace, the illegal instruction is encountered in this same function. However, patch_text_nosync, after patching the instructions, calls flush_icache_range. But looking at what happens in this function: flush_icache_range -> flush_icache_all -> sbi_remote_fence_i -> __sbi_rfence_v02 -> riscv_cpuid_to_hartid_mask The icache and dcache of the current cpu are never synchronized between the patching of riscv_cpuid_to_hartid_mask and calling this same function. So fix this by flushing the current cpu's icache before asking for the other cpus to do the same. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-09-05Merge tag 'riscv-for-linus-5.15-mw0' of ↵Linus Torvalds1-72/+58
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - support PC-relative instructions (auipc and branches) in kprobes - support for forced IRQ threading - support for the hlt/nohlt kernel command line options, via the generic idle loop - show the edge/level triggered behavior of interrupts in /proc/interrupts - a handful of cleanups to our address mapping mechanisms - support for allocating gigantic hugepages via CMA - support for the undefined behavior sanitizer (UBSAN) - a handful of cleanups to the VDSO that allow the kernel to build with LLD. - support for hugepage migration * tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits) riscv: add support for hugepage migration RISC-V: Fix VDSO build for !MMU riscv: use strscpy to replace strlcpy riscv: explicitly use symbol offsets for VDSO riscv: Enable Undefined Behavior Sanitizer UBSAN riscv: Keep the riscv Kconfig selects sorted riscv: Support allocating gigantic hugepages using CMA riscv: fix the global name pfn_base confliction error riscv: Move early fdt mapping creation in its own function riscv: Simplify BUILTIN_DTB device tree mapping handling riscv: Use __maybe_unused instead of #ifdefs around variable declarations riscv: Get rid of map_size parameter to create_kernel_page_table riscv: Introduce va_kernel_pa_offset for 32-bit kernel riscv: Optimize kernel virtual address conversion macro dt-bindings: riscv: add starfive jh7100 bindings riscv: Enable GENERIC_IRQ_SHOW_LEVEL riscv: Enable idle generic idle loop riscv: Allow forced irq threading riscv: Implement thread_struct whitelist for hardened usercopy riscv: kprobes: implement the branch instructions ...
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-30/+14
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03memblock: make memblock_find_in_range method privateMike Rapoport1-30/+14
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-02Merge tag 'devicetree-for-5.15' of ↵Linus Torvalds1-20/+0
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: - Refactor arch kdump DT related code to a common implementation - Add fw_devlink tracking for 'phy-handle', 'leds', 'backlight', 'resets', and 'pwm' properties - Various clean-ups to DT FDT code - Fix a runtime error for !CONFIG_SYSFS - Convert Synopsys DW PCI and derivative binding docs to schemas. Add Toshiba Visconti PCIe binding. - Convert a bunch of memory controller bindings to schemas - Covert eeprom-93xx46, Samsung Exynos TRNG, Samsung Exynos IRQ combiner, arm-charlcd, img-ascii-lcd, UniPhier eFuse, Xilinx Zynq MPSoC FPGA, Xilinx Zynq MPSoC reset, Mediatek mmsys, Gemini boards, brcm,iproc-i2c, faraday,ftpci100, and ks8851 net to DT schema. - Extend nvmem bindings to handle bit offsets in unit-addresses - Add DT schemas for HiKey 970 PCIe PHY - Remove unused ZTE, energymicro,efm32-timer, and Exynos SATA bindings - Enable dtc pci_device_reg warning by default - Fixes for handling 'unevaluatedProperties' in preparation to enable pending support in the tooling for jsonschema 2020-12 draft * tag 'devicetree-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits) dt-bindings: display: remove zte,vou.txt binding doc dt-bindings: hwmon: merge max1619 into trivial devices dt-bindings: mtd-physmap: Add 'arm,vexpress-flash' compatible dt-bindings: PCI: imx6: convert the imx pcie controller to dtschema dt-bindings: Use 'enum' instead of 'oneOf' plus 'const' entries dt-bindings: Add vendor prefix for Topic Embedded Systems of: fdt: Rename reserve_elfcorehdr() to fdt_reserve_elfcorehdr() arm64: kdump: Remove custom linux,usable-memory-range handling arm64: kdump: Remove custom linux,elfcorehdr handling riscv: Remove non-standard linux,elfcorehdr handling of: fdt: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD) instead of #ifdef of: fdt: Add generic support for handling usable memory range property of: fdt: Add generic support for handling elf core headers property crash_dump: Make elfcorehdr address/size symbols always visible dt-bindings: memory: convert Samsung Exynos DMC to dtschema dt-bindings: devfreq: event: convert Samsung Exynos PPMU to dtschema dt-bindings: devfreq: event: convert Samsung Exynos NoCP to dtschema kbuild: Enable dtc 'pci_device_reg' warning by default dt-bindings: soc: remove obsolete zte zx header dt-bindings: clock: remove obsolete zte zx header ...
2021-08-25riscv: Remove non-standard linux,elfcorehdr handlingGeert Uytterhoeven1-20/+0
RISC-V uses platform-specific code to locate the elf core header in memory. However, this does not conform to the standard "linux,elfcorehdr" DT bindings, as it relies on a reserved memory node with the "linux,elfcorehdr" compatible value, instead of on a "linux,elfcorehdr" property under the "/chosen" node. The non-compliant code can just be removed, as the standard behavior is already implemented by platform-agnostic handling in the FDT core code. Fixes: 5640975003d0234d ("RISC-V: Add crash kernel support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/41c75d6ee3114ae6304f8afe0051895af91200ee.1628670468.git.geert+renesas@glider.be
2021-08-14riscv: Support allocating gigantic hugepages using CMAKefeng Wang1-0/+3
This patch adds support to allocate gigantic hugepages using CMA by specifying the hugetlb_cma= kernel parameter. This is only supported on RV64. Reviewed-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-14riscv: fix the global name pfn_base confliction errorKenneth Lee1-3/+3
RISCV uses a global variable pfn_base for page/pfn translation. But this is a common name and will be used elsewhere. In those cases, the page-pfn macros which refer to this name will be referred to the local/input variable instead. (such as in vfio_pin_pages_remote). This make everything wrong. This patch changes the name from pfn_base to riscv_pfn_base to fix this problem. Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-12riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUEAlexandre Ghiti1-1/+1
The current comment states that we check if the 64-bit kernel mapping overlaps with the last 4K of the address space that is reserved to error values in create_kernel_page_table, which is not the case since it is done in setup_vm. But anyway, remove the reference to any function and simply note that in 64-bit kernel, the check should be done as soon as the kernel mapping base address is known. Fixes: db6b84a368b4 ("riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE") Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-12riscv: Move early fdt mapping creation in its own functionAlexandre Ghiti1-36/+40
The code that handles the early fdt mapping is hard to read and does not create the same mapping size depending on the kernel: - for 64-bit, 2 PMD entries are used which amounts to a 4MB mapping - for 32-bit, 2 PGDIR entries are used which amounts to a 8MB mapping So keep using 2 PMD entries for 64-bit and use only one PGD entry for 32-bit needed to cover 4MB. Move that into a new function called create_fdt_early_page_table which, using the same naming as create_kernel_page_table. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-12riscv: Simplify BUILTIN_DTB device tree mapping handlingAlexandre Ghiti1-10/+2
__PAGETABLE_PMD_FOLDED defines a 2-level page table that is only used in 32-bit kernel, so there is no need to check for CONFIG_64BIT in #ifndef __PAGETABLE_PMD_FOLDED and vice-versa. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-12riscv: Use __maybe_unused instead of #ifdefs around variable declarationsAlexandre Ghiti1-3/+1
This allows to simplify the code and make it more readable. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-12riscv: Get rid of map_size parameter to create_kernel_page_tableAlexandre Ghiti1-19/+11
The kernel must always be mapped using PMD_SIZE, and this is already the case, this just simplifies create_kernel_page_table. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-12riscv: Introduce va_kernel_pa_offset for 32-bit kernelAlexandre Ghiti1-3/+0
va_kernel_pa_offset was only used for 64-bit as the kernel mapping lies in the linear mapping for 32-bit kernel and then only the offset between the PAGE_OFFSET and the kernel load address is needed. But this distinction complexifies the code with #ifdefs and especially with a separate definition of the address conversions macros. Simplify the code by defining this variable for both 32-bit and 64-bit. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-07riscv: Get rid of CONFIG_PHYS_RAM_BASE in kernel physical address conversionAlexandre Ghiti1-5/+12
The usage of CONFIG_PHYS_RAM_BASE for all kernel types was a mistake: this value is implementation-specific and this breaks the genericity of the RISC-V kernel. Fix this by introducing a new variable phys_ram_base that holds this value at runtime and use it in the kernel physical address conversion macro. Since this value is used only for XIP kernels, evaluate it only if CONFIG_XIP_KERNEL is set which in addition optimizes this macro for standard kernels at compile-time. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Tested-by: Emil Renner Berthing <kernel@esmil.dk> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Fixes: 44c922572952 ("RISC-V: enable XIP") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-23riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUEAlexandre Ghiti1-2/+16
The check that is done in setup_bootmem currently only works for 32-bit kernel since the kernel mapping has been moved outside of the linear mapping for 64-bit kernel. So make sure that for 64-bit kernel, the kernel mapping does not overlap with the last 4K of the addressable memory. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-23riscv: Make sure the linear mapping does not use the kernel mappingAlexandre Ghiti1-0/+2
For 64-bit kernel, the end of the address space is occupied by the kernel mapping and currently, the functions to populate the kernel page tables (i.e. create_p*d_mapping) do not override existing mapping so we must make sure the linear mapping does not map memory in the kernel mapping by clipping the memory above the memory limit. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: c9811e379b21 ("riscv: Add mem kernel parameter support") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-23riscv: Fix memory_limit for 64-bit kernelAlexandre Ghiti1-2/+9
As described in Documentation/riscv/vm-layout.rst, the end of the virtual address space for 64-bit kernel is occupied by the modules/BPF/ kernel mappings so this actually reduces the amount of memory we are able to map and then use in the linear mapping. So make sure this limit is correctly set. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-22Merge remote-tracking branch 'riscv/riscv-fix-32bit' into fixesPalmer Dabbelt1-0/+1
This contains a single fix for 32-bit boot. It happens this was already fixed by c9811e379b21 ("riscv: Add mem kernel parameter support"), but the bug existed before that feature addition so I've applied the patch earlier and then merged it in (which results in a conflict, which is fixed via not changing the resulting tree). * riscv/riscv-fix-32bit: riscv: Fix 32-bit RISC-V boot failure
2021-07-22riscv: Fix 32-bit RISC-V boot failureBin Meng1-1/+3
Commit dd2d082b5760 ("riscv: Cleanup setup_bootmem()") adjusted the calling sequence in setup_bootmem(), which invalidates the fix commit de043da0b9e7 ("RISC-V: Fix usage of memblock_enforce_memory_limit") did for 32-bit RISC-V unfortunately. So now 32-bit RISC-V does not boot again when testing booting kernel on QEMU 'virt' with '-m 2G', which was exactly what the original commit de043da0b9e7 ("RISC-V: Fix usage of memblock_enforce_memory_limit") tried to fix. Fixes: dd2d082b5760 ("riscv: Cleanup setup_bootmem()") Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-09Merge tag 'riscv-for-linus-5.14-mw0' of ↵Linus Torvalds6-179/+206
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "We have a handful of new features for 5.14: - Support for transparent huge pages. - Support for generic PCI resources mapping. - Support for the mem= kernel parameter. - Support for KFENCE. - A handful of fixes to avoid W+X mappings in the kernel. - Support for VMAP_STACK based overflow detection. - An optimized copy_{to,from}_user" * tag 'riscv-for-linus-5.14-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (37 commits) riscv: xip: Fix duplicate included asm/pgtable.h riscv: Fix PTDUMP output now BPF region moved back to module region riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall riscv: add VMAP_STACK overflow detection riscv: ptrace: add argn syntax riscv: mm: fix build errors caused by mk_pmd() riscv: Introduce structure that group all variables regarding kernel mapping riscv: Map the kernel with correct permissions the first time riscv: Introduce set_kernel_memory helper riscv: Enable KFENCE for riscv64 RISC-V: Use asm-generic for {in,out}{bwlq} riscv: add ASID-based tlbflushing methods riscv: pass the mm_struct to __sbi_tlb_flush_range riscv: Add mem kernel parameter support riscv: Simplify xip and !xip kernel address conversion macros riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED riscv: Only initialize swiotlb when necessary riscv: fix typo in init.c riscv: Cleanup unused functions riscv: mm: Use better bitmap_zalloc() ...
2021-07-07riscv: Fix PTDUMP output now BPF region moved back to module regionAlexandre Ghiti1-2/+2
BPF region was moved back to the region below the kernel at the end of the module region by 3a02764c372c ("riscv: Ensure BPF_JIT_REGION_START aligned with PMD size"), so reflect this change in kernel page table output. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Fixes: 3a02764c372c ("riscv: Ensure BPF_JIT_REGION_START aligned with PMD size") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-06riscv: Introduce structure that group all variables regarding kernel mappingAlexandre Ghiti3-64/+38
We have a lot of variables that are used to hold kernel mapping addresses, offsets between physical and virtual mappings and some others used for XIP kernels: they are all defined at different places in mm/init.c, so group them into a single structure with, for some of them, more explicit and concise names. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-01Merge branch 'riscv-wx-mappings' into for-nextPalmer Dabbelt1-59/+49
This contains both the short-term fix for the W+X boot mappings and the larger cleanup. * riscv-wx-mappings: riscv: Map the kernel with correct permissions the first time riscv: Introduce set_kernel_memory helper riscv: Simplify xip and !xip kernel address conversion macros riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED riscv: mm: Fix W+X mappings at boot
2021-07-01riscv: Map the kernel with correct permissions the first timeAlexandre Ghiti1-63/+50
For 64-bit kernels, we map all the kernel with write and execute permissions and afterwards remove writability from text and executability from data. For 32-bit kernels, the kernel mapping resides in the linear mapping, so we map all the linear mapping as writable and executable and afterwards we remove those properties for unused memory and kernel mapping as described above. Change this behavior to directly map the kernel with correct permissions and avoid going through the whole mapping to fix the permissions. At the same time, this fixes an issue introduced by commit 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") as reported here https://github.com/starfive-tech/linux/issues/17. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-01riscv: Enable KFENCE for riscv64Liu Shixin1-1/+10
Add architecture specific implementation details for KFENCE and enable KFENCE for the riscv64 architecture. In particular, this implements the required interface in <asm/kfence.h>. KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the kfence pool to be mapped at page granularity. Testing this patch using the testcases in kfence_test.c and all passed. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Marco Elver <elver@google.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-01riscv: add ASID-based tlbflushing methodsGuo Ren2-8/+41
Implement optimized version of the tlb flushing routines for systems using ASIDs. These are behind the use_asid_allocator static branch to not affect existing systems not using ASIDs. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> [hch: rebased on top of previous cleanups, use the same algorithm as the non-ASID based code for local vs global flushes, keep functions as local as possible] Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Guo Ren <guoren@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-01riscv: pass the mm_struct to __sbi_tlb_flush_rangeChristoph Hellwig1-9/+6
Move the call mm_cpumask from the callers into __sbi_tlb_flush_range to reduce a bit of duplicate code and prepare for future changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Guo Ren <guoren@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-19riscv: Ensure BPF_JIT_REGION_START aligned with PMD sizeJisheng Zhang1-1/+1
Andreas reported commit fc8504765ec5 ("riscv: bpf: Avoid breaking W^X") breaks booting with one kind of defconfig, I reproduced a kernel panic with the defconfig: [ 0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220 [ 0.139159] Oops [#1] [ 0.139303] Modules linked in: [ 0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1 [ 0.139934] Hardware name: riscv-virtio,qemu (DT) [ 0.140193] epc : __memset+0xc4/0xfc [ 0.140416] ra : skb_flow_dissector_init+0x1e/0x82 [ 0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0 [ 0.140878] gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158 [ 0.141156] t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0 [ 0.141424] s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000 [ 0.141654] a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064 [ 0.141893] a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff [ 0.142126] s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088 [ 0.142353] s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438 [ 0.142584] s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac [ 0.142810] s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000 [ 0.143042] t5 : 0000000000000155 t6 : 00000000000003ff [ 0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f [ 0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc [ 0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60 [ 0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168 [ 0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224 [ 0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110 [ 0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc [ 0.145124] ---[ end trace f1e9643daa46d591 ]--- After some investigation, I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping outside of linear mapping") moves BPF JIT region after the kernel: | #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) The &_end is unlikely aligned with PMD size, so the front bpf jit region sits with part of kernel .data section in one PMD size mapping. But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog ROX, we will make part of kernel .data section RO too, so when we write to, for example memset the .data section, MMU will trigger a store page fault. To fix the issue, we need to ensure the BPF JIT region is PMD size aligned. This patch acchieve this goal by restoring the BPF JIT region to original position, I.E the 128MB before kernel .text section. The modification to kasan_init.c is inspired by Alexandre. Fixes: fc8504765ec5 ("riscv: bpf: Avoid breaking W^X") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-19riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' nameJisheng Zhang1-4/+4
commit 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") makes use of MODULES_VADDR to populate kernel, BPF, modules mapping. Currently, MODULES_VADDR is defined as below for RV64: | #define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G) But kasan_init() has two local variables which are also named as _start, _end, so MODULES_VADDR is evaluated with the local variable _end rather than the global "_end" as we expected. Fix this issue by renaming the two local variables. Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-15riscv: Add mem kernel parameter supportKefeng Wang1-3/+25
The memblock_enforce_memory_limit() could change the memblock range, so move the dram_end assignment after it in bootmem_init(), then support mem= cmdline. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-11riscv: Only initialize swiotlb when necessaryKefeng Wang1-0/+8
The SWIOTLB buffer is not needed unless the physical address space is beyond the limit of dma, only initialize swiotlb when swiotlb_force is true or not all system memory is DMA-able. Also move the swiotlb_init() into mem_init(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-09riscv: fix typo in init.cVitaly Wool1-1/+1
Commit 010623568222 introduced a typo in "__initdata" spelling which led to build breakage for XIP. Fix that. Fixes: 010623568222 ("riscv: mm: init: Consolidate vars, functions") Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-09riscv: mm: Use better bitmap_zalloc()Kefeng Wang1-2/+1
Use better bitmap_zalloc() to allocate bitmap. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-02riscv: mm: Fix W+X mappings at bootJisheng Zhang1-2/+6
When the kernel mapping was moved the last 2GB of the address space, (__va(PFN_PHYS(max_low_pfn))) is much smaller than the .data section start address, the last set_memory_nx() in protect_kernel_text_data() will fail, thus the .data section is still mapped as W+X. This results in below W+X mapping waring at boot. Fix it by passing the correct .data section page num to the set_memory_nx(). [ 0.396516] ------------[ cut here ]------------ [ 0.396889] riscv/mm: Found insecure W+X mapping at address (____ptrval____)/0xffffffff80c00000 [ 0.398347] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/ptdump.c:258 note_page+0x244/0x24a [ 0.398964] Modules linked in: [ 0.399459] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1+ #14 [ 0.400003] Hardware name: riscv-virtio,qemu (DT) [ 0.400591] epc : note_page+0x244/0x24a [ 0.401368] ra : note_page+0x244/0x24a [ 0.401772] epc : ffffffff80007c86 ra : ffffffff80007c86 sp : ffffffe000e7bc30 [ 0.402304] gp : ffffffff80caae88 tp : ffffffe000e70000 t0 : ffffffff80cb80cf [ 0.402800] t1 : ffffffff80cb80c0 t2 : 0000000000000000 s0 : ffffffe000e7bc80 [ 0.403310] s1 : ffffffe000e7bde8 a0 : 0000000000000053 a1 : ffffffff80c83ff0 [ 0.403805] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 6c7e7a5137233100 [ 0.404298] a5 : 6c7e7a5137233100 a6 : 0000000000000030 a7 : ffffffffffffffff [ 0.404849] s2 : ffffffff80e00000 s3 : 0000000040000000 s4 : 0000000000000000 [ 0.405393] s5 : 0000000000000000 s6 : 0000000000000003 s7 : ffffffe000e7bd48 [ 0.405935] s8 : ffffffff81000000 s9 : ffffffffc0000000 s10: ffffffe000e7bd48 [ 0.406476] s11: 0000000000001000 t3 : 0000000000000072 t4 : ffffffffffffffff [ 0.407016] t5 : 0000000000000002 t6 : ffffffe000e7b978 [ 0.407435] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.408052] Call Trace: [ 0.408343] [<ffffffff80007c86>] note_page+0x244/0x24a [ 0.408855] [<ffffffff8010c5a6>] ptdump_hole+0x14/0x1e [ 0.409263] [<ffffffff800f65c6>] walk_pgd_range+0x2a0/0x376 [ 0.409690] [<ffffffff800f6828>] walk_page_range_novma+0x4e/0x6e [ 0.410146] [<ffffffff8010c5f8>] ptdump_walk_pgd+0x48/0x78 [ 0.410570] [<ffffffff80007d66>] ptdump_check_wx+0xb4/0xf8 [ 0.410990] [<ffffffff80006738>] mark_rodata_ro+0x26/0x2e [ 0.411407] [<ffffffff8031961e>] kernel_init+0x44/0x108 [ 0.411814] [<ffffffff80002312>] ret_from_exception+0x0/0xc [ 0.412309] ---[ end trace 7ec3459f2547ea83 ]--- [ 0.413141] Checked W+X mappings: failed, 512 W+X pages found Fixes: 2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-29riscv: mm: init: Consolidate vars, functionsJisheng Zhang1-17/+19
Consolidate the following items in init.c Staticize global vars as much as possible; Add __initdata mark if the global var isn't needed after init Add __init mark if the func isn't needed after init Add __ro_after_init if the global var is read only after init Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-29riscv: Add __init section marker to some functions againJisheng Zhang1-1/+1
These functions are not needed after booting, so mark them as __init to move them to the __init section. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-26riscv: Optimize switch_mm by passing "cpu" to flush_icache_deferred()Jisheng Zhang1-3/+4
Directly passing the cpu to flush_icache_deferred() rather than calling smp_processor_id() again. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Palmer: drop the QEMU performance numbers, and update the comment] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-26riscv: mm: Drop redundant _sdata and _edata declarationKefeng Wang1-7/+1
The _sdata/_edata is already in sections.h, drop redundant declaration. Also move _xiprom/_exiprom declarations at the beginning of the file, cleanup one CONFIG_XIP_KERNEL. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-26riscv: Move setup_bootmem into paging_initKefeng Wang1-1/+2
Make setup_bootmem() static. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-26riscv: mm: Remove setup_zero_page()Jisheng Zhang1-6/+0
The empty_zero_page sits at .bss..page_aligned section, so will be cleared to zero during clearing bss, we don't need to clear it again. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: mm: add THP support on 64-bitNanyong Sun1-0/+7
Bring Transparent HugePage support to riscv. A transparent huge page is always represented as a pmd. Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: mm: add param stride for __sbi_tlb_flush_rangeNanyong Sun1-5/+5
Add a parameter: stride for __sbi_tlb_flush_range(), represent the page stride between the address of start and end. Normally, the stride is PAGE_SIZE, and when flush huge page address, the stride can be the huge page size such as:PMD_SIZE, then it only need to flush one tlb entry if the address range within PMD_SIZE. Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>