summaryrefslogtreecommitdiff
path: root/arch/arm64/lib
AgeCommit message (Collapse)AuthorFilesLines
2021-06-24Merge branch 'for-next/mte' into for-next/coreWill Deacon1-0/+20
KASAN optimisations for the hardware tagging (MTE) implementation. * for-next/mte: kasan: disable freed user page poisoning with HW tags arm64: mte: handle tags zeroing at page allocation time kasan: use separate (un)poison implementation for integrated init mm: arch: remove indirection level in alloc_zeroed_user_highpage_movable() kasan: speed up mte_set_mem_tag_range
2021-06-24Merge branch 'for-next/kasan' into for-next/coreWill Deacon2-0/+78
Optimise out-of-line KASAN checking when using software tagging. * for-next/kasan: kasan: arm64: support specialized outlined tag mismatch checks
2021-06-24Merge branch 'for-next/insn' into for-next/coreWill Deacon2-1/+1459
Refactoring of our instruction decoding routines and addition of some missing encodings. * for-next/insn: arm64: insn: avoid circular include dependency arm64: insn: move AARCH64_INSN_SIZE into <asm/insn.h> arm64: insn: decouple patching from insn code arm64: insn: Add load/store decoding helpers arm64: insn: Add some opcodes to instruction decoder arm64: insn: Add barrier encodings arm64: insn: Add SVE instruction class arm64: Move instruction encoder/decoder under lib/ arm64: Move aarch32 condition check functions arm64: Move patching utilities out of instruction encoding/decoding
2021-06-24Merge branch 'for-next/cortex-strings' into for-next/coreWill Deacon9-967/+907
Update our kernel string routines to the latest Cortex Strings implementation. * for-next/cortex-strings: arm64: update string routine copyrights and URLs arm64: Rewrite __arch_clear_user() arm64: Better optimised memchr() arm64: Import latest memcpy()/memmove() implementation arm64: Add assembly annotations for weak-PI-alias madness arm64: Import latest version of Cortex Strings' strncmp arm64: Import updated version of Cortex Strings' strlen arm64: Import latest version of Cortex Strings' strcmp arm64: Import latest version of Cortex Strings' memcmp
2021-06-04arm64: mte: handle tags zeroing at page allocation timePeter Collingbourne1-0/+20
Currently, on an anonymous page fault, the kernel allocates a zeroed page and maps it in user space. If the mapping is tagged (PROT_MTE), set_pte_at() additionally clears the tags. It is, however, more efficient to clear the tags at the same time as zeroing the data on allocation. To avoid clearing the tags on any page (which may not be mapped as tagged), only do this if the vma flags contain VM_MTE. This requires introducing a new GFP flag that is used to determine whether to clear the tags. The DC GZVA instruction with a 0 top byte (and 0 tag) requires top-byte-ignore. Set the TCR_EL1.{TBI1,TBID1} bits irrespective of whether KASAN_HW is enabled. Signed-off-by: Peter Collingbourne <pcc@google.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://linux-review.googlesource.com/id/Id46dc94e30fe11474f7e54f5d65e7658dbdddb26 Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-02arm64: update string routine copyrights and URLsMark Rutland5-11/+11
To make future archaeology easier, let's have the string routine comment blocks encode the specific upstream commit ID they were imported from. These are the same commit IDs as listed in the commits importing the code, expanded to 16 characters. Note that the routines have different commit IDs, each reprsenting the latest upstream commit which changed the particular routine. At the same time, let's consistently include 2021 in the copyright dates. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210602151358.35571-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Rewrite __arch_clear_user()Robin Murphy1-20/+27
Now that we're always using STTR variants rather than abstracting two different addressing modes, the user_ldst macro here is frankly more obfuscating than helpful. Rewrite __arch_clear_user() with regular USER() annotations so that it's clearer what's going on, and take the opportunity to minimise the branchiness in the most common paths, while also allowing the exception fixup to return an accurate result. Apparently some folks examine large reads from /dev/zero closely enough to notice the loop being hot, so align it per the other critical loops (presumably around a typical instruction fetch granularity). Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1cbd78b12c076a8ad4656a345811cfb9425df0b3.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Better optimised memchr()Robin Murphy1-12/+53
Although we implement our own assembly version of memchr(), it turns out to be barely any better than what GCC can generate for the generic C version (and would go wrong if the size_t argument were ever large enough to be interpreted as negative). Unfortunately we can't import the tuned implementation from the Arm optimized-routines library, since that has some Advanced SIMD parts which are not really viable for general kernel library code. What we can do, however, is pep things up with some relatively straightforward word-at-a-time logic for larger calls. Adding some timing to optimized-routines' memchr() test for a simple benchmark, overall this version comes in around half as fast as the SIMD code, but still nearly 4x faster than our existing implementation. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/58471b42f9287e039dafa9e5e7035077152438fd.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Import latest memcpy()/memmove() implementationRobin Murphy3-233/+230
Import the latest implementation of memcpy(), based on the upstream code of string/aarch64/memcpy.S at commit afd6244 from https://github.com/ARM-software/optimized-routines, and subsuming memmove() in the process. Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT license. Note also that the needs of the usercopy routines vs. regular memcpy() have now diverged so far that we abandon the shared template idea and the damage which that incurred to the tuning of LDP/STP loops. We'll be back to tackle those routines separately in future. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/3c953af43506581b2422f61952261e76949ba711.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Import latest version of Cortex Strings' strncmpSam Tebbs1-222/+184
Import the latest version of the former Cortex Strings - now Arm Optimized Routines - strncmp function based on the upstream code of string/aarch64/strncmp.S at commit e823e3a from https://github.com/ARM-software/optimized-routines Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT license. Signed-off-by: Sam Tebbs <sam.tebbs@arm.com> [ rm: update attribution and commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/26110bee02ad360596c9a7536af7eaaf6890d0e8.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Import updated version of Cortex Strings' strlenSam Tebbs1-85/+173
Import an updated version of the former Cortex Strings - now Arm Optimized Routines - strcmp function. The latest version introduces Advanced SIMD usage which rules it out for our purposes, but we can still pick an intermediate improvement from the previous version, namely string/aarch64/strlen.S at commit 98e4d6a from https://github.com/ARM-software/optimized-routines Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT license. Signed-off-by: Sam Tebbs <sam.tebbs@arm.com> [ rm: update attribution and commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/32e3489398a24b23ae6e996935ac4818f8fd9dfd.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Import latest version of Cortex Strings' strcmpSam Tebbs1-168/+121
Import the latest version of the former Cortex Strings - now Arm Optimized Routines - strcmp function based on the upstream code of string/aarch64/strcmp.S at commit afd6244 from https://github.com/ARM-software/optimized-routines Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT license. Signed-off-by: Sam Tebbs <sam.tebbs@arm.com> [ rm: update attribution and commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0fe90c90b96b569fbdfd46e47bd1298abb02079e.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-01arm64: Import latest version of Cortex Strings' memcmpSam Tebbs1-227/+119
Import the latest version of the former Cortex Strings - now Arm Optimized Routines - memcmp function based on the upstream code of string/aarch64/memcmp.S at commit e823e3a from https://github.com/ARM-software/optimized-routines Note that for simplicity Arm have chosen to contribute this code to Linux under GPLv2 rather than the original MIT license. Signed-off-by: Sam Tebbs <sam.tebbs@arm.com> [ rm: update attribution and commit message ] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/2889de2d41054f3f508fb3addad784a3606ef383.1622128527.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27arm64: insn: Add SVE instruction classJulien Thierry1-1/+1
SVE has been public for some time now. Let the decoder acknowledge its existence. Signed-off-by: Julien Thierry <jthierry@redhat.com> Link: https://lore.kernel.org/r/20210303170536.1838032-6-jthierry@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27arm64: Move instruction encoder/decoder under lib/Julien Thierry2-3/+1461
Aarch64 instruction set encoding and decoding logic can prove useful for some features/tools both part of the kernel and outside the kernel. Isolate the function dealing only with encoding/decoding instructions, with minimal dependency on kernel utilities in order to be able to reuse that code. Code was only moved, no code should have been added, removed nor modifier. Signed-off-by: Julien Thierry <jthierry@redhat.com> Link: https://lore.kernel.org/r/20210303170536.1838032-5-jthierry@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27kasan: arm64: support specialized outlined tag mismatch checksPeter Collingbourne2-0/+78
By using outlined checks we can achieve a significant code size improvement by moving the tag-based ASAN checks into separate functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these functions have a custom calling convention that preserves most registers and is specialized to the register containing the address and the type of access, and as a result we can eliminate the code size and performance overhead of a standard calling convention such as AAPCS for these functions. This change depends on a separate series of changes to Clang [1] to support outlined checks in the kernel, although the change works fine without them (we just don't get outlined checks). This is because the flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang changes land. The flag was introduced in the Clang 9.0 timeframe as part of the support for outlined checks in userspace and because our minimum Clang version is 10.0 we can pass it unconditionally. Outlined checks require a new runtime function with a custom calling convention. Add this function to arch/arm64/lib. I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time CONFIG_KASAN_INLINE=y before 92824064 6.18s CONFIG_KASAN_INLINE=y after 38822400 6.65s CONFIG_KASAN_OUTLINE=y 39215616 11.48s We can see straight away that specialized outlined checks beat the existing CONFIG_KASAN_OUTLINE=y on both code size and boot time for tag-based ASAN. As for the comparison between CONFIG_KASAN_INLINE=y before and after we saw similar performance numbers in userspace [2] and decided that since the performance overhead is minimal compared to the overhead of tag-based ASAN itself as well as compared to the code size improvements we would just replace the inlined checks with the specialized outlined checks without the option to select between them, and that is what I have implemented in this patch. Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Link: [1] https://reviews.llvm.org/D90426 Link: [2] https://reviews.llvm.org/D56954 Link: https://lore.kernel.org/r/20210526174927.2477847-3-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25arm64: Rename arm64-internal cache maintenance functionsFuad Tabba1-2/+2
Although naming across the codebase isn't that consistent, it tends to follow certain patterns. Moreover, the term "flush" isn't defined in the Arm Architecture reference manual, and might be interpreted to mean clean, invalidate, or both for a cache. Rename arm64-internal functions to make the naming internally consistent, as well as making it consistent with the Arm ARM, by specifying whether it applies to the instruction, data, or both caches, whether the operation is a clean, invalidate, or both. Also specify which point the operation applies to, i.e., to the point of unification (PoU), coherency (PoC), or persistence (PoP). This commit applies the following sed transformation to all files under arch/arm64: "s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\ "s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\ "s/\binvalidate_icache_range\b/icache_inval_pou/g;"\ "s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\ "s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\ "s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\ "s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\ "s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\ "s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\ "s/\b__flush_icache_all\b/icache_inval_all_pou/g;" Note that __clean_dcache_area_poc is deliberately missing a word boundary check at the beginning in order to match the efistub symbols in image-vars.h. Also note that, despite its name, __flush_icache_range operates on both instruction and data caches. The name change here reflects that. No functional change intended. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25arm64: __clean_dcache_area_pop to take end parameter instead of sizeFuad Tabba1-2/+2
To be consistent with other functions with similar names and functionality in cacheflush.h, cache.S, and cachetlb.rst, change to specify the range in terms of start and end, as opposed to start and size. No functional change intended. Reported-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210524083001.2586635-15-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-03-19arm64: lib: Annotate {clear, copy}_page() as position-independentWill Deacon2-4/+4
clear_page() and copy_page() are suitable for use outside of the kernel address space, so annotate them as position-independent code. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-2-qperret@google.com
2021-02-26arm64: kasan: simplify and inline MTE functionsAndrey Konovalov1-16/+0
This change provides a simpler implementation of mte_get_mem_tag(), mte_get_random_tag(), and mte_set_mem_tag_range(). Simplifications include removing system_supports_mte() checks as these functions are onlye called from KASAN runtime that had already checked system_supports_mte(). Besides that, size and address alignment checks are removed from mte_set_mem_tag_range(), as KASAN now does those. This change also moves these functions into the asm/mte-kasan.h header and implements mte_set_mem_tag_range() via inline assembly to avoid unnecessary functions calls. [vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()] Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-22arm64: mte: add in-kernel MTE helpersVincenzo Frascino1-0/+16
Provide helper functions to manipulate allocation and pointer tags for kernel addresses. Low-level helper functions (mte_assign_*, written in assembly) operate tag values from the [0x0, 0xF] range. High-level helper functions (mte_get/set_*) use the [0xF0, 0xFF] range to preserve compatibility with normal kernel pointers that have 0xFF in their top byte. MTE_GRANULE_SIZE and related definitions are moved to mte-def.h header that doesn't have any dependencies and is safe to include into any low-level header. Link: https://lkml.kernel.org/r/c31bf759b4411b2d98cdd801eb928e241584fd1f.1606161801.git.andreyknvl@google.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-02arm64: uaccess cleanup macro namingMark Rutland5-22/+22
Now the uaccess primitives use LDTR/STTR unconditionally, the uao_{ldp,stp,user_alternative} asm macros are misnamed, and have a redundant argument. Let's remove the redundant argument and rename these to user_{ldp,stp,ldst} respectively to clean this up. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Robin Murohy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201202131558.39270-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02arm64: uaccess: simplify __copy_user_flushcache()Mark Rutland1-3/+1
Currently __copy_user_flushcache() open-codes raw_copy_from_user(), and doesn't use uaccess_mask_ptr() on the user address. Let's have it call raw_copy_from_user(), which is both a simplification and ensures that user pointers are masked under speculation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201202131558.39270-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-10arm64: uaccess: move uao_* alternatives to asm-uaccess.hMark Rutland1-1/+1
The uao_* alternative asm macros are only used by the uaccess assembly routines in arch/arm64/lib/, where they are included indirectly via asm-uaccess.h. Since they're specific to the uaccess assembly (and will lose the alternatives in subsequent patches), let's move them into asm-uaccess.h. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> [will: update #include in mte.S to pull in uao asm macros] Signed-off-by: Will Deacon <will@kernel.org>
2020-10-30arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.SFangrui Song3-6/+3
Commit 39d114ddc682 ("arm64: add KASAN support") added .weak directives to arch/arm64/lib/mem*.S instead of changing the existing SYM_FUNC_START_PI macros. This can lead to the assembly snippet `.weak memcpy ... .globl memcpy` which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate SYM_FUNC_START_WEAK_PI instead. Fixes: 39d114ddc682 ("arm64: add KASAN support") Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201029181951.1866093-1-maskray@google.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Enable swap of tagged pagesSteven Price1-0/+45
When swapping pages out to disk it is necessary to save any tags that have been set, and restore when swapping back in. Make use of the new page flag (PG_ARCH_2, locally named PG_mte_tagged) to identify pages with tags. When swapping out these pages the tags are stored in memory and later restored when the pages are brought back in. Because shmem can swap pages back in without restoring the userspace PTE it is also necessary to add a hook for shmem. Signed-off-by: Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: move function prototypes to mte.h] [catalin.marinas@arm.com: drop '_tags' from arch_swap_restore_tags()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS supportCatalin Marinas1-0/+53
Add support for bulk setting/getting of the MTE tags in a tracee's address space at 'addr' in the ptrace() syscall prototype. 'data' points to a struct iovec in the tracer's address space with iov_base representing the address of a tracer's buffer of length iov_len. The tags to be copied to/from the tracer's buffer are stored as one tag per byte. On successfully copying at least one tag, ptrace() returns 0 and updates the tracer's iov_len with the number of tags copied. In case of error, either -EIO or -EFAULT is returned, trying to follow the ptrace() man page. Note that the tag copying functions are not performance critical, therefore they lack optimisations found in typical memory copy routines. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alan Hayward <Alan.Hayward@arm.com> Cc: Luis Machado <luis.machado@linaro.org> Cc: Omair Javaid <omair.javaid@linaro.org>
2020-09-04arm64: mte: Tags-aware copy_{user_,}highpage() implementationsVincenzo Frascino1-0/+19
When the Memory Tagging Extension is enabled, the tags need to be preserved across page copy (e.g. for copy-on-write, page migration). Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to the destination if the source page has the PG_mte_tagged flag set. copy_user_page() does not need to handle tag copying since, with this patch, it is only called by the DAX code where there is no source page structure (and no source tags). Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTECatalin Marinas2-0/+36
Pages allocated by the kernel are not guaranteed to have the tags zeroed, especially as the kernel does not (yet) use MTE itself. To ensure the user can still access such pages when mapped into its address space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged (PG_arch_2) - is used to track pages with valid allocation tags. Since the zero page is mapped as pte_special(), it won't be covered by the above set_pte_at() mechanism. Clear its tags during early MTE initialisation. Co-developed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-06-11Merge branch 'rwonce/rework' of ↵Linus Torvalds1-8/+12
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux Pull READ/WRITE_ONCE rework from Will Deacon: "This the READ_ONCE rework I've been working on for a while, which bumps the minimum GCC version and improves code-gen on arm64 when stack protector is enabled" [ Side note: I'm _really_ tempted to raise the minimum gcc version to 4.9, so that we can just say that we require _Generic() support. That would allow us to more cleanly handle a lot of the cases where we depend on very complex macros with 'sizeof' or __builtin_choose_expr() with __builtin_types_compatible_p() etc. This branch has a workaround for sparse not handling _Generic(), either, but that was already fixed in the sparse development branch, so it's really just gcc-4.9 that we'd require. - Linus ] * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse compiler_types.h: Optimize __unqual_scalar_typeof compilation time compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long) compiler-types.h: Include naked type in __pick_integer_type() match READ_ONCE: Fix comment describing 2x32-bit atomicity gcov: Remove old GCC 3.4 support arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros READ_ONCE: Drop pointer qualifiers when reading from scalar types READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() arm64: csum: Disable KASAN for do_csum() fault_inject: Don't rely on "return value" from WRITE_ONCE() net: tls: Avoid assigning 'const' pointer to non-const pointer netfilter: Avoid assigning 'const' pointer to non-const pointer compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-04-29arm64: Reorder the macro arguments in the copy routinesCatalin Marinas4-64/+64
The current argument order is obviously buggy (memcpy.S): macro strb1 ptr, regB, val strb \ptr, [\regB], \val endm However, it cancels out as the calling sites in copy_template.S pass the address as the regB argument. Mechanically reorder the arguments to match the instruction mnemonics. There is no difference in objdump before and after this patch. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200429183702.28445-1-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-04-28arm64: lib: Consistently enable crc32 extensionMark Brown1-1/+1
Currently most of the assembly files that use architecture extensions enable them using the .arch directive but crc32.S uses .cpu instead. Move that over to .arch for consistency. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20200414182843.31664-1-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-04-15arm64: csum: Disable KASAN for do_csum()Will Deacon1-8/+12
do_csum() over-reads the source buffer and therefore abuses READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for READ_ONCE_NOCHECK() becoming a macro, and therefore losing its '__no_sanitize_address' annotation, just annotate do_csum() explicitly and fall back to normal loads. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17arm64: fix spelling mistake "ca not" -> "cannot"韩科才1-1/+1
There is a spelling mistake in the comment, Fix it. Signed-off-by: hankecai <hankecai@bbktel.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-09arm64: csum: Optimise IPv6 header checksumRobin Murphy1-0/+27
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it about 1.3x-2x faster across a range of microarchitecture/compiler combinations. Not much in absolute terms, but every little helps. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-01-22Merge branch 'for-next/asm-annotations' into for-next/coreWill Deacon19-50/+50
* for-next/asm-annotations: (6 commits) arm64: kernel: Correct annotation of end of el0_sync ...
2020-01-22Merge branches 'for-next/acpi', 'for-next/cpufeatures', 'for-next/csum', ↵Will Deacon3-22/+148
'for-next/e0pd', 'for-next/entry', 'for-next/kbuild', 'for-next/kexec/cleanup', 'for-next/kexec/file-kdump', 'for-next/misc', 'for-next/nofpsimd', 'for-next/perf' and 'for-next/scs' into for-next/core * for-next/acpi: ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map() * for-next/cpufeatures: (2 commits) arm64: Introduce ID_ISAR6 CPU register ... * for-next/csum: (2 commits) arm64: csum: Fix pathological zero-length calls ... * for-next/e0pd: (7 commits) arm64: kconfig: Fix alignment of E0PD help text ... * for-next/entry: (5 commits) arm64: entry: cleanup sp_el0 manipulation ... * for-next/kbuild: (4 commits) arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean' ... * for-next/kexec/cleanup: (11 commits) Revert "arm64: kexec: make dtb_mem always enabled" ... * for-next/kexec/file-kdump: (2 commits) arm64: kexec_file: add crash dump support ... * for-next/misc: (12 commits) arm64: entry: Avoid empty alternatives entries ... * for-next/nofpsimd: (7 commits) arm64: nofpsmid: Handle TIF_FOREIGN_FPSTATE flag cleanly ... * for-next/perf: (2 commits) perf/imx_ddr: Fix cpu hotplug state cleanup ... * for-next/scs: (6 commits) arm64: kernel: avoid x18 in __cpu_soft_restart ...
2020-01-17arm64: csum: Fix pathological zero-length callsRobin Murphy1-0/+3
In validating the checksumming results of the new routine, I sadly neglected to test its not-checksumming results. Thus it slipped through that the one case where @buff is already dword-aligned and @len = 0 manages to defeat the tail-masking logic and behave as if @len = 8. For a zero length it doesn't make much sense to deference @buff anyway, so just add an early return (which has essentially zero impact on performance). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16arm64/lib: copy_page: avoid x18 register in assembler codeArd Biesheuvel1-19/+19
Register x18 will no longer be used as a caller save register in the future, so stop using it in the copy_page() code. Link: https://patchwork.kernel.org/patch/9836869/ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Sami: changed the offset and bias to be explicit] Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16arm64: Implement optimised checksum routineRobin Murphy2-3/+126
Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang <huanglingyan2@huawei.com> Tested-by: Lingyan Huang <huanglingyan2@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08arm64: lib: Use modern annotations for assembly functionsMark Brown19-50/+50
In an effort to clarify and simplify the annotation of assembly functions in the kernel new macros have been introduced. These replace ENTRY and ENDPROC and also add a new annotation for static functions which previously had no ENTRY equivalent. Update the annotations in the library code to the new macros. Signed-off-by: Mark Brown <broonie@kernel.org> [will: Use SYM_FUNC_START_WEAK_PI] Signed-off-by: Will Deacon <will@kernel.org>
2019-11-20arm64: uaccess: Remove uaccess_*_not_uao asm macrosPavel Tatashin5-13/+5
It is safer and simpler to drop the uaccess assembly macros in favour of inline C functions. Although this bloats the Image size slightly, it aligns our user copy routines with '{get,put}_user()' and generally makes the code a lot easier to reason about. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> [will: tweaked commit message and changed temporary variable names] Signed-off-by: Will Deacon <will@kernel.org>
2019-11-20arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess faultPavel Tatashin4-0/+4
A number of our uaccess routines ('__arch_clear_user()' and '__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they encounter an unhandled fault whilst accessing userspace. For CPUs implementing both hardware PAN and UAO, this bug has no effect when both extensions are in use by the kernel. For CPUs implementing hardware PAN but not UAO, this means that a kernel using hardware PAN may execute portions of code with PAN inadvertently disabled, opening us up to potential security vulnerabilities that rely on userspace access from within the kernel which would usually be prevented by this mechanism. In other words, parts of the kernel run the same way as they would on a CPU without PAN implemented/emulated at all. For CPUs not implementing hardware PAN and instead relying on software emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately much worse. Calling 'schedule()' with software PAN disabled means that the next task will execute in the kernel using the page-table and ASID of the previous process even after 'switch_mm()', since the actual hardware switch is deferred until return to userspace. At this point, or if there is a intermediate call to 'uaccess_enable()', the page-table and ASID of the new process are installed. Sadly, due to the changes introduced by KPTI, this is not an atomic operation and there is a very small window (two instructions) where the CPU is configured with the page-table of the old task and the ASID of the new task; a speculative access in this state is disastrous because it would corrupt the TLB entries for the new task with mappings from the previous address space. As Pavel explains: | I was able to reproduce memory corruption problem on Broadcom's SoC | ARMv8-A like this: | | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's | stack is accessed and copied. | | The test program performed the following on every CPU and forking | many processes: | | unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, | MAP_SHARED | MAP_ANONYMOUS, -1, 0); | map[0] = getpid(); | sched_yield(); | if (map[0] != getpid()) { | fprintf(stderr, "Corruption detected!"); | } | munmap(map, PAGE_SIZE); | | From time to time I was getting map[0] to contain pid for a | different process. Ensure that PAN is re-enabled when returning after an unhandled user fault from our uaccess routines. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never") Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> [will: rewrote commit message] Signed-off-by: Will Deacon <will@kernel.org>
2019-08-30Merge branch 'for-next/atomics' into for-next/coreWill Deacon2-22/+0
* for-next/atomics: (10 commits) Rework LSE instruction selection to use static keys instead of alternatives
2019-08-29arm64: atomics: Remove atomic_ll_sc compilation unitAndrew Murray2-22/+0
We no longer fall back to out-of-line atomics on systems with CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set. Remove the unused compilation unit which provided these symbols. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07arm64: Add support for function error injectionLeo Yan2-0/+20
Inspired by the commit 7cd01b08d35f ("powerpc: Add support for function error injection"), this patch supports function error injection for Arm64. This patch mainly support two functions: one is regs_set_return_value() which is used to overwrite the return value; the another function is override_function_with_return() which is to override the probed function returning and jump to its caller. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2-8/+2
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner20-249/+20
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-09arm64: Makefile: Replace -pg with CC_FLAGS_FTRACETorsten Duwe1-1/+1
In preparation for arm64 supporting ftrace built on other compiler options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE) flags, whatever these may be, rather than assuming '-pg'. There should be no functional change as a result of this patch. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: string: use asm EXPORT_SYMBOL()Mark Rutland11-0/+14
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the string routine exports to the assembly files the functions are defined in. Routines which should only be exported for !KASAN builds are exported using the EXPORT_SYMBOL_NOKASAN() helper. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>