summaryrefslogtreecommitdiff
path: root/arch/arm64/include/asm/checksum.h
AgeCommit message (Collapse)AuthorFilesLines
2020-07-30arm64: csum: Fix handling of bad packetsRobin Murphy1-2/+3
Although iph is expected to point to at least 20 bytes of valid memory, ihl may be bogus, for example on reception of a corrupt packet. If it happens to be less than 5, we really don't want to run away and dereference 16GB worth of memory until it wraps back to exactly zero... Fixes: 0e455d8e80aa ("arm64: Implement optimised IP checksum helpers") Reported-by: guodeqing <geffrey.guo@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-09arm64: csum: Optimise IPv6 header checksumRobin Murphy1-1/+6
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it about 1.3x-2x faster across a range of microarchitecture/compiler combinations. Not much in absolute terms, but every little helps. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-01-16arm64: Implement optimised checksum routineRobin Murphy1-0/+3
Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang <huanglingyan2@huawei.com> Tested-by: Lingyan Huang <huanglingyan2@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner1-12/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-29arm64: add missing conversion to __wsum in ip_fast_csum()Luc Van Oostenryck1-1/+1
ARM64 implementation of ip_fast_csum() do most of the work in 128 or 64 bit and call csum_fold() to finalize. csum_fold() itself take a __wsum argument, to insure that this value is always a 32bit native-order value. Fix this by adding the sadly needed '__force' to cast the native 'sum' to the type '__wsum'. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-21arm64: Implement optimised IP checksum helpersRobin Murphy1-0/+51
AArch64 is capable of 128-bit memory accesses without alignment restrictions, which makes it both possible and highly practical to slurp up a typical 20-byte IP header in just 2 loads. Implement our own version of ip_fast_checksum() to take advantage of that, resulting in considerably fewer instructions and memory accesses than the generic version. We can also get more optimal code generation for csum_fold() by defining it a slightly different way round from the generic version, so throw that into the mix too. Suggested-by: Luke Starrett <luke.starrett@broadcom.com> Acked-by: Luke Starrett <luke.starrett@broadcom.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>