summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2017-03-01crypto: arm/crc32 - add build time test for CRC instruction supportArd Biesheuvel1-1/+11
The accelerated CRC32 module for ARM may use either the scalar CRC32 instructions, the NEON 64x64 to 128 bit polynomial multiplication (vmull.p64) instruction, or both, depending on what the current CPU supports. However, this also requires support in binutils, and as it turns out, versions of binutils exist that support the vmull.p64 instruction but not the crc32 instructions. So refactor the Makefile logic so that this module only gets built if binutils has support for both. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-01crypto: arm/crc32 - fix build error with outdated binutilsArd Biesheuvel1-1/+1
Annotate a vmov instruction with an explicit element size of 32 bits. This is inferred by recent toolchains, but apparently, older versions need some help figuring this out. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driverArd Biesheuvel2-2/+267
On ARMv8 implementations that do not support the Crypto Extensions, such as the Raspberry Pi 3, the CCM driver falls back to the generic table based AES implementation to perform the MAC part of the algorithm, which is slow and not time invariant. So add a CBCMAC implementation to the shared glue code between NEON AES and Crypto Extensions AES, so that it can be used instead now that the CCM driver has been updated to look for CBCMAC implementations other than the one it supplies itself. Also, given how these algorithms mostly only differ in the way the key handling and the final encryption are implemented, expose CMAC and XCBC algorithms as well based on the same core update code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: sha512-mb - Protect sha512 mb ctx mgr accessTim Chen1-22/+42
The flusher and regular multi-buffer computation via mcryptd may race with another. Add here a lock and turn off interrupt to to access multi-buffer computation state cstate->mgr before a round of computation. This should prevent the flusher code jumping in. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11crypto: arm64/crc32 - merge CRC32 and PMULL instruction based driversArd Biesheuvel5-312/+41
The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Matthias Brugger <mbrugger@suse.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm/aes - don't use IV buffer to return final keystream blockArd Biesheuvel2-11/+14
The ARM bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes - don't use IV buffer to return final keystream blockArd Biesheuvel2-18/+28
The arm64 bit sliced AES core code uses the IV buffer to pass the final keystream block back to the glue code if the input is not a multiple of the block size, so that the asm code does not have to deal with anything except 16 byte blocks. This is done under the assumption that the outgoing IV is meaningless anyway in this case, given that chaining is no longer possible under these circumstances. However, as it turns out, the CCM driver does expect the IV to retain a value that is equal to the original IV except for the counter value, and even interprets byte zero as a length indicator, which may result in memory corruption if the IV is overwritten with something else. So use a separate buffer to return the final keystream block. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes - replace scalar fallback with plain NEON fallbackArd Biesheuvel2-11/+29
The new bitsliced NEON implementation of AES uses a fallback in two places: CBC encryption (which is strictly sequential, whereas this driver can only operate efficiently on 8 blocks at a time), and the XTS tweak generation, which involves encrypting a single AES block with a different key schedule. The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback, given that it is faster than scalar on low end cores (which is what the NEON implementations target, since high end cores have dedicated instructions for AES), and shows similar behavior in terms of D-cache footprint and sensitivity to cache timing attacks. So switch the fallback handling to the plain NEON driver. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes-neon-blk - tweak performance for low end coresArd Biesheuvel2-135/+102
The non-bitsliced AES implementation using the NEON is highly sensitive to micro-architectural details, and, as it turns out, the Cortex-A53 on the Raspberry Pi 3 is a core that can benefit from this code, given that its scalar AES performance is abysmal (32.9 cycles per byte). The new bitsliced AES code manages 19.8 cycles per byte on this core, but can only operate on 8 blocks at a time, which is not supported by all chaining modes. With a bit of tweaking, we can get the plain NEON code to run at 22.0 cycles per byte, making it useful for sequential modes like CBC encryption. (Like bitsliced NEON, the plain NEON implementation does not use any lookup tables, which makes it easy on the D-cache, and invulnerable to cache timing attacks) So tweak the plain NEON AES code to use tbl instructions rather than shl/sri pairs, and to avoid the need to reload permutation vectors or other constants from memory in every round. Also, improve the decryption performance by switching to 16x8 pmul instructions for the performing the multiplications in GF(2^8). To allow the ECB and CBC encrypt routines to be reused by the bitsliced NEON code in a subsequent patch, export them from the module. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes - performance tweakArd Biesheuvel1-33/+19
Shuffle some instructions around in the __hround macro to shave off 0.1 cycles per byte on Cortex-A57. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes - avoid literals for cross-module symbol referencesArd Biesheuvel1-5/+2
Using simple adrp/add pairs to refer to the AES lookup tables exposed by the generic AES driver (which could be loaded far away from this driver when KASLR is in effect) was unreliable at module load time before commit 41c066f2c4d4 ("arm64: assembler: make adr_l work in modules under KASLR"), which is why the AES code used literals instead. So now we can get rid of the literals, and switch to the adr_l macro. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/chacha20 - remove cra_alignmaskArd Biesheuvel1-1/+0
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes-blk - remove cra_alignmaskArd Biesheuvel2-15/+9
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm64/aes-ce-ccm - remove cra_alignmaskArd Biesheuvel1-1/+0
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm/chacha20 - remove cra_alignmaskArd Biesheuvel1-1/+0
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03crypto: arm/aes-ce - remove cra_alignmaskArd Biesheuvel2-52/+47
Remove the unnecessary alignmask: it is much more efficient to deal with the misalignment in the core algorithm than relying on the crypto API to copy the data to a suitably aligned buffer. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu2-51/+48
Merge the crypto tree to pick up arm64 output IV patch.
2017-02-03crypto: aesni - Fix failure when pcbc module is absentHerbert Xu1-4/+4
When aesni is built as a module together with pcbc, the pcbc module must be present for aesni to load. However, the pcbc module may not be present for reasons such as its absence on initramfs. This patch allows the aesni to function even if the pcbc module is enabled but not present. Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: x86 - make constants readonly, allow linker to merge themDenys Vlasenko33-74/+229
A lot of asm-optimized routines in arch/x86/crypto/ keep its constants in .data. This is wrong, they should be on .rodata. Mnay of these constants are the same in different modules. For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F exists in at least half a dozen places. There is a way to let linker merge them and use just one copy. The rules are as follows: mergeable objects of different sizes should not share sections. You can't put them all in one .rodata section, they will lose "mergeability". GCC puts its mergeable constants in ".rodata.cstSIZE" sections, or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used. This patch does the same: .section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16 It is important that all data in such section consists of 16-byte elements, not larger ones, and there are no implicit use of one element from another. When this is not the case, use non-mergeable section: .section .rodata[.VAR_NAME], "a", @progbits This reduces .data by ~15 kbytes: text data bss dec hex filename 11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o 11112095 2690672 2630712 16433479 fac147 vmlinux.o Merged objects are visible in System.map: ffffffff81a28810 r POLY ffffffff81a28810 r POLY ffffffff81a28820 r TWOONE ffffffff81a28820 r TWOONE ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of ffffffff81a28830 r SHUF_MASK <------------- the name difference ffffffff81a28830 r SHUF_MASK ffffffff81a28830 r SHUF_MASK .. ffffffff81a28d00 r K512 <- merged three identical 640-byte tables ffffffff81a28d00 r K512 ffffffff81a28d00 r K512 Use of object names in section name suffixes is not strictly necessary, but might help if someday link stage will use garbage collection to eliminate unused sections (ld --gc-sections). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: x86/crc32c - fix %progbits -> @progbitsDenys Vlasenko1-1/+1
%progbits form is used on ARM (where @ is a comment char). x86 consistently uses @progbits everywhere else. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: George Spelvin <linux@horizon.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: arm/aes-neonbs - fix issue with v2.22 and older assemblerArd Biesheuvel1-4/+4
The GNU assembler for ARM version 2.22 or older fails to infer the element size from the vmov instructions, and aborts the build in the following way; .../aes-neonbs-core.S: Assembler messages: .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7' Fix this by setting the element size explicitly, by replacing vmov with vmov.32. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-23crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modesArd Biesheuvel1-46/+42
Update the ARMv8 Crypto Extensions and the plain NEON AES implementations in CBC and CTR modes to return the next IV back to the skcipher API client. This is necessary for chaining to work correctly. Note that for CTR, this is only done if the request is a round multiple of the block size, since otherwise, chaining is impossible anyway. Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13crypto: arm/aes - avoid reserved 'tt' mnemonic in asm codeArd Biesheuvel1-5/+5
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions, which means we can no longer use 'tt' as a register alias on recent versions of binutils for ARM. So replace the alias with 'ttab'. Fixes: 81edb4262975 ("crypto: arm/aes - replace scalar AES cipher") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-13crypto: arm/aes - replace bit-sliced OpenSSL NEON codeArd Biesheuvel9-6499/+1429
This replaces the unwieldy generated implementation of bit-sliced AES in CBC/CTR/XTS modes that originated in the OpenSSL project with a new version that is heavily based on the OpenSSL implementation, but has a number of advantages over the old version: - it does not rely on the scalar AES cipher that also originated in the OpenSSL project and contains redundant lookup tables and key schedule generation routines (which we already have in crypto/aes_generic.) - it uses the same expanded key schedule for encryption and decryption, reducing the size of the per-key data structure by 1696 bytes - it adds an implementation of AES in ECB mode, which can be wrapped by other generic chaining mode implementations - it moves the handling of corner cases that are non critical to performance to the glue layer written in C - it was written directly in assembler rather than generated from a Perl script Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64Ard Biesheuvel4-0/+1393
This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% (*) faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) * Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: arm/aes - replace scalar AES cipherArd Biesheuvel5-119/+256
This replaces the scalar AES cipher that originates in the OpenSSL project with a new implementation that is ~15% (*) faster (on modern cores), and reuses the lookup tables and the key schedule generation routines from the generic C implementation (which is usually compiled in anyway due to networking and other subsystems depending on it). Note that the bit sliced NEON code for AES still depends on the scalar cipher that this patch replaces, so it is not removed entirely yet. * On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte for 128-bit keys. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: arm64/aes - add scalar implementationArd Biesheuvel4-0/+203
This adds a scalar implementation of AES, based on the precomputed tables that are exposed by the generic AES code. Since rotates are cheap on arm64, this implementation only uses the 4 core tables (of 1 KB each), and avoids the prerotated ones, reducing the D-cache footprint by 75%. On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as wellArd Biesheuvel1-2/+23
In addition to wrapping the AES-CTR cipher into the async SIMD wrapper, which exposes it as an async skcipher that defers processing to process context, expose our AES-CTR implementation directly as a synchronous cipher as well, but with a lower priority. This makes the AES-CTR transform usable in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: arm/chacha20 - implement NEON version based on SSE3 codeArd Biesheuvel4-0/+659
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: arm64/chacha20 - implement NEON version based on SSE3 codeArd Biesheuvel4-0/+586
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-01-12crypto: x86/chacha20 - Manually align stack bufferHerbert Xu1-1/+4
The kernel on x86-64 cannot use gcc attribute align to align to a 16-byte boundary. This patch reverts to the old way of aligning it by hand. Fixes: 9ae433bc79f9 ("crypto: chacha20 - convert generic and...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2017-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linuxHerbert Xu80-373/+477
Merging 4.10-rc3 so that the cryptodev tree builds on ARM64.
2017-01-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-15/+17
Pull KVM fixes from Radim Krčmář: "MIPS: - fix host kernel crashes when receiving a signal with 64-bit userspace - flush instruction cache on all vcpus after generating entry code (both for stable) x86: - fix NULL dereference in MMU caused by SMM transitions (for stable) - correct guest instruction pointer after emulating some VMX errors - minor cleanup" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: remove duplicated declaration KVM: MIPS: Flush KVM entry code from icache globally KVM: MIPS: Don't clobber CP0_Status.UX KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS KVM: nVMX: fix instruction skipping during emulated vm-entry
2017-01-07Merge tag 'arm64-fixes' of ↵Linus Torvalds2-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - re-introduce the arm64 get_current() optimisation - KERN_CONT fallout fix in show_pte() * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: restore get_current() optimisation arm64: mm: fix show_pte KERN_CONT fallout
2017-01-06Merge branch 'stable/for-linus-4.10' of ↵Linus Torvalds4-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fixes from Konrad Rzeszutek Wilk: "This has one fix to make i915 work when using Xen SWIOTLB, and a feature from Geert to aid in debugging of devices that can't do DMA outside the 32-bit address space. The feature from Geert is on top of v4.10 merge window commit (specifically you pulling my previous branch), as his changes were dependent on the Documentation/ movement patches. I figured it would just easier than me trying than to cherry-pick the Documentation patches to satisfy git. The patches have been soaking since 12/20, albeit I updated the last patch due to linux-next catching an compiler error and adding an Tested-and-Reported-by tag" * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Export swiotlb_max_segment to users swiotlb: Add swiotlb=noforce debug option swiotlb: Convert swiotlb_force from int to enum x86, swiotlb: Simplify pci_swiotlb_detect_override()
2017-01-05Merge tag 'armsoc-fixes' of ↵Linus Torvalds52-263/+314
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "This is a rather large set of bugfixes, as we just returned from the Christmas break. Most of these are relatively unimportant fixes for regressions introduced during the merge window, and about half of the changes are for mach-omap2. A couple of patches are just cleanups and dead code removal that I would not normally have considered for merging after -rc2, but I decided to take them along with the fixes this time. Notable fixes include: - removing the skeleton.dtsi include broke a number of machines, and we have to put empty /chosen nodes back to be able to pass kernel command lines as before - enabling Samsung platforms no longer hardwires CONFIG_HZ to 200, as it had been for no good reason for a long time" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits) MAINTAINERS: extend PSCI entry to cover the newly add PSCI checker code drivers: psci: annotate timer on stack to silence odebug messages ARM64: defconfig: enable DRM_MESON as module ARM64: dts: meson-gx: Add Graphic Controller nodes ARM64: dts: meson-gxl: fix GPIO include ARM: dts: imx6: Disable "weim" node in the dtsi files ARM: dts: qcom: apq8064: Add missing scm clock ARM: davinci: da8xx: Fix sleeping function called from invalid context ARM: davinci: Make __clk_{enable,disable} functions public ARM: davinci: da850: don't add emac clock to lookup table twice ARM: davinci: da850: fix infinite loop in clk_set_rate() ARM: i.MX: remove map_io callback ARM: dts: vf610-zii-dev-rev-b: Add missing newline ARM: dts: imx6qdl-nitrogen6x: remove duplicate iomux entry ARM: dts: imx31: fix AVIC base address ARM: dts: am572x-idk: Add gpios property to control PCIE_RESETn arm64: dts: vexpress: Support GICC_DIR operations ARM: dts: vexpress: Support GICC_DIR operations firmware: arm_scpi: fix reading sensor values on pre-1.0 SCPI firmwares arm64: dts: msm8996: Add required memory carveouts ...
2017-01-05Merge tag 'for-linus-4.10-rc2-tag' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes and cleanups from Juergen Gross: - small fixes for xenbus driver - one fix for xen dom0 boot on huge system - small cleanups * tag 'for-linus-4.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: Xen: ARM: Zero reserved fields of xatp before making hypervisor call xen: events: Replace BUG() with BUG_ON() xen: remove stale xs_input_avail() from header xen: return xenstore command failures via response instead of rc xen: xenbus driver must not accept invalid transaction ids xen/evtchn: use rb_entry() xen/setup: Don't relocate p2m over existing one
2017-01-05KVM: VMX: remove duplicated declarationJan Dakinevich1-6/+0
Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code. Probably, it was happened after unsuccessful merge. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-05KVM: MIPS: Flush KVM entry code from icache globallyJames Hogan1-2/+2
Flush the KVM entry code from the icache on all CPUs, not just the one that built the entry code. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.16.x- Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-05KVM: MIPS: Don't clobber CP0_Status.UXJames Hogan1-1/+4
On 64-bit kernels, MIPS KVM will clear CP0_Status.UX to prevent the guest (running in user mode) from accessing the 64-bit memory segments. However the previous value of CP0_Status.UX is never restored when exiting from the guest. If the user process uses 64-bit addressing (the n64 ABI) this can result in address error exceptions from the kernel if it needs to deliver a signal before returning to user mode, as the kernel will need to write a sigframe to high user addresses on the user stack which are disallowed by CP0_Status.UX=0. This is fixed by explicitly setting SX and UX again when exiting from the guest, and explicitly clearing those bits when returning to the guest. Having the SX and UX bits set when handling guest exits (rather than only when exiting to userland) will be helpful when we support VZ, since we shouldn't need to directly read or write guest memory, so it will be valid for cache management IPIs to access host user addresses. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 4.8.x- Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-01-04arm64: restore get_current() optimisationMark Rutland1-1/+9
Commit c02433dd6de32f04 ("arm64: split thread_info from task stack") inverted the relationship between get_current() and current_thread_info(), with sp_el0 now holding the current task_struct rather than the current thead_info. The new implementation of get_current() prevents the compiler from being able to optimize repeated calls to either, resulting in a noticeable penalty in some microbenchmarks. This patch restores the previous optimisation by implementing get_current() in the same way as our old current_thread_info(), using a non-volatile asm statement. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-01-04arm64: mm: fix show_pte KERN_CONT falloutMark Rutland1-4/+4
Recent changes made KERN_CONT mandatory for continued lines. In the absence of KERN_CONT, a newline may be implicit inserted by the core printk code. In show_pte, we (erroneously) use printk without KERN_CONT for continued prints, resulting in output being split across a number of lines, and not matching the intended output, e.g. [ff000000000000] *pgd=00000009f511b003 , *pud=00000009f4a80003 , *pmd=0000000000000000 Fix this by using pr_cont() for all the continuations. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-01-04Merge tag 'davinci-fixes-for-v4.10' of ↵Arnd Bergmann4-27/+53
git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes Pull "DaVinci fixes for v4.10" from Sekhar Nori: This pull request contains fixes for the following issues 1) Fix two instances of infinite loop occurring in clock list for DA850. This fixes kernel hangs in some instances and so have been marked for stable kernel. 2) Fix for sleeping function called from atomic context with USB 2.0 clock management code introduced in v4.10 merge window. * tag 'davinci-fixes-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: davinci: da8xx: Fix sleeping function called from invalid context ARM: davinci: Make __clk_{enable,disable} functions public ARM: davinci: da850: don't add emac clock to lookup table twice ARM: davinci: da850: fix infinite loop in clk_set_rate()
2017-01-04Merge tag 'amlogic-fixes' of ↵Arnd Bergmann9-1/+94
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into fixes Pull "Amlogic fixes for v4.10" from Kevin Hilman: - DT: GXL: fix GPIO include - add DT and defconfig for newly merged DRM driver This pull has one real fix, as a couple non-critical ones. The DRM DT/defconfig patches are coming now because I didn't expect the new driver to make it for the v4.10 merge window, but since it did[1], the DT and defconfig should go into the same release. [1] bbbe775ec5b5 drm: Add support for Amlogic Meson Graphic Controller * tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: ARM64: defconfig: enable DRM_MESON as module ARM64: dts: meson-gx: Add Graphic Controller nodes ARM64: dts: meson-gxl: fix GPIO include
2017-01-04Merge tag 'juno-fixes-4.10' of ↵Arnd Bergmann1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into fixes Pull "ARMv8 Juno/VExpress fixes for v4.10" from Sudeep Holla: A simple fix to extend GICv2 CPU interface registers from 4K to 8K on AEMv8 FVP/RTSM models in order to support split priority drop and interrupt deactivation. * tag 'juno-fixes-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: arm64: dts: vexpress: Support GICC_DIR operations
2017-01-04Merge tag 'vexpress-fixes-4.10' of ↵Arnd Bergmann2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into fixes Pull "ARMv7 VExpress fixes for v4.10" from Sudeep Holla: A simple fix to extend GICv2 CPU interface registers from 4K to 8K on VExpress TC1 and TC2 platforms in order to support split priority drop and interrupt deactivation. * tag 'vexpress-fixes-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: ARM: dts: vexpress: Support GICC_DIR operations
2017-01-04Merge tag 'imx-fixes-4.10' of ↵Arnd Bergmann7-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Pull "i.MX fixes for 4.10" from Shawn Guo: - A format fix for vf610-zii-dev-rev-b.dts, which has a very odd line due to misses a newline. - A fix to imx-weim bus error seen on board which doesn't actually use the bus. - A fix for imx6qdl-nitrogen6x board which has conflicting usage of pad NANDF_CS2. - A cleanup on i.MX1 machine to remove .map_io callback, which also fixes a compiling error for NOMMU build. - Fix AVIC base address in i.MX31 device tree source. The problem was shadowed by the AVIC driver, which takes the correct base address from a SoC specific header file. * tag 'imx-fixes-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6: Disable "weim" node in the dtsi files ARM: i.MX: remove map_io callback ARM: dts: vf610-zii-dev-rev-b: Add missing newline ARM: dts: imx6qdl-nitrogen6x: remove duplicate iomux entry ARM: dts: imx31: fix AVIC base address
2017-01-04Merge tag 'qcom-arm-fixes-for-4.10-rc2' of ↵Arnd Bergmann1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into fixes Pull "Qualcomm ARM DTS Fixes for v4.10-rc2" from Andy Gross: * Add SCM clock for APQ8064 to fix boot failures * tag 'qcom-arm-fixes-for-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: ARM: dts: qcom: apq8064: Add missing scm clock
2017-01-04Merge tag 'omap-for-v4.10/fixes-rc1' of ↵Arnd Bergmann23-193/+65
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Pull "omap fixes for v4.10-rc cycle" from Tony Lindgren: Fist set of fixes for omaps for v4.10-rc cycle, mostly to deal with various regressions noticed during the merge window and to fix various device tree configurations for boards. Also included is removal of mach-omap2/gpio.c that is now dead code with device tree based booting that should be OK for the early -rc cycle: - A series of fixes to add empty chosen node to fix regressions caused for bootloaders that don't create chosen node as the decompressor needs the chosen node to merge command line and ATAGs into it - Fix missing logicpd-som-lv-37xx-devkit.dtb entry in Makefile - Fix regression for am437x timers - Fix wrong strcat for non-NULL terminated string - A series of changes to fix tps65217 interrupts to not use defines as we don't do that for interrupts - Two patches to fix USB VBUS detection on am57xx-idk and force it to peripheral mode until dwc3 role detection is working - Add missing dra72-evm-tps65917 missing voltage supplies accidentally left out of an earlier patch - Fix n900 eMMC detection when booted on qemu - Remove unwanted pr_err on failed memory allocation for prm_common.c - Remove legacy mach-omap2/gpio.c that now is dead code since we boot mach-omap2 in device tree only mode - Fix am572x-idk pcie1 by adding the missing gpio reset pin * tag 'omap-for-v4.10/fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (23 commits) ARM: dts: am572x-idk: Add gpios property to control PCIE_RESETn ARM: OMAP2+: PRM: Delete an error message for a failed memory allocation ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags ARM: dts: dra72-evm-tps65917: Add voltage supplies to usb_phy, mmc, dss ARM: dts: am57xx-idk: Put USB2 port in peripheral mode ARM: dts: am57xx-idk: Support VBUS detection on USB2 port dt-bindings: input: Specify the interrupt number of TPS65217 power button dt-bindings: power/supply: Update TPS65217 properties dt-bindings: mfd: Remove TPS65217 interrupts ARM: dts: am335x: Fix the interrupt name of TPS65217 ARM: omap2+: fixing wrong strcat for Non-NULL terminated string ARM: omap2+: am437x: rollback to use omap3_gptimer_timer_init() ARM: dts: omap3: Add DTS for Logic PD SOM-LV 37xx Dev Kit ARM: dts: dra7: Add an empty chosen node to top level DTSI ARM: dts: dm816x: Add an empty chosen node to top level DTSI ARM: dts: dm814x: Add an empty chosen node to top level DTSI ARM: dts: am4372: Add an empty chosen node to top level DTSI ARM: dts: am33xx: Add an empty chosen node to top level DTSI ARM: dts: omap5: Add an empty chosen node to top level DTSI ARM: dts: omap4: Add an empty chosen node to top level DTSI ...
2017-01-04Merge tag 'samsung-soc-4.10-2' of ↵Arnd Bergmann3-33/+77
git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into fixes Samsung mach/soc update for v4.10: 1. Minor cleanup in smp_operations. 2. Another step in switching s3c24xx to new DMA API. 3. Drop fixed requirement for HZ=200 on Samsung platforms. * tag 'samsung-soc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: ARM: Drop fixed 200 Hz timer requirement from Samsung platforms ARM: S3C24XX: Add DMA slave maps for remaining s3c24xx SoCs ARM: EXYNOS: Remove smp_init_cpus hook from platsmp.c