summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2015-02-15vhost_net: fix wrong iter offset when setting number of buffersJason Wang1-5/+6
In commit ba7438aed924 ("vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()"), we advance iov iter fixup sizeof(struct virtio_net_hdr) bytes and fill the number of buffers after doing the socket recvmsg(). This work well but was broken after commit 6e03f896b52c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") which tries to advance sizeof(struct virtio_net_hdr_mrg_rxbuf). It will fill the number of buffers at the wrong place. This patch fixes this. Fixes 6e03f896b52c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Cc: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-15tpm, tpm_tis: fix TPM 2.0 probingJarkko Sakkinen3-15/+39
If during transmission system error was returned, the logic was to incorrectly deduce that chip is a TPM 1.x chip. This patch fixes this issue. Also, this patch changes probing so that message tag is used as the measure for TPM 2.x, which should be much more stable. A separate function called tpm2_probe() is encapsulated because it can be used with any chipset. Fixes: aec04cbdf723 ("tpm: TPM 2.0 FIFO Interface") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2015-02-15tpm: fix suspend/resume paths for TPM 2.0Jarkko Sakkinen5-39/+34
Fixed suspend/resume paths for TPM 2.0 and consolidated all the associated code to the tpm_pm_suspend() and tpm_pm_resume() functions. Resume path should be handled by the firmware, i.e. Startup(CLEAR) for hibernate and Startup(STATE) for suspend. There might be some non-PC embedded devices in the future where Startup() is not the handled by the FW but fixing the code for those IMHO should be postponed until there is hardware available to test the fixes although extra Startup in the driver code is essentially a NOP. Added Shutdown(CLEAR) to the remove paths of TIS and CRB drivers. Changed tpm2_shutdown() to a void function because there isn't much you can do except print an error message if this fails with a system error. Fixes: aec04cbdf723 ("tpm: TPM 2.0 FIFO Interface") Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface") [phuewe: both did send TPM_Shutdown on resume which 'disables' the TPM and did not send TPM2_Shutdown on teardown which leads some TPM2.0 to believe there was an attack (no TPM2_Shutdown = no orderly shutdown = attack)] Reported-by: Peter Hüwe <PeterHuewe@gmx.de> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Scot Doyle <lkml14@scotdoyle.com> Reviewed-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2015-02-15net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081Sylvain Rochet1-0/+28
NAND-tree is used to check wiring between MAC and PHY using NAND gates on the PHY side, hence the name. NAND-tree initial status is latched at reset by probing the IRQ pin. However some devices are sharing the PHY IRQ pin with other peripherals such as Atmel SAMA5D[34]x-EK boards when using the optional TM7000 display module, therefore they are switching the PHY in NAND-tree test mode depending on the current IRQ line status at reset. This patch ensure PHY is not in NAND-tree test mode for all Micrel PHYs using IRQ line as a NAND-tree toggle mode at reset. Signed-off-by: Sylvain Rochet <sylvain.rochet@finsecur.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-15r8152: restore hw settingshayeswang1-2/+57
There is a capability which let the hw could change the settings automatically when the power change to ON. However, the USB reset would reset the settings to the hw default, so the driver has to restore the relative settings. Otherwise, it would influence the functions of the hw, and the compatibility for the USB hub and USB host controller. The relative settings are as following. - set the power down scale to 96. - enable the power saving function of USB 2.0. - disable the ALDPS of ECM mode. - set burst mode depending on the burst size. - enable the flow control of endpoint full. - set fifo empty boundary to 32448 bytes. - enable the function of exiting LPM when Rx OK occurs. - set the connect timer to 1. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds34-432/+855
Pull crypto update from Herbert Xu: "Here is the crypto update for 3.20: - Added 192/256-bit key support to aesni GCM. - Added MIPS OCTEON MD5 support. - Fixed hwrng starvation and race conditions. - Added note that memzero_explicit is not a subsitute for memset. - Added user-space interface for crypto_rng. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits) crypto: tcrypt - do not allocate iv on stack for aead speed tests crypto: testmgr - limit IV copy length in aead tests crypto: tcrypt - fix buflen reminder calculation crypto: testmgr - mark rfc4106(gcm(aes)) as fips_allowed crypto: caam - fix resource clean-up on error path for caam_jr_init crypto: caam - pair irq map and dispose in the same function crypto: ccp - terminate ccp_support array with empty element crypto: caam - remove unused local variable crypto: caam - remove dead code crypto: caam - don't emit ICV check failures to dmesg hwrng: virtio - drop extra empty line crypto: replace scatterwalk_sg_next with sg_next crypto: atmel - Free memory in error path crypto: doc - remove colons in comments crypto: seqiv - Ensure that IV size is at least 8 bytes crypto: cts - Weed out non-CBC algorithms MAINTAINERS: add linux-crypto to hw random crypto: cts - Remove bogus use of seqiv crypto: qat - don't need qat_auth_state struct crypto: algif_rng - fix sparse non static symbol warning ...
2015-02-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds30-107/+1830
Merge fourth set of updates from Andrew Morton: - the rest of lib/ - checkpatch updates - a few misc things - kasan: kernel address sanitizer - the rtc tree * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits) ARM: mvebu: enable Armada 38x RTC driver in mvebu_v7_defconfig ARM: mvebu: add Device Tree description of RTC on Armada 38x MAINTAINERS: add the RTC driver for the Armada38x drivers/rtc/rtc-armada38x: add a new RTC driver for recent mvebu SoCs rtc: armada38x: add the device tree binding documentation rtc: rtc-ab-b5ze-s3: add sub-minute alarm support rtc: add support for Abracon AB-RTCMC-32.768kHz-B5ZE-S3 I2C RTC chip of: add vendor prefix for Abracon Corporation drivers/rtc/rtc-rk808.c: fix rtc time reading issue drivers/rtc/rtc-isl12057.c: constify struct regmap_config drivers/rtc/rtc-at91sam9.c: constify struct regmap_config drivers/rtc/rtc-imxdi.c: add more known register bits drivers/rtc/rtc-imxdi.c: trivial clean up code ARM: mvebu: ISL12057 rtc chip can now wake up RN102, RN102 and RN2120 rtc: rtc-isl12057: add isil,irq2-can-wakeup-machine property for in-tree users drivers/rtc/rtc-isl12057.c: add alarm support to Intersil ISL12057 RTC driver drivers/rtc/rtc-pcf2123.c: add support for devicetree kprobes: makes kprobes/enabled works correctly for optimized kprobes. kprobes: set kprobes_all_disarmed earlier to enable re-optimization. init: remove CONFIG_INIT_FALLBACK ...
2015-02-14drivers/rtc/rtc-armada38x: add a new RTC driver for recent mvebu SoCsGregory CLEMENT3-0/+331
The new mvebu SoCs come with a new RTC driver. This patch adds the support for this new IP which is currently found in the Armada 38x SoCs. This RTC provides two alarms, but only the first one is used in the driver. The RTC also allows using periodic interrupts. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Arnaud Ebalard <arno@natisbad.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: Boris BREZILLON <boris.brezillon@free-electrons.com> Cc: Lior Amsalem <alior@marvell.com> Cc: Tawfik Bayouk <tawfik@marvell.com> Cc: Nadav Haklai <nadavh@marvell.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14rtc: rtc-ab-b5ze-s3: add sub-minute alarm supportArnaud Ebalard1-29/+262
Abracon AB-RTCMC-32.768kHz-B5ZE-S3 alarm is only accurate to the minute. For that reason, UIE mode is currently not supported by the driver. But the device provides a watchdog timer which can be coupled with the alarm mechanism to extend support and provide sub-minute alarm capability. This patch implements that extension. More precisely, it makes use of the watchdog timer for alarms which are less that four minutes in the future (with second accuracy) and use standard alarm mechanism for other alarms (with minute accuracy). Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Peter Huewe <peter.huewe@infineon.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Mark Brown <broonie@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robherring2@gmail.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Ian Campbell <ijc+devicetree@hellion.org.uk> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Landley <rob@landley.net> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Kumar Gala <galak@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14rtc: add support for Abracon AB-RTCMC-32.768kHz-B5ZE-S3 I2C RTC chipArnaud Ebalard3-0/+814
This patch adds support for Abracon AB-RTCMC-32.768kHz-B5ZE-S3 RTC/Calendar module w/ I2C interface. This support includes RTC time reading and setting, Alarm (1 minute accuracy) reading and setting, and battery low detection. The device also supports frequency adjustment and two timers but those features are currently not implemented in this driver. Due to alarm accuracy limitation (and current lack of timer support in the driver), UIE mode is not supported. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Peter Huewe <peter.huewe@infineon.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Mark Brown <broonie@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robherring2@gmail.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Ian Campbell <ijc+devicetree@hellion.org.uk> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Landley <rob@landley.net> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Kumar Gala <galak@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-rk808.c: fix rtc time reading issueChris Zhong1-2/+8
After we set the GET_TIME bit, the rtc time can't be read immediately. We should wait up to 31.25 us, about one cycle of 32khz. Otherwise reading RTC time will return a old time. If we clear the GET_TIME bit after setting, the time of i2c transfer is certainly more than 31.25us. Doug said: : I think we are safe. At 400kHz (the max speed of this part) each bit can : be transferred no faster than 2.5us. In order to do a valid i2c : transaction we need to _at least_ write the address of the device and the : data onto the bus, which is 16 bits. 16 * 2.5us = 40us. That's above the : 31.25us [akpm@linux-foundation.org: tweak comment per review discussion] Signed-off-by: Chris Zhong <zyw@rock-chips.com> Reviewed-by: Doug Anderson <dianders@chromium.org> Cc: Sonny Rao <sonnyrao@chromium.org> Cc: Heiko Stübner <heiko@sntech.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-isl12057.c: constify struct regmap_configKrzysztof Kozlowski1-1/+1
The regmap_config struct may be const because it is not modified by the driver and regmap_init() accepts pointer to const. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-at91sam9.c: constify struct regmap_configKrzysztof Kozlowski1-1/+1
The regmap_config struct may be const because it is not modified by the driver and regmap_init() accepts pointer to const. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-imxdi.c: add more known register bitsJuergen Borleis1-4/+40
Intended for monitoring and controlling the security features. These bits are required to bring this unit back to live after a security violation event was detected. The code to bring it back to live will follow after a vendor clearance. Signed-off-by: Juergen Borleis <jbe@pengutronix.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-imxdi.c: trivial clean up codeJuergen Borleis1-3/+3
Signed-off-by: Juergen Borleis <jbe@pengutronix.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14rtc: rtc-isl12057: add isil,irq2-can-wakeup-machine property for in-tree usersArnaud Ebalard1-6/+39
Current in-tree users of ISL12057 RTC chip (NETGEAR ReadyNAS 102, 104 and 2120) do not have the IRQ#2 pin of the chip (associated w/ the Alarm1 mechanism) connected to their SoC, but to a PMIC (TPS65251 FWIW). This specific hardware configuration allows the NAS to wake up when the alarms rings. Recently introduced alarm support for ISL12057 relies on the provision of an "interrupts" property in system .dts file, which previous three users will never get. For that reason, alarm support on those devices is not function. To support this use case, this patch adds a new DT property for ISL12057 (isil,irq2-can-wakeup-machine) to indicate that the chip is capable of waking up the device using its IRQ#2 pin (even though it does not have its IRQ#2 pin connected directly to the SoC). This specific configuration was tested on a ReadyNAS 102 by setting an alarm, powering off the device and see it reboot as expected when the alarm rang w/: # echo `date '+%s' -d '+ 1 minutes'` > /sys/class/rtc/rtc0/wakealarm # shutdown -h now As a side note, the ISL12057 remains in the list of trivial devices, because the property is not per se required by the device to work but can help handle system w/ specific requirements. In exchange, the new feature is described in details in a specific documentation file. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Peter Huewe <peter.huewe@infineon.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Mark Brown <broonie@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Darshana Padmadas <darshanapadmadas@gmail.com> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Ian Campbell <ijc+devicetree@hellion.org.uk> Cc: Grant Likely <grant.likely@linaro.org> Cc: Rob Landley <rob@landley.net> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Kumar Gala <galak@codeaurora.org> Cc: Uwe Kleine-König <uwe@kleine-koenig.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-isl12057.c: add alarm support to Intersil ISL12057 RTC driverArnaud Ebalard1-8/+305
This patch adds alarm support to Intersil ISL12057 driver. This allows to configure the chip to generate an interrupt when the alarm matches current time value. Alarm can be programmed up to one month in the future and is accurate to the second. The patch was developed to support two different configurations: systems w/ and w/o RTC chip IRQ line connected to the main CPU. The latter is the one found on current 3 kernel users of the chip for which support was initially developed (Netgear ReadyNAS 102, 104 and 2120 NAS). On those devices, the IRQ#2 pin of the chip is not connected to the SoC but to a PMIC. This allows setting an alarm, powering off the device and have it wake up when the alarm rings. To support that configuration the driver does the following: 1. it has alarm_irq_enable() function returns -ENOTTY when no IRQ is passed to the driver. 2. it marks the device as a wakeup source in all cases (whether an IRQ is passed to the driver or not) to have 'wakealarm' sysfs entry created. 3. it marks the device has not supporting UIE mode when no IRQ is passed to the driver (see the commmit message of c9f5c7e7a84f) This specific configuration was tested on a ReadyNAS 102 by setting an alarm, powering off the device and see it reboot as expected when the alarm rang. The former configuration was tested on a Netgear ReadyNAS 102 after some soldering of the IRQ#2 pin of the RTC chip to a MPP line of the SoC (the one used usually handles the reset button). The test was performed using a modified .dts file reflecting this change (see below) and rtc-test.c program available in Documentation/rtc.txt. This test program ran as expected, which validates alarm supports, including interrupt support. As a side note, the ISL12057 remains in the list of trivial devices, i.e. no specific DT binding being added by this patch: i2c core automatically handles extraction of IRQ line info from .dts file. For instance, if one wants to reference the interrupt line for the alarm in its .dts file, adding interrupt and interrupt-parent properties works as expected: isl12057: isl12057@68 { compatible =3D "isil,isl12057"; interrupt-parent =3D <&gpio0>; interrupts =3D <6 IRQ_TYPE_EDGE_FALLING>; reg =3D <0x68>; }; FWIW, if someone is looking for a way to test alarm support on a system on which the chip IRQ line has the ability to boot the system (e.g. ReadyNAS 102, 104, etc): # echo 0 > /sys/class/rtc/rtc0/wakealarm # echo `date '+%s' -d '+ 1 minutes'` > /sys/class/rtc/rtc0/wakealarm # shutdown -h now With the commands above, after a minute, the system comes back to life. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Peter Huewe <peter.huewe@infineon.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Mark Brown <broonie@kernel.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Uwe Kleine-König <uwe@kleine-koenig.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/rtc/rtc-pcf2123.c: add support for devicetreeJoshua Clayton1-0/+10
Add compatible string "nxp,rtc-pcf2123" Document the binding Signed-off-by: Joshua Clayton <stillcompiling@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Grant Likely <grant.likely@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14x86_64: kasan: add interceptors for memset/memmove/memcpy functionsAndrey Ryabinin1-0/+4
Recently instrumentation of builtin functions calls was removed from GCC 5.0. To check the memory accessed by such functions, userspace asan always uses interceptors for them. So now we should do this as well. This patch declares memset/memmove/memcpy as weak symbols. In mm/kasan/kasan.c we have our own implementation of those functions which checks memory before accessing it. Default memset/memmove/memcpy now now always have aliases with '__' prefix. For files that built without kasan instrumentation (e.g. mm/slub.c) original mem* replaced (via #define) with prefixed variants, cause we don't want to check memory accesses there. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14kasan: add kernel address sanitizer infrastructureAndrey Ryabinin1-0/+1
Kernel Address sanitizer (KASan) is a dynamic memory error detector. It provides fast and comprehensive solution for finding use-after-free and out-of-bounds bugs. KASAN uses compile-time instrumentation for checking every memory access, therefore GCC > v4.9.2 required. v4.9.2 almost works, but has issues with putting symbol aliases into the wrong section, which breaks kasan instrumentation of globals. This patch only adds infrastructure for kernel address sanitizer. It's not available for use yet. The idea and some code was borrowed from [1]. Basic idea: The main idea of KASAN is to use shadow memory to record whether each byte of memory is safe to access or not, and use compiler's instrumentation to check the shadow memory on each memory access. Address sanitizer uses 1/8 of the memory addressable in kernel for shadow memory and uses direct mapping with a scale and offset to translate a memory address to its corresponding shadow address. Here is function to translate address to corresponding shadow address: unsigned long kasan_mem_to_shadow(unsigned long addr) { return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET; } where KASAN_SHADOW_SCALE_SHIFT = 3. So for every 8 bytes there is one corresponding byte of shadow memory. The following encoding used for each shadow byte: 0 means that all 8 bytes of the corresponding memory region are valid for access; k (1 <= k <= 7) means that the first k bytes are valid for access, and other (8 - k) bytes are not; Any negative value indicates that the entire 8-bytes are inaccessible. Different negative values used to distinguish between different kinds of inaccessible memory (redzones, freed memory) (see mm/kasan/kasan.h). To be able to detect accesses to bad memory we need a special compiler. Such compiler inserts a specific function calls (__asan_load*(addr), __asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16. These functions check whether memory region is valid to access or not by checking corresponding shadow memory. If access is not valid an error printed. Historical background of the address sanitizer from Dmitry Vyukov: "We've developed the set of tools, AddressSanitizer (Asan), ThreadSanitizer and MemorySanitizer, for user space. We actively use them for testing inside of Google (continuous testing, fuzzing, running prod services). To date the tools have found more than 10'000 scary bugs in Chromium, Google internal codebase and various open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and lots of others): [2] [3] [4]. The tools are part of both gcc and clang compilers. We have not yet done massive testing under the Kernel AddressSanitizer (it's kind of chicken and egg problem, you need it to be upstream to start applying it extensively). To date it has found about 50 bugs. Bugs that we've found in upstream kernel are listed in [5]. We've also found ~20 bugs in out internal version of the kernel. Also people from Samsung and Oracle have found some. [...] As others noted, the main feature of AddressSanitizer is its performance due to inline compiler instrumentation and simple linear shadow memory. User-space Asan has ~2x slowdown on computational programs and ~2x memory consumption increase. Taking into account that kernel usually consumes only small fraction of CPU and memory when running real user-space programs, I would expect that kernel Asan will have ~10-30% slowdown and similar memory consumption increase (when we finish all tuning). I agree that Asan can well replace kmemcheck. We have plans to start working on Kernel MemorySanitizer that finds uses of unitialized memory. Asan+Msan will provide feature-parity with kmemcheck. As others noted, Asan will unlikely replace debug slab and pagealloc that can be enabled at runtime. Asan uses compiler instrumentation, so even if it is disabled, it still incurs visible overheads. Asan technology is easily portable to other architectures. Compiler instrumentation is fully portable. Runtime has some arch-dependent parts like shadow mapping and atomic operation interception. They are relatively easy to port." Comparison with other debugging features: ======================================== KMEMCHECK: - KASan can do almost everything that kmemcheck can. KASan uses compile-time instrumentation, which makes it significantly faster than kmemcheck. The only advantage of kmemcheck over KASan is detection of uninitialized memory reads. Some brief performance testing showed that kasan could be x500-x600 times faster than kmemcheck: $ netperf -l 30 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec no debug: 87380 16384 16384 30.00 41624.72 kasan inline: 87380 16384 16384 30.00 12870.54 kasan outline: 87380 16384 16384 30.00 10586.39 kmemcheck: 87380 16384 16384 30.03 20.23 - Also kmemcheck couldn't work on several CPUs. It always sets number of CPUs to 1. KASan doesn't have such limitation. DEBUG_PAGEALLOC: - KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page granularity level, so it able to find more bugs. SLUB_DEBUG (poisoning, redzones): - SLUB_DEBUG has lower overhead than KASan. - SLUB_DEBUG in most cases are not able to detect bad reads, KASan able to detect both reads and writes. - In some cases (e.g. redzone overwritten) SLUB_DEBUG detect bugs only on allocation/freeing of object. KASan catch bugs right before it will happen, so we always know exact place of first bad read/write. [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel [2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs [3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs [4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs [5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies Based on work by Andrey Konovalov. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14MODULE_DEVICE_TABLE: fix some callsitesAndrew Morton2-2/+0
The patch "module: fix types of device tables aliases" newly requires that invocations of MODULE_DEVICE_TABLE(type, name); come *after* the definition of `name'. That is reasonable, but some drivers weren't doing this. Fix them. Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: David Miller <davem@davemloft.net> Cc: Hans Verkuil <hverkuil@xs4all.nl> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14drivers/base: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo2-2/+3
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Line termination only requires one extra space at the end of the buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14usb: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo6-29/+13
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * drivers/uwb/drp.c::uwb_drp_handle_alien_drp() was formatting mas.bm into a buffer but never used it. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14scsi: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo1-3/+3
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * map_show()'s return value is too high by one and the function could modify beyond the end of the buffer when the formatted text is long enough. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14input: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo2-3/+3
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Line termination only requires one extra space at the end of the buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14wireless: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo2-35/+12
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14arm: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo1-2/+2
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Line termination only requires one extra space at the end of the buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14tile: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo2-6/+4
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14clk: convert clock name allocations to kstrdup_constAndrzej Hajda1-6/+6
Clock subsystem frequently performs duplication of strings located in read-only memory section. Replacing kstrdup by kstrdup_const allows to avoid such operations. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mike Turquette <mturquette@linaro.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-14PM / sleep: Re-implement suspend-to-idle handlingRafael J. Wysocki1-23/+26
In preparation for adding support for quiescing timers in the final stage of suspend-to-idle transitions, rework the freeze_enter() function making the system wait on a wakeup event, the freeze_wake() function terminating the suspend-to-idle loop and the mechanism by which deep idle states are entered during suspend-to-idle. First of all, introduce a simple state machine for suspend-to-idle and make the code in question use it. Second, prevent freeze_enter() from losing wakeup events due to race conditions and ensure that the number of online CPUs won't change while it is being executed. In addition to that, make it force all of the CPUs re-enter the idle loop in case they are in idle states already (so they can enter deeper idle states if possible). Next, drop cpuidle_use_deepest_state() and replace use_deepest_state checks in cpuidle_select() and cpuidle_reflect() with a single suspend-to-idle state check in cpuidle_idle_call(). Finally, introduce cpuidle_enter_freeze() that will simply find the deepest idle state available to the given CPU and enter it using cpuidle_enter(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-02-14Merge tag 'pm+acpi-3.20-rc1-2' of ↵Linus Torvalds8-119/+39
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: "These are two reverts related to system suspend breakage by one of a recent commits, a fix for a recently introduced bug in devfreq and a bunch of other things that didn't make it into my previous pull request, but otherwise are ready to go. Specifics: - Revert two ACPI EC driver commits, one that broke system suspend on Acer Aspire S5 and one that depends on it (Rafael J Wysocki). - Fix a typo leading to an incorrect check in the exynos-ppmu devfreq driver (Dan Carpenter). - Add support for one more Broadwell CPU model to intel_idle (Len Brown). - Fix an obscure problem with state transitions related to interrupts in the speedstep-smi cpufreq driver (Mikulas Patocka). - Remove some unnecessary messages related to the "out of memory" condition from the core PM code (Quentin Lambert). - Update turbostat parameters and documentation, add support for one more Broadwell CPU model to it and modify it to skip printing disabled package C-states (Len Brown)" * tag 'pm+acpi-3.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / devfreq: event: testing the wrong variable cpufreq: speedstep-smi: enable interrupts when waiting PM / OPP / clk: Remove unnecessary OOM message Revert "ACPI / EC: Add query flushing support" Revert "ACPI / EC: Add GPE reference counting debugging messages" tools/power turbostat: support additional Broadwell model intel_idle: support additional Broadwell model tools/power turbostat: update parameters, documentation tools/power turbostat: Skip printing disabled package C-states
2015-02-13Merge branches 'pm-cpufreq', 'pm-cpuidle', 'pm-devfreq', 'pm-opp' and 'pm-tools'Rafael J. Wysocki7-12/+21
* pm-cpufreq: cpufreq: speedstep-smi: enable interrupts when waiting * pm-cpuidle: intel_idle: support additional Broadwell model * pm-devfreq: PM / devfreq: event: testing the wrong variable * pm-opp: PM / OPP / clk: Remove unnecessary OOM message * pm-tools: tools/power turbostat: support additional Broadwell model tools/power turbostat: update parameters, documentation tools/power turbostat: Skip printing disabled package C-states
2015-02-13Merge branch 'acpi-ec'Rafael J. Wysocki1-107/+18
* acpi-ec: Revert "ACPI / EC: Add query flushing support" Revert "ACPI / EC: Add GPE reference counting debugging messages"
2015-02-13Merge branch 'for-next' of ↵Linus Torvalds7-8/+508
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED subsystem update from Bryan Wu: "The big change of LED subsystem is introducing a new LED class for Flash type LEDs which will be used for V4L2 subsystem. Also we got some cleanup and fixes" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: leds: leds-gpio: Pass on error codes unmodified DT: leds: Add led-sources property leds: Add LED Flash class extension to the LED subsystem leds: leds-mc13783: Use of_get_child_by_name() instead of refcount hack leds: Use setup_timer leds: Don't allow brightness values greater than max_brightness DT: leds: Add flash LED devices related properties
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-5/+17
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-13hso: fix rx parsing logic when skb allocation failsAleksander Morgado1-1/+1
If skb allocation fails once the IP header has been received, the rx state is being set to WAIT_SYNC. The logic, though, shouldn't directly return, as the buffer may contain a full packet, and therefore the WAIT_SYNC state needs to be processed (resetting state to WAIT_IP, clearing rx_buf_size and re-initializing rx_buf_missing). So, just let the while loop continue so that in the next iteration the WAIT_SYNC state cleanly stops the loop. The WAIT_SYNC processing will be done just after that, only if the end of packet is flagged. Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-13drm/radeon: fix voltage setup on hawaiiAlex Deucher1-0/+1
Missing parameter when fetching the real voltage values from atom. Fixes problems with dynamic clocking on certain boards. bug: https://bugs.freedesktop.org/show_bug.cgi?id=87457 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-02-13drm/radeon/dp: Set EDP_CONFIGURATION_SET for bridge chips if necessaryAlex Deucher1-3/+1
Don't restrict it to just eDP panels. Some LVDS bridge chips require this. Fixes blank panels on resume on certain laptops. Noticed by mrnuke on IRC. bug: https://bugs.freedesktop.org/show_bug.cgi?id=42960 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-02-13Merge branch 'exynos-drm-next' of ↵Dave Airlie19-226/+1102
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next Summary: - Add code cleanups and bug fixups. - Add a new display controller dirver, DECON which is a new display controller of Exynos7 SoC. This device is much different from FIMD of Exynos4 and Exynos4 SoC series. * 'exynos-drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos: drm/exynos: Add DECON driver drm/exynos: fix NULL pointer reference drm/exynos: remove exynos_plane_dpms drm/exynos: remove mode property of exynos crtc drm/exynos: Remove exynos_plane_dpms() call with no effect drm/exynos: fix DMA_ATTR_NO_KERNEL_MAPPING usage drm/exynos: hdmi: replace fb size with mode size from win commit drm/exynos: fix no hdmi output drm/exynos: use driver internal struct drm/exynos: fix wrong pipe calculation for crtc drm/exynos: remove to use unnecessary MODULE_xxx macro drm/exynos: remove DRM_EXYNOS_DMABUF config drm/exynos: IOMMU support should not be selectable by user drm/exynos: add support for 'hdmi' clock
2015-02-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-119/+135
Merge third set of updates from Andrew Morton: - the rest of MM [ This includes getting rid of the numa hinting bits, in favor of just generic protnone logic. Yay. - Linus ] - core kernel - procfs - some of lib/ (lots of lib/ material this time) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits) lib/lcm.c: replace include lib/percpu_ida.c: remove redundant includes lib/strncpy_from_user.c: replace module.h include lib/stmp_device.c: replace module.h include lib/sort.c: move include inside #if 0 lib/show_mem.c: remove redundant include lib/radix-tree.c: change to simpler include lib/plist.c: remove redundant include lib/nlattr.c: remove redundant include lib/kobject_uevent.c: remove redundant include lib/llist.c: remove redundant include lib/md5.c: simplify include lib/list_sort.c: rearrange includes lib/genalloc.c: remove redundant include lib/idr.c: remove redundant include lib/halfmd4.c: simplify includes lib/dynamic_queue_limits.c: simplify includes lib/sort.c: use simpler includes lib/interval_tree.c: simplify includes hexdump: make it return number of bytes placed in buffer ...
2015-02-13kernel.h: remove ancient __FUNCTION__ hackRasmus Villemoes2-3/+3
__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so this only helps people who only test-compile using 3.3 (compiler-gcc3.h barks at anything older than that). Besides, there are almost no occurrences of __FUNCTION__ left in the tree. [akpm@linux-foundation.org: convert remaining __FUNCTION__ references] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13mm/zpool: add name argument to create zpoolGanesh Mahendran1-3/+5
Currently the underlay of zpool: zsmalloc/zbud, do not know who creates them. There is not a method to let zsmalloc/zbud find which caller they belong to. Now we want to add statistics collection in zsmalloc. We need to name the debugfs dir for each pool created. The way suggested by Minchan Kim is to use a name passed by caller(such as zram) to create the zsmalloc pool. /sys/kernel/debug/zsmalloc/zram0 This patch adds an argument `name' to zs_create_pool() and other related functions. Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: remove request_queue from struct zramSergey Senozhatsky2-9/+8
`struct zram' contains both `struct gendisk' and `struct request_queue'. the latter can be deleted, because zram->disk carries ->queue pointer, and ->queue carries zram pointer: create_device() zram->queue->queuedata = zram zram->disk->queue = zram->queue zram->disk->private_data = zram so zram->queue is not needed, we can access all necessary data anyway. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: remove init_lock in zram_make_requestMinchan Kim2-32/+64
Admin could reset zram during I/O operation going on so we have used zram->init_lock as read-side lock in I/O path to prevent sudden zram meta freeing. However, the init_lock is really troublesome. We can't do call zram_meta_alloc under init_lock due to lockdep splat because zram_rw_page is one of the function under reclaim path and hold it as read_lock while other places in process context hold it as write_lock. So, we have used allocation out of the lock to avoid lockdep warn but it's not good for readability and fainally, I met another lockdep splat between init_lock and cpu_hotplug from kmem_cache_destroy during working zsmalloc compaction. :( Yes, the ideal is to remove horrible init_lock of zram in rw path. This patch removes it in rw path and instead, add atomic refcount for meta lifetime management and completion to free meta in process context. It's important to free meta in process context because some of resource destruction needs mutex lock, which could be held if we releases the resource in reclaim context so it's deadlock, again. As a bonus, we could remove init_done check in rw path because zram_meta_get will do a role for it, instead. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Ganesh Mahendran <opensource.ganesh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: check bd_openers instead of bd_holdersMinchan Kim1-1/+1
bd_holders is increased only when user open the device file as FMODE_EXCL so if something opens zram0 as !FMODE_EXCL and request I/O while another user reset zram0, we can see following warning. zram0: detected capacity change from 0 to 64424509440 Buffer I/O error on dev zram0, logical block 180823, lost async page write Buffer I/O error on dev zram0, logical block 180824, lost async page write Buffer I/O error on dev zram0, logical block 180825, lost async page write Buffer I/O error on dev zram0, logical block 180826, lost async page write Buffer I/O error on dev zram0, logical block 180827, lost async page write Buffer I/O error on dev zram0, logical block 180828, lost async page write Buffer I/O error on dev zram0, logical block 180829, lost async page write Buffer I/O error on dev zram0, logical block 180830, lost async page write Buffer I/O error on dev zram0, logical block 180831, lost async page write Buffer I/O error on dev zram0, logical block 180832, lost async page write ------------[ cut here ]------------ WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210() Modules linked in: CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x45/0x57 warn_slowpath_common+0x8a/0xc0 warn_slowpath_null+0x1a/0x20 __blkdev_put+0x1d7/0x210 blkdev_put+0x50/0x130 blkdev_close+0x25/0x30 __fput+0xdf/0x1e0 ____fput+0xe/0x10 task_work_run+0xa7/0xe0 do_notify_resume+0x49/0x60 int_signal+0x12/0x17 ---[ end trace 274fbbc5664827d2 ]--- The warning comes from bdev_write_node in blkdev_put path. static void bdev_write_inode(struct inode *inode) { spin_lock(&inode->i_lock); while (inode->i_state & I_DIRTY) { spin_unlock(&inode->i_lock); WARN_ON_ONCE(write_inode_now(inode, true)); <========= here. spin_lock(&inode->i_lock); } spin_unlock(&inode->i_lock); } The reason is dd process encounters I/O fails due to sudden block device disappear so in filemap_check_errors in __writeback_single_inode returns -EIO. If we check bd_openers instead of bd_holders, we could address the problem. When I see the brd, it already have used it rather than bd_holders so although I'm not a expert of block layer, it seems to be better. I can make following warning with below simple script. In addition, I added msleep(2000) below set_capacity(zram->disk, 0) after applying your patch to make window huge(Kudos to Ganesh!) script: echo $((60<<30)) > /sys/block/zram0/disksize setsid dd if=/dev/zero of=/dev/zram0 & sleep 1 setsid echo 1 > /sys/block/zram0/reset Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Ganesh Mahendran <opensource.ganesh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: rework reset and destroy pathSergey Senozhatsky1-42/+33
We need to return set_capacity(disk, 0) from reset_store() back to zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can race set_capacity() calls from init and reset paths. The problem is that zram_reset_device() is also getting called from zram_exit(), which performs operations in misleading reversed order -- we first create_device() and then init it, while zram_exit() perform destroy_device() first and then does zram_reset_device(). This is done to remove sysfs group before we reset device, so we can continue with device reset/destruction not being raced by sysfs attr write (f.e. disksize). Apart from that, destroy_device() releases zram->disk (but we still have ->disk pointer), so we cannot acces zram->disk in later zram_reset_device() call, which may cause additional errors in the future. So, this patch rework and cleanup destroy path. 1) remove several unneeded goto labels in zram_init() 2) factor out zram_init() error path and zram_exit() into destroy_devices() function, which takes the number of devices to destroy as its argument. 3) remove sysfs group in destroy_devices() first, so we can reorder operations -- reset device (as expected) goes before disk destroy and queue cleanup. So we can always access ->disk in zram_reset_device(). 4) and, finally, return set_capacity() back under ->init_lock. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: fix umount-reset_store-mount race conditionSergey Senozhatsky1-14/+9
Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to avoid ->bd_holders race condition: CPU0 CPU1 umount /* zram->init_done is true */ reset_store() bdev->bd_holders == 0 mount ... zram_make_request() zram_reset_device() However, his solution required some considerable amount of code movement, which we can avoid. Apart from using bdev->bd_mutex in reset_store(), this patch also simplifies zram_reset_device(). zram_reset_device() has a bool parameter reset_capacity which tells it whether disk capacity and itself disk should be reset. There are two zram_reset_device() callers: -- zram_exit() passes reset_capacity=false -- reset_store() passes reset_capacity=true So we can move reset_capacity-sensitive work out of zram_reset_device() and perform it unconditionally in reset_store(). This also lets us drop reset_capacity parameter from zram_reset_device() and pass zram pointer only. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: free meta table in zram_meta_freeGanesh Mahendran1-17/+16
zram_meta_alloc() and zram_meta_free() are a pair. In zram_meta_alloc(), meta table is allocated. So it it better to free it in zram_meta_free(). Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13zram: clean up zram_meta_alloc()Sergey Senozhatsky1-8/+6
A trivial cleanup of zram_meta_alloc() error handling. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13Merge tag 'dm-3.20-changes' of ↵Linus Torvalds15-193/+407
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper changes from Mike Snitzer: - The most significant change this cycle is request-based DM now supports stacking ontop of blk-mq devices. This blk-mq support changes the model request-based DM uses for cloning a request to relying on calling blk_get_request() directly from the underlying blk-mq device. An early consumer of this code is Intel's emerging NVMe hardware; thanks to Keith Busch for working on, and pushing for, these changes. - A few other small fixes and cleanups across other DM targets. * tag 'dm-3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: inherit QUEUE_FLAG_SG_GAPS flags from underlying queues dm snapshot: remove unnecessary NULL checks before vfree() calls dm mpath: simplify failure path of dm_multipath_init() dm thin metadata: remove unused dm_pool_get_data_block_size() dm ioctl: fix stale comment above dm_get_inactive_table() dm crypt: update url in CONFIG_DM_CRYPT help text dm bufio: fix time comparison to use time_after_eq() dm: use time_in_range() and time_after() dm raid: fix a couple integer overflows dm table: train hybrid target type detection to select blk-mq if appropriate dm: allocate requests in target when stacking on blk-mq devices dm: prepare for allocating blk-mq clone requests in target dm: submit stacked requests in irq enabled context dm: split request structure out from dm_rq_target_io structure dm: remove exports for request-based interfaces without external callers