summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-07-12Merge branch 'fortglx/4.19/time' of ↵Thomas Gleixner27-47/+4490
https://git.linaro.org/people/john.stultz/linux into timers/core Pull timekeeping updates from John Stultz: - Make the timekeeping update more precise when NTP frequency is set directly by updating the multiplier. - Adjust selftests
2018-07-12hrtimer: Improve kernel message printingGeert Uytterhoeven1-4/+3
- Join split message for easier grepping, - Use pr_*() instead of printk*(), - Use %u to format unsigned cpu numbers. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180712144118.8819-1-geert+renesas@glider.be
2018-07-10timekeeping: Update multiplier when NTP frequency is set directlyMiroslav Lichvar1-6/+30
When the NTP frequency is set directly from userspace using the ADJ_FREQUENCY or ADJ_TICK timex mode, immediately update the timekeeper's multiplier instead of waiting for the next tick. This removes a hidden non-deterministic delay in setting of the frequency and allows an extremely tight control of the system clock with update rates close to or even exceeding the kernel HZ. The update is limited to archs using modern timekeeping (!ARCH_USES_GETTIMEOFFSET). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2018-07-02alarmtimer: Prevent overflow for relative nanosleepThomas Gleixner1-1/+2
Air Icy reported: UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7 signed integer overflow: 1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int' Call Trace: alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811 __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline] __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline] __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213 do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290 alarm_timer_nsleep() uses ktime_add() to add the current time and the relative expiry value. ktime_add() has no sanity checks so the addition can overflow when the relative timeout is large enough. Use ktime_add_safe() which has the necessary sanity checks in place and limits the result to the valid range. Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers") Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
2018-07-02posix-timers: Sanitize overrun handlingThomas Gleixner2-12/+21
The posix timer overrun handling is broken because the forwarding functions can return a huge number of overruns which does not fit in an int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn into random number generators. The k_clock::timer_forward() callbacks return a 64 bit value now. Make k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal accounting is correct. 3Remove the temporary (int) casts. Add a helper function which clamps the overrun value returned to user space via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value between 0 and INT_MAX. INT_MAX is an indicator for user space that the overrun value has been clamped. Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
2018-07-02posix-timers: Make forward callback return s64Thomas Gleixner3-6/+6
The posix timer ti_overrun handling is broken because the forwarding functions can return a huge number of overruns which does not fit in an int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn into random number generators. As a first step to address that let the timer_forward() callbacks return the full 64 bit value. Cast it to (int) temporarily until k_itimer::ti_overrun is converted to 64bit and the conversion to user space visible values is sanitized. Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
2018-07-01Merge tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-0/+1
Pull dma mapping fixlet from Christoph Hellwig: "Add a missing export required by riscv and unicore" * tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: export swiotlb_dma_ops
2018-06-28swiotlb: export swiotlb_dma_opsChristoph Hellwig1-0/+1
For architectures that do not use per-device dma ops we need to export the dma_map_ops structure returned from get_arch_dma_ops(). Fixes: 10314e09 ("riscv: add swiotlb support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Andreas Schwab <schwab@suse.de>
2018-06-27perf/core: Move inline keyword at the beginning of declarationMathieu Malaterre1-1/+1
Fix non-fatal warning triggered during compilation with W=1: kernel/events/core.c:6106:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] static void __always_inline ^~~~~~ Signed-off-by: Mathieu Malaterre <malat@debian.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180626202301.20270-1-malat@debian.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-24time: Change types to new y2038 safe __kernel_itimerspecDeepa Dinamani1-5/+7
timer_set/gettime and timerfd_set/get apis use struct itimerspec at the user interface layer. struct itimerspec is not y2038-safe. Change these interfaces to use y2038-safe struct __kernel_itimerspec instead. This will help define new syscalls when 32bit architectures select CONFIG_64BIT_TIME. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: y2038@lists.linaro.org Link: https://lkml.kernel.org/r/20180617051144.29756-4-deepa.kernel@gmail.com
2018-06-24time: Enable get/put_compat_itimerspec64 alwaysDeepa Dinamani2-29/+21
This will aid in enabling the compat syscalls on 32-bit architectures later on. Also move compat_itimerspec and related defines to compat_time.h. The compat_time.h file will eventually be deleted. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: y2038@lists.linaro.org Link: https://lkml.kernel.org/r/20180617051144.29756-3-deepa.kernel@gmail.com
2018-06-24time: Introduce struct __kernel_itimerspecDeepa Dinamani1-2/+2
struct itimerspec is not y2038-safe. Introduce a new struct __kernel_itimerspec based on the kernel internal y2038-safe struct itimerspec64. The definition of struct __kernel_itimerspec includes two struct __kernel_timespec. Since struct __kernel_timespec has the same representation in native and compat modes, so does struct __kernel_itimerspec. This helps have a common entry point for syscalls using struct __kernel_itimerspec. New y2038-safe syscalls will use this new type. Since most of the new syscalls are just an update to the native syscalls with the type update, place the new definition under CONFIG_64BIT_TIME. This helps architectures that do not support the above config to keep using the old definition of struct itimerspec. Also change the get/put_itimerspec64 to use struct__kernel_itimerspec. This will help 32 bit architectures to use the new syscalls when architectures select CONFIG_64BIT_TIME. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: y2038@lists.linaro.org Link: https://lkml.kernel.org/r/20180617051144.29756-2-deepa.kernel@gmail.com
2018-06-24Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf updates: Kernel side: - Remove an incorrect warning in uprobe_init_insn() when insn_get_length() fails. The error return code is handled at the call site. - Move the inline keyword to the right place in the perf ringbuffer code to address a W=1 build warning. Tooling: perf stat: - Fix metric column header display alignment - Improve error messages for default attributes, providing better output for error in command line. - Add --interval-clear option, to provide a 'watch' like printing perf script: - Show hw-cache events too perf c2c: - Fix data dependency problem in layout of 'struct c2c_hist_entry' Core: - Do not blindly assume that 'struct perf_evsel' can be obtained via a straight forward container_of() as there are call sites which hand in a plain 'struct hist' which is not part of a container. - Fix error index in the PMU event parser, so that error messages can point to the problematic token" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Move the inline keyword at the beginning of the function declaration uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() perf script: Show hw-cache events perf c2c: Keep struct hist_entry at the end of struct c2c_hist_entry perf stat: Add event parsing error handling to add_default_attributes perf stat: Allow to specify specific metric column len perf stat: Fix metric column header display alignment perf stat: Use only color_fprintf call in print_metric_only perf stat: Add --interval-clear option perf tools: Fix error index for pmu event parser perf hists: Reimplement hists__has_callchains() perf hists browser gtk: Use hist_entry__has_callchains() perf hists: Make hist_entry__has_callchains() work with 'perf c2c' perf hists: Save the callchain_size in struct hist_entry
2018-06-24Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq fixes from Thomas Gleixer: "A pile of rseq related fixups: - Prevent infinite recursion when delivering SIGSEGV - Remove the abort of rseq critical section on fork() as syscalls inside rseq critical sections are explicitely forbidden. So no point in doing the abort on the child. - Align the rseq structure on 32 bytes in the ARM selftest code. - Fix file permissions of the test script" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Avoid infinite recursion when delivering SIGSEGV rseq/cleanup: Do not abort rseq c.s. in child on fork() rseq/selftests/arm: Align 'struct rseq_cs' on 32 bytes rseq/selftests: Make run_param_test.sh executable
2018-06-24Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of fixes and updates for the locking code: - Prevent lockdep from updating irq state within its own code and thereby confusing itself. - Buid fix for older GCCs which mistreat anonymous unions - Add a missing lockdep annotation in down_read_non_onwer() which causes up_read_non_owner() to emit a lockdep splat - Remove the custom alpha dec_and_lock() implementation which is incorrect in terms of ordering and use the generic one. The remaining two commits are not strictly fixes. They provide irqsave variants of atomic_dec_and_lock() and refcount_dec_and_lock(). These are required to merge the relevant updates and cleanups into different maintainer trees for 4.19, so routing them into mainline without actual users is the sanest approach. They should have been in -rc1, but last weekend I took the liberty to just avoid computers in order to regain some mental sanity" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/qspinlock: Fix build for anonymous union in older GCC compilers locking/lockdep: Do not record IRQ state within lockdep code locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMS locking/refcounts: Implement refcount_dec_and_lock_irqsave() atomic: Add irqsave variant of atomic_dec_and_lock() alpha: Remove custom dec_and_lock() implementation
2018-06-24Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds3-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of fixes for time(r) related issues: - Fix a long standing conversion issue in jiffies_to_msecs() for odd HZ values like 1024 or 1200 which resulted in returning 0 for small jiffies values due to rounding down. - Use the proper CONFIG symbol in the new Y2038 safe compat code for posix-timers. Not yet a visible breakage, but this will immediately trigger when the architecture support for the new interfaces is merged. - Return an error code in the STM32 clocksource driver on failure instead of success. - Remove the redundant and stale irq disabled check in the posix cpu timer code. The check is at the wrong place anyway and lockdep already covers it via the sighand lock locking coverage" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Make sure jiffies_to_msecs() preserves non-zero time periods posix-timers: Fix nanosleep_copyout() for CONFIG_COMPAT_32BIT_TIME clocksource/drivers/stm32: Fix error return code posix-cpu-timers: Remove lockdep_assert_irqs_disabled()
2018-06-24Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes mostly for the ARM/GIC world: - Fix the MSI affinity handling in the ls-scfg irq chip driver so it updates and uses the effective affinity mask correctly - Prevent binding LPIs to offline CPUs and respect the Cavium erratum which requires that LPIs which belong to an offline NUMA node are not bound to a CPU on a different NUMA node. - Free only the amount of allocated interrupts in the GIC-V2M driver instead of trying to free log2(nrirqs). - Prevent emitting SYNC and VSYNC targetting non existing interrupt collections in the GIC-V3 ITS driver - Ensure that the GIV-V3 interrupt redistributor is correctly reprogrammed on CPU hotplug - Remove a stale unused helper function" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqdesc: Delete irq_desc_get_msi_desc() irqchip/gic-v3-its: Fix reprogramming of redistributors on CPU hotplug irqchip/gic-v3-its: Only emit VSYNC if targetting a valid collection irqchip/gic-v3-its: Only emit SYNC if targetting a valid collection irqchip/gic-v3-its: Don't bind LPI to unavailable NUMA node irqchip/gic-v2m: Fix SPI release on error path irqchip/ls-scfg-msi: Fix MSI affinity handling genirq/debugfs: Add missing IRQCHIP_SUPPORTS_LEVEL_MSI debug
2018-06-24Merge tag 'trace-v4.18-rc1' of ↵Linus Torvalds3-7/+15
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This contains a few fixes and a clean up. - a bad merge caused an "endif" to go in the wrong place in scripts/Makefile.build - softirq tracing fix for tracing that corrupts lockdep and causes a false splat - histogram documentation typo fixes - fix a bad memory reference when passing in no filter to the filter code - simplify code by using the swap macro instead of open coding the swap" * tag 'trace-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix SKIP_STACK_VALIDATION=1 build due to bad merge with -mrecord-mcount tracing: Fix some errors in histogram documentation tracing: Use swap macro in update_max_tr softirq: Reorder trace_softirqs_on to prevent lockdep splat tracing: Check for no filter when processing event filters
2018-06-22rseq: Avoid infinite recursion when delivering SIGSEGVWill Deacon1-3/+4
When delivering a signal to a task that is using rseq, we call into __rseq_handle_notify_resume() so that the registers pushed in the sigframe are updated to reflect the state of the restartable sequence (for example, ensuring that the signal returns to the abort handler if necessary). However, if the rseq management fails due to an unrecoverable fault when accessing userspace or certain combinations of RSEQ_CS_* flags, then we will attempt to deliver a SIGSEGV. This has the potential for infinite recursion if the rseq code continuously fails on signal delivery. Avoid this problem by using force_sigsegv() instead of force_sig(), which is explicitly designed to reset the SEGV handler to SIG_DFL in the case of a recursive fault. In doing so, remove rseq_signal_deliver() from the internal rseq API and have an optional struct ksignal * parameter to rseq_handle_notify_resume() instead. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: peterz@infradead.org Cc: paulmck@linux.vnet.ibm.com Cc: boqun.feng@gmail.com Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
2018-06-22time: Make sure jiffies_to_msecs() preserves non-zero time periodsGeert Uytterhoeven1-2/+4
For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of 1000, jiffies_to_msecs() never returns zero when passed a non-zero time period. However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or 1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero for small non-zero time periods. This may break code that relies on receiving back a non-zero value. jiffies_to_usecs() does not need such a fix: one jiffy can only be less than one µs if HZ > 1000000, and such large values of HZ are already rejected at build time, twice: - include/linux/jiffies.h does #error if HZ >= 12288, - kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC). Broken since forever. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org
2018-06-22genirq/debugfs: Add missing IRQCHIP_SUPPORTS_LEVEL_MSI debugMarc Zyngier1-0/+1
Debug is missing the IRQCHIP_SUPPORTS_LEVEL_MSI debug entry, making debugfs slightly less useful. Take this opportunity to also add a missing comment in the definition of IRQCHIP_SUPPORTS_LEVEL_MSI. Fixes: 6988e0e0d283 ("genirq/msi: Limit level-triggered MSI to platform devices") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Cc: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20180622095254.5906-2-marc.zyngier@arm.com
2018-06-22perf/core: Move the inline keyword at the beginning of the function declarationMathieu Malaterre1-3/+3
When building perf with W=1 the following warning triggers: CC kernel/events/ring_buffer.o kernel/events/ring_buffer.c:105:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] static bool __always_inline ^~~~~~ ... Move the inline keyword to the beginning of the function declaration. Signed-off-by: Mathieu Malaterre <malat@debian.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trival@kernel.org Link: http://lkml.kernel.org/r/20180308202856.9378-1-malat@debian.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21tracing: Use swap macro in update_max_trGustavo A. R. Silva1-5/+1
Make use of the swap macro and remove unnecessary variable _buf_. This makes the code easier to read and maintain. Also, reduces the stack usage. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20180209175316.GA18720@embeddedgus Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-06-21softirq: Reorder trace_softirqs_on to prevent lockdep splatJoel Fernandes (Google)1-1/+5
I'm able to reproduce a lockdep splat with config options: CONFIG_PROVE_LOCKING=y, CONFIG_DEBUG_LOCK_ALLOC=y and CONFIG_PREEMPTIRQ_EVENTS=y $ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable [ 26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [ 26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854 [...] [ 26.144229] Call Trace: [ 26.144926] <IRQ> [ 26.145506] lock_acquire+0x55/0x1b0 [ 26.146499] ? __do_softirq+0x46f/0x4d9 [ 26.147571] ? __do_softirq+0x46f/0x4d9 [ 26.148646] trace_preempt_on+0x8f/0x240 [ 26.149744] ? trace_preempt_on+0x4d/0x240 [ 26.150862] ? __do_softirq+0x46f/0x4d9 [ 26.151930] preempt_count_sub+0x18a/0x1a0 [ 26.152985] __do_softirq+0x46f/0x4d9 [ 26.153937] irq_exit+0x68/0xe0 [ 26.154755] smp_apic_timer_interrupt+0x271/0x280 [ 26.156056] apic_timer_interrupt+0xf/0x20 [ 26.157105] </IRQ> The issue was this: preempt_count = 1 << SOFTIRQ_SHIFT __local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) { if (softirq_count() == (cnt && SOFTIRQ_MASK)) { trace_softirqs_on() { current->softirqs_enabled = 1; } } preempt_count_sub(cnt) { trace_preempt_on() { tracepoint() { rcu_read_lock_sched() { // jumps into lockdep Where preempt_count still has softirqs disabled, but current->softirqs_enabled is true, and we get a splat. Link: http://lkml.kernel.org/r/20180607201143.247775-1-joel@joelfernandes.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Glexiner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Todd Kjos <tkjos@google.com> Cc: Erick Reyes <erickreyes@google.com> Cc: Julia Cartwright <julia@ni.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: stable@vger.kernel.org Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-06-21tracing: Check for no filter when processing event filtersSteven Rostedt (VMware)1-1/+9
The syzkaller detected a out-of-bounds issue with the events filter code, specifically here: prog[N].pred = NULL; /* #13 */ prog[N].target = 1; /* TRUE */ prog[N+1].pred = NULL; prog[N+1].target = 0; /* FALSE */ -> prog[N-1].target = N; prog[N-1].when_to_branch = false; As that's the first reference to a "N-1" index, it appears that the code got here with N = 0, which means the filter parser found no filter to parse (which shouldn't ever happen, but apparently it did). Add a new error to the parsing code that will check to make sure that N is not zero before going into this part of the code. If N = 0, then -EINVAL is returned, and a error message is added to the filter. Cc: stable@vger.kernel.org Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Reported-by: air icy <icytxw@gmail.com> bugzilla url: https://bugzilla.kernel.org/show_bug.cgi?id=200019 Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-06-21locking/lockdep: Do not record IRQ state within lockdep codeSteven Rostedt (VMware)1-6/+6
While debugging where things were going wrong with mapping enabling/disabling interrupts with the lockdep state and actual real enabling and disabling interrupts, I had to silent the IRQ disabling/enabling in debug_check_no_locks_freed() because it was always showing up as it was called before the splat was. Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed() but for all internal lockdep functions, as they hide useful information about where interrupts were used incorrectly last. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-16/+79
Pull networking fixes from David Miller: 1) Fix crash on bpf_prog_load() errors, from Daniel Borkmann. 2) Fix ATM VCC memory accounting, from David Woodhouse. 3) fib6_info objects need RCU freeing, from Eric Dumazet. 4) Fix SO_BINDTODEVICE handling for TCP sockets, from David Ahern. 5) Fix clobbered error code in enic_open() failure path, from Govindarajulu Varadarajan. 6) Propagate dev_get_valid_name() error returns properly, from Li RongQing. 7) Fix suspend/resume in davinci_emac driver, from Bartosz Golaszewski. 8) Various act_ife fixes (recursive locking, IDR leaks, etc.) from Davide Caratti. 9) Fix buggy checksum handling in sungem driver, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) ip: limit use of gso_size to udp stmmac: fix DMA channel hang in half-duplex mode net: stmmac: socfpga: add additional ocp reset line for Stratix10 net: sungem: fix rx checksum support bpfilter: ignore binary files bpfilter: fix build error net/usb/drivers: Remove useless hrtimer_active check net/sched: act_ife: preserve the action control in case of error net/sched: act_ife: fix recursive lock and idr leak net: ethernet: fix suspend/resume in davinci_emac net: propagate dev_get_valid_name return code enic: do not overwrite error code net/tcp: Fix socket lookups with SO_BINDTODEVICE ptp: replace getnstimeofday64() with ktime_get_real_ts64() net/ipv6: respect rcu grace period before freeing fib6_info net: net_failover: fix typo in net_failover_slave_register() ipvlan: use ETH_MAX_MTU as max mtu net: hamradio: use eth_broadcast_addr enic: initialize enic->rfs_h.lock in enic_probe MAINTAINERS: Add Sam as the maintainer for NCSI ...
2018-06-20locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMSWaiman Long1-0/+1
It was found that the use of up_read_non_owner() in NFS was causing the following warning when DEBUG_RWSEMS was configured. DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0))) Looking into the rwsem.c file, it was discovered that the corresponding down_read_non_owner() function was not setting the owner field properly. This is fixed now, and the warning should be gone. Fixes: 5149cbac4235 ("locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Gavin Schenk <g.schenk@eckelmann.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-nfs@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1527168398-4291-1-git-send-email-longman@redhat.com
2018-06-20Merge tag 'dma-rename-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds11-0/+4344
Pull dma-mapping rename from Christoph Hellwig: "Move all the dma-mapping code to kernel/dma and lose their dma-* prefixes" * tag 'dma-rename-4.18' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: move all DMA mapping code to kernel/dma dma-mapping: use obj-y instead of lib-y for generic dma ops
2018-06-19sysinfo: Remove get_monotonic_boottime()Arnd Bergmann1-2/+2
get_monotonic_boottime() is deprecated because it uses the old 'timespec' structure. This replaces one of the last callers with a call to ktime_get_boottime. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: y2038@lists.linaro.org Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Link: https://lkml.kernel.org/r/20180618150114.849216-1-arnd@arndb.de
2018-06-19posix-timers: Use new ktime_get_*_ts64() helpersArnd Bergmann2-6/+6
Some of the oddly named time accessor functions now have a more consistent naming, which should be used from now on so the aliases can be removed. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: y2038@lists.linaro.org Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/20180618143246.3865099-1-arnd@arndb.de
2018-06-19timekeeping: Use ktime_get_real_ts64() instead of getnstimeofday64()Arnd Bergmann2-4/+4
The two do the same, this moves all users to the newer name for consistency. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: y2038@lists.linaro.org Cc: Stephen Boyd <sboyd@kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Link: https://lkml.kernel.org/r/20180618140811.2998503-3-arnd@arndb.de
2018-06-19time: Use ktime_get_real_seconds() in time syscallArnd Bergmann1-4/+2
Both get_seconds() and do_gettimeofday() are deprecated. Change the time() implementation to use the replacement function instead. Obviously the system call will still overflow in 2038, but this gets us closer to removing the old helper functions. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: y2038@lists.linaro.org Cc: Stephen Boyd <sboyd@kernel.org> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20180618140811.2998503-2-arnd@arndb.de
2018-06-19posix-timers: Fix nanosleep_copyout() for CONFIG_COMPAT_32BIT_TIMEArnd Bergmann1-1/+1
Commit b5793b0d92c9 added support for building the nanosleep compat system call on 32-bit architectures, but missed one change in nanosleep_copyout(), which would trigger a BUG() as soon as any architecture is switched over to use it. Use the proper config symbol to enable the code path. Fixes: Commit b5793b0d92c9 ("posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: y2038@lists.linaro.org Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20180618140811.2998503-1-arnd@arndb.de
2018-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-16/+79
Daniel Borkmann says: ==================== pull-request: bpf 2018-06-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a panic in devmap handling in generic XDP where return type of __devmap_lookup_elem() got changed recently but generic XDP code missed the related update, from Toshiaki. 2) Fix a freeze when BPF progs are loaded that include BPF to BPF calls when JIT is enabled where we would later bail out via error path w/o dropping kallsyms, and another one to silence syzkaller splats from locking prog read-only, from Daniel. 3) Fix a bug in test_offloads.py BPF selftest which must not assume that the underlying system have no BPF progs loaded prior to test, and one in bpftool to fix accuracy of program load time, from Jakub. 4) Fix a bug in bpftool's probe for availability of the bpf(2) BPF_TASK_FD_QUERY subcommand, from Yonghong. 5) Fix a regression in AF_XDP's XDP_SKB receive path where queue id check got erroneously removed, from Björn. 6) Fix missing state cleanup in BPF's xfrm tunnel test, from William. 7) Check tunnel type more accurately in BPF's tunnel collect metadata kselftest, from Jian. 8) Fix missing Kconfig fragments for BPF kselftests, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-16Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimentalLinus Torvalds3-11/+12
Pull documentation fixes from Mauro Carvalho Chehab: "This solves a series of broken links for files under Documentation, and improves a script meant to detect such broken links (see scripts/documentation-file-ref-check). The changes on this series are: - can.rst: fix a footnote reference; - crypto_engine.rst: Fix two parsing warnings; - Fix a lot of broken references to Documentation/*; - improve the scripts/documentation-file-ref-check script, in order to help detecting/fixing broken references, preventing false-positives. After this patch series, only 33 broken references to doc files are detected by scripts/documentation-file-ref-check" * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits) fix a series of Documentation/ broken file name references Documentation: rstFlatTable.py: fix a broken reference ABI: sysfs-devices-system-cpu: remove a broken reference devicetree: fix a series of wrong file references devicetree: fix name of pinctrl-bindings.txt devicetree: fix some bindings file names MAINTAINERS: fix location of DT npcm files MAINTAINERS: fix location of some display DT bindings kernel-parameters.txt: fix pointers to sound parameters bindings: nvmem/zii: Fix location of nvmem.txt docs: Fix more broken references scripts/documentation-file-ref-check: check tools/*/Documentation scripts/documentation-file-ref-check: get rid of false-positives scripts/documentation-file-ref-check: hint: dash or underline scripts/documentation-file-ref-check: add a fix logic for DT scripts/documentation-file-ref-check: accept more wildcards at filenames scripts/documentation-file-ref-check: fix help message media: max2175: fix location of driver's companion documentation media: v4l: fix broken video4linux docs locations media: dvb: point to the location of the old README.dvb-usb file ...
2018-06-16Merge tag 'fsnotify_for_v4.18-rc1' of ↵Linus Torvalds3-13/+9
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify cleanups unifying handling of different watch types. This is the shortened fsnotify series from Amir with the last five patches pulled out. Amir has modified those patches to not change struct inode but obviously it's too late for those to go into this merge window" * tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: add fsnotify_add_inode_mark() wrappers fanotify: generalize fanotify_should_send_event() fsnotify: generalize send_to_group() fsnotify: generalize iteration of marks by object type fsnotify: introduce marks iteration helpers fsnotify: remove redundant arguments to handle_event() fsnotify: use type id to identify connector object type
2018-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-2/+12
Pull networking fixes from David Miller: 1) Various netfilter fixlets from Pablo and the netfilter team. 2) Fix regression in IPVS caused by lack of PMTU exceptions on local routes in ipv6, from Julian Anastasov. 3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia. 4) Don't crash on poll in TLS, from Daniel Borkmann. 5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things including Avahi mDNS. From Bart Van Assche. 6) Missing of_node_put in qcom/emac driver, from Yue Haibing. 7) We lack checking of the TCP checking in one special case during SYN receive, from Frank van der Linden. 8) Fix module init error paths of mac80211 hwsim, from Johannes Berg. 9) Handle 802.1ad properly in stmmac driver, from Elad Nachman. 10) Must grab HW caps before doing quirk checks in stmmac driver, from Jose Abreu. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) net: stmmac: Run HWIF Quirks after getting HW caps neighbour: skip NTF_EXT_LEARNED entries during forced gc net: cxgb3: add error handling for sysfs_create_group tls: fix waitall behavior in tls_sw_recvmsg tls: fix use-after-free in tls_push_record l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl() l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels mlxsw: spectrum_switchdev: Fix port_vlan refcounting mlxsw: spectrum_router: Align with new route replace logic mlxsw: spectrum_router: Allow appending to dev-only routes ipv6: Only emit append events for appended routes stmmac: added support for 802.1ad vlan stripping cfg80211: fix rcu in cfg80211_unregister_wdev mac80211: Move up init of TXQs mac80211_hwsim: fix module init error paths cfg80211: initialize sinfo in cfg80211_get_station nl80211: fix some kernel doc tag mistakes hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload rds: avoid unenecessary cong_update in loop transport l2tp: clean up stale tunnel or session in pppol2tp_connect's error path ...
2018-06-16Merge tag 'modules-for-v4.18' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "Minor code cleanup and also allow sig_enforce param to be shown in sysfs with CONFIG_MODULE_SIG_FORCE" * tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: Allow to always show the status of modsign module: Do not access sig_enforce directly
2018-06-16xdp: Fix handling of devmap in generic XDPToshiaki Makita1-0/+14
Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed the return value type of __devmap_lookup_elem() from struct net_device * to struct bpf_dtab_netdev * but forgot to modify generic XDP code accordingly. Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct net_device is expected, then skb->dev was set to invalid value. v2: - Fix compiler warning without CONFIG_BPF_SYSCALL. Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-16fix a series of Documentation/ broken file name referencesMauro Carvalho Chehab1-2/+3
As files move around, their previous links break. Fix the references for them. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-16docs: Fix some broken referencesMauro Carvalho Chehab2-9/+9
As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked if the produced result is valid, removing a few false-positives. Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-15bpf: reject any prog that failed read-only lockDaniel Borkmann2-10/+49
We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro() as well as the BPF image as read-only through bpf_prog_lock_ro(). In the case any of these would fail we throw a WARN_ON_ONCE() in order to yell loudly to the log. Perhaps, to some extend, this may be comparable to an allocation where __GFP_NOWARN is explicitly not set. Added via 65869a47f348 ("bpf: improve read-only handling"), this behavior is slightly different compared to any of the other in-kernel set_memory_ro() users who do not check the return code of set_memory_ro() and friends /at all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given in BPF this is mandatory hardening step, we want to know whether there are any issues that would leave both BPF data writable. So it happens that syzkaller enabled fault injection and it triggered memory allocation failure deep inside x86's change_page_attr_set_clr() which was triggered from set_memory_ro(). Now, there are two options: i) leaving everything as is, and ii) reworking the image locking code in order to have a final checkpoint out of the central bpf_prog_select_runtime() which probes whether any of the calls during prog setup weren't successful, and then bailing out with an error. Option ii) is a better approach since this additional paranoia avoids altogether leaving any potential W+X pages from BPF side in the system. Therefore, lets be strict about it, and reject programs in such unlikely occasion. While testing I noticed also that one bpf_prog_lock_ro() call was missing on the outer dummy prog in case of calls, e.g. in the destructor we call bpf_prog_free_deferred() on the main prog where we try to bpf_prog_unlock_free() the program, and since we go via bpf_prog_select_runtime() do that as well. Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-15bpf: fix panic in prog load calls cleanupDaniel Borkmann2-6/+16
While testing I found that when hitting error path in bpf_prog_load() where we jump to free_used_maps and prog contained BPF to BPF calls that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add() done under jit_subprogs(). Add proper API to make BPF kallsyms deletion more clear and fix that. Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-3/+36
Merge more updates from Andrew Morton: - MM remainders - various misc things - kcov updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) lib/test_printf.c: call wait_for_random_bytes() before plain %p tests hexagon: drop the unused variable zero_page_mask hexagon: fix printk format warning in setup.c mm: fix oom_kill event handling treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX mm: use octal not symbolic permissions ipc: use new return type vm_fault_t sysvipc/sem: mitigate semnum index against spectre v1 fault-injection: reorder config entries arm: port KCOV to arm sched/core / kcov: avoid kcov_area during task switch kcov: prefault the kcov_area kcov: ensure irq code sees a valid area kernel/relay.c: change return type to vm_fault_t exofs: avoid VLA in structures coredump: fix spam with zero VMA process fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block() proc: skip branch in /proc/*/* lookup mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns mm/memblock: add missing include <linux/bootmem.h> ...
2018-06-15sched/core / kcov: avoid kcov_area during task switchMark Rutland2-1/+5
During a context switch, we first switch_mm() to the next task's mm, then switch_to() that new task. This means that vmalloc'd regions which had previously been faulted in can transiently disappear in the context of the prev task. Functions instrumented by KCOV may try to access a vmalloc'd kcov_area during this window, and as the fault handling code is instrumented, this results in a recursive fault. We must avoid accessing any kcov_area during this window. We can do so with a new flag in kcov_mode, set prior to switching the mm, and cleared once the new task is live. Since task_struct::kcov_mode isn't always a specific enum kcov_mode value, this is made an unsigned int. The manipulation is hidden behind kcov_{prepare,finish}_switch() helpers, which are empty for !CONFIG_KCOV kernels. The code uses macros because I can't use static inline functions without a circular include dependency between <linux/sched.h> and <linux/kcov.h>, since the definition of task_struct uses things defined in <linux/kcov.h> Link: http://lkml.kernel.org/r/20180504135535.53744-4-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15kcov: prefault the kcov_areaMark Rutland1-0/+16
On many architectures the vmalloc area is lazily faulted in upon first access. This is problematic for KCOV, as __sanitizer_cov_trace_pc accesses the (vmalloc'd) kcov_area, and fault handling code may be instrumented. If an access to kcov_area faults, this will result in mutual recursion through the fault handling code and __sanitizer_cov_trace_pc(), eventually leading to stack corruption and/or overflow. We can avoid this by faulting in the kcov_area before __sanitizer_cov_trace_pc() is permitted to access it. Once it has been faulted in, it will remain present in the process page tables, and will not fault again. [akpm@linux-foundation.org: code cleanup] [akpm@linux-foundation.org: add comment explaining kcov_fault_in_area()] [akpm@linux-foundation.org: fancier code comment from Mark] Link: http://lkml.kernel.org/r/20180504135535.53744-3-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15kcov: ensure irq code sees a valid areaMark Rutland1-1/+2
Patch series "kcov: fix unexpected faults". These patches fix a few issues where KCOV code could trigger recursive faults, discovered while debugging a patch enabling KCOV for arch/arm: * On CONFIG_PREEMPT kernels, there's a small race window where __sanitizer_cov_trace_pc() can see a bogus kcov_area. * Lazy faulting of the vmalloc area can cause mutual recursion between fault handling code and __sanitizer_cov_trace_pc(). * During the context switch, switching the mm can cause the kcov_area to be transiently unmapped. These are prerequisites for enabling KCOV on arm, but the issues themsevles are generic -- we just happen to avoid them by chance rather than design on x86-64 and arm64. This patch (of 3): For kernels built with CONFIG_PREEMPT, some C code may execute before or after the interrupt handler, while the hardirq count is zero. In these cases, in_task() can return true. A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it resets the task's kcov data via kcov_task_init(). Instrumented code executed during this period will call __sanitizer_cov_trace_pc(), and as in_task() returns true, will inspect t->kcov_mode before trying to write to t->kcov_area. In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores, which may be re-ordered, torn, etc. Thus __sanitizer_cov_trace_pc() may see bogus values for any of these fields, and may attempt to write to memory which is not mapped. Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a barrier() to ensure this is ordered before we clear t->kov_{area,size}. This ensures that any code execute while kcov_init_task() is preempted will either see valid values for t->kcov_{area,size}, or will see that t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching t->kcov_area. Link: http://lkml.kernel.org/r/20180504135535.53744-2-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15kernel/relay.c: change return type to vm_fault_tSouptick Joarder1-1/+1
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Link: http://lkml.kernel.org/r/20180510140335.GA25363@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15mm: check for SIGKILL inside dup_mmap() loopTetsuo Handa1-0/+8
As a theoretical problem, dup_mmap() of an mm_struct with 60000+ vmas can loop while potentially allocating memory, with mm->mmap_sem held for write by current thread. This is bad if current thread was selected as an OOM victim, for current thread will continue allocations using memory reserves while OOM reaper is unable to reclaim memory. As an actually observable problem, it is not difficult to make OOM reaper unable to reclaim memory if the OOM victim is blocked at i_mmap_lock_write() in this loop. Unfortunately, since nobody can explain whether it is safe to use killable wait there, let's check for SIGKILL before trying to allocate memory. Even without an OOM event, there is no point with continuing the loop from the beginning if current thread is killed. I tested with debug printk(). This patch should be safe because we already fail if security_vm_enough_memory_mm() or kmem_cache_alloc(GFP_KERNEL) fails and exit_mmap() handles it. ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting exit_mmap() due to NULL mmap ***** [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/201804071938.CDE04681.SOFVQJFtMHOOLF@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Rik van Riel <riel@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>