summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-11-10cpufreq: Introduce governor flagsRafael J. Wysocki1-1/+1
A new cpufreq governor flag will be added subsequently, so replace the bool dynamic_switching fleid in struct cpufreq_governor with a flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for the "dynamic switching" governors instead of it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-10arm64: smp: Tell RCU about CPUs that fail to come onlineWill Deacon1-1/+1
Commit ce3d31ad3cac ("arm64/smp: Move rcu_cpu_starting() earlier") ensured that RCU is informed early about incoming CPUs that might end up calling into printk() before they are online. However, if such a CPU fails the early CPU feature compatibility checks in check_local_cpu_capabilities(), then it will be powered off or parked without informing RCU, leading to an endless stream of stalls: | rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: | rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593 | (detected by 0, t=5252 jiffies, g=9317, q=136) | Task dump for CPU 2: | task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028 | Call trace: | ret_from_fork+0x0/0x30 Ensure that the dying CPU invokes rcu_report_dead() prior to being powered off or parked. Cc: Qian Cai <cai@redhat.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Suggested-by: Qian Cai <cai@redhat.com> Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-11-10bpf: Fix passing zero to PTR_ERR() in bpf_btf_printf_prepareWang Qing1-1/+1
There is a bug when passing zero to PTR_ERR() and return. Fix the smatch error. Fixes: c4d0bfb45068 ("bpf: Add bpf_snprintf_btf helper") Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1604735144-686-1-git-send-email-wangqing@vivo.com
2020-11-09perf: Tweak perf_event_attr::exclusive semanticsPeter Zijlstra1-1/+1
Currently perf_event_attr::exclusive can be used to ensure an event(group) is the sole group scheduled on the PMU. One consequence is that when you have a pinned event (say the watchdog) you can no longer have regular exclusive event(group)s. Inspired by the fact that !pinned events are considered less strict, allow !pinned,exclusive events to share the PMU with pinned,!exclusive events. Pinned,exclusive is still fully exclusive. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201029162902.105962225@infradead.org
2020-11-09perf: Fix event multiplexing for exclusive groupsPeter Zijlstra1-1/+1
Commit 9e6302056f80 ("perf: Use hrtimers for event multiplexing") placed the hrtimer (re)start call in the wrong place. Instead of capturing all scheduling failures, it only considered the PMU failure. The result is that groups using perf_event_attr::exclusive are no longer rotated. Fixes: 9e6302056f80 ("perf: Use hrtimers for event multiplexing") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201029162902.038667689@infradead.org
2020-11-09perf: Simplify group_sched_in()Peter Zijlstra1-7/+3
Collate the error paths. Code duplication only leads to divergence and extra bugs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201029162901.972161394@infradead.org
2020-11-09perf: Simplify group_sched_out()Peter Zijlstra1-3/+0
Since event_sched_out() clears cpuctx->exclusive upon removal of an exclusive event (and only group leaders can be exclusive), there is no point in group_sched_out() trying to do it too. It is impossible for cpuctx->exclusive to still be set here. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201029162901.904060564@infradead.org
2020-11-09perf/arch: Remove perf_sample_data::regs_user_copyPeter Zijlstra1-5/+3
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
2020-11-09perf: Optimize get_recursion_context()Peter Zijlstra1-10/+6
"Look ma, no branches!" Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lkml.kernel.org/r/20201030151955.187580298@infradead.org
2020-11-09perf: Fix get_recursion_context()Peter Zijlstra1-1/+1
One should use in_serving_softirq() to detect SoftIRQ context. Fixes: 96f6d4444302 ("perf_counter: avoid recursion") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.120572175@infradead.org
2020-11-09perf: Reduce stack usage of perf_output_begin()Peter Zijlstra2-24/+28
__perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
2020-11-09futex: Don't enable IRQs unconditionally in put_pi_state()Dan Carpenter1-2/+3
The exit_pi_state_list() function calls put_pi_state() with IRQs disabled and is not expecting that IRQs will be enabled inside the function. Use the _irqsave() variant so that IRQs are restored to the original state instead of being enabled unconditionally. Fixes: 153fbd1226fb ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201106085205.GA1159983@mwanda
2020-11-08fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parentEddy Wu1-5/+5
current->group_leader->exit_signal may change during copy_process() if current->real_parent exits. Move the assignment inside tasklist_lock to avoid the race. Signed-off-by: Eddy Wu <eddy_wu@trendmicro.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-08Merge tag 'perf-urgent-2020-11-08' of ↵Linus Torvalds1-7/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Thomas Gleixner: "A single fix for the perf core plugging a memory leak in the address filter parser" * tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix a memory leak in perf_event_parse_addr_filter()
2020-11-08Merge tag 'locking-urgent-2020-11-08' of ↵Linus Torvalds1-2/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fix from Thomas Gleixner: "A single fix for the futex code where an intermediate state in the underlying RT mutex was not handled correctly and triggering a BUG() instead of treating it as another variant of retry condition" * tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Handle transient "ownerless" rtmutex state correctly
2020-11-08Merge tag 'irq-urgent-2020-11-08' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A set of fixes for interrupt chip drivers: - Fix the fallout of the IPI as interrupt conversion in Kconfig and the BCM2836 interrupt chip driver - Fixes for interrupt affinity setting and the handling of hierarchical irq domains in the SiFive PLIC driver - Make the unmapped event handling in the TI SCI driver work correctly - A few minor fixes and cleanups in various chip drivers and Kconfig" * tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: irqchip: ti, sci-inta: Fix diagram indentation for unmapped events irqchip/ti-sci-inta: Add support for unmapped event handling dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm irqchip/sifive-plic: Fix chip_data access within a hierarchy irqchip/sifive-plic: Fix broken irq_set_affinity() callback irqchip/stm32-exti: Add all LP timer exti direct events support irqchip/bcm2836: Fix missing __init annotation irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY irqchip/mst: Make mst_intc_of_init static irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7 genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
2020-11-08Merge tag 'core-urgent-2020-11-08' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull entry code fix from Thomas Gleixner: "A single fix for the generic entry code to correct the wrong assumption that the lockdep interrupt state needs not to be established before calling the RCU check" * tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Fix the incorrect ordering of lockdep and RCU check
2020-11-08futex: Handle transient "ownerless" rtmutex state correctlyMike Galbraith1-2/+14
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner(). This is one possible chain of events leading to this: Task Prio Operation T1 120 lock(F) T2 120 lock(F) -> blocks (top waiter) T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter) XX timeout/ -> wakes T2 signal T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set) T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter and the lower priority T2 cannot steal the lock. -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON() The comment states that this is invalid and rt_mutex_real_owner() must return a non NULL owner when the trylock failed, but in case of a queued and woken up waiter rt_mutex_real_owner() == NULL is a valid transient state. The higher priority waiter has simply not yet managed to take over the rtmutex. The BUG_ON() is therefore wrong and this is just another retry condition in fixup_pi_state_owner(). Drop the locks, so that T3 can make progress, and then try the fixup again. Gratian provided a great analysis, traces and a reproducer. The analysis is to the point, but it confused the hell out of that tglx dude who had to page in all the futex horrors again. Condensed version is above. [ tglx: Wrote comment and changelog ] Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Reported-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
2020-11-07perf/core: Fix a memory leak in perf_event_parse_addr_filter()kiyin(尹亮)1-7/+5
As shown through runtime testing, the "filename" allocation is not always freed in perf_event_parse_addr_filter(). There are three possible ways that this could happen: - It could be allocated twice on subsequent iterations through the loop, - or leaked on the success path, - or on the failure path. Clean up the code flow to make it obvious that 'filename' is always freed in the reallocation path and in the two return paths as well. We rely on the fact that kfree(NULL) is NOP and filename is initialized with NULL. This fixes the leak. No other side effects expected. [ Dan Carpenter: cleaned up the code flow & added a changelog. ] [ Ingo Molnar: updated the changelog some more. ] Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Signed-off-by: "kiyin(尹亮)" <kiyin@tencent.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu> Cc: Anthony Liguori <aliguori@amazon.com> -- kernel/events/core.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
2020-11-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski5-7/+42
Alexei Starovoitov says: ==================== pull-request: bpf 2020-11-06 1) Pre-allocated per-cpu hashmap needs to zero-fill reused element, from David. 2) Tighten bpf_lsm function check, from KP. 3) Fix bpftool attaching to flow dissector, from Lorenz. 4) Use -fno-gcse for the whole kernel/bpf/core.c instead of function attribute, from Ard. * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Update verification logic for LSM programs bpf: Zero-fill re-used per-cpu map element bpf: BPF_PRELOAD depends on BPF_SYSCALL tools/bpftool: Fix attaching flow dissector libbpf: Fix possible use after free in xsk_socket__delete libbpf: Fix null dereference in xsk_socket__delete libbpf, hashmap: Fix undefined behavior in hash_bits bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSE tools, bpftool: Remove two unused variables. tools, bpftool: Avoid array index warnings. xsk: Fix possible memory leak at socket close bpf: Add struct bpf_redir_neigh forward declaration to BPF helper defs samples/bpf: Set rlimit for memlock to infinity in all samples bpf: Fix -Wshadow warnings selftest/bpf: Fix profiler test using CO-RE relocation for enums ==================== Link: https://lore.kernel.org/r/20201106221759.24143-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-07bpf: Update verification logic for LSM programsKP Singh1-3/+7
The current logic checks if the name of the BTF type passed in attach_btf_id starts with "bpf_lsm_", this is not sufficient as it also allows attachment to non-LSM hooks like the very function that performs this check, i.e. bpf_lsm_verify_prog. In order to ensure that this verification logic allows attachment to only LSM hooks, the LSM_HOOK definitions in lsm_hook_defs.h are used to generate a BTF_ID set. Upon verification, the attach_btf_id of the program being attached is checked for presence in this set. Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution") Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201105230651.2621917-1-kpsingh@chromium.org
2020-11-06printk: remove unneeded dead-store assignmentLukas Bulwahn1-2/+0
make clang-analyzer on x86_64 defconfig caught my attention with: kernel/printk/printk_ringbuffer.c:885:3: warning: Value stored to 'desc' is never read [clang-analyzer-deadcode.DeadStores] desc = to_desc(desc_ring, head_id); ^ Commit b6cf8b3f3312 ("printk: add lockless ringbuffer") introduced desc_reserve() with this unneeded dead-store assignment. As discussed with John Ogness privately, this is probably just some minor left-over from previous iterations of the ringbuffer implementation. So, simply remove this unneeded dead assignment to make clang-analyzer happy. As compilers will detect this unneeded assignment and optimize this anyway, the resulting object code is identical before and after this change. No functional change. No change to object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20201106034005.18822-1-lukas.bulwahn@gmail.com
2020-11-06bpf: Zero-fill re-used per-cpu map elementDavid Verbeiren1-2/+28
Zero-fill element values for all other cpus than current, just as when not using prealloc. This is the only way the bpf program can ensure known initial values for all cpus ('onallcpus' cannot be set when coming from the bpf program). The scenario is: bpf program inserts some elements in a per-cpu map, then deletes some (or userspace does). When later adding new elements using bpf_map_update_elem(), the bpf program can only set the value of the new elements for the current cpu. When prealloc is enabled, previously deleted elements are re-used. Without the fix, values for other cpus remain whatever they were when the re-used entry was previously freed. A selftest is added to validate correct operation in above scenario as well as in case of LRU per-cpu map element re-use. Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Signed-off-by: David Verbeiren <david.verbeiren@tessares.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201104112332.15191-1-david.verbeiren@tessares.net
2020-11-06bpf: BPF_PRELOAD depends on BPF_SYSCALLRandy Dunlap1-0/+1
Fix build error when BPF_SYSCALL is not set/enabled but BPF_PRELOAD is by making BPF_PRELOAD depend on BPF_SYSCALL. ERROR: modpost: "bpf_preload_ops" [kernel/bpf/preload/bpf_preload.ko] undefined! Reported-by: kernel test robot <lkp@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201105195109.26232-1-rdunlap@infradead.org
2020-11-05Merge tag 'trace-v5.10-rc2' of ↵Linus Torvalds6-34/+107
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Fix off-by-one error in retrieving the context buffer for trace_printk() - Fix off-by-one error in stack nesting limit - Fix recursion to not make all NMI code false positive as recursing - Stop losing events in function tracing when transitioning between irq context - Stop losing events in ring buffer when transitioning between irq context - Fix return code of error pointer in parse_synth_field() to prevent NULL pointer dereference. - Fix false positive of NMI recursion in kprobe event handling * tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: kprobes: Tell lockdep about kprobe nesting tracing: Make -ENOMEM the default error for parse_synth_field() ring-buffer: Fix recursion protection transitions between interrupt context tracing: Fix the checking of stackidx in __ftrace_trace_stack ftrace: Handle tracing when switching between context ftrace: Fix recursion check for NMI test tracing: Fix out of bounds write in get_trace_buf
2020-11-05Merge tag 'pm-5.10-rc3' of ↵Linus Torvalds1-12/+10
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the device links support in runtime PM, correct mistakes in the cpuidle documentation, fix the handling of policy limits changes in the schedutil cpufreq governor, fix assorted issues in the OPP (operating performance points) framework and make one janitorial change. Specifics: - Unify the handling of managed and stateless device links in the runtime PM framework and prevent runtime PM references to devices from being leaked after device link removal (Rafael Wysocki). - Fix two mistakes in the cpuidle documentation (Julia Lawall). - Prevent the schedutil cpufreq governor from missing policy limits updates in some cases (Viresh Kumar). - Prevent static OPPs from being dropped by mistake (Viresh Kumar). - Prevent helper function in the OPP framework from returning prematurely (Viresh Kumar). - Prevent opp_table_lock from being held too long during removal of OPP tables with no more active references (Viresh Kumar). - Drop redundant semicolon from the Intel RAPL power capping driver (Tom Rix)" * tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: runtime: Resume the device earlier in __device_release_driver() PM: runtime: Drop pm_runtime_clean_up_links() PM: runtime: Drop runtime PM references to supplier on link removal powercap/intel_rapl: remove unneeded semicolon Documentation: PM: cpuidle: correct path name Documentation: PM: cpuidle: correct typo cpufreq: schedutil: Don't skip freq update if need_freq_update is set opp: Reduce the size of critical section in _opp_table_kref_release() opp: Fix early exit from dev_pm_opp_register_set_opp_helper() opp: Don't always remove static OPPs in _of_add_opp_table_v1()
2020-11-04entry: Fix the incorrect ordering of lockdep and RCU checkThomas Gleixner1-2/+2
When an exception/interrupt hits kernel space and the kernel is not currently in the idle task then RCU must be watching. irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in turn invokes lockdep when taking a lock. But at that point lockdep does not yet know about the fact that interrupts have been disabled by the CPU, which triggers a lockdep splat complaining about inconsistent state. Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU. So use the same sequence as for the idle case and tell lockdep about the irq state change first, invoke the RCU check and then do the lockdep and tracer update. Fixes: a5497bab5f72 ("entry: Provide generic interrupt entry/exit code") Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87y2jhl19s.fsf@nanos.tec.linutronix.de
2020-11-04kprobes: Tell lockdep about kprobe nestingSteven Rostedt (VMware)1-4/+21
Since the kprobe handlers have protection that prohibits other handlers from executing in other contexts (like if an NMI comes in while processing a kprobe, and executes the same kprobe, it will get fail with a "busy" return). Lockdep is unaware of this protection. Use lockdep's nesting api to differentiate between locks taken in INT3 context and other context to suppress the false warnings. Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02tracing: Make -ENOMEM the default error for parse_synth_field()Steven Rostedt (VMware)1-10/+7
parse_synth_field() returns a pointer and requires that errors get surrounded by ERR_PTR(). The ret variable is initialized to zero, but should never be used as zero, and if it is, it could cause a false return code and produce a NULL pointer dereference. It makes no sense to set ret to zero. Set ret to -ENOMEM (the most common error case), and have any other errors set it to something else. This removes the need to initialize ret on *every* error branch. Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02ring-buffer: Fix recursion protection transitions between interrupt contextSteven Rostedt (VMware)1-12/+46
The recursion protection of the ring buffer depends on preempt_count() to be correct. But it is possible that the ring buffer gets called after an interrupt comes in but before it updates the preempt_count(). This will trigger a false positive in the recursion code. Use the same trick from the ftrace function callback recursion code which uses a "transition" bit that gets set, to allow for a single recursion for to handle transitions between contexts. Cc: stable@vger.kernel.org Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02kernel/hung_task.c: make type annotations consistentLukas Bulwahn1-2/+1
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") removed various __user annotations from function signatures as part of its refactoring. It also removed the __user annotation for proc_dohung_task_timeout_secs() at its declaration in sched/sysctl.h, but not at its definition in kernel/hung_task.c. Hence, sparse complains: kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces)) Adjust the annotation at the definition fitting to that refactoring to make sparse happy again, which also resolves this warning from sparse: kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces) kernel/hung_task.c:277:52: expected void * kernel/hung_task.c:277:52: got void [noderef] __user *buffer No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ignatov <rdna@fb.com> Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-02kthread_worker: prevent queuing delayed work from timer_fn when it is being ↵Zqiang1-1/+2
canceled There is a small race window when a delayed work is being canceled and the work still might be queued from the timer_fn: CPU0 CPU1 kthread_cancel_delayed_work_sync() __kthread_cancel_work_sync() __kthread_cancel_work() work->canceling++; kthread_delayed_work_timer_fn() kthread_insert_work(); BUG: kthread_insert_work() should not get called when work->canceling is set. Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-02ptrace: fix task_join_group_stop() for the case when current is tracedOleg Nesterov1-9/+10
This testcase #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <pthread.h> #include <assert.h> void *tf(void *arg) { return NULL; } int main(void) { int pid = fork(); if (!pid) { kill(getpid(), SIGSTOP); pthread_t th; pthread_create(&th, NULL, tf, NULL); return 0; } waitpid(pid, NULL, WSTOPPED); ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE); waitpid(pid, NULL, 0); ptrace(PTRACE_CONT, pid, 0,0); waitpid(pid, NULL, 0); int status; int thread = waitpid(-1, &status, 0); assert(thread > 0 && thread != pid); assert(status == 0x80137f); return 0; } fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap(). This is because task_join_group_stop() has 2 problems when current is traced: 1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee can be woken up by debugger and it can clone another thread which should join the group-stop. We need to check group_stop_count || SIGNAL_STOP_STOPPED. 2. If SIGNAL_STOP_STOPPED is already set, we should not increment sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread should stop without another do_notify_parent_cldstop() report. To clarify, the problem is very old and we should blame ptrace_init_task(). But now that we have task_join_group_stop() it makes more sense to fix this helper to avoid the code duplication. Reported-by: syzbot+3485e3773f7da290eecc@syzkaller.appspotmail.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christian Brauner <christian@brauner.io> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Zhiqiang Liu <liuzhiqiang26@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-02cpufreq: schedutil: Don't skip freq update if need_freq_update is setViresh Kumar1-12/+10
The cpufreq policy's frequency limits (min/max) can get changed at any point of time, while schedutil is trying to update the next frequency. Though the schedutil governor has necessary locking and support in place to make sure we don't miss any of those updates, there is a corner case where the governor will find that the CPU is already running at the desired frequency and so may skip an update. For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz and is running at 1 GHz currently. Schedutil tries to update the frequency to 1.2 GHz, during this time the policy limits get changed as policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the frequency at various instances, we will eventually set the frequency to 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq. Now lets say the policy limits get changed back at this time with policy->min as 1 GHz. The next time schedutil is invoked by the scheduler, we will reevaluate the next frequency (because need_freq_update will get set due to limits change event) and lets say we want to set the frequency to 1.2 GHz again. At this point sugov_update_next_freq() will find the next_freq == current_freq and will abort the update, while the CPU actually runs at 1.4 GHz. Until now need_freq_update was used as a flag to indicate that the policy's frequency limits have changed, and that we should consider the new limits while reevaluating the next frequency. This patch fixes the above mentioned issue by extending the purpose of the need_freq_update flag. If this flag is set now, the schedutil governor will not try to abort a frequency change even if next_freq == current_freq. As similar behavior is required in the case of CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be set to false if that flag is set for the driver. We also don't need to consider the need_freq_update flag in sugov_update_single() anymore to handle the special case of busy CPU, as we won't abort a frequency update anymore. Reported-by: zhuguangqing <zhuguangqing@xiaomi.com> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Rearrange code to avoid a branch ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-02tracing: Fix the checking of stackidx in __ftrace_trace_stackQiujun Huang1-2/+2
The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING is illegal too. And fix two typos by the way. Link: https://lkml.kernel.org/r/20201031085714.2147-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_singleChristoph Hellwig1-10/+6
The tbl_dma_addr argument is used to check the DMA boundary for the allocations, and thus needs to be a dma_addr_t. swiotlb-xen instead passed a physical address, which could lead to incorrect results for strange offsets. Fix this by removing the parameter entirely and hard code the DMA address for io_tlb_start instead. Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys translations") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-11-02swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"Stefano Stabellini1-1/+5
kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to allocate a buffer for the swiotlb. It does so by calling memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE); If the allocation must fail, no_iotlb_memory is set. Later during initialization swiotlb-xen comes in (drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start is != 0, it thinks the memory is ready to use when actually it is not. When the swiotlb is actually needed, swiotlb_tbl_map_single gets called and since no_iotlb_memory is set the kernel panics. Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been initialized, it would do the initialization itself, which might still succeed. Fix the panic by setting io_tlb_start to 0 on swiotlb initialization failure, and also by setting no_iotlb_memory to false on swiotlb initialization success. Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb") Reported-by: Elliott Mitchell <ehem+xen@m5p.com> Tested-by: Elliott Mitchell <ehem+xen@m5p.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-11-02ftrace: Handle tracing when switching between contextSteven Rostedt (VMware)2-4/+28
When an interrupt or NMI comes in and switches the context, there's a delay from when the preempt_count() shows the update. As the preempt_count() is used to detect recursion having each context have its own bit get set when tracing starts, and if that bit is already set, it is considered a recursion and the function exits. But if this happens in that section where context has changed but preempt_count() has not been updated, this will be incorrectly flagged as a recursion. To handle this case, create another bit call TRANSITION and test it if the current context bit is already set. Flag the call as a recursion if the TRANSITION bit is already set, and if not, set it and continue. The TRANSITION bit will be cleared normally on the return of the function that set it, or if the current context bit is clear, set it and clear the TRANSITION bit to allow for another transition between the current context and an even higher one. Cc: stable@vger.kernel.org Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02ftrace: Fix recursion check for NMI testSteven Rostedt (VMware)1-1/+2
The code that checks recursion will work to only do the recursion check once if there's nested checks. The top one will do the check, the other nested checks will see recursion was already checked and return zero for its "bit". On the return side, nothing will be done if the "bit" is zero. The problem is that zero is returned for the "good" bit when in NMI context. This will set the bit for NMIs making it look like *all* NMI tracing is recursing, and prevent tracing of anything in NMI context! The simple fix is to return "bit + 1" and subtract that bit on the end to get the real bit. Cc: stable@vger.kernel.org Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-02tracing: Fix out of bounds write in get_trace_bufQiujun Huang1-1/+1
The nesting count of trace_printk allows for 4 levels of nesting. The nesting counter starts at zero and is incremented before being used to retrieve the current context's buffer. But the index to the buffer uses the nesting counter after it was incremented, and not its original number, which in needs to do. Link: https://lkml.kernel.org/r/20201029161905.4269-1-hqjagain@gmail.com Cc: stable@vger.kernel.org Fixes: 3d9622c12c887 ("tracing: Add barrier to trace_printk() buffer nesting modification") Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-01Merge tag 'timers-urgent-2020-11-01' of ↵Linus Torvalds4-16/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A few fixes for timers/timekeeping: - Prevent undefined behaviour in the timespec64_to_ns() conversion which is used for converting user supplied time input to nanoseconds. It lacked overflow protection. - Mark sched_clock_read_begin/retry() to prevent recursion in the tracer - Remove unused debug functions in the hrtimer and timerlist code" * tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Prevent undefined behaviour in timespec64_to_ns() timers: Remove unused inline funtion debug_timer_free() hrtimer: Remove unused inline function debug_hrtimer_free() time/sched_clock: Mark sched_clock_read_begin/retry() as notrace
2020-11-01Merge tag 'smp-urgent-2020-11-01' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp fix from Thomas Gleixner: "A single fix for stop machine. Mark functions no trace to prevent a crash caused by recursion when enabling or disabling a tracer on RISC-V (probably all architectures which patch through stop machine)" * tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine, rcu: Mark functions as notrace
2020-11-01Merge tag 'locking-urgent-2020-11-01' of ↵Linus Torvalds2-14/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A couple of locking fixes: - Fix incorrect failure injection handling in the fuxtex code - Prevent a preemption warning in lockdep when tracking local_irq_enable() and interrupts are already enabled - Remove more raw_cpu_read() usage from lockdep which causes state corruption on !X86 architectures. - Make the nr_unused_locks accounting in lockdep correct again" * tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix nr_unused_locks accounting locking/lockdep: Remove more raw_cpu_read() usage futex: Fix incorrect should_fail_futex() handling lockdep: Fix preemption WARN for spurious IRQ-enable
2020-11-01Merge tag 'flexible-array-conversions-5.10-rc2' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull more flexible-array member conversions from Gustavo A. R. Silva: "Replace zero-length arrays with flexible-array members" * tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: printk: ringbuffer: Replace zero-length array with flexible-array member net/smc: Replace zero-length array with flexible-array member net/mlx5: Replace zero-length array with flexible-array member mei: hw: Replace zero-length array with flexible-array member gve: Replace zero-length array with flexible-array member Bluetooth: btintel: Replace zero-length array with flexible-array member scsi: target: tcmu: Replace zero-length array with flexible-array member ima: Replace zero-length array with flexible-array member enetc: Replace zero-length array with flexible-array member fs: Replace zero-length array with flexible-array member Bluetooth: Replace zero-length array with flexible-array member params: Replace zero-length array with flexible-array member tracepoint: Replace zero-length array with flexible-array member platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
2020-10-31printk: ringbuffer: Replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-10-30Merge tag 'pm-5.10-rc2' of ↵Linus Torvalds2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a few issues related to running intel_pstate in the passive mode with HWP enabled, correct the handling of the max_cstate module parameter in intel_idle and make a few janitorial changes. Specifics: - Modify Kconfig to prevent configuring either the "conservative" or the "ondemand" governor as the default cpufreq governor if intel_pstate is selected, in which case "schedutil" is the default choice for the default governor setting (Rafael Wysocki). - Modify the cpufreq core, intel_pstate and the schedutil governor to avoid missing updates of the HWP max limit when intel_pstate operates in the passive mode with HWP enabled (Rafael Wysocki). - Fix max_cstate module parameter handling in intel_idle for processor models with C-state tables coming from ACPI (Chen Yu). - Clean up assorted pieces of power management code (Jackie Zamow, Tom Rix, Zhang Qilong)" * tag 'pm-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is set cpufreq: Introduce cpufreq_driver_test_flags() cpufreq: speedstep: remove unneeded semicolon PM: sleep: fix typo in kernel/power/process.c intel_idle: Fix max_cstate for processor models without C-state tables cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag cpufreq: Avoid configuring old governors as default with intel_pstate cpufreq: e_powersaver: remove unreachable break
2020-10-30lockdep: Fix nr_unused_locks accountingPeter Zijlstra1-10/+4
Chris reported that commit 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow") breaks the nr_unused_locks validation code triggered by /proc/lockdep_stats. By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad indicator for accounting nr_unused_locks; simplyfy by using any first bit. Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow") Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net
2020-10-30locking/lockdep: Remove more raw_cpu_read() usagePeter Zijlstra1-1/+1
I initially thought raw_cpu_read() was OK, since if it is !0 we have IRQs disabled and can't get migrated, so if we get migrated both CPUs must have 0 and it doesn't matter which 0 we read. And while that is true; it isn't the whole store, on pretty much all architectures (except x86) this can result in computing the address for one CPU, getting migrated, the old CPU continuing execution with another task (possibly setting recursion) and then the new CPU reading the value of the old CPU, which is no longer 0. Similer to: baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net
2020-10-30Merge branches 'pm-cpuidle' and 'pm-sleep'Rafael J. Wysocki1-1/+1
* pm-cpuidle: intel_idle: Fix max_cstate for processor models without C-state tables * pm-sleep: PM: sleep: fix typo in kernel/power/process.c
2020-10-30bpf: Don't rely on GCC __attribute__((optimize)) to disable GCSEArd Biesheuvel2-2/+6
Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a function scope __attribute__((optimize("-fno-gcse"))), to disable a GCC specific optimization that was causing trouble on x86 builds, and was not expected to have any positive effect in the first place. However, as the GCC manual documents, __attribute__((optimize)) is not for production use, and results in all other optimization options to be forgotten for the function in question. This can cause all kinds of trouble, but in one particular reported case, it causes -fno-asynchronous-unwind-tables to be disregarded, resulting in .eh_frame info to be emitted for the function. This reverts commit 3193c0836, and instead, it disables the -fgcse optimization for the entire source file, but only when building for X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the original commit states that CONFIG_RETPOLINE=n triggers the issue, whereas CONFIG_RETPOLINE=y performs better without the optimization, so it is kept disabled in both cases. Fixes: 3193c0836f20 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20201028171506.15682-2-ardb@kernel.org