summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-10-23Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds2-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Two main changes: - Cleanups, simplifications and CLOCK_TAI support (Thomas Gleixner) - Improve code generation (Andy Lutomirski)" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Rearrange do_hres() to improve code generation x86/vdso: Document vgtod_ts better x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks x66/vdso: Add CLOCK_TAI support x86/vdso: Move cycle_last handling into the caller x86/vdso: Simplify the invalid vclock case x86/vdso: Replace the clockid switch case x86/vdso: Collapse coarse functions x86/vdso: Collapse high resolution functions x86/vdso: Introduce and use vgtod_ts x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq x86/vdso: Enforce 64bit clocksource x86/time: Implement clocksource_arch_init() clocksource: Provide clocksource_arch_init()
2018-10-23Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2-1/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Ingo Molnar: "The main changes: - Make the IBPB barrier more strict and add STIBP support (Jiri Kosina) - Micro-optimize and clean up the entry code (Andy Lutomirski) - ... plus misc other fixes" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Propagate information about RSB filling mitigation to sysfs x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation x86/speculation: Apply IBPB more strictly to avoid cross-process data leak x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION x86/pti/64: Remove the SYSCALL64 entry trampoline x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space x86/entry/64: Document idtentry
2018-10-23Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds6-102/+94
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "Lots of changes in this cycle: - Lots of CPA (change page attribute) optimizations and related cleanups (Thomas Gleixner, Peter Zijstra) - Make lazy TLB mode even lazier (Rik van Riel) - Fault handler cleanups and improvements (Dave Hansen) - kdump, vmcore: Enable kdumping encrypted memory with AMD SME enabled (Lianbo Jiang) - Clean up VM layout documentation (Baoquan He, Ingo Molnar) - ... plus misc other fixes and enhancements" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry() x86/mm: Kill stray kernel fault handling comment x86/mm: Do not warn about PCI BIOS W+X mappings resource: Clean it up a bit resource: Fix find_next_iomem_res() iteration issue resource: Include resource end in walk_*() interfaces x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error x86/mm: Remove spurious fault pkey check x86/mm/vsyscall: Consider vsyscall page part of user address space x86/mm: Add vsyscall address helper x86/mm: Fix exception table comments x86/mm: Add clarifying comments for user addr space x86/mm: Break out user address space handling x86/mm: Break out kernel address space handling x86/mm: Clarify hardware vs. software "error_code" x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Add freed_tables element to flush_tlb_info x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range smp,cpumask: introduce on_each_cpu_cond_mask smp: use __cpumask_set_cpu in on_each_cpu_cond ...
2018-10-23Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds1-30/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "Improve the spreading of managed IRQs at allocation time" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq/matrix: Spread managed interrupts on allocation irq/matrix: Split out the CPU selection code into a helper
2018-10-23Merge branch 'sched-core-for-linus' of ↵Linus Torvalds8-84/+262
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes are: - Migrate CPU-intense 'misfit' tasks on asymmetric capacity systems, to better utilize (much) faster 'big core' CPUs. (Morten Rasmussen, Valentin Schneider) - Topology handling improvements, in particular when CPU capacity changes and related load-balancing fixes/improvements (Morten Rasmussen) - ... plus misc other improvements, fixes and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) sched/completions/Documentation: Add recommendation for dynamic and ONSTACK completions sched/completions/Documentation: Clean up the document some more sched/completions/Documentation: Fix a couple of punctuation nits cpu/SMT: State SMT is disabled even with nosmt and without "=force" sched/core: Fix comment regarding nr_iowait_cpu() and get_iowait_load() sched/fair: Remove setting task's se->runnable_weight during PELT update sched/fair: Disable LB_BIAS by default sched/pelt: Fix warning and clean up IRQ PELT config sched/topology: Make local variables static sched/debug: Use symbolic names for task state constants sched/numa: Remove unused numa_stats::nr_running field sched/numa: Remove unused code from update_numa_stats() sched/debug: Explicitly cast sched_feat() to bool sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains sched/fair: Don't move tasks to lower capacity CPUs unless necessary sched/fair: Set rq->rd->overload when misfit sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE() sched/core: Change root_domain->overload type to int sched/fair: Change 'prefer_sibling' type to bool sched/fair: Kick nohz balance if rq->misfit_task_load ...
2018-10-23Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2-14/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main updates in this cycle were: - Lots of perf tooling changes too voluminous to list (big perf trace and perf stat improvements, lots of libtraceevent reorganization, etc.), so I'll list the authors and refer to the changelog for details: Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa. ... with the bulk of the changes written by Jiri Olsa, Tzvetomir Stoyanov and Arnaldo Carvalho de Melo. - Continued intel_rdt work with a focus on playing well with perf events. This also imported some non-perf RDT work due to dependencies. (Reinette Chatre) - Implement counter freezing for Arch Perfmon v4 (Skylake and newer). This allows to speed up the PMI handler by avoiding unnecessary MSR writes and make it more accurate. (Andi Kleen) - kprobes cleanups and simplification (Masami Hiramatsu) - Intel Goldmont PMU updates (Kan Liang) - ... plus misc other fixes and updates" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits) kprobes/x86: Use preempt_enable() in optimized_callback() x86/intel_rdt: Prevent pseudo-locking from using stale pointers kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack perf/x86/intel: Export mem events only if there's PEBS support x86/cpu: Drop pointless static qualifier in punit_dev_state_show() x86/intel_rdt: Fix initial allocation to consider CDP x86/intel_rdt: CBM overlap should also check for overlap with CDP peer x86/intel_rdt: Introduce utility to obtain CDP peer tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file tools lib traceevent: Separate out tep_strerror() for strerror_r() issues perf python: More portable way to make CFLAGS work with clang perf python: Make clang_has_option() work on Python 3 perf tools: Free temporary 'sys' string in read_event_files() perf tools: Avoid double free in read_event_file() perf tools: Free 'printk' string in parse_ftrace_printk() perf tools: Cleanup trace-event-info 'tdata' leak perf strbuf: Match va_{add,copy} with va_end perf test: S390 does not support watchpoints in test 22 perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG tools include: Adopt linux/bits.h ...
2018-10-23Merge branch 'locking-core-for-linus' of ↵Linus Torvalds14-225/+342
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and misc x86 updates from Ingo Molnar: "Lots of changes in this cycle - in part because locking/core attracted a number of related x86 low level work which was easier to handle in a single tree: - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E. McKenney, Andrea Parri) - lockdep scalability improvements and micro-optimizations (Waiman Long) - rwsem improvements (Waiman Long) - spinlock micro-optimization (Matthew Wilcox) - qspinlocks: Provide a liveness guarantee (more fairness) on x86. (Peter Zijlstra) - Add support for relative references in jump tables on arm64, x86 and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens) - Be a lot less permissive on weird (kernel address) uaccess faults on x86: BUG() when uaccess helpers fault on kernel addresses (Jann Horn) - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav Amit) - ... and a handful of other smaller changes as well" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) locking/lockdep: Make global debug_locks* variables read-mostly locking/lockdep: Fix debug_locks off performance problem locking/pvqspinlock: Extend node size when pvqspinlock is configured locking/qspinlock_stat: Count instances of nested lock slowpaths locking/qspinlock, x86: Provide liveness guarantee x86/asm: 'Simplify' GEN_*_RMWcc() macros locking/qspinlock: Rework some comments locking/qspinlock: Re-order code locking/lockdep: Remove duplicated 'lock_class_ops' percpu array x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y futex: Replace spin_is_locked() with lockdep locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs x86/extable: Macrofy inline assembly code to work around GCC inlining bugs x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs x86/refcount: Work around GCC inlining bug x86/objtool: Use asm macros to work around GCC inlining bugs ...
2018-10-23Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds14-2387/+2008
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest change in this cycle is the conclusion of the big 'simplify RCU to two primary flavors' consolidation work - i.e. there's a single RCU flavor for any kernel variant (PREEMPT and !PREEMPT): - Consolidate the RCU-bh, RCU-preempt, and RCU-sched flavors into a single flavor similar to RCU-sched in !PREEMPT kernels and into a single flavor similar to RCU-preempt (but also waiting on preempt-disabled sequences of code) in PREEMPT kernels. This branch also includes a refactoring of rcu_{nmi,irq}_{enter,exit}() from Byungchul Park. - Now that there is only one RCU flavor in any given running kernel, the many "rsp" pointers are no longer required, and this cleanup series removes them. - This branch carries out additional cleanups made possible by the RCU flavor consolidation, including inlining now-trivial functions, updating comments and definitions, and removing now-unneeded rcutorture scenarios. - Now that there is only one flavor of RCU in any running kernel, there is also only on rcu_data structure per CPU. This means that the rcu_dynticks structure can be merged into the rcu_data structure, a task taken on by this branch. This branch also contains a -rt-related fix from Mike Galbraith. There were also other updates: - Documentation updates, including some good-eye catches from Joel Fernandes. - SRCU updates, most notably changes enabling call_srcu() to be invoked very early in the boot sequence. - Torture-test updates, including some preliminary work towards making rcutorture better able to find problems that result in insufficient grace-period forward progress. - Initial changes to RCU to better promote forward progress of grace periods, including fixing a bug found by Marius Hillenbrand and David Woodhouse, with the fix suggested by Peter Zijlstra" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) srcu: Make early-boot call_srcu() reuse workqueue lists rcutorture: Test early boot call_srcu() srcu: Make call_srcu() available during very early boot rcu: Convert rcu_state.ofl_lock to raw_spinlock_t rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks rcu: Switch dyntick nesting counters to rcu_data structure rcu: Switch urgent quiescent-state requests to rcu_data structure rcu: Switch lazy counts to rcu_data structure rcu: Switch last accelerate/advance to rcu_data structure rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure rcu: Merge rcu_dynticks structure into rcu_data structure rcu: Remove unused rcu_dynticks_snap() from Tiny RCU rcu: Convert "1UL << x" to "BIT(x)" rcu: Avoid resched_cpu() when rescheduling the current CPU rcu: More aggressively enlist scheduler aid for nohz_full CPUs rcu: Compute jiffies_till_sched_qs from other kernel parameters rcu: Provide functions for determining if call_rcu() has been invoked rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure rcu: Motivate Tiny RCU forward progress ...
2018-10-23Merge branch 'x86/cache' into perf/core, to pick up fixesIngo Molnar7-22/+114
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-23Merge tag 'pm-4.20-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These make hibernation on 32-bit x86 systems work in all of the cases in which it works on 64-bit x86 ones, update the menu cpuidle governor and the "polling" state to make them more efficient, add more hardware support to cpufreq drivers and fix issues with some of them, fix a bug in the conservative cpufreq governor, fix the operating performance points (OPP) framework and make it more stable, update the devfreq subsystem to take changes in the APIs used by into account and clean up some things all over. Specifics: - Backport hibernation bug fixes from x86-64 to x86-32 and consolidate hibernation handling on x86 to allow 32-bit systems to work in all of the cases in which 64-bit ones work (Zhimin Gu, Chen Yu). - Fix hibernation documentation (Vladimir D. Seleznev). - Update the menu cpuidle governor to fix a couple of issues with it, make it more efficient in some cases and clean it up (Rafael Wysocki). - Rework the cpuidle polling state implementation to make it more efficient (Rafael Wysocki). - Clean up the cpuidle core somewhat (Fieah Lim). - Fix the cpufreq conservative governor to take policy limits into account properly in some cases (Rafael Wysocki). - Add support for retrieving guaranteed performance information to the ACPI CPPC library and make the intel_pstate driver use it to expose the CPU base frequency via sysfs on systems with the hardware-managed P-states (HWP) feature enabled (Srinivas Pandruvada). - Fix clang warning in the CPPC cpufreq driver (Nathan Chancellor). - Get rid of device_node.name printing from cpufreq (Rob Herring). - Remove unnecessary unlikely() from the cpufreq core (Igor Stoppa). - Add support for the r8a7744 SoC to the cpufreq-dt driver (Biju Das). - Update the dt-platdev cpufreq driver to allow RK3399 to have separate tunables per cluster (Dmitry Torokhov). - Fix the dma_alloc_coherent() usage in the tegra186 cpufreq driver (Christoph Hellwig). - Make the imx6q cpufreq driver read OCOTP through nvmem for imx6ul/imx6ull (Anson Huang). - Fix several bugs in the operating performance points (OPP) framework and make it more stable (Viresh Kumar, Dave Gerlach). - Update the devfreq subsystem to take changes in the APIs used by into account, fix some issues with it and make it stop print device_node.name directly (Bjorn Andersson, Enric Balletbo i Serra, Matthias Kaehlcke, Rob Herring, Vincent Donnefort, zhong jiang). - Prepare the generic power domains (genpd) framework for dealing with domains containing CPUs (Ulf Hansson). - Prevent sysfs attributes representing low-power S0 residency counters from being exposed if low-power S0 support is not indicated in ACPI FADT (Rajneesh Bhardwaj). - Get rid of custom CPU features macros for Intel CPUs from the intel_idle and RAPL drivers (Andy Shevchenko). - Update the tasks freezer to list tasks that refused to freeze and caused a system transition to a sleep state to be aborted (Todd Brandt). - Update the pm-graph set of tools to v5.2 (Todd Brandt). - Fix some issues in the cpupower utility (Anders Roxell, Prarit Bhargava)" * tag 'pm-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits) PM / Domains: Document flags for genpd PM / Domains: Deal with multiple states but no governor in genpd PM / Domains: Don't treat zero found compatible idle states as an error cpuidle: menu: Avoid computations when result will be discarded cpuidle: menu: Drop redundant comparison cpufreq: tegra186: don't pass GFP_DMA32 to dma_alloc_coherent() cpufreq: conservative: Take limits changes into account properly Documentation: intel_pstate: Add base_frequency information cpufreq: intel_pstate: Add base_frequency attribute ACPI / CPPC: Add support for guaranteed performance cpuidle: menu: Simplify checks related to the polling state PM / tools: sleepgraph and bootgraph: upgrade to v5.2 PM / tools: sleepgraph: first batch of v5.2 changes cpupower: Fix coredump on VMWare cpupower: Fix AMD Family 0x17 msr_pstate size cpufreq: imx6q: read OCOTP through nvmem for imx6ul/imx6ull cpufreq: dt-platdev: allow RK3399 to have separate tunables per cluster cpuidle: poll_state: Revise loop termination condition cpuidle: menu: Move the latency_req == 0 special case check cpuidle: menu: Avoid computations for very close timers ...
2018-10-23Merge tag 'regulator-v5.0' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The biggest chunk of the regulator changes for this release outside of the new drivers is the conversion of the fixed regulator to use the GPIO descriptor API, there's a small addition to the GPIO API plus a bunch of updates to board files to implement it. This is some really welcome work from Linus Walleij that's had a bunch of review and has been sitting in -next for a while so I'm fairly happy there's no major issues. - Helpers for overlapping linear ranges. - Display opmode and consumer requested load in the regualtor_summary file in debugfs, plus a fix there. - Support for the fun and entertaining power off mechanism that the pfuze100 hardware implements. - Conversion of the fixed regulator API to use GPIO descriptors, including pulling in a bunch of patches to a bunch of board files. - New drivers for Cirrus Logic Lochnagar, Qualcomm PMS405, Rohm BD71847, ST PMIC1, and TI LM363x devices" * tag 'regulator-v5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (36 commits) regulator: lochnagar: Use a consisent comment style for SPDX header regulator: bd718x7: Remove struct bd718xx_pmic regulator: Fetch enable gpiods nonexclusive regulator/gpio: Allow nonexclusive GPIO access regulator: lochnagar: Add support for the Cirrus Logic Lochnagar regulator: stpmic1: Return REGULATOR_MODE_INVALID for invalid mode regulator: stpmic1: add stpmic1 regulator driver dt-bindings: regulator: document stpmic1 pmic regulators regulator: axp20x: Mark expected switch fall-throughs regulator: bd718xx: fix build warning on x86_64 regulator: fixed: Default enable high on DT regulators regulator: bd718xx: rename bd71837 to 718xx regulator: bd718XX use pickable ranges regulator/mfd: bd718xx: rename bd71837/bd71847 common instances regulator: Support regulators where voltage ranges are selectable mfd: dt bindings: add BD71847 device-tree binding documentation regulator: dt bindings: add BD71847 device-tree binding documentation regulator/mfd: Support ROHM BD71847 power management IC regulator: da905{2,5}: Remove unnecessary array check regulator: qcom: Add PMS405 regulators ...
2018-10-22Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds7-192/+246
Pull dma mapping updates from Christoph Hellwig: "First batch of dma-mapping changes for 4.20. There will be a second PR as some big changes were only applied just before the end of the merge window, and I want to give them a few more days in linux-next. Summary: - mostly more consolidation of the direct mapping code, including converting over hexagon, and merging the coherent and non-coherent code into a single dma_map_ops instance (me) - cleanups for the dma_configure/dma_unconfigure callchains (me) - better handling of dma_masks in odd setups (me, Alexander Duyck) - better debugging of passing vmalloc address to the DMA API (Stephen Boyd) - CMA command line parsing fix (He Zhe)" * tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping: (27 commits) dma-direct: respect DMA_ATTR_NO_WARN dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN dma-direct: document the zone selection logic dma-debug: Check for drivers mapping invalid addresses in dma_map_single() dma-direct: fix return value of dma_direct_supported dma-mapping: move dma_default_get_required_mask under ifdef dma-direct: always allow dma mask <= physiscal memory size dma-direct: implement complete bus_dma_mask handling dma-direct: refine dma_direct_alloc zone selection dma-direct: add an explicit dma_direct_get_required_mask dma-mapping: make the get_required_mask method available unconditionally unicore32: remove swiotlb support Revert "dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops" dma-mapping: support non-coherent devices in dma_common_get_sgtable dma-mapping: consolidate the dma mmap implementations dma-mapping: merge direct and noncoherent ops dma-mapping: move the dma_coherent flag to struct device MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration dma-mapping: fix panic caused by passing empty cma command line argument ...
2018-10-22Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-blockLinus Torvalds2-11/+41
Pull block layer updates from Jens Axboe: "This is the main pull request for block changes for 4.20. This contains: - Series enabling runtime PM for blk-mq (Bart). - Two pull requests from Christoph for NVMe, with items such as; - Better AEN tracking - Multipath improvements - RDMA fixes - Rework of FC for target removal - Fixes for issues identified by static checkers - Fabric cleanups, as prep for TCP transport - Various cleanups and bug fixes - Block merging cleanups (Christoph) - Conversion of drivers to generic DMA mapping API (Christoph) - Series fixing ref count issues with blkcg (Dennis) - Series improving BFQ heuristics (Paolo, et al) - Series improving heuristics for the Kyber IO scheduler (Omar) - Removal of dangerous bio_rewind_iter() API (Ming) - Apply single queue IPI redirection logic to blk-mq (Ming) - Set of fixes and improvements for bcache (Coly et al) - Series closing a hotplug race with sysfs group attributes (Hannes) - Set of patches for lightnvm: - pblk trace support (Hans) - SPDX license header update (Javier) - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0 specs behind a common core interface. (Javier, Matias) - Enable pblk to use a common interface to retrieve chunk metadata (Matias) - Bug fixes (Various) - Set of fixes and updates to the blk IO latency target (Josef) - blk-mq queue number updates fixes (Jianchao) - Convert a bunch of drivers from the old legacy IO interface to blk-mq. This will conclude with the removal of the legacy IO interface itself in 4.21, with the rest of the drivers (me, Omar) - Removal of the DAC960 driver. The SCSI tree will introduce two replacement drivers for this (Hannes)" * tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits) block: setup bounce bio_sets properly blkcg: reassociate bios when make_request() is called recursively blkcg: fix edge case for blk_get_rl() under memory pressure nvme-fabrics: move controller options matching to fabrics nvme-rdma: always have a valid trsvcid mtip32xx: fully switch to the generic DMA API rsxx: switch to the generic DMA API umem: switch to the generic DMA API sx8: switch to the generic DMA API sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code skd: switch to the generic DMA API ubd: remove use of blk_rq_map_sg nvme-pci: remove duplicate check drivers/block: Remove DAC960 driver nvme-pci: fix hot removal during error handling nvmet-fcloop: suppress a compiler warning nvme-core: make implicit seed truncation explicit nvmet-fc: fix kernel-doc headers nvme-fc: rework the request initialization code nvme-fc: introduce struct nvme_fcp_op_w_sgl ...
2018-10-22Merge tag 'arm64-upstream' of ↵Linus Torvalds1-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Apart from some new arm64 features and clean-ups, this also contains the core mmu_gather changes for tracking the levels of the page table being cleared and a minor update to the generic compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ. Summary: - Core mmu_gather changes which allow tracking the levels of page-table being cleared together with the arm64 low-level flushing routines - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to mitigate Spectre-v4 dynamically without trapping to EL3 firmware - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4 - Support for Common Not Private (CnP) translations allowing threads of the same CPU to share the TLB entries - Accelerated crc32 routines - Move swapper_pg_dir to the rodata section - Trap WFI instruction executed in user space - ARM erratum 1188874 workaround (arch_timer) - Miscellaneous fixes and clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits) arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work arm64: cpufeature: Trap CTR_EL0 access only where it is necessary arm64: cpufeature: Fix handling of CTR_EL0.IDC field arm64: cpufeature: ctr: Fix cpu capability check for late CPUs Documentation/arm64: HugeTLB page implementation arm64: mm: Use __pa_symbol() for set_swapper_pgd() arm64: Add silicon-errata.txt entry for ARM erratum 1188873 Revert "arm64: uaccess: implement unsafe accessors" arm64: mm: Drop the unused cpu parameter MAINTAINERS: fix bad sdei paths arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c arm64: xen: Use existing helper to check interrupt status arm64: Use daifflag_restore after bp_hardening arm64: daifflags: Use irqflags functions for daifflags arm64: arch_timer: avoid unused function warning arm64: Trap WFI executed in userspace arm64: docs: Document SSBS HWCAP arm64: docs: Fix typos in ELF hwcaps arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe ...
2018-10-22x86/stackprotector: Remove the call to boot_init_stack_canary() from ↵Christophe Leroy2-16/+0
cpu_startup_entry() The following commit: d7880812b359 ("idle: Add the stack canary init to cpu_startup_entry()") ... added an x86 specific boot_init_stack_canary() call to the generic cpu_startup_entry() as a temporary hack, with the intention to remove the #ifdef CONFIG_X86 later. More than 5 years later let's finally realize that plan! :-) While implementing stack protector support for PowerPC, we found that calling boot_init_stack_canary() is also needed for PowerPC which uses per task (TLS) stack canary like the X86. However, calling boot_init_stack_canary() would break architectures using a global stack canary (ARM, SH, MIPS and XTENSA). Instead of modifying the #ifdef CONFIG_X86 to an even messier: #if defined(CONFIG_X86) || defined(CONFIG_PPC) PowerPC implemented the call to boot_init_stack_canary() in the function calling cpu_startup_entry(). Let's try the same cleanup on the x86 side as well. On x86 we have two functions calling cpu_startup_entry(): - start_secondary() - cpu_bringup_and_idle() start_secondary() already calls boot_init_stack_canary(), so it's good, and this patch adds the call to boot_init_stack_canary() in cpu_bringup_and_idle(). I.e. now x86 catches up to the rest of the world and the ugly init sequence in init/main.c can be removed from cpu_startup_entry(). As a final benefit we can also remove the <linux/stackprotector.h> dependency from <linux/sched.h>. [ mingo: Improved the changelog a bit, added language explaining x86 borkage and sched.h change. ] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181020072649.5B59310483E@pc16082vm.idsi0.si.c-s.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-21Merge remote-tracking branches 'regulator/topic/bd718xx' and ↵Mark Brown1-0/+1
'regulator/topic/pfuze100' into regulator-next
2018-10-20Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman2-4/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fixes: Two fixes: a CFS-throttling bug fix, and an interactivity fix." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix the min_vruntime update logic in dequeue_entity() sched/fair: Fix throttle_list starvation with low CFS quota
2018-10-20Merge tag 'trace-v4.19-rc8-2' of ↵Greg Kroah-Hartman1-7/+25
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: A few small fixes to synthetic events Masami found some issues with the creation of synthetic events. The first two patches fix handling of unsigned type, and handling of a space before an ending semi-colon. The third patch adds a selftest to test the processing of synthetic events." * tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add synthetic event syntax testcase tracing: Fix synthetic event to allow semicolon at end tracing: Fix synthetic event to accept unsigned modifier
2018-10-20tracing: Fix synthetic event to allow semicolon at endMasami Hiramatsu1-1/+1
Fix synthetic event to allow independent semicolon at end. The synthetic_events interface accepts a semicolon after the last word if there is no space. # echo "myevent u64 var;" >> synthetic_events But if there is a space, it returns an error. # echo "myevent u64 var ;" > synthetic_events sh: write error: Invalid argument This behavior is difficult for users to understand. Let's allow the last independent semicolon too. Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-20tracing: Fix synthetic event to accept unsigned modifierMasami Hiramatsu1-6/+24
Fix synthetic event to accept unsigned modifier for its field type correctly. Currently, synthetic_events interface returns error for "unsigned" modifiers as below; # echo "myevent unsigned long var" >> synthetic_events sh: write error: Invalid argument This is because argv_split() breaks "unsigned long" into "unsigned" and "long", but parse_synth_field() doesn't expected it. With this fix, synthetic_events can handle the "unsigned long" correctly like as below; # echo "myevent unsigned long var" >> synthetic_events # cat synthetic_events myevent unsigned long var Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman1-8/+2
David writes: "Networking 1) Fix gro_cells leak in xfrm layer, from Li RongQing. 2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From Eric Dumazet. 3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn Töpel. 4) Out of bounds packet access in _decode_session6(), from Alexei Starovoitov. 5) Several ethtool bugs, where we copy a struct into the kernel twice and our validations of the values in the first copy can be invalidated by the second copy due to asynchronous updates to the memory by the user. From Wenwen Wang. 6) Missing netlink attribute validation in cls_api, from Davide Caratti. 7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang. 8) rxrpc operates on wrong kvec, from Yue Haibing. 9) A regression was introduced by the disassosciation of route neighbour references in rt6_probe(), causing probe for neighbourless routes to not be properly rate limited. Fix from Sabrina Dubroca. 10) Unsafe RCU locking in tipc, from Tung Nguyen. 11) Use after free in inet6_mc_check(), from Eric Dumazet. 12) PMTU from icmp packets should update the SCTP transport pathmtu, from Xin Long. 13) Missing peer put on error in rxrpc, from David Howells. 14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren. 15) Fix overflowing shift statement in qla3xxx driver, from Nathan Chancellor. 16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva. 17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return value in an inverted manner, fix from Paolo Abeni. 18) Fix missed unresolved entries in ipmr dumps, from Nikolay Aleksandrov. 19) Fix NAPI handling under high load, we can completely miss events when NAPI has to loop more than one time in a cycle. From Heiner Kallweit." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) ip6_tunnel: Fix encapsulation layout tipc: fix info leak from kernel tipc_event net: socket: fix a missing-check bug net: sched: Fix for duplicate class dump r8169: fix NAPI handling under high load net: ipmr: fix unresolved entry dumps net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion() sctp: fix the data size calculation in sctp_data_size virtio_net: avoid using netif_tx_disable() for serializing tx routine udp6: fix encap return code for resubmitting mlxsw: core: Fix use-after-free when flashing firmware during init sctp: not free the new asoc when sctp_wait_for_connect returns err sctp: fix race on sctp_id2asoc r8169: re-enable MSI-X on RTL8168g net: bpfilter: use get_pid_task instead of pid_task ptp: fix Spectre v1 vulnerability net: qla3xxx: Remove overflowing shift statement geneve, vxlan: Don't set exceptions if skb->len < mtu geneve, vxlan: Don't check skb_dst() twice sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead ...
2018-10-19locking/lockdep: Fix debug_locks off performance problemWaiman Long1-2/+2
It was found that when debug_locks was turned off because of a problem found by the lockdep code, the system performance could drop quite significantly when the lock_stat code was also configured into the kernel. For instance, parallel kernel build time on a 4-socket x86-64 server nearly doubled. Further analysis into the cause of the slowdown traced back to the frequent call to debug_locks_off() from the __lock_acquired() function probably due to some inconsistent lockdep states with debug_locks off. The debug_locks_off() function did an unconditional atomic xchg to write a 0 value into debug_locks which had already been set to 0. This led to severe cacheline contention in the cacheline that held debug_locks. As debug_locks is being referenced in quite a few different places in the kernel, this greatly slow down the system performance. To prevent that trashing of debug_locks cacheline, lock_acquired() and lock_contended() now checks the state of debug_locks before proceeding. The debug_locks_off() function is also modified to check debug_locks before calling __debug_locks_off(). Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-18Merge branches 'acpi-pm' and 'pm-sleep'Rafael J. Wysocki1-1/+1
* acpi-pm: ACPI / PM: LPIT: Register sysfs attributes based on FADT * pm-sleep: x86-32, hibernate: Adjust in_suspend after resumed on 32bit system x86-32, hibernate: Set up temporary text mapping for 32bit system x86-32, hibernate: Switch to relocated restore code during resume on 32bit system x86-32, hibernate: Switch to original page table after resumed x86-32, hibernate: Use the page size macro instead of constant value x86-32, hibernate: Use temp_pgt as the temporary page table x86, hibernate: Rename temp_level4_pgt to temp_pgt x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system x86, hibernate: Extract the common code of 64/32 bit system x86-32/asm/power: Create stack frames in hibernate_asm_32.S PM / hibernate: Check the success of generating md5 digest before hibernation x86, hibernate: Fix nosave_regions setup for hibernation PM / sleep: Show freezing tasks that caused a suspend abort PM / hibernate: Documentation: fix image_size default value
2018-10-18Merge tag 'trace-v4.19-rc8' of ↵Greg Kroah-Hartman2-21/+13
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: Two fixes for 4.19 This fixes two bugs: - Fix size mismatch of tracepoint array - Have preemptirq test module use same clock source of the selftest" * tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c tracepoint: Fix tracepoint array element size mismatch
2018-10-17tracing: Use trace_clock_local() for looping in preemptirq_delay_test.cSteven Rostedt (VMware)1-5/+5
The preemptirq_delay_test module is used for the ftrace selftest code that tests the latency tracers. The problem is that it uses ktime for the delay loop, and then checks the tracer to see if the delay loop is caught, but the tracer uses trace_clock_local() which uses various different other clocks to measure the latency. As ktime uses the clock cycles, and the code then converts that to nanoseconds, it causes rounding errors, and the preemptirq latency tests are failing due to being off by 1 (it expects to see a delay of 500000 us, but the delay is only 499999 us). This is happening due to a rounding error in the ktime (which is totally legit). The purpose of the test is to see if it can catch the delay, not to test the accuracy between trace_clock_local() and ktime_get(). Best to use apples to apples, and have the delay loop use the same clock as the latency tracer does. Cc: stable@vger.kernel.org Fixes: f96e8577da102 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-17tracepoint: Fix tracepoint array element size mismatchMathieu Desnoyers1-16/+8
commit 46e0c9be206f ("kernel: tracepoints: add support for relative references") changes the layout of the __tracepoint_ptrs section on architectures supporting relative references. However, it does so without turning struct tracepoint * const into const int elsewhere in the tracepoint code, which has the following side-effect: Setting mod->num_tracepoints is done in by module.c: mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs", sizeof(*mod->tracepoints_ptrs), &mod->num_tracepoints); Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size (rather than sizeof(int)), num_tracepoints is erroneously set to half the size it should be on 64-bit arch. So a module with an odd number of tracepoints misses the last tracepoint due to effect of integer division. So in the module going notifier: for_each_tracepoint_range(mod->tracepoints_ptrs, mod->tracepoints_ptrs + mod->num_tracepoints, tp_module_going_check_quiescent, NULL); the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually evaluates to something within the bounds of the array, but miss the last tracepoint if the number of tracepoints is odd on 64-bit arch. Fix this by introducing a new typedef: tracepoint_ptr_t, which is either "const int" on architectures that have PREL32 relocations, or "struct tracepoint * const" on architectures that does not have this feature. Also provide a new tracepoint_ptr_defer() static inline to encapsulate deferencing this type rather than duplicate code and ugly idefs within the for_each_tracepoint_range() implementation. This issue appears in 4.19-rc kernels, and should ideally be fixed before the end of the rc cycle. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jessica Yu <jeyu@kernel.org> Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-17locking/pvqspinlock: Extend node size when pvqspinlock is configuredWaiman Long2-11/+27
The qspinlock code supports up to 4 levels of slowpath nesting using four per-CPU mcs_spinlock structures. For 64-bit architectures, they fit nicely in one 64-byte cacheline. For para-virtualized (PV) qspinlocks it needs to store more information in the per-CPU node structure than there is space for. It uses a trick to use a second cacheline to hold the extra information that it needs. So PV qspinlock needs to access two extra cachelines for its information whereas the native qspinlock code only needs one extra cacheline. Freshly added counter profiling of the qspinlock code, however, revealed that it was very rare to use more than two levels of slowpath nesting. So it doesn't make sense to penalize PV qspinlock code in order to have four mcs_spinlock structures in the same cacheline to optimize for a case in the native qspinlock code that rarely happens. Extend the per-CPU node structure to have two more long words when PV qspinlock locks are configured to hold the extra data that it needs. As a result, the PV qspinlock code will enjoy the same benefit of using just one extra cacheline like the native counterpart, for most cases. [ mingo: Minor changelog edits. ] Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539697507-28084-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-17locking/qspinlock_stat: Count instances of nested lock slowpathsWaiman Long2-0/+11
Queued spinlock supports up to 4 levels of lock slowpath nesting - user context, soft IRQ, hard IRQ and NMI. However, we are not sure how often the nesting happens. So add 3 more per-CPU stat counters to track the number of instances where nesting index goes to 1, 2 and 3 respectively. On a dual-socket 64-core 128-thread Zen server, the following were the new stat counter values under different circumstances: State slowpath index1 index2 index3 ----- -------- ------ ------ ------- After bootup 1,012,150 82 0 0 After parallel build + perf-top 125,195,009 82 0 0 So the chance of having more than 2 levels of nesting is extremely low. [ mingo: Minor changelog edits. ] Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539697507-28084-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16locking/qspinlock, x86: Provide liveness guaranteePeter Zijlstra1-1/+15
On x86 we cannot do fetch_or() with a single instruction and thus end up using a cmpxchg loop, this reduces determinism. Replace the fetch_or() with a composite operation: tas-pending + load. Using two instructions of course opens a window we previously did not have. Consider the scenario: CPU0 CPU1 CPU2 1) lock trylock -> (0,0,1) 2) lock trylock /* fail */ 3) unlock -> (0,0,0) 4) lock trylock -> (0,0,1) 5) tas-pending -> (0,1,1) load-val <- (0,1,0) from 3 6) clear-pending-set-locked -> (0,0,1) FAIL: _2_ owners where 5) is our new composite operation. When we consider each part of the qspinlock state as a separate variable (as we can when _Q_PENDING_BITS == 8) then the above is entirely possible, because tas-pending will only RmW the pending byte, so the later load is able to observe prior tail and lock state (but not earlier than its own trylock, which operates on the whole word, due to coherence). To avoid this we need 2 things: - the load must come after the tas-pending (obviously, otherwise it can trivially observe prior state). - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for example, such that we cannot observe other state prior to setting pending. On x86 we can realize this by using "LOCK BTS m32, r32" for tas-pending followed by a regular load. Note that observing later state is not a problem: - if we fail to observe a later unlock, we'll simply spin-wait for that store to become visible. - if we observe a later xchg_tail(), there is no difference from that xchg_tail() having taken place before the tas-pending. Suggested-by: Will Deacon <will.deacon@arm.com> Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Fixes: 59fb586b4a07 ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath") Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16locking/qspinlock: Rework some commentsPeter Zijlstra1-10/+26
While working my way through the code again; I felt the comments could use help. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16locking/qspinlock: Re-order codePeter Zijlstra1-29/+27
Flip the branch condition after atomic_fetch_or_acquire(_Q_PENDING_VAL) such that we loose the indent. This also result in a more natural code flow IMO. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrea.parri@amarulasolutions.com Cc: longman@redhat.com Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16Merge branch 'x86/build' into locking/core, to pick up dependent patches and ↵Ingo Molnar17-120/+180
unify jump-label work Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16sched/fair: Fix the min_vruntime update logic in dequeue_entity()Song Muchun1-1/+1
The comment and the code around the update_min_vruntime() call in dequeue_entity() are not in agreement. From commit: b60205c7c558 ("sched/fair: Fix min_vruntime tracking") I think that we want to update min_vruntime when a task is sleeping/migrating. So, the check is inverted there - fix it. Signed-off-by: Song Muchun <smuchun@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b60205c7c558 ("sched/fair: Fix min_vruntime tracking") Link: http://lkml.kernel.org/r/20181014112612.2614-1-smuchun@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-16locking/lockdep: Remove duplicated 'lock_class_ops' percpu arrayWaiman Long1-1/+0
Remove the duplicated 'lock_class_ops' percpu array that is not used anywhere. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 8ca2b56cd7da ("locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y") Link: http://lkml.kernel.org/r/1539380547-16726-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-8/+2
Daniel Borkmann says: ==================== pull-request: bpf 2018-10-14 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix xsk map update and delete operation to not call synchronize_net() but to piggy back on SOCK_RCU_FREE for sockets instead as we are not allowed to sleep under RCU, from Björn. 2) Do not change RLIMIT_MEMLOCK in reuseport_bpf selftest if the process already has unlimited RLIMIT_MEMLOCK, from Eric. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge branch 'for-linus' of ↵Greg Kroah-Hartman1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Dmitry writes: "Input updates for v4.19-rc7 - we added a few scheduling points into various input interfaces to ensure that large writes will not cause RCU stalls - fixed configuring PS/2 keyboards as wakeup devices on newer platforms - added a new Xbox gamepad ID." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: uinput - add a schedule point in uinput_inject_events() Input: evdev - add a schedule point in evdev_write() Input: mousedev - add a schedule point in mousedev_write() Input: i8042 - enable keyboard wakeups by default when s2idle is used Input: xpad - add support for Xbox1 PDP Camo series gamepad
2018-10-11Merge branch 'for-4.19-fixes' of ↵Greg Kroah-Hartman1-9/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Tejun writes: "cgroup fixes for v4.19-rc7 One cgroup2 threaded mode fix for v4.19-rc7. While threaded mode isn't used widely (yet) and the bug requires somewhat convoluted sequence of operations, it causes a userland visible malfunction - EINVAL on a valid attempt to enable threaded mode. This pull request contains the fix" * 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix dom_cgrp propagation when enabling threaded mode
2018-10-11sched/fair: Fix throttle_list starvation with low CFS quotaPhil Auld2-3/+21
With a very low cpu.cfs_quota_us setting, such as the minimum of 1000, distribute_cfs_runtime may not empty the throttled_list before it runs out of runtime to distribute. In that case, due to the change from c06f04c7048 to put throttled entries at the head of the list, later entries on the list will starve. Essentially, the same X processes will get pulled off the list, given CPU time and then, when expired, get put back on the head of the list where distribute_cfs_runtime will give runtime to the same set of processes leaving the rest. Fix the issue by setting a bit in struct cfs_bandwidth when distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can decide to put the throttled entry on the tail or the head of the list. The bit is set/cleared by the callers of distribute_cfs_runtime while they hold cfs_bandwidth->lock. This is easy to reproduce with a handful of CPU consumers. I use 'crash' on the live system. In some cases you can simply look at the throttled list and see the later entries are not changing: crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3 1 ffff90b56cb2d200 -976050 2 ffff90b56cb2cc00 -484925 3 ffff90b56cb2bc00 -658814 4 ffff90b56cb2ba00 -275365 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3 1 ffff90b56cb2d200 -994147 2 ffff90b56cb2cc00 -306051 3 ffff90b56cb2bc00 -961321 4 ffff90b56cb2ba00 -24490 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 Sometimes it is easier to see by finding a process getting starved and looking at the sched_info: crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, Signed-off-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-11xsk: do not call synchronize_net() under RCU read lockBjörn Töpel1-8/+2
The XSKMAP update and delete functions called synchronize_net(), which can sleep. It is not allowed to sleep during an RCU read section. Instead we need to make sure that the sock sk_destruct (xsk_destruct) function is asynchronously called after an RCU grace period. Setting the SOCK_RCU_FREE flag for XDP sockets takes care of this. Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-09resource: Clean it up a bitBorislav Petkov1-29/+26
- Drop BUG_ON()s and do normal error handling instead, in find_next_iomem_res(). - Align function arguments on opening braces. - Get rid of local var sibling_only in find_next_iomem_res(). - Shorten unnecessarily long first_level_children_only arg name. Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Bjorn Helgaas <bhelgaas@google.com> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com Link: <new submission>
2018-10-09resource: Fix find_next_iomem_res() iteration issueBjorn Helgaas1-54/+42
Previously find_next_iomem_res() used "*res" as both an input parameter for the range to search and the type of resource to search for, and an output parameter for the resource we found, which makes the interface confusing. The current callers use find_next_iomem_res() incorrectly because they allocate a single struct resource and use it for repeated calls to find_next_iomem_res(). When find_next_iomem_res() returns a resource, it overwrites the start, end, flags, and desc members of the struct. If we call find_next_iomem_res() again, we must update or restore these fields. The previous code restored res.start and res.end, but not res.flags or res.desc. Since the callers did not restore res.flags, if they searched for flags IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would incorrectly skip resources unless they were also marked as IORESOURCE_SYSRAM. Fix this by restructuring the interface so it takes explicit "start, end, flags" parameters and uses "*res" only as an output parameter. Based on a patch by Lianbo Jiang <lijiang@redhat.com>. [ bp: While at it: - make comments kernel-doc style. - Originally-by: http://lore.kernel.org/lkml/20180921073211.20097-2-lijiang@redhat.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812916.1157.177580438135143788.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09resource: Include resource end in walk_*() interfacesBjorn Helgaas1-2/+2
find_next_iomem_res() finds an iomem resource that covers part of a range described by "start, end". All callers expect that range to be inclusive, i.e., both start and end are included, but find_next_iomem_res() doesn't handle the end address correctly. If it finds an iomem resource that contains exactly the end address, it skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip it: find_next_iomem_res(...) { start = 0x0; end = 0x10000; for (p = next_resource(...)) { # p->start = 0x10000; # p->end = 0x10000; # we *should* return this resource, but this condition is false: if ((p->end >= start) && (p->start < end)) break; Adjust find_next_iomem_res() so it allows a resource that includes the single byte at the end of the range. This is a corner case that we probably don't see in practice. Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.com
2018-10-09smp,cpumask: introduce on_each_cpu_cond_maskRik van Riel2-7/+24
Introduce a variant of on_each_cpu_cond that iterates only over the CPUs in a cpumask, in order to avoid making callbacks for every single CPU in the system when we only need to test a subset. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-5-riel@surriel.com
2018-10-09smp: use __cpumask_set_cpu in on_each_cpu_condRik van Riel1-1/+1
The code in on_each_cpu_cond sets CPUs in a locally allocated bitmask, which should never be used by other CPUs simultaneously. There is no need to use locked memory accesses to set the bits in this bitmap. Switch to __cpumask_set_cpu. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-4-riel@surriel.com
2018-10-09dma-direct: respect DMA_ATTR_NO_WARNChristoph Hellwig1-0/+3
Respect the DMA_ATTR_NO_WARN flags for allocations in dma-direct. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Robin Murphy <robin.murphy@arm.com>
2018-10-09futex: Replace spin_is_locked() with lockdepLance Roy1-2/+2
lockdep_assert_held() is better suited for checking locking requirements, since it won't get confused when the lock is held by some other task. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Link: https://lkml.kernel.org/r/20181003053902.6910-12-ldr709@gmail.com
2018-10-09locking/lockdep: Make class->ops a percpu counter and move it under ↵Waiman Long3-4/+36
CONFIG_DEBUG_LOCKDEP=y A sizable portion of the CPU cycles spent on the __lock_acquire() is used up by the atomic increment of the class->ops stat counter. By taking it out from the lock_class structure and changing it to a per-cpu per-lock-class counter, we can reduce the amount of cacheline contention on the class structure when multiple CPUs are trying to acquire locks of the same class simultaneously. To limit the increase in memory consumption because of the percpu nature of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP config option. So the memory consumption increase will only occur if CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however, is reduced in size by 16 bytes on 64-bit archs after ops removal and a minor restructuring of the fields. This patch also fixes a bug in the increment code as the counter is of the 'unsigned long' type, but atomic_inc() was used to increment it. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09dma-direct: document the zone selection logicChristoph Hellwig1-1/+8
What we are doing here isn't quite obvious, so add a comment explaining it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-08dma-debug: Check for drivers mapping invalid addresses in dma_map_single()Stephen Boyd1-0/+16
I recently debugged a DMA mapping oops where a driver was trying to map a buffer returned from request_firmware() with dma_map_single(). Memory returned from request_firmware() is mapped into the vmalloc region and this isn't a valid region to map with dma_map_single() per the DMA documentation's "What memory is DMA'able?" section. Unfortunately, we don't really check that in the DMA debugging code, so enabling DMA debugging doesn't help catch this problem. Let's add a new DMA debug function to check for a vmalloc address or an invalid virtual address and print a warning if this happens. This makes it a little easier to debug these sorts of problems, instead of seeing odd behavior or crashes when drivers attempt to map the vmalloc space for DMA. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-06Merge branch 'core/core' into x86/build, to prevent conflictsIngo Molnar2-55/+54
Signed-off-by: Ingo Molnar <mingo@kernel.org>