summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2018-10-31linux/bitmap.h: fix type of nbits in bitmap_shift_right()Rasmus Villemoes1-1/+1
Most other bitmap API, including the OOL version __bitmap_shift_right, take unsigned nbits. This was accidentally left out from 2fbad29917c98. Link: http://lkml.kernel.org/r/20180818131623.8755-5-linux@rasmusvillemoes.dk Fixes: 2fbad29917c98 ("lib: bitmap: change bitmap_shift_right to take unsigned parameters") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: Yury Norov <ynorov@caviumnetworks.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31linux/bitmap.h: remove redundant uses of small_const_nbits()Rasmus Villemoes1-18/+6
In the _zero, _fill and _copy functions, the small_const_nbits branch is redundant. If nbits is small and const, gcc knows full well that BITS_TO_LONGS(nbits) is 1, so len is also a compile-time constant (sizeof(long)), and calling memset or memcpy with a length argument of sizeof(long) makes gcc generate the expected code anyway: #include <string.h> void a(unsigned long *x) { memset(x, 0, 8); } void b(unsigned long *x) { memset(x, 0xff, 8); } void c(unsigned long *x, const unsigned long *y) { memcpy(x, y, 8); } turns into 0000000000000000 <a>: 0: 48 c7 07 00 00 00 00 movq $0x0,(%rdi) 7: c3 retq 8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) f: 00 0000000000000010 <b>: 10: 48 c7 07 ff ff ff ff movq $0xffffffffffffffff,(%rdi) 17: c3 retq 18: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 1f: 00 0000000000000020 <c>: 20: 48 8b 06 mov (%rsi),%rax 23: 48 89 07 mov %rax,(%rdi) 26: c3 retq Link: http://lkml.kernel.org/r/20180818131623.8755-4-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31linux/bitmap.h: handle constant zero-size bitmaps correctlyRasmus Villemoes1-1/+6
The static inlines in bitmap.h do not handle a compile-time constant nbits==0 correctly (they dereference the passed src or dst pointers, despite only 0 words being valid to access). I had the 0-day buildbot chew on a patch [1] that would cause build failures for such cases without complaining, suggesting that we don't have any such users currently, at least for the 70 .config/arch combinations that was built. Should any turn up, make sure they use the out-of-line versions, which do handle nbits==0 correctly. This is of course not the most efficient, but it's much less churn than teaching all the static inlines an "if (zero_const_nbits())", and since we don't have any current instances, this doesn't affect existing code at all. [1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31mm/hmm: use a structure for update callback parametersJérôme Glisse1-9/+22
Use a structure to gather all the parameters for the update callback. This make it easier when adding new parameters by avoiding having to update all callback function signature. The hmm_update structure is always associated with a mmu_notifier callbacks so we are not planing on grouping multiple updates together. Nor do we care about page size for the range as range will over fully cover the page being invalidated (this is a mmu_notifier property). Link: http://lkml.kernel.org/r/20181019160442.18723-6-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31mm/hmm: fix utf8 ...Jérôme Glisse1-1/+1
Patch series "HMM updates, improvements and fixes", v2 Few fixes that only affect HMM users. Improve the synchronization call back so that we match was other mmu_notifier listener do and add proper support to the new blockable flags in the process. For curious folks here are branches to leverage HMM in various existing device drivers: https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-nouveau-v01 https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-radeon-v00 https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-intel-v00 More to come (amd gpu, Mellanox, ...) I expect more of the preparatory work for nouveau will be merge in 4.20 (like we have been doing since 4.16) and i will wait until this patchset is upstream before pushing the patches that actualy make use of HMM (to avoid complex tree inter-dependency). This patch (of 6): Somehow utf=8 must have been broken. Link: http://lkml.kernel.org/r/20181019160442.18723-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31platform/x86: asus-wmi: export function for evaluating WMI methodsDaniel Drake1-0/+101
Export asus_wmi_evaluate_method() and related headers for use by other drivers. hid-asus is going to use this to avoid advertising that it has a keyboard backlight when the keyboard backlight is controlled via WMI. Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-10-30Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linuxLinus Torvalds3-12/+20
Pull nfsd updates from Bruce Fields: "Olga added support for the NFSv4.2 asynchronous copy protocol. We already supported COPY, by copying a limited amount of data and then returning a short result, letting the client resend. The asynchronous protocol should offer better performance at the expense of some complexity. The other highlight is Trond's work to convert the duplicate reply cache to a red-black tree, and to move it and some other server caches to RCU. (Previously these have meant taking global spinlocks on every RPC) Otherwise, some RDMA work and miscellaneous bugfixes" * tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits) lockd: fix access beyond unterminated strings in prints nfsd: Fix an Oops in free_session() nfsd: correctly decrement odstate refcount in error path svcrdma: Increase the default connection credit limit svcrdma: Remove try_module_get from backchannel svcrdma: Remove ->release_rqst call in bc reply handler svcrdma: Reduce max_send_sges nfsd: fix fall-through annotations knfsd: Improve lookup performance in the duplicate reply cache using an rbtree knfsd: Further simplify the cache lookup knfsd: Simplify NFS duplicate replay cache knfsd: Remove dead code from nfsd_cache_lookup SUNRPC: Simplify TCP receive code SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock SUNRPC: Remove non-RCU protected lookup NFS: Fix up a typo in nfs_dns_ent_put NFS: Lockless DNS lookups knfsd: Lockless lookup of NFSv4 identities. SUNRPC: Lockless server RPCSEC_GSS context lookup knfsd: Allow lockless lookups of the exports ...
2018-10-30Merge tag 'trace-v4.20' of ↵Linus Torvalds2-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The biggest change here is the updates to kprobes Back in January I posted patches to create function based events. These were the events that you suggested I make to allow developers to easily create events in code where no trace event exists. After posting those changes for review, it was suggested that we implement this instead with kprobes. The problem with kprobes is that the interface is too complex and needs to be simplified. Masami Hiramatsu posted patches in March and I've been playing with them a bit. There's been a bit of clean up in the kprobe code that was inspired by the function based event patches, and a couple of enhancements to the kprobe event interface. - If the arch supports it (we added support for x86), you can place a kprobe event at the start of a function and use $arg1, $arg2, etc to reference the arguments of a function. (Before you needed to know what register or where on the stack the argument was). - The second is a way to see array of events. For example, if you reference a mac address, you can add: echo 'p:mac ip_rcv perm_addr=+574($arg2):x8[6]' > kprobe_events And this will produce: mac: (ip_rcv+0x0/0x140) perm_addr={0x52,0x54,0x0,0xc0,0x76,0xec} Other changes include - Exporting trace_dump_stack to modules - Have the stack tracer trace the entire stack (stop trying to remove tracing itself, as we keep removing too much). - Added support for SDT in uprobes" [ SDT - "Statically Defined Tracing" are userspace markers for tracing. Let's not use random TLA's in explanations unless they are fairly well-established as generic (at least for kernel people) - Linus ] * tag 'trace-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits) tracing: Have stack tracer trace full stack tracing: Export trace_dump_stack to modules tracing: probeevent: Fix uninitialized used of offset in parse args tracing/kprobes: Allow kprobe-events to record module symbol tracing/kprobes: Check the probe on unloaded module correctly tracing/uprobes: Fix to return -EFAULT if copy_from_user failed tracing: probeevent: Add $argN for accessing function args x86: ptrace: Add function argument access API tracing: probeevent: Add array type support tracing: probeevent: Add symbol type tracing: probeevent: Unify fetch_insn processing common part tracing: probeevent: Append traceprobe_ for exported function tracing: probeevent: Return consumed bytes of dynamic area tracing: probeevent: Unify fetch type tables tracing: probeevent: Introduce new argument fetching code tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions tracing: probeevent: Cleanup argument field definition tracing: probeevent: Cleanup print argument functions trace_uprobe: support reference counter in fd-based uprobe perf probe: Support SDT markers having reference counter (semaphore) ...
2018-10-30Merge tag 'pm-4.20-rc1-2' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These remove a questionable heuristic from the menu cpuidle governor, fix a recent build regression in the intel_pstate driver, clean up ARM big-Little support in cpufreq and fix up hung task watchdog's interaction with system-wide power management transitions. Specifics: - Fix build regression in the intel_pstate driver that doesn't build without CONFIG_ACPI after recent changes (Dominik Brodowski). - One of the heuristics in the menu cpuidle governor is based on a function returning 0 most of the time, so drop it and clean up the scheduler code related to it (Daniel Lezcano). - Prevent the arm_big_little cpufreq driver from being used on ARM64 which is not suitable for it and drop the arm_big_little_dt driver that is not used any more (Sudeep Holla). - Prevent the hung task watchdog from triggering during resume from system-wide sleep states by disabling it before freezing tasks and enabling it again after they have been thawed (Vitaly Kuznetsov)" * tag 'pm-4.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: kernel: hung_task.c: disable on suspend cpufreq: remove unused arm_big_little_dt driver cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64 cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI cpuidle: menu: Remove get_loadavg() from the performance multiplier sched: Factor out nr_iowait and nr_iowait_cpu
2018-10-30ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12Hangbin Liu1-1/+3
Similiar with ipv6 mcast commit 89225d1ce6af3 ("net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.") i) RFC3376 8.12. Older Version Querier Present Timeout says: The Older Version Querier Interval is the time-out for transitioning a host back to IGMPv3 mode once an older version query is heard. When an older version query is received, hosts set their Older Version Querier Present Timer to Older Version Querier Interval. This value MUST be ((the Robustness Variable) times (the Query Interval in the last Query received)) plus (one Query Response Interval). Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT. Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response Interval) in struct in_device. Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri. We need update these values when receive IGMPv3 queries. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-30Merge tag 'rproc-v4.20' of git://github.com/andersson/remoteprocLinus Torvalds1-3/+44
Pull remoteproc updates from Bjorn Andersson: "This contains a series of patches that reworks the memory carveout handling in remoteproc, in order to allow this to be reused for statically allocated memory regions to be used for e.g. firmware. It adds support for audio DSP (both TZ-assisted and non-TZ assisted) and compute DSP on Qualcomm SDM845, TZ-assisted audio DSP, compute DSP and WiFi processor on Qualcomm QCS404 and through some renaming of the drivers cleans up the naming situation. Finally support for custom coreudmp segment handlers is added and is used in the Qualcomm modem remoteproc driver to gather memory dumps of the firmware" * tag 'rproc-v4.20' of git://github.com/andersson/remoteproc: (36 commits) remoteproc: qcom: q6v5-mss: Register segments/dumpfn for coredump remoteproc: qcom: q6v5-mss: Add custom dump function for modem remoteproc: qcom: q6v5-mss: Refactor mba load/unload sequence remoteproc: Add mechanism for custom dump function assignment remoteproc: Introduce custom dump function for each remoteproc segment remoteproc: modify vring allocation to rely on centralized carveout allocator remoteproc: qcom: q6v5: shore up resource probe handling remoteproc: qcom: qcom_q6v5_adsp: Fix some return value check remoteproc: modify rproc_handle_carveout to support pre-registered region remoteproc: add helper function to check carveout device address remoteproc: add helper function to allocate rproc_mem_entry from reserved memory remoteproc: add alloc ops in rproc_mem_entry struct remoteproc: introduce rproc_find_carveout_by_name function remoteproc: introduce rproc_add_carveout function remoteproc: add helper function to allocate and init rproc_mem_entry struct remoteproc: add name in rproc_mem_entry struct remoteproc: add release ops in rproc_mem_entry struct remoteproc: add rproc_va_to_pa function remoteproc: configure IOMMU only if device address requested remoteproc: qcom: q6v5-mss: add SCM probe dependency ...
2018-10-30vfs: hide file range comparison functionDarrick J. Wong1-3/+0
There are no callers of vfs_dedupe_file_range_compare, so we might as well make it a static helper and remove the export. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: enable remap callers that can handle short operationsDarrick J. Wong1-2/+3
Plumb in a remap flag that enables the filesystem remap handler to shorten remapping requests for callers that can handle it. Now copy_file_range can report partial success (in case we run up against alignment problems, resource limits, etc.). We also enable CAN_SHORTEN for fideduperange to maintain existing userspace-visible behavior where xfs/btrfs shorten the dedupe range to avoid stale post-eof data exposure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: plumb remap flags through the vfs dedupe functionsDarrick J. Wong1-1/+1
Plumb a remap_flags argument through the vfs_dedupe_file_range_one functions so that dedupe can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: plumb remap flags through the vfs clone functionsDarrick J. Wong1-2/+2
Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: make remap_file_range functions take and return bytes completedDarrick J. Wong1-12/+15
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource limits in a graceful manner. A subsequent patch will enable copy_file_range to signal to the ->clone_file_range implementation that it can handle a short length, which will be returned in the function's return value. For now the short return is not implemented anywhere so the behavior won't change -- either copy_file_range manages to clone the entire range or it tries an alternative. Neither clone ioctl can take advantage of this, alas. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: pass remap flags to generic_remap_checksDarrick J. Wong1-1/+1
Pass the same remap flags to generic_remap_checks for consistency. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: pass remap flags to generic_remap_file_range_prepDarrick J. Wong1-1/+1
Plumb the remap flags through the filesystem from the vfs function dispatcher all the way to the prep function to prepare for behavior changes in subsequent patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: combine the clone and dedupe into a single remap_file_rangeDarrick J. Wong1-4/+21
Combine the clone_file_range and dedupe_file_range operations into a single remap_file_range file operation dispatch since they're fundamentally the same operation. The differences between the two can be made in the prep functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: rename vfs_clone_file_prep to be more descriptiveDarrick J. Wong1-3/+3
The vfs_clone_file_prep is a generic function to be called by filesystem implementations only. Rename the prefix to generic_ and make it more clear that it applies to remap operations, not just clones. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: check file ranges before cloning filesDarrick J. Wong1-3/+6
Move the file range checks from vfs_clone_file_prep into a separate generic_remap_checks function so that all the checks are collected in a central location. This forms the basis for adding more checks from generic_write_checks that will make cloning's input checking more consistent with write input checking. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30Merge tag 'armsoc-soc' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform updates from Arnd Bergmann: "A couple of platforms change hands in the MAINTAINERS file: - Linus Walleij lists himself for the ARM Reference platforms: versatile, vexpress, integrator and realview. He has been the main contributor for these for a while, and makes it official now. - Vladimir Zapolskiy takes over the LPC18xx platform from Joachim Eastwood - Manivannan Sadhasivam becomes a secondary maintainer for the Actions Semi machines - Nicolas Ferre lists updates the MAINTAINER listing for the AT91 platform: Ludovic Desroches is now a co-maintainer for the platform, and several other people (Claudiu Beznea, Cristian Birsan, Eugen Hristev, Codrin Ciubotariu) take over individual device drivers. Thanks everyone for working on this, and welcome to the new maintainers! The "virt" platform on qemy or kvm can now be used in big-endian mode without additional tricks, thanks to Jason Donenfeld. Once again, we gain support for another NXP i.MX6 variant, this time it's the i.MX 6ULZ 32-bit single-core version. On arm64, we add support for two SoCs from Renesas: RZ/G2E (r8a774c0) and RZ/G2M (r8a774a1). These are described as microcontrollers on the manufacturer website, but appear to be rather powerful. The RZ/G2M is used on the reference board for the CIP Super Long Term Support (SLTS) Linux Kernels" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (54 commits) MAINTAINERS: Assign myself as a maintainer of ARM/LPC18XX architecture arm64: exynos: Enable generic power domain support MAINTAINERS: remove non-exsiting email address of Baoyou MAINTAINERS: fix pattern in ARM/Synaptics berlin SoC section MAINTAINERS: Drop dt-bindings/genpd/k2g.h ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms arm64: actions: Enable PINCTRL in platforms Kconfig MAINTAINERS: Add entry for Actions Semi Owl SoCs DMA driver MAINTAINERS: Add entry for Actions Semiconductor Owl I2C driver MAINTAINERS: Update clock binding entry for Actions Semi Owl SoCs ARM: imx: add i.mx6ulz msl support ARM: Assume maintainership of ARM reference designs ARM: support big-endian for the virt architecture MAINTAINERS: sdhci: move the Microchip entry to proper location MAINTAINERS: move former ATMEL entries to proper MICROCHIP location MAINTAINERS: remove the / ATMEL string from MICROCHIP entries MAINTAINERS: iio: add co-maintainer to SAMA5D2-compatible ADC driver MAINTAINERS: pwm: add entry for Microchip pwm driver MAINTAINERS: dmaengine: add files to Microchip dma entry MAINTAINERS: USB: change maintainer for Microchip USBA gadget driver ...
2018-10-30Merge tag 'armsoc-drivers' of ↵Linus Torvalds12-4/+1036
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "The most noteworthy SoC driver changes this time include: - The TEE subsystem gains an in-kernel interface to access the TEE from device drivers. - The reset controller subsystem gains a driver for the Qualcomm Snapdragon 845 Power Domain Controller. - The Xilinx Zynq platform now has a firmware interface for its platform management unit. This contains a firmware "ioctl" interface that was a little controversial at first, but the version we merged solved that by not exposing arbitrary firmware calls to user space. - The Amlogic Meson platform gains a "canvas" driver that is used for video processing and shared between different high-level drivers. The rest is more of the usual, mostly related to SoC specific power management support and core drivers in drivers/soc: - Several Renesas SoCs (RZ/G1N, RZ/G2M, R-Car V3M, RZ/A2M) gain new features related to power and reset control. - The Mediatek mt8183 and mt6765 SoC platforms gain support for their respective power management chips. - A new driver for NXP i.MX8, which need a firmware interface for power management. - The SCPI firmware interface now contains support estimating power usage of performance states - The NVIDIA Tegra "pmc" driver gains a few new features, in particular a pinctrl interface for configuring the pads. - Lots of small changes for Qualcomm, in particular the "smem" device driver. - Some cleanups for the TI OMAP series related to their sysc controller. Additional cleanups and bugfixes in SoC specific drivers include the Meson, Keystone, NXP, AT91, Sunxi, Actions, and Tegra platforms" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (129 commits) firmware: tegra: bpmp: Implement suspend/resume support drivers: clk: Add ZynqMP clock driver dt-bindings: clock: Add bindings for ZynqMP clock driver firmware: xilinx: Add zynqmp IOCTL API for device control Documentation: xilinx: Add documentation for eemi APIs MAINTAINERS: imx: include drivers/firmware/imx path firmware: imx: add misc svc support firmware: imx: add SCU firmware driver support reset: Fix potential use-after-free in __of_reset_control_get() dt-bindings: arm: fsl: add scu binding doc soc: fsl: qbman: add interrupt coalesce changing APIs soc: fsl: bman_portals: defer probe after bman's probe soc: fsl: qbman: Use last response to determine valid bit soc: fsl: qbman: Add 64 bit DMA addressing requirement to QBMan soc: fsl: qbman: replace CPU 0 with any online CPU in hotplug handlers soc: fsl: qbman: Check if CPU is offline when initializing portals reset: qcom: PDC Global (Power Domain Controller) reset controller dt-bindings: reset: Add PDC Global binding for SDM845 SoCs reset: Grammar s/more then once/more than once/ bus: ti-sysc: Just use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS ...
2018-10-30Merge tag 'media/v4.20-1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - new dvb frontend driver: lnbh29 - new sensor drivers: imx319 and imx 355 - some old soc_camera driver renames to avoid conflict with new drivers - new i.MX Pixel Pipeline (PXP) mem-to-mem platform driver - a new V4L2 frontend for the FWHT codec - several other improvements, bug fixes, code cleanups, etc * tag 'media/v4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (289 commits) media: rename soc_camera I2C drivers media: cec: forgot to cancel delayed work media: vivid: Support 480p for webcam capture media: v4l2-tpg: fix kernel oops when enabling HFLIP and OSD media: vivid: Add 16-bit bayer to format list media: v4l2-tpg-core: Add 16-bit bayer media: pvrusb2: replace `printk` with `pr_*` media: venus: vdec: fix decoded data size media: cx231xx: fix potential sign-extension overflow on large shift media: dt-bindings: media: rcar_vin: add device tree support for r8a7744 media: isif: fix a NULL pointer dereference bug media: exynos4-is: make const array config_ids static media: cx23885: make const array addr_list static media: ivtv: make const array addr_list static media: bttv-input: make const array addr_list static media: cx18: Don't check for address of video_dev media: dw9807-vcm: Fix probe error handling media: dw9714: Remove useless error message media: dw9714: Fix error handling in probe function media: cec: name for RC passthrough device does not need 'RC for' ...
2018-10-29svcrdma: Increase the default connection credit limitChuck Lever1-6/+7
Reduce queuing on clients by allowing more credits by default. 64 is the default NFSv4.1 slot table size on Linux clients. This size prevents the credit limit from putting RPC requests to sleep again after they have already slept waiting for a session slot. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Replace the cache_detail->hash_lock with a regular spinlockTrond Myklebust1-1/+1
Now that the reader functions are all RCU protected, use a regular spinlock rather than a reader/writer lock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Remove non-RCU protected lookupTrond Myklebust1-6/+0
Clean up the cache code by removing the non-RCU protected lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlockTrond Myklebust1-0/+12
Instead of the reader/writer spinlock, allow cache lookups to use RCU for looking up entries. This is more efficient since modifications can occur while other entries are being looked up. Note that for now, we keep the reader/writer spinlock until all users have been converted to use RCU-safe freeing of their cache entries. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29Merge tag 'tty-4.20-rc1' of ↵Linus Torvalds2-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the big tty and serial pull request for 4.20-rc1 Lots of little things here, including a merge from the SPI tree in order to keep things simpler for everyone to sync around for one platform. Major stuff is: - tty buffer clearing after use - atmel_serial fixes and additions - xilinx uart driver updates and of course, lots of tiny fixes and additions to individual serial drivers. All of these have been in linux-next with no reported issues for a while" * tag 'tty-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits) of: base: Change logic in of_alias_get_alias_list() of: base: Fix english spelling in of_alias_get_alias_list() serial: sh-sci: do not warn if DMA transfers are not supported serial: uartps: Do not allow use aliases >= MAX_UART_INSTANCES tty: check name length in tty_find_polling_driver() serial: sh-sci: Add r8a77990 support tty: wipe buffer if not echoing data tty: wipe buffer. serial: fsl_lpuart: Remove the alias node dependence TTY: sn_console: Replace spin_is_locked() with spin_trylock() Revert "serial:serial_core: Allow use of CTS for PPS line discipline" serial: 8250_uniphier: add auto-flow-control support serial: 8250_uniphier: flatten probe function serial: 8250_uniphier: remove unused "fifo-size" property dt-bindings: serial: sh-sci: Document r8a7744 bindings serial: uartps: Fix missing unlock on error in cdns_get_id() tty/serial: atmel: add ISO7816 support tty/serial_core: add ISO7816 infrastructure serial:serial_core: Allow use of CTS for PPS line discipline serial: docs: Fix filename for serial reference implementation ...
2018-10-29Merge tag 'for_v4.20-rc1' of ↵Linus Torvalds3-14/+97
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Amir's patches to implement superblock fanotify watches, Xiaoming's patch to enable reporting of thread IDs in fanotify events instead of TGIDs (sadly the patch got mis-attributed to Amir and I've noticed only now), and a fix of possible oops on umount caused by fsnotify infrastructure" * tag 'for_v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Fix busy inodes during unmount fs: group frequently accessed fields of struct super_block together fanotify: support reporting thread id instead of process id fanotify: add BUILD_BUG_ON() to count the bits of fanotify constants fsnotify: convert runtime BUG_ON() to BUILD_BUG_ON() fanotify: deprecate uapi FAN_ALL_* constants fanotify: simplify handling of FAN_ONDIR fsnotify: generalize handling of extra event flags fanotify: fix collision of internal and uapi mark flags fanotify: store fanotify_init() flags in group's fanotify_data fanotify: add API to attach/detach super block mark fsnotify: send path type events to group with super block marks fsnotify: add super block object type
2018-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: 1) GRO overflow entries are not unlinked properly, resulting in list poison pointers being dereferenced. 2) Fix bridge build with ipv6 disabled, from Nikolay Aleksandrov. 3) Direct packet access and other fixes in BPF from Daniel Borkmann. 4) gred_change_table_def() gets passed the wrong pointer, a pointer to a set of unparsed attributes instead of the attribute itself. From Jakub Kicinski. 5) Allow macsec device to be brought up even if it's lowerdev is down, from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: diag: document swapped src/dst in udp_dump_one. macsec: let the administrator set UP state even if lowerdev is down macsec: update operstate when lower device changes net: sched: gred: pass the right attribute to gred_change_table_def() ptp: drop redundant kasprintf() to create worker name net: bridge: remove ipv6 zero address check in mcast queries net: Properly unlink GRO packets on overflow. bpf: fix wrong helper enablement in cgroup local storage bpf: add bpf_jit_limit knob to restrict unpriv allocations bpf: make direct packet write unclone more robust bpf: fix leaking uninitialized memory on pop/peek helpers bpf: fix direct packet write into pop/peek helpers bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data bpf: fix direct packet access for flow dissector progs bpf: disallow direct packet access for unpriv in cg_skb bpf: fix test suite to enable all unpriv program types bpf, btf: fix a missing check bug in btf_parse selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS bpf: devmap: fix wrong interface selection in notifier_call
2018-10-29Merge tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds5-16/+32
Pull drm updates from Dave Airlie: "This is going to rebuild more than drm as it adds a new helper to list.h for doing bulk updates. Seemed like a reasonable addition to me. Otherwise the usual merge window stuff lots of i915 and amdgpu, not so much nouveau, and piles of everything else. Core: - Adds a new list.h helper for doing bulk list updates for TTM. - Don't leak fb address in smem_start to userspace (comes with EXPORT workaround for people using mali out of tree hacks) - udmabuf device to turn memfd regions into dma-buf - Per-plane blend mode property - ref/unref replacements with get/put - fbdev conflicting framebuffers code cleaned up - host-endian format variants - panel orientation quirk for Acer One 10 bridge: - TI SN65DSI86 chip support vkms: - GEM support. - Cursor support amdgpu: - Merge amdkfd and amdgpu into one module - CEC over DP AUX support - Picasso APU support + VCN dynamic powergating - Raven2 APU support - Vega20 enablement + kfd support - ACP powergating improvements - ABGR/XBGR display support - VCN jpeg support - xGMI support - DC i2c/aux cleanup - Ycbcr 4:2:0 support - GPUVM improvements - Powerplay and powerplay endian fixes - Display underflow fixes vmwgfx: - Move vmwgfx specific TTM code to vmwgfx - Split out vmwgfx buffer/resource validation code - Atomic operation rework bochs: - use more helpers - format/byteorder improvements qxl: - use more helpers i915: - GGTT coherency getparam - Turn off resource streamer API - More Icelake enablement + DMC firmware - Full PPGTT for Ivybridge, Haswell and Valleyview - DDB distribution based on resolution - Limited range DP display support nouveau: - CEC over DP AUX support - Initial HDMI 2.0 support virtio-gpu: - vmap support for PRIME objects tegra: - Initial Tegra194 support - DMA/IOMMU integration fixes msm: - a6xx perf improvements + clock prefix - GPU preemption optimisations - a6xx devfreq support - cursor support rockchip: - PX30 support - rgb output interface support mediatek: - HDMI output support on mt2701 and mt7623 rcar-du: - Interlaced modes on Gen3 - LVDS on R8A77980 - D3 and E3 SoC support hisilicon: - misc fixes mxsfb: - runtime pm support sun4i: - R40 TCON support - Allwinner A64 support - R40 HDMI support omapdrm: - Driver rework changing display pipeline ordering to use common code - DMM memory barrier and irq fixes - Errata workarounds exynos: - out-bridge support for LVDS bridge driver - Samsung 16x16 tiled format support - Plane alpha and pixel blend mode support tilcdc: - suspend/resume update mali-dp: - misc updates" * tag 'drm-next-2018-10-24' of git://anongit.freedesktop.org/drm/drm: (1382 commits) firmware/dmc/icl: Add missing MODULE_FIRMWARE() for Icelake. drm/i915/icl: Fix signal_levels drm/i915/icl: Fix DDI/TC port clk_off bits drm/i915/icl: create function to identify combophy port drm/i915/gen9+: Fix initial readout for Y tiled framebuffers drm/i915: Large page offsets for pread/pwrite drm/i915/selftests: Disable shrinker across mmap-exhaustion drm/i915/dp: Link train Fallback on eDP only if fallback link BW can fit panel's native mode drm/i915: Fix intel_dp_mst_best_encoder() drm/i915: Skip vcpi allocation for MSTB ports that are gone drm/i915: Don't unset intel_connector->mst_port drm/i915: Only reset seqno if actually idle drm/i915: Use the correct crtc when sanitizing plane mapping drm/i915: Restore vblank interrupts earlier drm/i915: Check fb stride against plane max stride drm/amdgpu/vcn:Fix uninitialized symbol error drm: panel-orientation-quirks: Add quirk for Acer One 10 (S1003) drm/amd/amdgpu: Fix debugfs error handling drm/amdgpu: Update gc_9_0 golden settings. drm/amd/powerplay: update PPtable with DC BTC and Tvr SocLimit fields ...
2018-10-28Merge tag 'vla-v4.20-rc1' of ↵Linus Torvalds1-16/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull VLA removal from Kees Cook: "Globally warn on VLA use. This turns on "-Wvla" globally now that the last few trees with their VLA removals have landed (crypto, block, net, and powerpc). Arnd mentioned that there may be a couple more VLAs hiding in hard-to-find randconfigs, but nothing big has shaken out in the last month or so in linux-next. We should be basically VLA-free now! Wheee. :) Summary: - Remove unused fallback for BUILD_BUG_ON (which technically contains a VLA) - Lift -Wvla to the top-level Makefile" * tag 'vla-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: Makefile: Globally enable VLA warning compiler.h: give up __compiletime_assert_fallback()
2018-10-28Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds8-221/+1390
Pull XArray conversion from Matthew Wilcox: "The XArray provides an improved interface to the radix tree data structure, providing locking as part of the API, specifying GFP flags at allocation time, eliminating preloading, less re-walking the tree, more efficient iterations and not exposing RCU-protected pointers to its users. This patch set 1. Introduces the XArray implementation 2. Converts the pagecache to use it 3. Converts memremap to use it The page cache is the most complex and important user of the radix tree, so converting it was most important. Converting the memremap code removes the only other user of the multiorder code, which allows us to remove the radix tree code that supported it. I have 40+ followup patches to convert many other users of the radix tree over to the XArray, but I'd like to get this part in first. The other conversions haven't been in linux-next and aren't suitable for applying yet, but you can see them in the xarray-conv branch if you're interested" * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits) radix tree: Remove multiorder support radix tree test: Convert multiorder tests to XArray radix tree tests: Convert item_delete_rcu to XArray radix tree tests: Convert item_kill_tree to XArray radix tree tests: Move item_insert_order radix tree test suite: Remove multiorder benchmarking radix tree test suite: Remove __item_insert memremap: Convert to XArray xarray: Add range store functionality xarray: Move multiorder_check to in-kernel tests xarray: Move multiorder_shrink to kernel tests xarray: Move multiorder account test in-kernel radix tree test suite: Convert iteration test to XArray radix tree test suite: Convert tag_tagged_items to XArray radix tree: Remove radix_tree_clear_tags radix tree: Remove radix_tree_maybe_preload_order radix tree: Remove split/join code radix tree: Remove radix_tree_update_node_t page cache: Finish XArray conversion dax: Convert page fault handlers to XArray ...
2018-10-27Merge tag 'rtc-4.20' of ↵Linus Torvalds1-5/+16
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "This cycle, there were mostly non urgent fixes in drivers. I also finally unexported the non managed registration. Subsystem: - non devm managed registration is now removed from the driver API - all the unnecessary rtc_valid_tm() calls have been removed Drivers: - abx80X: watchdog support - cmos: fix non ACPI support - sc27xx: fix alarm support - Remove a possible sysfs race condition for ab8500, ds1307, ds1685, isl1208 - Fix a possible race condition where an irq handler may be called before the rtc_device struct is allocated for mt6397, pl030, menelaus, armada38x" * tag 'rtc-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (54 commits) rtc: sc27xx: Always read normal alarm when registering RTC device rtc: sc27xx: Add check to see if need to enable the alarm interrupt rtc: sc27xx: Remove interrupts disable and clear in probe() rtc: sc27xx: Clear SPG value update interrupt status rtc: sc27xx: Set wakeup capability before registering rtc device rtc: s35390a: Change buf's type to u8 in s35390a_init rtc: ds1307: fix ds1339 wakealarm support rtc: ds1685: simplify getting .driver_data rtc: m41t80: mark expected switch fall-through rtc: tegra: Propagate errors from platform_get_irq() rtc: cmos: Remove the `use_acpi_alarm' module parameter for !ACPI rtc: cmos: Fix non-ACPI undefined reference to `hpet_rtc_interrupt' rtc: mv: let the core handle invalid alarms rtc: vr41xx: switch to rtc_time64_to_tm/rtc_tm_to_time64 rtc: ab8500: remove useless check rtc: ab8500: let the core handle range rtc: ab8500: use rtc_add_group rtc: rs5c348: report error when time is invalid rtc: rs5c348: remove forward declaration rtc: rs5c348: remove useless label ...
2018-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-0/+1
Daniel Borkmann says: ==================== pull-request: bpf 2018-10-27 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix toctou race in BTF header validation, from Martin and Wenwen. 2) Fix devmap interface comparison in notifier call which was neglecting netns, from Taehee. 3) Several fixes in various places, for example, correcting direct packet access and helper function availability, from Daniel. 4) Fix BPF kselftest config fragment to include af_xdp and sockmap, from Naresh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds21-114/+326
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) hugetlbfs: dirty pages as they are added to pagecache mm: export add_swap_extent() mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t mm/gup: cache dev_pagemap while pinning pages Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" mm: return zero_resv_unavail optimization mm: zero remaining unavailable struct pages tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option tools/testing/selftests/vm/gup_benchmark.c: allow user specified file tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage mm/gup_benchmark.c: add additional pinning methods mm/gup_benchmark.c: time put_page() mm: don't raise MEMCG_OOM event due to failed high-order allocation mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock ...
2018-10-27mm: split SWP_FILE into SWP_ACTIVATED and SWP_FSOmar Sandoval1-6/+7
The SWP_FILE flag serves two purposes: to make swap_{read,write}page() go through the filesystem, and to make swapoff() call ->swap_deactivate(). For Btrfs, we want the latter but not the former, so split this flag into two. This makes us always call ->swap_deactivate() if ->swap_activate() succeeded, not just if it didn't add any swap extents itself. This also resolves the issue of the very misleading name of SWP_FILE, which is only used for swap files over NFS. Link: http://lkml.kernel.org/r/6d63d8668c4287a4f6d203d65696e96f80abdfc7.1536704650.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm/gup: cache dev_pagemap while pinning pagesKeith Busch2-14/+6
Getting pages from ZONE_DEVICE memory needs to check the backing device's live-ness, which is tracked in the device's dev_pagemap metadata. This metadata is stored in a radix tree and looking it up adds measurable software overhead. This patch avoids repeating this relatively costly operation when dev_pagemap is used by caching the last dev_pagemap while getting user pages. The gup_benchmark kernel self test reports this reduces time to get user pages to as low as 1/3 of the previous time. Link: http://lkml.kernel.org/r/20181012173040.15669-1-keith.busch@intel.com Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: zero remaining unavailable struct pagesNaoya Horiguchi1-15/+0
Patch series "mm: Fix for movable_node boot option", v3. This patch series contains a fix for the movable_node boot option issue which was introduced by commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"). The commit breaks the option because it changed the memory gap range to reserved memblock. So, the node is marked as Normal zone even if the SRAT has Hot pluggable affinity. First and second patch fix the original issue which the commit tried to fix, then revert the commit. This patch (of 3): There is a kernel panic that is triggered when reading /proc/kpageflags on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]': BUG: unable to handle kernel paging request at fffffffffffffffe PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 RIP: 0010:stable_page_flags+0x27/0x3c0 Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7 RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202 RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0 RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001 R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0 R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10 FS: 00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0 Call Trace: kpageflags_read+0xc7/0x120 proc_reg_read+0x3c/0x60 __vfs_read+0x36/0x170 vfs_read+0x89/0x130 ksys_pread64+0x71/0x90 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7efc42e75e23 Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24 According to kernel bisection, this problem became visible due to commit f7f99100d8d9 which changes how struct pages are initialized. Memblock layout affects the pfn ranges covered by node/zone. Consider that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the default (no memmap= given) memblock layout is like below: MEMBLOCK configuration: memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x4 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0 memory[0x3] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... If you give memmap=1G!4G (so it just covers memory[0x2]), the range [0x100000000-0x13fffffff] is gone: MEMBLOCK configuration: memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000 memory.cnt = 0x3 memory[0x0] [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0 memory[0x1] [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0 ... This causes shrinking node 0's pfn range because it is calculated by the address range of memblock.memory. So some of struct pages in the gap range are left uninitialized. We have a function zero_resv_unavail() which does zeroing the struct pages outside memblock.memory, but currently it covers only the reserved unavailable range (i.e. memblock.memory && !memblock.reserved). This patch extends it to cover all unavailable range, which fixes the reported issue. Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Tested-by: Oscar Salvador <osalvador@suse.de> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm/memcontrol.c: convert mem_cgroup_id::ref to refcount_t typeKirill Tkhai1-1/+1
This will allow to use generic refcount_t interfaces to check counters overflow instead of currently existing VM_BUG_ON(). The only difference after the patch is VM_BUG_ON() may cause BUG(), while refcount_t fires with WARN(). But this seems not to be significant here, since such the problems are usually caught by syzbot with panic-on-warn enabled. Link: http://lkml.kernel.org/r/153910718919.7006.13400779039257185427.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: dax: add comment for PFN_SPECIALYang Shi1-0/+2
The comment for PFN_SPECIAL is missed in pfn_t.h. Add comment to get consistent with other pfn flags. Link: http://lkml.kernel.org/r/1538086549-100536-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: mremap: downgrade mmap_sem to read when shrinkingYang Shi1-0/+2
Other than munmap, mremap might be used to shrink memory mapping too. So, it may hold write mmap_sem for long time when shrinking large mapping, as what commit ("mm: mmap: zap pages with read mmap_sem in munmap") described. The mremap() will not manipulate vmas anymore after __do_munmap() call for the mapping shrink use case, so it is safe to downgrade to read mmap_sem. So, the same optimization, which downgrades mmap_sem to read for zapping pages, is also feasible and reasonable to this case. The period of holding exclusive mmap_sem for shrinking large mapping would be reduced significantly with this optimization. MREMAP_FIXED and MREMAP_MAYMOVE are more complicated to adopt this optimization since they need manipulate vmas after do_munmap(), downgrading mmap_sem may create race window. Simple mapping shrink is the low hanging fruit, and it may cover the most cases of unmap with munmap together. [akpm@linux-foundation.org: tweak comment] [yang.shi@linux.alibaba.com: fix unsigned compare against 0 issue] Link: http://lkml.kernel.org/r/1538687672-17795-2-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1538067582-60038-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: defer ZONE_DEVICE page initialization to the point where we init pgmapAlexander Duyck1-0/+2
The ZONE_DEVICE pages were being initialized in two locations. One was with the memory_hotplug lock held and another was outside of that lock. The problem with this is that it was nearly doubling the memory initialization time. Instead of doing this twice, once while holding a global lock and once without, I am opting to defer the initialization to the one outside of the lock. This allows us to avoid serializing the overhead for memory init and we can instead focus on per-node init times. One issue I encountered is that devm_memremap_pages and hmm_devmmem_pages_create were initializing only the pgmap field the same way. One wasn't initializing hmm_data, and the other was initializing it to a poison value. Since this is something that is exposed to the driver in the case of hmm I am opting for a third option and just initializing hmm_data to 0 since this is going to be exposed to unknown third party drivers. [alexander.h.duyck@linux.intel.com: fix reference count for pgmap in devm_memremap_pages] Link: http://lkml.kernel.org/r/20181008233404.1909.37302.stgit@localhost.localdomain Link: http://lkml.kernel.org/r/20180925202053.3576.66039.stgit@localhost.localdomain Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: create non-atomic version of SetPageReserved for init useAlexander Duyck1-0/+1
It doesn't make much sense to use the atomic SetPageReserved at init time when we are using memset to clear the memory and manipulating the page flags via simple "&=" and "|=" operations in __init_single_page. This patch adds a non-atomic version __SetPageReserved that can be used during page init and shows about a 10% improvement in initialization times on the systems I have available for testing. On those systems I saw initialization times drop from around 35 seconds to around 32 seconds to initialize a 3TB block of persistent memory. I believe the main advantage of this is that it allows for more compiler optimization as the __set_bit operation can be reordered whereas the atomic version cannot. I tried adding a bit of documentation based on f1dd2cd13c4 ("mm, memory_hotplug: do not associate hotadded memory to zones until online"). Ideally the reserved flag should be set earlier since there is a brief window where the page is initialization via __init_single_page and we have not set the PG_Reserved flag. I'm leaving that for a future patch set as that will require a more significant refactor. Link: http://lkml.kernel.org/r/20180925202018.3576.11607.stgit@localhost.localdomain Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: provide kernel parameter to allow disabling page init poisoningAlexander Duyck1-0/+8
Patch series "Address issues slowing persistent memory initialization", v5. The main thing this patch set achieves is that it allows us to initialize each node worth of persistent memory independently. As a result we reduce page init time by about 2 minutes because instead of taking 30 to 40 seconds per node and going through each node one at a time, we process all 4 nodes in parallel in the case of a 12TB persistent memory setup spread evenly over 4 nodes. This patch (of 3): On systems with a large amount of memory it can take a significant amount of time to initialize all of the page structs with the PAGE_POISON_PATTERN value. I have seen it take over 2 minutes to initialize a system with over 12TB of RAM. In order to work around the issue I had to disable CONFIG_DEBUG_VM and then the boot time returned to something much more reasonable as the arch_add_memory call completed in milliseconds versus seconds. However in doing that I had to disable all of the other VM debugging on the system. In order to work around a kernel that might have CONFIG_DEBUG_VM enabled on a system that has a large amount of memory I have added a new kernel parameter named "vm_debug" that can be set to "-" in order to disable it. Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomain Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27memcg: remove memcg_kmem_skip_accountShakeel Butt1-3/+0
The flag memcg_kmem_skip_account was added during the era of opt-out kmem accounting. There is no need for such flag in the opt-in world as there aren't any __GFP_ACCOUNT allocations within memcg_create_cache_enqueue(). Link: http://lkml.kernel.org/r/20180919004501.178023-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27mm: workingset: add vmstat counter for shadow nodesJohannes Weiner1-0/+1
Make it easier to catch bugs in the shadow node shrinker by adding a counter for the shadow nodes in circulation. [akpm@linux-foundation.org: assert that irqs are disabled, for __inc_lruvec_page_state()] [akpm@linux-foundation.org: s/WARN_ON_ONCE/VM_WARN_ON_ONCE/, per Johannes] Link: http://lkml.kernel.org/r/20181009184732.762-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27psi: cgroup supportJohannes Weiner3-0/+44
On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-27psi: pressure stall information for CPU, memory, and IOJohannes Weiner3-0/+130
When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this patch implements a way to quantify resource pressure in the system. A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. Stall states are aggregate versions of the per-task delay accounting delays: cpu: some tasks are runnable but not executing on a CPU memory: tasks are reclaiming, or waiting for swapin or thrashing cache io: tasks are waiting for io completions These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs. To do this, psi keeps track of the task states associated with each CPU and samples the time they spend in stall states. Every 2 seconds, the samples are averaged across CPUs - weighted by the CPUs' non-idle time to eliminate artifacts from unused CPUs - and translated into percentages of walltime. A running average of those percentages is maintained over 10s, 1m, and 5m periods (similar to the loadaverage). [hannes@cmpxchg.org: doc fixlet, per Randy] Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org [hannes@cmpxchg.org: code optimization] Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org [hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter] Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org [hannes@cmpxchg.org: fix build] Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>