summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2019-09-27Merge tag 'drm-next-2019-09-27' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-8/+8
Pull drm fixes from Dave Airlie: "Fixes built up over the past 1.5 weeks or so, it's two weeks of amdgpu, some core cleanups and some panfrost fixes. I also finally figured out why my desktop was slow to do a bunch of stuff (someone gave it an IPv6 address which can't reach anything!). core: - Some cleanups and fixes in the self-refresh helpers - Some cleanups and fixes in the atomic helpers amdgpu: - Fix a 64 bit divide - Prevent a memory leak in a failure case in dc - Load proper gfx firmware on navi14 variants - Add more navi12 and navi14 PCI ids - Misc fixes for renoir - Fix bandwidth issues with multiple displays on vega20 - Support for Dali - Fix a possible oops with KFD on hawaii - Fix for backlight level after resume on some APUs - Other misc fixes panfrost: - Multiple panfrost fixes for regulator support and page fault handling" * tag 'drm-next-2019-09-27' of git://anongit.freedesktop.org/drm/drm: (34 commits) drm/amd/display: prevent memory leak drm/amdgpu/gfx10: add support for wks firmware loading drm/amdgpu/display: include slab.h in dcn21_resource.c drm/amdgpu/display: fix 64 bit divide drm/panfrost: Prevent race when handling page fault drm/panfrost: Remove NULL checks for regulator drm/panfrost: Fix regulator_get_optional() misuse drm: Measure Self Refresh Entry/Exit times to avoid thrashing drm: Fix kerneldoc and remove unused struct member in self_refresh helper drm/atomic: Rename crtc_state->pageflip_flags to async_flip drm/atomic: Reject FLIP_ASYNC unconditionally drm/atomic: Take the atomic toys away from X drm/amdgpu: flag navi12 and 14 as experimental for 5.4 drm/kms: Duct-tape for mode object lifetime checks drm/amdgpu: add navi12 pci id drm/amdgpu: add navi14 PCI ID for work station SKU drm/amdkfd: Swap trap temporary registers in gfx10 trap handler drm/amd/powerplay: implement sysfs for getting dpm clock drm/amd/display: Restore backlight brightness after system resume drm/amd/display: Implement voltage limitation for dali ...
2019-09-26Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds6-19/+82
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Dequeue the request from the receive queue while we're re-encoding # v4.20+ - Fix buffer handling of GSS MIC without slack # 5.1 Features: - Increase xprtrdma maximum transport header and slot table sizes - Add support for nfs4_call_sync() calls using a custom rpc_task_struct - Optimize the default readahead size - Enable pNFS filelayout LAYOUTGET on OPEN Other bugfixes and cleanups: - Fix possible null-pointer dereferences and memory leaks - Various NFS over RDMA cleanups - Various NFS over RDMA comment updates - Don't receive TCP data into a reset request buffer - Don't try to parse incomplete RPC messages - Fix congestion window race with disconnect - Clean up pNFS return-on-close error handling - Fixes for NFS4ERR_OLD_STATEID handling" * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits) pNFS/filelayout: enable LAYOUTGET on OPEN NFS: Optimise the default readahead size NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE NFSv4: Fix OPEN_DOWNGRADE error handling pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid NFSv4: Add a helper to increment stateid seqids NFSv4: Handle RPC level errors in LAYOUTRETURN NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close NFSv4: Clean up pNFS return-on-close error handling pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors NFS: remove unused check for negative dentry NFSv3: use nfs_add_or_obtain() to create and reference inodes NFS: Refactor nfs_instantiate() for dentry referencing callers SUNRPC: Fix congestion window race with disconnect SUNRPC: Don't try to parse incomplete RPC messages SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic SUNRPC: Fix buffer handling of GSS MIC without slack SUNRPC: RPC level errors should always set task->tk_rpc_status SUNRPC: Don't receive TCP data into a request buffer that has been reset ...
2019-09-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds17-124/+171
Merge more updates from Andrew Morton: - almost all of the rest of -mm - various other subsystems Subsystems affected by this patch series: memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork, cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise, cleanups, pagemap * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (77 commits) arch/sparc/include/asm/pgtable_64.h: fix build mm: treewide: clarify pgtable_page_{ctor,dtor}() naming ntfs: remove (un)?likely() from IS_ERR() conditions IB/hfi1: remove unlikely() from IS_ERR*() condition xfs: remove unlikely() from WARN_ON() condition wimax/i2400m: remove unlikely() from WARN*() condition fs: remove unlikely() from WARN_ON() condition xen/events: remove unlikely() from WARN() condition checkpatch: check for nested (un)?likely() calls hexagon: drop empty and unused free_initrd_mem mm: factor out common parts between MADV_COLD and MADV_PAGEOUT mm: introduce MADV_PAGEOUT mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM mm: introduce MADV_COLD mm: untag user pointers in mmap/munmap/mremap/brk vfio/type1: untag user pointers in vaddr_get_pfn tee/shm: untag user pointers in tee_shm_register media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get drm/radeon: untag user pointers in radeon_gem_userptr_ioctl drm/amdgpu: untag user pointers ...
2019-09-26mm: treewide: clarify pgtable_page_{ctor,dtor}() namingMark Rutland2-6/+6
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26mm: introduce MADV_PAGEOUTMinchan Kim2-0/+2
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ [minchan@kernel.org: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26mm: introduce MADV_COLDMinchan Kim2-0/+3
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26bug: move WARN_ON() "cut here" into exception handlerKees Cook1-5/+3
The original clean up of "cut here" missed the WARN_ON() case (that does not have a printk message), which was fixed recently by adding an explicit printk of "cut here". This had the downside of adding a printk() to every WARN_ON() caller, which reduces the utility of using an instruction exception to streamline the resulting code. By making this a new BUGFLAG, all of these can be removed and "cut here" can be handled by the exception handler. This was very pronounced on PowerPC, but the effect can be seen on x86 as well. The resulting text size of a defconfig build shows some small savings from this patch: text data bss dec hex filename 19691167 5134320 1646664 26472151 193eed7 vmlinux.before 19676362 5134260 1663048 26473670 193f4c6 vmlinux.after This change also opens the door for creating something like BUG_MSG(), where a custom printk() before issuing BUG(), without confusing the "cut here" line. Link: http://lkml.kernel.org/r/201908200943.601DD59DCE@keescook Fixes: 6b15f678fb7d ("include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26bug: consolidate __WARN_FLAGS usageKees Cook1-11/+7
Instead of having separate tests for __WARN_FLAGS, merge the two #ifdef blocks and replace the synonym WANT_WARN_ON_SLOWPATH macro. Link: http://lkml.kernel.org/r/20190819234111.9019-7-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26bug: clean up helper macros to remove __WARN_TAINT()Kees Cook1-10/+11
In preparation for cleaning up "cut here" even more, this removes the __WARN_*TAINT() helpers, as they limit the ability to add new BUGFLAG_* flags to call sites. They are removed by expanding them into full __WARN_FLAGS() calls. Link: http://lkml.kernel.org/r/20190819234111.9019-6-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26bug: consolidate warn_slowpath_fmt() usageKees Cook1-2/+1
Instead of having a separate helper for no printk output, just consolidate the logic into warn_slowpath_fmt(). Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26bug: rename __WARN_printf_taint() to __WARN_printf()Kees Cook1-4/+4
This just renames the helper to improve readability. Link: http://lkml.kernel.org/r/20190819234111.9019-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26bug: refactor away warn_slowpath_fmt_taint()Kees Cook1-9/+4
Patch series "Clean up WARN() "cut here" handling", v2. Christophe Leroy noticed that the fix for missing "cut here" in the WARN() case was adding explicit printk() calls instead of teaching the exception handler to add it. This refactors the bug/warn infrastructure to pass this information as a new BUGFLAG. Longer details repeated from the last patch in the series: bug: move WARN_ON() "cut here" into exception handler The original cleanup of "cut here" missed the WARN_ON() case (that does not have a printk message), which was fixed recently by adding an explicit printk of "cut here". This had the downside of adding a printk() to every WARN_ON() caller, which reduces the utility of using an instruction exception to streamline the resulting code. By making this a new BUGFLAG, all of these can be removed and "cut here" can be handled by the exception handler. This was very pronounced on PowerPC, but the effect can be seen on x86 as well. The resulting text size of a defconfig build shows some small savings from this patch: text data bss dec hex filename 19691167 5134320 1646664 26472151 193eed7 vmlinux.before 19676362 5134260 1663048 26473670 193f4c6 vmlinux.after This change also opens the door for creating something like BUG_MSG(), where a custom printk() before issuing BUG(), without confusing the "cut here" line. This patch (of 7): There's no reason to have specialized helpers for passing the warn taint down to __warn(). Consolidate and refactor helper macros, removing __WARN_printf() and warn_slowpath_fmt_taint(). Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26kgdb: don't use a notifier to enter kgdb at panic; call directlyDouglas Anderson1-0/+2
Right now kgdb/kdb hooks up to debug panics by registering for the panic notifier. This works OK except that it means that kgdb/kdb gets called _after_ the CPUs in the system are taken offline. That means that if anything important was happening on those CPUs (like something that might have contributed to the panic) you can't debug them. Specifically I ran into a case where I got a panic because a task was "blocked for more than 120 seconds" which was detected on CPU 2. I nicely got shown stack traces in the kernel log for all CPUs including CPU 0, which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of __mmc_switch(). I then ended up at the kdb prompt where switched over to kgdb to try to look at local variables of the process on CPU 0. I found that I couldn't. Digging more, I found that I had no info on any tasks running on CPUs other than CPU 2 and that asking kdb for help showed me "Error: no saved data for this cpu". This was because all the CPUs were offline. Let's move the entry of kdb/kgdb to a direct call from panic() and stop using the generic notifier. Putting a direct call in allows us to order things more properly and it also doesn't seem like we're breaking any abstractions by calling into the debugger from the panic function. Daniel said: : This patch changes the way kdump and kgdb interact with each other. : However it would seem rather odd to have both tools simultaneously armed : and, even if they were, the user still has the option to use panic_timeout : to force a kdump to happen. Thus I think the change of order is : acceptable. Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.tang@intel.com> Cc: YueHaibing <yuehaibing@huawei.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26uaccess: add missing __must_check attributesKees Cook2-11/+12
The usercopy implementation comments describe that callers of the copy_*_user() family of functions must always have their return values checked. This can be enforced at compile time with __must_check, so add it where needed. Link: http://lkml.kernel.org/r/201908251609.ADAD5CAAC1@keescook Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26kexec: restore arch_kexec_kernel_image_probe declarationVasily Gorbik1-0/+2
arch_kexec_kernel_image_probe function declaration has been removed by commit 9ec4ecef0af7 ("kexec_file,x86,powerpc: factor out kexec_file_ops functions"). Still this function is overridden by couple of architectures and proper prototype declaration is therefore important, so bring it back. This fixes the following sparse warning on s390: arch/s390/kernel/machine_kexec_file.c:333:5: warning: symbol 'arch_kexec_kernel_image_probe' was not declared. Should it be static? Link: http://lkml.kernel.org/r/patch.git-ff1c9045ebdc.your-ad-here.call-01564402297-ext-5690@work.hours Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26cpumask: nicer for_each_cpumask_and() signatureAlexey Dobriyan1-7/+7
Mask arguments can be swapped without changing anything. Make arguments names reflect that: #define for_each_cpu_and(cpu, mask1, mask2) Link: http://lkml.kernel.org/r/20190724183350.GA15041@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26fork: improve error message for corrupted page tablesSai Praneeth Prakhya1-0/+4
When a user process exits, the kernel cleans up the mm_struct of the user process and during cleanup, check_mm() checks the page tables of the user process for corruption (E.g: unexpected page flags set/cleared). For corrupted page tables, the error message printed by check_mm() isn't very clear as it prints the loop index instead of page table type (E.g: Resident file mapping pages vs Resident shared memory pages). The loop index in check_mm() is used to index rss_stat[] which represents individual memory type stats. Hence, instead of printing index, print memory type, thereby improving error message. Without patch: -------------- [ 204.836425] mm/pgtable-generic.c:29: bad p4d 0000000089eb4e92(800000025f941467) [ 204.836544] BUG: Bad rss-counter state mm:00000000f75895ea idx:0 val:2 [ 204.836615] BUG: Bad rss-counter state mm:00000000f75895ea idx:1 val:5 [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 With patch: ----------- [ 69.815453] mm/pgtable-generic.c:29: bad p4d 0000000084653642(800000025ca37467) [ 69.815872] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_FILEPAGES val:2 [ 69.815962] BUG: Bad rss-counter state mm:00000000014a6c03 type:MM_ANONPAGES val:5 [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so that it matches the other print statement. Link: http://lkml.kernel.org/r/da75b5153f617f4c5739c08ee6ebeb3d19db0fbc.1565123758.git.sai.praneeth.prakhya@intel.com Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26lib/hexdump: make print_hex_dump_bytes() a nop on !DEBUG buildsStephen Boyd1-7/+15
I'm seeing a bunch of debug prints from a user of print_hex_dump_bytes() in my kernel logs, but I don't have CONFIG_DYNAMIC_DEBUG enabled nor do I have DEBUG defined in my build. The problem is that print_hex_dump_bytes() calls a wrapper function in lib/hexdump.c that calls print_hex_dump() with KERN_DEBUG level. There are three cases to consider here 1. CONFIG_DYNAMIC_DEBUG=y --> call dynamic_hex_dum() 2. CONFIG_DYNAMIC_DEBUG=n && DEBUG --> call print_hex_dump() 3. CONFIG_DYNAMIC_DEBUG=n && !DEBUG --> stub it out Right now, that last case isn't detected and we still call print_hex_dump() from the stub wrapper. Let's make print_hex_dump_bytes() only call print_hex_dump_debug() so that it works properly in all cases. Case #1, print_hex_dump_debug() calls dynamic_hex_dump() and we get same behavior. Case #2, print_hex_dump_debug() calls print_hex_dump() with KERN_DEBUG and we get the same behavior. Case #3, print_hex_dump_debug() is a nop, changing behavior to what we want, i.e. print nothing. Link: http://lkml.kernel.org/r/20190816235624.115280-1-swboyd@chromium.org Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26include/trace/events/writeback.h: fix -Wstringop-truncation warningsQian Cai1-18/+20
There are many of those warnings. In file included from ./arch/powerpc/include/asm/paca.h:15, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./include/asm-generic/preempt.h:5, from ./arch/powerpc/include/generated/asm/preempt.h:1, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:51, from fs/fs-writeback.c:19: In function 'strncpy', inlined from 'perf_trace_writeback_page_template' at ./include/trace/events/writeback.h:56:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by using the new strscpy_pad() which was introduced in "lib/string: Add strscpy_pad() function" and will always be NUL-terminated instead of strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for consistency. Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw Fixes: 455b2864686d ("writeback: Initial tracing support") Fixes: 028c2dd184c0 ("writeback: Add tracing to balance_dirty_pages") Fixes: e84d0a4f8e39 ("writeback: trace event writeback_queue_io") Fixes: b48c104d2211 ("writeback: trace event bdi_dirty_ratelimit") Fixes: cc1676d917f3 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()") Fixes: 9fb0a7da0c52 ("writeback: add more tracepoints") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nitin Gote <nitin.r.gote@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Stephen Kitt <steve@sk2.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26kernel-doc: core-api: include string.h into core-apiJoe Perches1-2/+3
core-api should show all the various string functions including the newly added stracpy and stracpy_pad. Miscellanea: o Update the Returns: value for strscpy o fix a defect with %NUL) [joe@perches.com: correct return of -E2BIG descriptions] Link: http://lkml.kernel.org/r/29f998b4c1a9d69fbeae70500ba0daa4b340c546.1563889130.git.joe@perches.com Link: http://lkml.kernel.org/r/224a6ebf39955f4107c0c376d66155d970e46733.1563841972.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Stephen Kitt <steve@sk2.org> Cc: Nitin Gote <nitin.r.gote@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26augmented rbtree: rework the RB_DECLARE_CALLBACKS macro definitionMichel Lespinasse1-12/+12
Change the definition of the RBCOMPUTE function. The propagate callback repeatedly calls RBCOMPUTE as it moves from leaf to root. it wants to stop recomputing once the augmented subtree information doesn't change. This was previously checked using the == operator, but that only works when the augmented subtree information is a scalar field. This commit modifies the RBCOMPUTE function so that it now sets the augmented subtree information instead of returning it, and returns a boolean value indicating if the propagate callback should stop. The motivation for this change is that I want to introduce augmented rbtree uses where the augmented data for the subtree is a struct instead of a scalar. Link: http://lkml.kernel.org/r/20190703040156.56953-4-walken@google.com Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macroMichel Lespinasse2-21/+37
Add RB_DECLARE_CALLBACKS_MAX, which generates augmented rbtree callbacks for the case where the augmented value is a scalar whose definition follows a max(f(node)) pattern. This actually covers all present uses of RB_DECLARE_CALLBACKS, and saves some (source) code duplication in the various RBCOMPUTE function definitions. [walken@google.com: fix mm/vmalloc.c] Link: http://lkml.kernel.org/r/CANN689FXgK13wDYNh1zKxdipeTuALG4eKvKpsdZqKFJ-rvtGiQ@mail.gmail.com [walken@google.com: re-add check to check_augmented()] Link: http://lkml.kernel.org/r/20190727022027.GA86863@google.com Link: http://lkml.kernel.org/r/20190703040156.56953-3-walken@google.com Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26augmented rbtree: add comments for RB_DECLARE_CALLBACKS macroMichel Lespinasse1-21/+33
Patch series "make RB_DECLARE_CALLBACKS more generic", v3. These changes are intended to make the RB_DECLARE_CALLBACKS macro more generic (allowing the aubmented subtree information to be a struct instead of a scalar). I have verified the compiled lib/interval_tree.o and mm/mmap.o files to check that they didn't change. This held as expected for interval_tree.o; mmap.o did have some changes which could be reverted by marking __vma_link_rb as noinline. I did not add such a change to the patchset; I felt it was reasonable enough to leave the inlining decision up to the compiler. This patch (of 3): Add a short comment summarizing the arguments to RB_DECLARE_CALLBACKS. The arguments are also now capitalized. This copies the style of the INTERVAL_TREE_DEFINE macro. No functional changes in this commit, only comments and capitalization. Link: http://lkml.kernel.org/r/20190703040156.56953-2-walken@google.com Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26linux/coff.h: add include guardMasahiro Yamada1-0/+5
Add a header include guard just in case. My motivation is to allow Kbuild to detect missing include guard: https://patchwork.kernel.org/patch/11063011/ Before I enable this checker I want to fix as many headers as possible. Link: http://lkml.kernel.org/r/20190728154728.11126-1-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25Merge tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds4-0/+5
Pull ceph updates from Ilya Dryomov: "The highlights are: - automatic recovery of a blacklisted filesystem session (Zheng Yan). This is disabled by default and can be enabled by mounting with the new "recover_session=clean" option. - serialize buffered reads and O_DIRECT writes (Jeff Layton). Care is taken to avoid serializing O_DIRECT reads and writes with each other, this is based on the exclusion scheme from NFS. - handle large osdmaps better in the face of fragmented memory (myself) - don't limit what security.* xattrs can be get or set (Jeff Layton). We were overly restrictive here, unnecessarily preventing things like file capability sets stored in security.capability from working. - allow copy_file_range() within the same inode and across different filesystems within the same cluster (Luis Henriques)" * tag 'ceph-for-5.4-rc1' of git://github.com/ceph/ceph-client: (41 commits) ceph: call ceph_mdsc_destroy from destroy_fs_client libceph: use ceph_kvmalloc() for osdmap arrays libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc() ceph: allow object copies across different filesystems in the same cluster ceph: include ceph_debug.h in cache.c ceph: move static keyword to the front of declarations rbd: pull rbd_img_request_create() dout out into the callers ceph: reconnect connection if session hang in opening state libceph: drop unused con parameter of calc_target() ceph: use release_pages() directly rbd: fix response length parameter for encoded strings ceph: allow arbitrary security.* xattrs ceph: only set CEPH_I_SEC_INITED if we got a MAC label ceph: turn ceph_security_invalidate_secctx into static inline ceph: add buffered/direct exclusionary locking for reads and writes libceph: handle OSD op ceph_pagelist_append() errors ceph: don't return a value from void function ceph: don't freeze during write page faults ceph: update the mtime when truncating up ceph: fix indentation in __get_snap_name() ...
2019-09-25Merge tag 'fuse-update-5.4' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Continue separating the transport (user/kernel communication) and the filesystem layers of fuse. Getting rid of most layering violations will allow for easier cleanup and optimization later on. - Prepare for the addition of the virtio-fs filesystem. The actual filesystem will be introduced by a separate pull request. - Convert to new mount API. - Various fixes, optimizations and cleanups. * tag 'fuse-update-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (55 commits) fuse: Make fuse_args_to_req static fuse: fix memleak in cuse_channel_open fuse: fix beyond-end-of-page access in fuse_parse_cache() fuse: unexport fuse_put_request fuse: kmemcg account fs data fuse: on 64-bit store time in d_fsdata directly fuse: fix missing unlock_page in fuse_writepage() fuse: reserve byteswapped init opcodes fuse: allow skipping control interface and forced unmount fuse: dissociate DESTROY from fuseblk fuse: delete dentry if timeout is zero fuse: separate fuse device allocation and installation in fuse_conn fuse: add fuse_iqueue_ops callbacks fuse: extract fuse_fill_super_common() fuse: export fuse_dequeue_forget() function fuse: export fuse_get_unique() fuse: export fuse_send_init_request() fuse: export fuse_len_args() fuse: export fuse_end_request() fuse: fix request limit ...
2019-09-25Merge tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-3/+7
Pull iomap updates from Darrick Wong: "After last week's failed pull request attempt, I scuttled everything in the branch except for the directio endio api changes, which were trivial. Everything else will simply have to wait for the next cycle. Summary: - Report both io errors and short io results to the directio endio handler. - Allow directio callers to pass an ops structure to iomap_dio_rw" * tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: move the iomap_dio_rw ->end_io callback into a structure iomap: split size and error for iomap_dio_rw ->end_io
2019-09-25Merge branch 'i2c/for-5.4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - new driver for ICY, an Amiga Zorro card :) - axxia driver gained slave mode support, NXP driver gained ACPI - the slave EEPROM backend gained 16 bit address support - and lots of regular driver updates and reworks * 'i2c/for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (52 commits) i2c: tegra: Move suspend handling to NOIRQ phase i2c: imx: ACPI support for NXP i2c controller i2c: uniphier(-f): remove all dev_dbg() i2c: uniphier(-f): use devm_platform_ioremap_resource() i2c: slave-eeprom: Add comment about address handling i2c: exynos5: Remove IRQF_ONESHOT i2c: stm32f7: Make structure stm32f7_i2c_algo constant i2c: cht-wc: drop check because i2c_unregister_device() is NULL safe i2c-eeprom_slave: Add support for more eeprom models i2c: fsi: Add of_put_node() before break i2c: synquacer: Make synquacer_i2c_ops constant i2c: hix5hd2: Remove IRQF_ONESHOT i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond watchdog: iTCO: Add support for Cannon Lake PCH iTCO i2c: iproc: Make bcm_iproc_i2c_quirks constant i2c: iproc: Add full name of devicetree node to adapter name i2c: piix4: Add ACPI support i2c: piix4: Fix probing of reserved ports on AMD Family 16h Model 30h i2c: ocores: use request_any_context_irq() to register IRQ handler i2c: designware: Fix optional reset error handling ...
2019-09-25Merge tag 'for-5.4/io_uring-2019-09-24' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+2
Pull more io_uring updates from Jens Axboe: "A collection of later fixes and additions, that weren't quite ready for pushing out with the initial pull request. This contains: - Fix potential use-after-free of shadow requests (Jackie) - Fix potential OOM crash in request allocation (Jackie) - kmalloc+memcpy -> kmemdup cleanup (Jackie) - Fix poll crash regression (me) - Fix SQ thread not being nice and giving up CPU for !PREEMPT (me) - Add support for timeouts, making it easier to do epoll_wait() conversions, for instance (me) - Ensure io_uring works without f_ops->read_iter() and f_ops->write_iter() (me)" * tag 'for-5.4/io_uring-2019-09-24' of git://git.kernel.dk/linux-block: io_uring: correctly handle non ->{read,write}_iter() file_operations io_uring: IORING_OP_TIMEOUT support io_uring: use cond_resched() in sqthread io_uring: fix potential crash issue due to io_get_req failure io_uring: ensure poll commands clear ->sqe io_uring: fix use-after-free of shadow_req io_uring: use kmemdup instead of kmalloc and memcpy
2019-09-25Merge tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-blockLinus Torvalds2-14/+4
Pull more block updates from Jens Axboe: "Some later additions that weren't quite done for the first pull request, and also a few fixes that have arrived since. This contains: - Kill silly pktcdvd warning on attempting to register a non-scsi passthrough device (me) - Use symbolic constants for the block t10 protection types, and switch to handling it in core rather than in the drivers (Max) - libahci platform missing node put fix (Nishka) - Small series of fixes for BFQ (Paolo) - Fix possible nbd crash (Xiubo)" * tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block: block: drop device references in bsg_queue_rq() block: t10-pi: fix -Wswitch warning pktcdvd: remove warning on attempting to register non-passthrough dev ata: libahci_platform: Add of_node_put() before loop exit nbd: fix possible page fault for nbd disk nbd: rename the runtime flags as NBD_RT_ prefixed block, bfq: push up injection only after setting service time block, bfq: increase update frequency of inject limit block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1 block, bfq: update inject limit only after injection occurred block: centralize PI remapping logic to the block layer block: use symbolic constants for t10_pi type
2019-09-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds20-198/+172
Merge updates from Andrew Morton: - a few hot fixes - ocfs2 updates - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan, cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug, sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy, oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap, zsmalloc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) mm/zsmalloc.c: fix a -Wunused-function warning zswap: do not map same object twice zswap: use movable memory if zpool support allocate movable memory zpool: add malloc_support_movable to zpool_driver shmem: fix obsolete comment in shmem_getpage_gfp() mm/madvise: reduce code duplication in error handling paths mm: mmap: increase sockets maximum memory size pgoff for 32bits mm/mmap.c: refine find_vma_prev() with rb_last() riscv: make mmap allocation top-down by default mips: use generic mmap top-down layout and brk randomization mips: replace arch specific way to determine 32bit task with generic version mips: adjust brk randomization offset to fit generic version mips: use STACK_TOP when computing mmap base address mips: properly account for stack randomization and stack guard gap arm: use generic mmap top-down layout and brk randomization arm: use STACK_TOP when computing mmap base address arm: properly account for stack randomization and stack guard gap arm64, mm: make randomization selected by generic topdown mmap layout arm64, mm: move generic mmap layout functions to mm arm64: consider stack randomization for mmap base only when necessary ...
2019-09-25zpool: add malloc_support_movable to zpool_driverHui Zhu1-0/+3
As a zpool_driver, zsmalloc can allocate movable memory because it support migate pages. But zbud and z3fold cannot allocate movable memory. Add malloc_support_movable to zpool_driver. If a zpool_driver support allocate movable memory, set it to true. And add zpool_malloc_support_movable check malloc_support_movable to make sure if a zpool support allocate movable memory. Link: http://lkml.kernel.org/r/20190605100630.13293-1-teawaterz@linux.alibaba.com Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitalywool@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm, fs: move randomize_stack_top from fs to mmAlexandre Ghiti1-0/+2
Patch series "Provide generic top-down mmap layout functions", v6. This series introduces generic functions to make top-down mmap layout easily accessible to architectures, in particular riscv which was the initial goal of this series. The generic implementation was taken from arm64 and used successively by arm, mips and finally riscv. Note that in addition the series fixes 2 issues: - stack randomization was taken into account even if not necessary. - [1] fixed an issue with mmap base which did not take into account randomization but did not report it to arm and mips, so by moving arm64 into a generic library, this problem is now fixed for both architectures. This work is an effort to factorize architecture functions to avoid code duplication and oversights as in [1]. [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html This patch (of 14): This preparatory commit moves this function so that further introduction of generic topdown mmap layout is contained only in mm/util.c. Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25khugepaged: enable collapse pmd for pte-mapped THPSong Liu1-0/+12
khugepaged needs exclusive mmap_sem to access page table. When it fails to lock mmap_sem, the page will fault in as pte-mapped THP. As the page is already a THP, khugepaged will not handle this pmd again. This patch enables the khugepaged to retry collapse the page table. struct mm_slot (in khugepaged.c) is extended with an array, containing addresses of pte-mapped THPs. We use array here for simplicity. We can easily replace it with more advanced data structures when needed. In khugepaged_scan_mm_slot(), if the mm contains pte-mapped THP, we try to collapse the page table. Since collapse may happen at an later time, some pages may already fault in. collapse_pte_mapped_thp() is added to properly handle these pages. collapse_pte_mapped_thp() also double checks whether all ptes in this pmd are mapping to the same THP. This is necessary because some subpage of the THP may be replaced, for example by uprobe. In such cases, it is not possible to collapse the pmd. [kirill.shutemov@linux.intel.com: add comments for retract_page_tables()] Link: http://lkml.kernel.org/r/20190816145443.6ard3iilytc6jlgv@box Link: http://lkml.kernel.org/r/20190815164525.1848545-6-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm, thp: introduce FOLL_SPLIT_PMDSong Liu1-0/+1
Introduce a new foll_flag: FOLL_SPLIT_PMD. As the name says FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge page stays as-is. FOLL_SPLIT_PMD is useful for cases where we need to use regular pages, but would switch back to huge page and huge pmd on. One of such example is uprobe. The following patches use FOLL_SPLIT_PMD in uprobe. Link: http://lkml.kernel.org/r/20190815164525.1848545-4-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: move memcmp_pages() and pages_identical()Song Liu1-0/+7
Patch series "THP aware uprobe", v13. This patchset makes uprobe aware of THPs. Currently, when uprobe is attached to text on THP, the page is split by FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of THP. This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces FOLL_SPLIT_PMD, which only split PMD for uprobe. After all uprobes within the THP are removed, the PTE-mapped pages are regrouped as huge PMD. This set (plus a few THP patches) is also available at https://github.com/liu-song-6/linux/tree/uprobe-thp This patch (of 6): Move memcmp_pages() to mm/util.c and pages_identical() to mm.h, so that we can use them in other files. Link: http://lkml.kernel.org/r/20190815164525.1848545-2-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <matthew.wilcox@oracle.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: thp: make deferred split shrinker memcg awareYang Shi3-0/+14
Currently THP deferred split shrinker is not memcg aware, this may cause premature OOM with some configuration. For example the below test would run into premature OOM easily: $ cgcreate -g memory:thp $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes $ cgexec -g memory:thp transhuge-stress 4000 transhuge-stress comes from kernel selftest. It is easy to hit OOM, but there are still a lot THP on the deferred split queue, memcg direct reclaim can't touch them since the deferred split shrinker is not memcg aware. Convert deferred split shrinker memcg aware by introducing per memcg deferred split queue. The THP should be on either per node or per memcg deferred split queue if it belongs to a memcg. When the page is immigrated to the other memcg, it will be immigrated to the target memcg's deferred split queue too. Reuse the second tail page's deferred_list for per memcg list since the same THP can't be on multiple deferred split queues. [yang.shi@linux.alibaba.com: simplify deferred split queue dereference per Kirill Tkhai] Link: http://lkml.kernel.org/r/1566496227-84952-5-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1565144277-36240-5-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: shrinker: make shrinker not depend on memcg kmemYang Shi2-9/+17
Currently shrinker is just allocated and can work when memcg kmem is enabled. But, THP deferred split shrinker is not slab shrinker, it doesn't make too much sense to have such shrinker depend on memcg kmem. It should be able to reclaim THP even though memcg kmem is disabled. Introduce a new shrinker flag, SHRINKER_NONSLAB, for non-slab shrinker. When memcg kmem is disabled, just such shrinkers can be called in shrinking memcg slab. [yang.shi@linux.alibaba.com: add comment] Link: http://lkml.kernel.org/r/1566496227-84952-4-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1565144277-36240-4-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: thp: extract split_queue_* into a structYang Shi1-3/+9
Patch series "Make deferred split shrinker memcg aware", v6. Currently THP deferred split shrinker is not memcg aware, this may cause premature OOM with some configuration. For example the below test would run into premature OOM easily: $ cgcreate -g memory:thp $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes $ cgexec -g memory:thp transhuge-stress 4000 transhuge-stress comes from kernel selftest. It is easy to hit OOM, but there are still a lot THP on the deferred split queue, memcg direct reclaim can't touch them since the deferred split shrinker is not memcg aware. Convert deferred split shrinker memcg aware by introducing per memcg deferred split queue. The THP should be on either per node or per memcg deferred split queue if it belongs to a memcg. When the page is immigrated to the other memcg, it will be immigrated to the target memcg's deferred split queue too. Reuse the second tail page's deferred_list for per memcg list since the same THP can't be on multiple deferred split queues. Make deferred split shrinker not depend on memcg kmem since it is not slab. It doesn't make sense to not shrink THP even though memcg kmem is disabled. With the above change the test demonstrated above doesn't trigger OOM even though with cgroup.memory=nokmem. This patch (of 4): Put split_queue, split_queue_lock and split_queue_len into a struct in order to reduce code duplication when we convert deferred_split to memcg aware in the later patches. Link: http://lkml.kernel.org/r/1565144277-36240-2-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Suggested-by: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm,thp: avoid writes to file with THP in pagecacheSong Liu1-0/+32
In previous patch, an application could put part of its text section in THP via madvise(). These THPs will be protected from writes when the application is still running (TXTBSY). However, after the application exits, the file is available for writes. This patch avoids writes to file THP by dropping page cache for the file when the file is open for write. A new counter nr_thps is added to struct address_space. In do_dentry_open(), if the file is open for write and nr_thps is non-zero, we drop page cache for the whole file. Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm,thp: stats for file backed THPSong Liu1-0/+2
In preparation for non-shmem THP, this patch adds a few stats and exposes them in /proc/meminfo, /sys/bus/node/devices/<node>/meminfo, and /proc/<pid>/task/<tid>/smaps. This patch is mostly a rewrite of Kirill A. Shutemov's earlier version: https://lkml.kernel.org/r/20170126115819.58875-5-kirill.shutemov@linux.intel.com/ Link: http://lkml.kernel.org/r/20190801184244.3169074-5-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25thp: update split_huge_page_pmd() commentKefeng Wang1-3/+2
According to 78ddc5347341 ("thp: rename split_huge_page_pmd() to split_huge_pmd()"), update related comment. Link: http://lkml.kernel.org/r/20190731033406.185285-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm, compaction: raise compaction priority after it withdrawnsVlastimil Babka1-5/+17
Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours when should_compact_retry() would return true more often then it should. Specifically, this was in the case where compact_result was COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being made." The problem is that the compaction_withdrawn() test in should_compact_retry() includes compaction outcomes that are only possible on low compaction priority, and results in a retry without increasing the priority. This may result in furter reclaim, and more incomplete compaction attempts. With this patch, compaction priority is raised when possible, or should_compact_retry() returns false. The COMPACT_SKIPPED result doesn't really fit together with the other outcomes in compaction_withdrawn(), as that's a result caused by insufficient order-0 pages, not due to low compaction priority. With this patch, it is moved to a new compaction_needs_reclaim() function, and for that outcome we keep the current logic of retrying if it looks like reclaim will be able to help. Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm/vmalloc: modify struct vmap_area to reduce its sizePengfei Li1-7/+13
Objective --------- The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Description ----------- 1) Pack "subtree_max_size", "vm" and "purge_list". This is no problem because A) "subtree_max_size" is only used when vmap_area is in "free" tree B) "vm" is only used when vmap_area is in "busy" tree C) "purge_list" is only used when vmap_area is in vmap_purge_list 2) Eliminate "flags". ;Since only one flag VM_VM_AREA is being used, and the same thing can be done by judging whether "vm" is NULL, then the "flags" can be eliminated. Link: http://lkml.kernel.org/r/20190716152656.12255-3-lpf.vector@gmail.com Signed-off-by: Pengfei Li <lpf.vector@gmail.com> Suggested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Roman Gushchin <guro@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25drivers/base/memory.c: don't store end_section_nr in memory blocksDavid Hildenbrand1-1/+0
Each memory block spans the same amount of sections/pages/bytes. The size is determined before the first memory block is created. No need to store what we can easily calculate - and the calculations even look simpler now. Michal brought up the idea of variable-sized memory blocks. However, if we ever implement something like this, we will need an API compatibility switch and reworks at various places (most code assumes a fixed memory block size). So let's cleanup what we have right now. While at it, fix the variable naming in register_mem_sect_under_node() - we no longer talk about a single section. Link: http://lkml.kernel.org/r/20190809110200.2746-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25driver/base/memory.c: validate memory block size earlyDavid Hildenbrand1-3/+3
Let's validate the memory block size early, when initializing the memory device infrastructure. Fail hard in case the value is not suitable. As nobody checks the return value of memory_dev_init(), turn it into a void function and fail with a panic in all scenarios instead. Otherwise, we'll crash later during boot when core/drivers expect that the memory device infrastructure (including memory_block_size_bytes()) works as expected. I think long term, we should move the whole memory block size configuration (set_memory_block_size_order() and memory_block_size_bytes()) into drivers/base/memory.c. Link: http://lkml.kernel.org/r/20190806090142.22709-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: consolidate pgtable_cache_init() and pgd_cache_init()Mike Rapoport1-1/+1
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem cache for page table allocations on several architectures that do not use PAGE_SIZE tables for one or more levels of the page table hierarchy. Most architectures do not implement these functions and use __weak default NOP implementation of pgd_cache_init(). Since there is no such default for pgtable_cache_init(), its empty stub is duplicated among most architectures. Rename the definitions of pgd_cache_init() to pgtable_cache_init() and drop empty stubs of pgtable_cache_init(). Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: remove quicklist page table cachesNicholas Piggin2-99/+0
Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm/gup: add make_dirty arg to put_user_pages_dirty_lock()akpm@linux-foundation.org1-2/+3
[11~From: John Hubbard <jhubbard@nvidia.com> Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock() Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()", v3. There are about 50+ patches in my tree [2], and I'll be sending out the remaining ones in a few more groups: * The block/bio related changes (Jerome mostly wrote those, but I've had to move stuff around extensively, and add a little code) * mm/ changes * other subsystem patches * an RFC that shows the current state of the tracking patch set. That can only be applied after all call sites are converted, but it's good to get an early look at it. This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). This patch (of 3): Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. [jhubbard@nvidia.com: remove unused variable in siw_free_plist()] Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: page cache: store only head pages in i_pagesMatthew Wilcox (Oracle)1-0/+10
Transparent Huge Pages are currently stored in i_pages as pointers to consecutive subpages. This patch changes that to storing consecutive pointers to the head page in preparation for storing huge pages more efficiently in i_pages. Large parts of this are "inspired" by Kirill's patch https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/ Kirill and Huang Ying contributed several fixes. [willy@infradead.org: use compound_nr, squish uninit-var warning] Link: http://lkml.kernel.org/r/20190731210400.7419-1-willy@infradead.org Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Kirill Shutemov <kirill@shutemov.name> Reviewed-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Tested-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Tested-by: Qian Cai <cai@lca.pw> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>