summaryrefslogtreecommitdiff
path: root/mm/slub.c
AgeCommit message (Collapse)AuthorFilesLines
2022-10-04mm: kmsan: call KMSAN hooks from SLUB codeAlexander Potapenko1-0/+17
In order to report uninitialized memory coming from heap allocations KMSAN has to poison them unless they're created with __GFP_ZERO. It's handy that we need KMSAN hooks in the places where init_on_alloc/init_on_free initialization is performed. In addition, we apply __no_kmsan_checks to get_freepointer_safe() to suppress reports when accessing freelist pointers that reside in freed objects. Link: https://lkml.kernel.org/r/20220915150417.722975-16-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12kfence: add sysfs interface to disable kfence for selected slabs.Imran Khan1-0/+26
By default kfence allocation can happen for any slab object, whose size is up to PAGE_SIZE, as long as that allocation is the first allocation after expiration of kfence sample interval. But in certain debugging scenarios we may be interested in debugging corruptions involving some specific slub objects like dentry or ext4_* etc. In such cases limiting kfence for allocations involving only specific slub objects will increase the probablity of catching the issue since kfence pool will not be consumed by other slab objects. This patch introduces a sysfs interface '/sys/kernel/slab/<name>/skip_kfence' to disable kfence for specific slabs. Having the interface work in this way does not impact current/default behavior of kfence and allows us to use kfence for specific slabs (when needed) as well. The decision to skip/use kfence is taken depending on whether kmem_cache.flags has (newly introduced) SLAB_SKIP_KFENCE flag set or not. Link: https://lkml.kernel.org/r/20220814195353.2540848-1-imran.f.khan@oracle.com Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-20mm/sl[au]b: use own bulk free function when bulk alloc failedHyeonggon Yoo1-2/+2
There is no benefit to call generic bulk free function when kmem_cache_alloc_bulk() failed. Use own kmem_cache_free_bulk() instead of generic function. Note that if kmem_cache_alloc_bulk() fails to allocate first object in SLUB, size is zero. So allow passing size == 0 to kmem_cache_free_bulk() like SLAB's. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04mm: slab: optimize memcg_slab_free_hook()Muchun Song1-44/+22
Most callers of memcg_slab_free_hook() already know the slab, which could be passed to memcg_slab_free_hook() directly to reduce the overhead of an another call of virt_to_slab(). For bulk freeing of objects, the call of slab_objcgs() in the loop in memcg_slab_free_hook() is redundant as well. Rework memcg_slab_free_hook() and build_detached_freelist() to reduce those unnecessary overhead and make memcg_slab_free_hook() can handle bulk freeing in slab_free(). Move the calling site of memcg_slab_free_hook() from do_slab_free() to slab_free() for slub to make the code clearer since the logic is weird (e.g. the caller need to judge whether it needs to call memcg_slab_free_hook()). It is easy to make mistakes like missing calling of memcg_slab_free_hook() like fixes of: commit d1b2cf6cb84a ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()") commit ae085d7f9365 ("mm: kfence: fix missing objcg housekeeping for SLAB") This optimization is mainly for bulk objects freeing. The following numbers is shown for 16-object freeing. before after kmem_cache_free_bulk: ~430 ns ~400 ns The overhead is reduced by about 7% for 16-object freeing. Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/20220429123044.37885-1-songmuchun@bytedance.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04mm/tracing: add 'accounted' entry into output of allocation tracepointsVasily Averin1-10/+10
Slab caches marked with SLAB_ACCOUNT force accounting for every allocation from this cache even if __GFP_ACCOUNT flag is not passed. Unfortunately, at the moment this flag is not visible in ftrace output, and this makes it difficult to analyze the accounted allocations. This patch adds boolean "accounted" entry into trace output, and set it to 'true' for calls used __GFP_ACCOUNT flag and for allocations from caches marked with SLAB_ACCOUNT. Set it to 'false' if accounting is disabled in configs. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-07-04mm/slub: Simplify __kmem_cache_alias()Xiongwei Song1-5/+3
There is no need to do anything if sysfs_slab_alias() return nonzero value after getting a mergeable cache. Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Link: https://lore.kernel.org/all/e5ebc952-af17-321f-5343-bc914d47c931@suse.cz/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-06-13mm/slub: add missing TID updates on slab deactivationJann Horn1-0/+2
The fastpath in slab_alloc_node() assumes that c->slab is stable as long as the TID stays the same. However, two places in __slab_alloc() currently don't update the TID when deactivating the CPU slab. If multiple operations race the right way, this could lead to an object getting lost; or, in an even more unlikely situation, it could even lead to an object being freed onto the wrong slab's freelist, messing up the `inuse` counter and eventually causing a page to be freed to the page allocator while it still contains slab objects. (I haven't actually tested these cases though, this is just based on looking at the code. Writing testcases for this stuff seems like it'd be a pain...) The race leading to state inconsistency is (all operations on the same CPU and kmem_cache): - task A: begin do_slab_free(): - read TID - read pcpu freelist (==NULL) - check `slab == c->slab` (true) - [PREEMPT A->B] - task B: begin slab_alloc_node(): - fastpath fails (`c->freelist` is NULL) - enter __slab_alloc() - slub_get_cpu_ptr() (disables preemption) - enter ___slab_alloc() - take local_lock_irqsave() - read c->freelist as NULL - get_freelist() returns NULL - write `c->slab = NULL` - drop local_unlock_irqrestore() - goto new_slab - slub_percpu_partial() is NULL - get_partial() returns NULL - slub_put_cpu_ptr() (enables preemption) - [PREEMPT B->A] - task A: finish do_slab_free(): - this_cpu_cmpxchg_double() succeeds() - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL] From there, the object on c->freelist will get lost if task B is allowed to continue from here: It will proceed to the retry_load_slab label, set c->slab, then jump to load_freelist, which clobbers c->freelist. But if we instead continue as follows, we get worse corruption: - task A: run __slab_free() on object from other struct slab: - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial) - task A: run slab_alloc_node() with NUMA node constraint: - fastpath fails (c->slab is NULL) - call __slab_alloc() - slub_get_cpu_ptr() (disables preemption) - enter ___slab_alloc() - c->slab is NULL: goto new_slab - slub_percpu_partial() is non-NULL - set c->slab to slub_percpu_partial(c) - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects from slab-2] - goto redo - node_match() fails - goto deactivate_slab - existing c->freelist is passed into deactivate_slab() - inuse count of slab-1 is decremented to account for object from slab-2 At this point, the inuse count of slab-1 is 1 lower than it should be. This means that if we free all allocated objects in slab-1 except for one, SLUB will think that slab-1 is completely unused, and may free its page, leading to use-after-free. Fixes: c17dda40a6a4e ("slub: Separate out kmem_cache_cpu processing from deactivate_slab") Fixes: 03e404af26dc2 ("slub: fast release on full slab") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
2022-06-13mm/slub: Move the stackdepot related allocation out of IRQ-off section.Sebastian Andrzej Siewior1-7/+34
The set_track() invocation in free_debug_processing() is invoked with acquired slab_lock(). The lock disables interrupts on PREEMPT_RT and this forbids to allocate memory which is done in stack_depot_save(). Split set_track() into two parts: set_track_prepare() which allocate memory and set_track_update() which only performs the assignment of the trace data structure. Use set_track_prepare() before disabling interrupts. [ vbabka@suse.cz: make set_track() call set_track_update() instead of open-coded assignments ] Fixes: 5cf909c553e9e ("mm/slub: use stackdepot to save stack trace in objects") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/Yp9sqoUi4fVa5ExF@linutronix.de
2022-05-25Merge tag 'slab-for-5.19' of ↵Linus Torvalds1-66/+108
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Conversion of slub_debug stack traces to stackdepot, allowing more useful debugfs-based inspection for e.g. memory leak debugging. Allocation and free debugfs info now includes full traces and is sorted by the unique trace frequency. The stackdepot conversion was already attempted last year but reverted by ae14c63a9f20. The memory overhead (while not actually enabled on boot) has been meanwhile solved by making the large stackdepot allocation dynamic. The xfstest issues haven't been reproduced on current kernel locally nor in -next, so the slab cache layout changes that originally made that bug manifest were probably not the root cause. - Refactoring of dma-kmalloc caches creation. - Trivial cleanups such as removal of unused parameters, fixes and clarifications of comments. - Hyeonggon Yoo joins as a reviewer. * tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: MAINTAINERS: add myself as reviewer for slab mm/slub: remove unused kmem_cache_order_objects max mm: slab: fix comment for __assume_kmalloc_alignment mm: slab: fix comment for ARCH_KMALLOC_MINALIGN mm/slub: remove unneeded return value of slab_pad_check mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache() mm/slub: remove meaningless node check in ___slab_alloc() mm/slub: remove duplicate flag in allocate_slab() mm/slub: remove unused parameter in setup_object*() mm/slab.c: fix comments slab, documentation: add description of debugfs files for SLUB caches mm/slub: sort debugfs output by frequency of stack traces mm/slub: distinguish and print stack traces in debugfs files mm/slub: use stackdepot to save stack trace in objects mm/slub: move struct track init out of set_track() lib/stackdepot: allow requesting early initialization dynamically mm/slub, kunit: Make slub_kunit unaffected by user specified flags mm/slab: remove some unused functions
2022-05-23Merge branches 'slab/for-5.19/stackdepot' and 'slab/for-5.19/refactor' into ↵Vlastimil Babka1-46/+91
slab/for-linus
2022-05-02mm/slub: remove unused kmem_cache_order_objects maxMiaohe Lin1-2/+0
max field holds the largest slab order that was ever used for a slab cache. But it's unused now. Remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220429090545.33413-1-linmiaohe@huawei.com
2022-04-20mm/slub: remove unneeded return value of slab_pad_checkMiaohe Lin1-7/+5
The return value of slab_pad_check is never used. So we can make it return void now. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220419120352.37825-1-linmiaohe@huawei.com
2022-04-16mm, kfence: support kmem_dump_obj() for KFENCE objectsMarco Elver1-1/+1
Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-13mm/slub: remove meaningless node check in ___slab_alloc()JaeSang Yoo1-1/+0
node_match() with node=NUMA_NO_NODE always returns 1. Duplicate check by goto statement is meaningless. Remove it. Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220409144239.2649257-1-jsyoo5b@gmail.com
2022-04-13mm/slub: remove duplicate flag in allocate_slab()Jiyoup Kim1-1/+1
In allocate_slab(), __GFP_NOFAIL flag is removed twice when trying higher-order allocation. Remove it. Signed-off-by: Jiyoup Kim <lakroforce@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220409150538.1264-1-lakroforce@gmail.com
2022-04-13mm/slub: remove unused parameter in setup_object*()JaeSang Yoo1-11/+8
setup_object_debug() and setup_object() has unused parameter, "struct slab *slab". Remove it. By the commit 3ec0974210fe ("SLUB: Simplify debug code"), setup_object_debug() were introduced to refactor previous code blocks in the setup_object(). Previous code used SlabDebug() to init_object() and init_tracking(). As the SlabDebug() takes "struct page *page" as argument, the setup_object_debug() checks flag of "struct kmem_cache *s" which doesn't require "struct page *page". As the struct page were changed into struct slab by commit bb192ed9aa719 ("mm/slub: Convert most struct page to struct slab by spatch"), but it's still unused parameter. Suggested-by: Ohhoon Kwon <ohkwon1043@gmail.com> Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220411072534.3372768-1-jsyoo5b@gmail.com
2022-04-06mm/slub: sort debugfs output by frequency of stack tracesOliver Glitta1-0/+16
Sort the output of debugfs alloc_traces and free_traces by the frequency of allocation/freeing stack traces. Most frequently used stack traces will be printed first, e.g. for easier memory leak debugging. Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com>
2022-04-06mm/slub: distinguish and print stack traces in debugfs filesOliver Glitta1-2/+26
Aggregate objects in slub cache by unique stack trace in addition to caller address when producing contents of debugfs files alloc_traces and free_traces in debugfs. Also add the stack traces to the debugfs output. This makes it much more useful to e.g. debug memory leaks. Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-04-06mm/slub: use stackdepot to save stack trace in objectsOliver Glitta1-31/+40
Many stack traces are similar so there are many similar arrays. Stackdepot saves each unique stack only once. Replace field addrs in struct track with depot_stack_handle_t handle. Use stackdepot to save stack trace. The benefits are smaller memory overhead and possibility to aggregate per-cache statistics in the following patch using the stackdepot handle instead of matching stacks manually. [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ] This was initially merged as commit 788691464c29 and reverted by commit ae14c63a9f20 due to several issues, that should now be fixed. The problem of unconditional memory overhead by stackdepot has been addressed by commit 2dba5eb1c73b ("lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()"), so the dependency on stackdepot will result in extra memory usage only when a slab cache tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds. The build failures on some architectures were also addressed, and the reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this patch. Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
2022-04-06mm/slub: move struct track init out of set_track()Vlastimil Babka1-17/+15
set_track() either zeroes out the struct track or fills it, depending on the addr parameter. This is unnecessary as there's only one place that calls it for the initialization - init_tracking(). We can simply do the zeroing there, with a single memset() that covers both TRACK_ALLOC and TRACK_FREE as they are adjacent. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com>
2022-04-06mm/slub, kunit: Make slub_kunit unaffected by user specified flagsHyeonggon Yoo1-0/+3
slub_kunit does not expect other debugging flags to be set when running tests. When SLAB_RED_ZONE flag is set globally, test fails because the flag affects number of errors reported. To make slub_kunit unaffected by user specified debugging flags, introduce SLAB_NO_USER_FLAGS to ignore them. With this flag, only flags specified in the code are used and others are ignored. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/Yk0sY9yoJhFEXWOg@hyeyoo
2022-03-23Merge tag 'slab-for-5.18' of ↵Linus Torvalds1-78/+52
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - A few non-trivial SLUB code cleanups, most notably a refactoring of deactivate_slab(). - A bunch of trivial changes, such as removal of unused parameters, making stuff static, and employing helper functions. * tag 'slab-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm: slub: Delete useless parameter of alloc_slab_page() mm: slab: Delete unused SLAB_DEACTIVATED flag mm/slub: remove forced_order parameter in calculate_sizes mm/slub: refactor deactivate_slab() mm/slub: limit number of node partial slabs only in cache creation mm/slub: use helper macro __ATTR_XX_MODE for SLAB_ATTR(_RO) mm/slab_common: use helper function is_power_of_2() mm/slob: make kmem_cache_boot static
2022-03-23mm: introduce kmem_cache_alloc_lruMuchun Song1-14/+28
We currently allocate scope for every memcg to be able to tracked on every superblock instantiated in the system, regardless of whether that superblock is even accessible to that memcg. These huge memcg counts come from container hosts where memcgs are confined to just a small subset of the total number of superblocks that instantiated at any given point in time. For these systems with huge container counts, list_lru does not need the capability of tracking every memcg on every superblock. What it comes down to is that adding the memcg to the list_lru at the first insert. So introduce kmem_cache_alloc_lru to allocate objects and its list_lru. In the later patch, we will convert all inode and dentry allocation from kmem_cache_alloc to kmem_cache_alloc_lru. Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-21Merge branch 'slab/for-5.18/cleanups' into slab/for-linusVlastimil Babka1-63/+42
Non-trivial SLUB code cleanups, notably refactoring of deactivate_slab().
2022-03-10mm: slub: Delete useless parameter of alloc_slab_page()Xiongwei Song1-4/+4
The parameter @s is useless for alloc_slab_page(). It was added in 2014 by commit 5dfb41750992 ("sl[au]b: charge slabs to kmemcg explicitly"). The need for it was removed in 2020 by commit 1f3147b49d75 ("mm: slub: call account_slab_page() after slab page initialization"). Let's delete it. [willy@infradead.org: Added detailed history of @s] Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220310140701.87908-3-sxwjean@me.com
2022-03-09mm/slub: remove forced_order parameter in calculate_sizesMiaohe Lin1-7/+4
Since commit 32a6f409b693 ("mm, slub: remove runtime allocation order changes"), forced_order is always -1. Remove this unneeded parameter to simplify the code. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220309092036.50844-1-linmiaohe@huawei.com
2022-03-09mm/slub: refactor deactivate_slab()Hyeonggon Yoo1-52/+39
Simplify deactivate_slab() by unlocking n->list_lock and retrying cmpxchg_double() when cmpxchg_double() fails, and perform add_{partial,full} only when it succeed. Releasing and taking n->list_lock again here is not harmful as SLUB avoids deactivating slabs as much as possible. [ vbabka@suse.cz: perform add_{partial,full} when cmpxchg_double() succeed. count deactivating full slabs even if debugging flag is not set. ] Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220307074057.902222-3-42.hyeyoo@gmail.com
2022-03-09mm/slub: limit number of node partial slabs only in cache creationHyeonggon Yoo1-11/+3
SLUB sets number of minimum partial slabs for node (min_partial) using set_min_partial(). SLUB holds at least min_partial slabs even if they're empty to avoid excessive use of page allocator. set_min_partial() limits value of min_partial limits value of min_partial MIN_PARTIAL and MAX_PARTIAL. As set_min_partial() can be called by min_partial_store() too, Only limit value of min_partial in kmem_cache_open() so that it can be changed to value that a user wants. [ rientjes@google.com: Fold set_min_partial() into its callers ] Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220307074057.902222-2-42.hyeyoo@gmail.com
2022-03-07mm/slub: use helper macro __ATTR_XX_MODE for SLAB_ATTR(_RO)Lianjie Zhang1-4/+2
This allows more concise code, and VERIFY_OCTAL_PERMISSIONS() can help validate any future change. Signed-off-by: Lianjie Zhang <zhanglianjie@uniontech.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220306073818.15089-1-zhanglianjie@uniontech.com
2022-01-06mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabledVlastimil Babka1-2/+6
The fields 'next' and 'slabs' are only used when CONFIG_SLUB_CPU_PARTIAL is enabled. We can put their definition to #ifdef to prevent accidental use when disabled. Currenlty show_slab_objects() and slabs_cpu_partial_show() contain code accessing the slabs field that's effectively dead with CONFIG_SLUB_CPU_PARTIAL=n through the wrappers slub_percpu_partial() and slub_percpu_partial_read_once(), but to prevent a compile error, we need to hide all this code behind #ifdef. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm/kasan: Convert to struct folio and struct slabMatthew Wilcox (Oracle)1-1/+1
KASAN accesses some slab related struct page fields so we need to convert it to struct slab. Some places are a bit simplified thanks to kasan_addr_to_slab() encapsulating the PageSlab flag check through virt_to_slab(). When resolving object address to either a real slab or a large kmalloc, use struct folio as the intermediate type for testing the slab flag to avoid unnecessary implicit compound_head(). [ vbabka@suse.cz: use struct folio, adjust to differences in previous patches ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Hyeongogn Yoo <42.hyeyoo@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <kasan-dev@googlegroups.com>
2022-01-06mm: Convert struct page to struct slab in functions used by other subsystemsVlastimil Babka1-1/+1
KASAN, KFENCE and memcg interact with SLAB or SLUB internals through functions nearest_obj(), obj_to_index() and objs_per_slab() that use struct page as parameter. This patch converts it to struct slab including all callers, through a coccinelle semantic patch. // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c // Note: needs coccinelle 1.1.1 to avoid breaking whitespace @@ @@ -objs_per_slab_page( +objs_per_slab( ... ) { ... } @@ @@ -objs_per_slab_page( +objs_per_slab( ... ) @@ identifier fn =~ "obj_to_index|objs_per_slab"; @@ fn(..., - const struct page *page + const struct slab *slab ,...) { <... ( - page_address(page) + slab_address(slab) | - page + slab ) ...> } @@ identifier fn =~ "nearest_obj"; @@ fn(..., - struct page *page + const struct slab *slab ,...) { <... ( - page_address(page) + slab_address(slab) | - page + slab ) ...> } @@ identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab"; expression E; @@ fn(..., ( - slab_page(E) + E | - virt_to_page(E) + virt_to_slab(E) | - virt_to_head_page(E) + virt_to_slab(E) | - page + page_slab(page) ) ,...) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <kasan-dev@googlegroups.com> Cc: <cgroups@vger.kernel.org>
2022-01-06mm/slub: Finish struct page to struct slab conversionVlastimil Babka1-53/+52
Update comments mentioning pages to mention slabs where appropriate. Also some goto labels. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Convert most struct page to struct slab by spatchVlastimil Babka1-436/+436
The majority of conversion from struct page to struct slab in SLUB internals can be delegated to a coccinelle semantic patch. This includes renaming of variables with 'page' in name to 'slab', and similar. Big thanks to Julia Lawall and Luis Chamberlain for help with coccinelle. // Options: --include-headers --no-includes --smpl-spacing include/linux/slub_def.h mm/slub.c // Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the // embedded script // build list of functions to exclude from applying the next rule @initialize:ocaml@ @@ let ok_function p = not (List.mem (List.hd p).current_element ["nearest_obj";"obj_to_index";"objs_per_slab_page";"__slab_lock";"__slab_unlock";"free_nonslab_page";"kmalloc_large_node"]) // convert the type from struct page to struct page in all functions except the // list from previous rule // this also affects struct kmem_cache_cpu, but that's ok @@ position p : script:ocaml() { ok_function p }; @@ - struct page@p + struct slab // in struct kmem_cache_cpu, change the name from page to slab // the type was already converted by the previous rule @@ @@ struct kmem_cache_cpu { ... -struct slab *page; +struct slab *slab; ... } // there are many places that use c->page which is now c->slab after the // previous rule @@ struct kmem_cache_cpu *c; @@ -c->page +c->slab @@ @@ struct kmem_cache { ... - unsigned int cpu_partial_pages; + unsigned int cpu_partial_slabs; ... } @@ struct kmem_cache *s; @@ - s->cpu_partial_pages + s->cpu_partial_slabs @@ @@ static void - setup_page_debug( + setup_slab_debug( ...) {...} @@ @@ - setup_page_debug( + setup_slab_debug( ...); // for all functions (with exceptions), change any "struct slab *page" // parameter to "struct slab *slab" in the signature, and generally all // occurences of "page" to "slab" in the body - with some special cases. @@ identifier fn !~ "free_nonslab_page|obj_to_index|objs_per_slab_page|nearest_obj"; @@ fn(..., - struct slab *page + struct slab *slab ,...) { <... - page + slab ...> } // similar to previous but the param is called partial_page @@ identifier fn; @@ fn(..., - struct slab *partial_page + struct slab *partial_slab ,...) { <... - partial_page + partial_slab ...> } // similar to previous but for functions that take pointer to struct page ptr @@ identifier fn; @@ fn(..., - struct slab **ret_page + struct slab **ret_slab ,...) { <... - ret_page + ret_slab ...> } // functions converted by previous rules that were temporarily called using // slab_page(E) so we want to remove the wrapper now that they accept struct // slab ptr directly @@ identifier fn =~ "slab_free|do_slab_free"; expression E; @@ fn(..., - slab_page(E) + E ,...) // similar to previous but for another pattern @@ identifier fn =~ "slab_pad_check|check_object"; @@ fn(..., - folio_page(folio, 0) + slab ,...) // functions that were returning struct page ptr and now will return struct // slab ptr, including slab_page() wrapper removal @@ identifier fn =~ "allocate_slab|new_slab"; expression E; @@ static -struct slab * +struct slab * fn(...) { <... - slab_page(E) + E ...> } // rename any former struct page * declarations @@ @@ struct slab * ( - page + slab | - partial_page + partial_slab | - oldpage + oldslab ) ; // this has to be separate from previous rule as page and page2 appear at the // same line @@ @@ struct slab * -page2 +slab2 ; // similar but with initial assignment @@ expression E; @@ struct slab * ( - page + slab | - flush_page + flush_slab | - discard_page + slab_to_discard | - page_to_unfreeze + slab_to_unfreeze ) = E; // convert most of struct page to struct slab usage inside functions (with // exceptions), including specific variable renames @@ identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node"; expression E; @@ fn(...) { <... ( - int pages; + int slabs; | - int pages = E; + int slabs = E; | - page + slab | - flush_page + flush_slab | - partial_page + partial_slab | - oldpage->pages + oldslab->slabs | - oldpage + oldslab | - unsigned int nr_pages; + unsigned int nr_slabs; | - nr_pages + nr_slabs | - unsigned int partial_pages = E; + unsigned int partial_slabs = E; | - partial_pages + partial_slabs ) ...> } // this has to be split out from the previous rule so that lines containing // multiple matching changes will be fully converted @@ identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node"; @@ fn(...) { <... ( - slab->pages + slab->slabs | - pages + slabs | - page2 + slab2 | - discard_page + slab_to_discard | - page_to_unfreeze + slab_to_unfreeze ) ...> } // after we simply changed all occurences of page to slab, some usages need // adjustment for slab-specific functions, or use slab_page() wrapper @@ identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node"; @@ fn(...) { <... ( - page_slab(slab) + slab | - kasan_poison_slab(slab) + kasan_poison_slab(slab_page(slab)) | - page_address(slab) + slab_address(slab) | - page_size(slab) + slab_size(slab) | - PageSlab(slab) + folio_test_slab(slab_folio(slab)) | - page_to_nid(slab) + slab_nid(slab) | - compound_order(slab) + slab_order(slab) ) ...> } Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Luis Chamberlain <mcgrof@kernel.org>
2022-01-06mm/slub: Convert pfmemalloc_match() to take a struct slabMatthew Wilcox (Oracle)1-19/+6
Preparatory for mass conversion. Use the new slab_test_pfmemalloc() helper. As it doesn't do VM_BUG_ON(!PageSlab()) we no longer need the pfmemalloc_match_unsafe() variant. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm/slub: Convert __free_slab() to use struct slabVlastimil Babka1-14/+13
__free_slab() is on the boundary of distinguishing struct slab and struct page so start with struct slab but convert to folio for working with flags and folio_page() to call functions that require struct page. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm/slub: Convert alloc_slab_page() to return a struct slabVlastimil Babka1-10/+16
Preparatory, callers convert back to struct page for now. Also move setting page flags to alloc_slab_page() where we still operate on a struct page. This means the page->slab_cache pointer is now set later than the PageSlab flag, which could theoretically confuse some pfn walker assuming PageSlab means there would be a valid cache pointer. But as the code had no barriers and used __set_bit() anyway, it could have happened already, so there shouldn't be such a walker. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm/slub: Convert print_page_info() to print_slab_info()Matthew Wilcox (Oracle)1-6/+7
Improve the type safety and prepare for further conversion. For flags access, convert to folio internally. [ vbabka@suse.cz: access flags via folio_flags() ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06mm/slub: Convert __slab_lock() and __slab_unlock() to struct slabVlastimil Babka1-7/+11
These functions operate on the PG_locked page flag, but make them accept struct slab to encapsulate this implementation detail. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Convert kfree() to use a struct slabMatthew Wilcox (Oracle)1-13/+16
Convert kfree(), kmem_cache_free() and ___cache_free() to resolve object addresses to struct slab, using folio as intermediate step where needed. Keep passing the result as struct page for now in preparation for mass conversion of internal functions. [ vbabka@suse.cz: Use folio as intermediate step when checking for large kmalloc pages, and when freeing them - rename free_nonslab_page() to free_large_kmalloc() that takes struct folio ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Convert detached_freelist to use a struct slabMatthew Wilcox (Oracle)1-14/+17
This gives us a little bit of extra typesafety as we know that nobody called virt_to_page() instead of virt_to_head_page(). [ vbabka@suse.cz: Use folio as intermediate step when filtering out large kmalloc pages ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert check_heap_object() to use struct slabMatthew Wilcox (Oracle)1-5/+5
Ensure that we're not seeing a tail page inside __check_heap_object() by converting to a slab instead of a page. Take the opportunity to mark the slab as const since we're not modifying it. Also move the declaration of __check_heap_object() to mm/slab.h so it's not available to the wider kernel. [ vbabka@suse.cz: in check_heap_object() only convert to struct slab for actual PageSlab pages; use folio as intermediate step instead of page ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Use struct slab in kmem_obj_info()Matthew Wilcox (Oracle)1-6/+7
All three implementations of slab support kmem_obj_info() which reports details of an object allocated from the slab allocator. By using the slab type instead of the page type, we make it obvious that this can only be called for slabs. [ vbabka@suse.cz: also convert the related kmem_valid_obj() to folios ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert __ksize() to struct slabMatthew Wilcox (Oracle)1-7/+5
In SLUB, use folios, and struct slab to access slab_cache field. In SLOB, use folios to properly resolve pointers beyond PAGE_SIZE offset of the object. [ vbabka@suse.cz: use folios, and only convert folio_test_slab() == true folios to struct slab ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Convert [un]account_slab_page() to struct slabMatthew Wilcox (Oracle)1-2/+2
Convert the parameter of these functions to struct slab instead of struct page and drop _page from the names. For now their callers just convert page to slab. [ vbabka@suse.cz: replace existing functions instead of calling them ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm: Split slab into its own typeMatthew Wilcox (Oracle)1-4/+4
Make struct slab independent of struct page. It still uses the underlying memory in struct page for storing slab-specific data, but slab and slub can now be weaned off using struct page directly. Some of the wrapper functions (slab_address() and slab_order()) still need to cast to struct folio, but this is a significant disentanglement. [ vbabka@suse.cz: Rebase on folios, use folio instead of page where possible. Do not duplicate flags field in struct slab, instead make the related accessors go through slab_folio(). For testing pfmemalloc use the folio_*_active flag accessors directly so the PageSlabPfmemalloc wrappers can be removed later. Make folio_slab() expect only folio_test_slab() == true folios and virt_to_slab() return NULL when folio_test_slab() == false. Move struct slab to mm/slab.h. Don't represent with struct slab pages that are not true slab pages, but just a compound page obtained directly rom page allocator (with large kmalloc() for SLUB and SLOB). ] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06mm/slub: Make object_err() staticVlastimil Babka1-15/+15
There are no callers outside of mm/slub.c anymore. Move freelist_corrupted() that calls object_err() to avoid a need for forward declaration. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Roman Gushchin <guro@fb.com>
2021-12-11mm/slub: fix endianness bug for alloc/free_traces attributesGerald Schaefer1-6/+9
On big-endian s390, the alloc/free_traces attributes produce endless output, because of always 0 idx in slab_debugfs_show(). idx is de-referenced from *v, which points to a loff_t value, with unsigned int idx = *(unsigned int *)v; This will only give the upper 32 bits on big-endian, which remain 0. Instead of only fixing this de-reference, during discussion it seemed more appropriate to change the seq_ops so that they use an explicit iterator in private loc_track struct. This patch adds idx to loc_track, which will also fix the endianness bug. Link: https://lore.kernel.org/r/20211117193932.4049412-1-gerald.schaefer@linux.ibm.com Link: https://lkml.kernel.org/r/20211126171848.17534-1-gerald.schaefer@linux.ibm.com Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reported-by: Steffen Maier <maier@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-20mm: emit the "free" trace report before freeing memory in kmem_cache_free()Yunfeng Ye1-1/+1
After the memory is freed, it can be immediately allocated by other CPUs, before the "free" trace report has been emitted. This causes inaccurate traces. For example, if the following sequence of events occurs: CPU 0 CPU 1 (1) alloc xxxxxx (2) free xxxxxx (3) alloc xxxxxx (4) free xxxxxx Then they will be inaccurately reported via tracing, so that they appear to have happened in this order: CPU 0 CPU 1 (1) alloc xxxxxx (2) alloc xxxxxx (3) free xxxxxx (4) free xxxxxx This makes it look like CPU 1 somehow managed to allocate memory that CPU 0 still had allocated for itself. In order to avoid this, emit the "free xxxxxx" tracing report just before the actual call to free the memory, instead of just after it. Link: https://lkml.kernel.org/r/374eb75d-7404-8721-4e1e-65b0e5b17279@huawei.com Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-46/+63
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...