summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-12-11slub, kasan: improve interaction of KASAN and slub_debug poisoningAndrey Konovalov1-15/+26
When both KASAN and slub_debug are enabled, when a free object is being prepared in setup_object, slub_debug poisons the object data before KASAN initializes its per-object metadata. Right now, in setup_object, KASAN only initializes the alloc metadata, which is always stored outside of the object. slub_debug is aware of this and it skips poisoning and checking that memory area. However, with the following patch in this series, KASAN also starts initializing its free medata in setup_object. As this metadata might be stored within the object, this initialization might overwrite the slub_debug poisoning. This leads to slub_debug reports. Thus, skip checking slub_debug poisoning of the object data area that overlaps with the in-object KASAN free metadata. Also make slub_debug poisoning of tail kmalloc redzones more precise when KASAN is enabled: slub_debug can still poison and check the tail kmalloc allocation area that comes after the KASAN free metadata. Link: https://lkml.kernel.org/r/20231122231202.121277-1-andrey.konovalov@linux.dev Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kasan: use stack_depot_put for tag-based modesAndrey Konovalov1-2/+8
Make tag-based KASAN modes evict stack traces from the stack depot once they are evicted from the stack ring. Internally, pass STACK_DEPOT_FLAG_GET to stack_depot_save_flags (via kasan_save_stack) to increment the refcount when saving a new entry to stack ring and call stack_depot_put when removing an entry from stack ring. Link: https://lkml.kernel.org/r/b4773e5c1b0b9df6826ec0b65c1923feadfa78e5.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kasan: check object_size in kasan_complete_mode_report_infoAndrey Konovalov1-1/+3
Check the object size when looking up entries in the stack ring. If the size of the object for which a report is being printed does not match the size of the object for which a stack trace has been saved in the stack ring, the saved stack trace is irrelevant. Link: https://lkml.kernel.org/r/68c6948175aadd7e7e7deea61725103d64a4528f.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kasan: remove atomic accesses to stack ring entriesAndrey Konovalov2-26/+12
Remove the atomic accesses to entry fields in save_stack_info and kasan_complete_mode_report_info for tag-based KASAN modes. These atomics are not required, as the read/write lock prevents the entries from being read (in kasan_complete_mode_report_info) while being written (in save_stack_info) and the try_cmpxchg prevents the same entry from being rewritten (in save_stack_info) in the unlikely case of wrapping during writing. Link: https://lkml.kernel.org/r/29f59126d9845c5257b6c29cd7ad113b16f19f47.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: allow users to evict stack tracesAndrey Konovalov2-1/+50
Add stack_depot_put, a function that decrements the reference counter on a stack record and removes it from the stack depot once the counter reaches 0. Internally, when removing a stack record, the function unlinks it from the hash table bucket and returns to the freelist. With this change, the users of stack depot can call stack_depot_put when keeping a stack trace in the stack depot is not needed anymore. This allows avoiding polluting the stack depot with irrelevant stack traces and thus have more space to store the relevant ones before the stack depot reaches its capacity. Link: https://lkml.kernel.org/r/1d1ad5692ee43d4fc2b3fd9d221331d30b36123f.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: add refcount for recordsAndrey Konovalov2-5/+20
Add a reference counter for how many times a stack records has been added to stack depot. Add a new STACK_DEPOT_FLAG_GET flag to stack_depot_save_flags that instructs the stack depot to increment the refcount. Do not yet decrement the refcount; this is implemented in one of the following patches. Do not yet enable any users to use the flag to avoid overflowing the refcount. This is preparatory patch for implementing the eviction of stack records from the stack depot. Link: https://lkml.kernel.org/r/a3fc14a2359d019d2a008d4ff8b46a665371ffee.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot, kasan: add flags to __stack_depot_save and renameAndrey Konovalov6-25/+48
Change the bool can_alloc argument of __stack_depot_save to a u32 argument that accepts a set of flags. The following patch will add another flag to stack_depot_save_flags besides the existing STACK_DEPOT_FLAG_CAN_ALLOC. Also rename the function to stack_depot_save_flags, as __stack_depot_save is a cryptic name, Link: https://lkml.kernel.org/r/645fa15239621eebbd3a10331e5864b718839512.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kmsan: use stack_depot_save instead of __stack_depot_saveAndrey Konovalov1-4/+3
Make KMSAN use stack_depot_save instead of __stack_depot_save, as it always passes true to __stack_depot_save as the last argument. Link: https://lkml.kernel.org/r/18092240699efdc6acd78b51e41ea782953e6c8d.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: use list_head for stack record linksAndrey Konovalov1-37/+50
Switch stack_record to use list_head for links in the hash table and in the freelist. This will allow removing entries from the hash table buckets. This is preparatory patch for implementing the eviction of stack records from the stack depot. Link: https://lkml.kernel.org/r/4787d9a584cd33433d9ee1846b17fa3d3e1987ad.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: use read/write lockAndrey Konovalov1-41/+46
Currently, stack depot uses the following locking scheme: 1. Lock-free accesses when looking up a stack record, which allows to have multiple users to look up records in parallel; 2. Spinlock for protecting the stack depot pools and the hash table when adding a new record. For implementing the eviction of stack traces from stack depot, the lock-free approach is not going to work anymore, as we will need to be able to also remove records from the hash table. Convert the spinlock into a read/write lock, and drop the atomic accesses, as they are no longer required. Looking up stack traces is now protected by the read lock and adding new records - by the write lock. One of the following patches will add a new function for evicting stack records, which will be protected by the write lock as well. With this change, multiple users can still look up records in parallel. This is preparatory patch for implementing the eviction of stack records from the stack depot. Link: https://lkml.kernel.org/r/9f81ffcc4bb422ebb6326a65a770bf1918634cbb.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: store free stack records in a freelistAndrey Konovalov1-49/+82
Instead of using the global pool_offset variable to find a free slot when storing a new stack record, mainlain a freelist of free slots within the allocated stack pools. A global next_stack variable is used as the head of the freelist, and the next field in the stack_record struct is reused as freelist link (when the record is not in the freelist, this field is used as a link in the hash table). This is preparatory patch for implementing the eviction of stack records from the stack depot. Link: https://lkml.kernel.org/r/b9e4c79955c2121b69301778643b203d3fb09ccc.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: store next pool pointer in new_poolAndrey Konovalov1-1/+5
Instead of using the last pointer in stack_pools for storing the pointer to a new pool (which does not yet store any stack records), use a new new_pool variable. This a purely code readability change: it seems more logical to store the pointer to a pool with a special meaning in a dedicated variable. Link: https://lkml.kernel.org/r/448bc18296c16bef95cb3167697be6583dcc8ce3.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: rename next_pool_required to new_pool_requiredAndrey Konovalov1-25/+24
Rename next_pool_required to new_pool_required. This a purely code readability change: the following patch will change stack depot to store the pointer to the new pool in a separate variable, and "new" seems like a more logical name. Link: https://lkml.kernel.org/r/fd7cd6c6eb250c13ec5d2009d75bb4ddd1470db9.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: rework helpers for depot_alloc_stackAndrey Konovalov1-37/+49
Split code in depot_alloc_stack and depot_init_pool into 3 functions: 1. depot_keep_next_pool that keeps preallocated memory for the next pool if required. 2. depot_update_pools that moves on to the next pool if there's no space left in the current pool, uses preallocated memory for the new current pool if required, and calls depot_keep_next_pool otherwise. 3. depot_alloc_stack that calls depot_update_pools and then allocates a stack record as before. This makes it somewhat easier to follow the logic of depot_alloc_stack and also serves as a preparation for implementing the eviction of stack records from the stack depot. Link: https://lkml.kernel.org/r/71fb144d42b701fcb46708d7f4be6801a4a8270e.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: fix and clean-up atomic annotationsAndrey Konovalov1-15/+14
Drop smp_load_acquire from next_pool_required in depot_init_pool, as both depot_init_pool and the all smp_store_release's to this variable are executed under the stack depot lock. Also simplify and clean up comments accompanying the use of atomic accesses in the stack depot code. Link: https://lkml.kernel.org/r/c118ef044d8db80248d9e1f14592c72e8429e9d9.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: use fixed-sized slots for stack recordsAndrey Konovalov2-4/+19
Instead of storing stack records in stack depot pools one right after another, use fixed-sized slots. Add a new Kconfig option STACKDEPOT_MAX_FRAMES that allows to select the size of the slot in frames. Use 64 as the default value, which is the maximum stack trace size both KASAN and KMSAN use right now. Also add descriptions for other stack depot Kconfig options. This is preparatory patch for implementing the eviction of stack records from the stack depot. Link: https://lkml.kernel.org/r/dce7d030a99ff61022509665187fac45b0827298.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: add depot_fetch_stack helperAndrey Konovalov1-17/+28
Add a helper depot_fetch_stack function that fetches the pointer to a stack record. With this change, all static depot_* functions now operate on stack pools and the exported stack_depot_* functions operate on the hash table. Link: https://lkml.kernel.org/r/170d8c202f29dc8e3d5491ee074d1e9e029a46db.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: drop valid bit from handlesAndrey Konovalov1-5/+2
Stack depot doesn't use the valid bit in handles in any way, so drop it. Link: https://lkml.kernel.org/r/34969bba2ca6e012c6ad071767197dee64dc5723.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: simplify __stack_depot_saveAndrey Konovalov1-5/+4
The retval local variable in __stack_depot_save has the union type handle_parts, but the function never uses anything but the union's handle field. Define retval simply as depot_stack_handle_t to simplify the code. Link: https://lkml.kernel.org/r/3b0763c8057a1cf2f200ff250a5f9580ee36a28c.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: check disabled flag when fetchingAndrey Konovalov1-1/+1
Do not try fetching a stack trace from the stack depot if the stack_depot_disabled flag is enabled. Link: https://lkml.kernel.org/r/c3bfa3b7ab00b2e48ab75a3fbb9c67555777cb08.1700502145.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11lib/stackdepot: print disabled message only if truly disabledAndrey Konovalov1-9/+15
Patch series "stackdepot: allow evicting stack traces", v4. Currently, the stack depot grows indefinitely until it reaches its capacity. Once that happens, the stack depot stops saving new stack traces. This creates a problem for using the stack depot for in-field testing and in production. For such uses, an ideal stack trace storage should: 1. Allow saving fresh stack traces on systems with a large uptime while limiting the amount of memory used to store the traces; 2. Have a low performance impact. Implementing #1 in the stack depot is impossible with the current keep-forever approach. This series targets to address that. Issue #2 is left to be addressed in a future series. This series changes the stack depot implementation to allow evicting unneeded stack traces from the stack depot. The users of the stack depot can do that via new stack_depot_save_flags(STACK_DEPOT_FLAG_GET) and stack_depot_put APIs. Internal changes to the stack depot code include: 1. Storing stack traces in fixed-frame-sized slots (vs precisely-sized slots in the current implementation); the slot size is controlled via CONFIG_STACKDEPOT_MAX_FRAMES (default: 64 frames); 2. Keeping available slots in a freelist (vs keeping an offset to the next free slot); 3. Using a read/write lock for synchronization (vs a lock-free approach combined with a spinlock). This series also integrates the eviction functionality into KASAN: the tag-based modes evict stack traces when the corresponding entry leaves the stack ring, and Generic KASAN evicts stack traces for objects once those leave the quarantine. With KASAN, despite wasting some space on rounding up the size of each stack record, the total memory consumed by stack depot gets saturated due to the eviction of irrelevant stack traces from the stack depot. With the tag-based KASAN modes, the average total amount of memory used for stack traces becomes ~0.5 MB (with the current default stack ring size of 32k entries and the default CONFIG_STACKDEPOT_MAX_FRAMES of 64). With Generic KASAN, the stack traces take up ~1 MB per 1 GB of RAM (as the quarantine's size depends on the amount of RAM). However, with KMSAN, the stack depot ends up using ~4x more memory per a stack trace than before. Thus, for KMSAN, the stack depot capacity is increased accordingly. KMSAN uses a lot of RAM for shadow memory anyway, so the increased stack depot memory usage will not make a significant difference. Other users of the stack depot do not save stack traces as often as KASAN and KMSAN. Thus, the increased memory usage is taken as an acceptable trade-off. In the future, these other users can take advantage of the eviction API to limit the memory waste. There is no measurable boot time performance impact of these changes for KASAN on x86-64. I haven't done any tests for arm64 modes (the stack depot without performance optimizations is not suitable for intended use of those anyway), but I expect a similar result. Obtaining and copying stack trace frames when saving them into stack depot is what takes the most time. This series does not yet provide a way to configure the maximum size of the stack depot externally (e.g. via a command-line parameter). This will be added in a separate series, possibly together with the performance improvement changes. This patch (of 22): Currently, if stack_depot_disable=off is passed to the kernel command-line after stack_depot_disable=on, stack depot prints a message that it is disabled, while it is actually enabled. Fix this by moving printing the disabled message to stack_depot_early_init. Place it before the __stack_depot_early_init_requested check, so that the message is printed even if early stack depot init has not been requested. Also drop the stack_table = NULL assignment from disable_stack_depot, as stack_table is NULL by default. Link: https://lkml.kernel.org/r/cover.1700502145.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/73a25c5fff29f3357cd7a9330e85e09bc8da2cbe.1700502145.git.andreyknvl@google.com Fixes: e1fdc403349c ("lib: stackdepot: add support to disable stack depot") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kmemleak: add checksum to backtrace reportJim Cromie1-1/+1
Change /sys/kernel/debug/kmemleak report format slightly, adding "(extra info)" to the backtrace header: from: " backtrace:" to: " backtrace (crc <cksum>):" The <cksum> allows a user to see recurring backtraces without detailed/careful reading of multiline stacks. So after cycling kmemleak-test a few times, I know some leaks are repeating. bash-5.2# grep backtrace /sys/kernel/debug/kmemleak | wc 62 186 1792 bash-5.2# grep backtrace /sys/kernel/debug/kmemleak | sort -u | wc 37 111 1067 syzkaller parses kmemleak for "unreferenced object" only, so is unaffected by this change. Other github repos are moribund. Link: https://lkml.kernel.org/r/20231116224318.124209-3-jim.cromie@gmail.com Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kmemleak: drop (age <increasing>) from leak recordJim Cromie1-4/+2
Patch series "tweak kmemleak report format". These 2 patches make minor changes to the report: 1st strips "age <increasing>" from output. This makes the output idempotent; unchanging until a new leak is reported. 2nd adds the backtrace.checksum to the "backtrace:" line. This lets a user see repeats without actually reading the whole backtrace. So now the backtrace line looks like this: backtrace (crc 603070071): I surveyed for un-wanted effects upon users: Syzkaller parses kmemleak in executor/common_linux.h: static void check_leaks(char** frames, int nframes) It just counts occurrences of "unreferenced object", specifically it does not look for "age", nor would it choke on "crc" being added. github has 3 repos with "kmemleak" mentioned, all are moribund. gitlab has 0 hits on "kmemleak". This patch (of 2): Displaying age is pretty, but counter-productive; it changes with current-time, so it surrenders idempotency of the output, which breaks simple hash-based cataloging of the records by the user. The trouble: sequential reads, wo new leaks, get new results: :#> sum /sys/kernel/debug/kmemleak 53439 74 /sys/kernel/debug/kmemleak :#> sum /sys/kernel/debug/kmemleak 59066 74 /sys/kernel/debug/kmemleak and age is why (nothing else changes): :#> grep -v age /sys/kernel/debug/kmemleak | sum 58894 67 :#> grep -v age /sys/kernel/debug/kmemleak | sum 58894 67 Since jiffies is already printed in the "comm" line, age adds nothing. Notably, syzkaller reads kmemleak only for "unreferenced object", and won't care about this reform of age-ism. A few moribund github repos mention it, but don't compile. Link: https://lkml.kernel.org/r/20231116224318.124209-1-jim.cromie@gmail.com Link: https://lkml.kernel.org/r/20231116224318.124209-2-jim.cromie@gmail.com Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11fs: convert error_remove_page to error_remove_folioMatthew Wilcox (Oracle)23-44/+44
There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11memory-failure: convert truncate_error_page to truncate_error_folioMatthew Wilcox (Oracle)1-5/+4
Both callers now have a folio, so pass it in. Nothing downstream was expecting a tail page; that's asserted in generic_error_remove_page(), for example. Link: https://lkml.kernel.org/r/20231117161447.2461643-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11memory-failure: use a folio in me_huge_page()Matthew Wilcox (Oracle)1-6/+6
This function was already explicitly calling compound_head(); unfortunately the compiler can't know that and elide the redundant calls to compound_head() buried in page_mapping(), unlock_page(), etc. Switch to using a folio, which does let us elide these calls. Link: https://lkml.kernel.org/r/20231117161447.2461643-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11memory-failure: convert delete_from_lru_cache() to take a folioMatthew Wilcox (Oracle)1-11/+11
All three callers now have a folio; pass it in instead of the page. Saves five calls to compound_head(). Link: https://lkml.kernel.org/r/20231117161447.2461643-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11memory-failure: use a folio in me_pagecache_dirty()Matthew Wilcox (Oracle)1-3/+4
Replaces three hidden calls to compound_head() with one visible one. Link: https://lkml.kernel.org/r/20231117161447.2461643-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11memory-failure: use a folio in me_pagecache_clean()Matthew Wilcox (Oracle)1-7/+6
Patch series "Convert aops->error_remove_page to ->error_remove_folio". This is a memory-failure patch series which converts a lot of uses of page APIs into folio APIs with the usual benefits. This patch (of 6): Replaces three hidden calls to compound_head() with one visible one. Fix up a few comments while I'm modifying this function. Link: https://lkml.kernel.org/r/20231117161447.2461643-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231117161447.2461643-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11zram: tweak writeback config helpSergey Senozhatsky1-2/+2
Writeback is for incompressible and idle zram pages. Link: https://lkml.kernel.org/r/20231115024223.4133148-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11zram: split memory-tracking and ac-time trackingSergey Senozhatsky4-17/+25
ZRAM_MEMORY_TRACKING enables two features: - per-entry ac-time tracking - debugfs interface The latter one is the reason why memory-tracking depends on DEBUG_FS, while the former one is used far beyond debugging these days. Namely ac-time is used for fine grained writeback of idle entries (pages). Move ac-time tracking under its own config option so that it can be enabled (along with writeback) on systems without DEBUG_FS. [senozhatsky@chromium.org: ifdef fixup, per Dmytro] Link: https://lkml.kernel.org/r/20231117013543.540280-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20231115024223.4133148-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm/page_owner: record and dump free_pid and free_tgidBarry Song1-1/+8
While investigating some complex memory allocation and free bugs especially in multi-processes and multi-threads cases, from time to time, I feel the free stack isn't sufficient as a page can be freed by processes or threads other than the one allocating it. And other processes and threads which free the page often have the exactly same free stack with the one allocating the page. We can't know who free the page only through the free stack though the current page_owner does tell us the pid and tgid of the one allocating the page. This makes the bug investigation often hard. So this patch adds free pid and tgid in page_owner, so that we can easily figure out if the freeing is crossing processes or threads. Link: https://lkml.kernel.org/r/20231114034202.73098-1-v-songbaohua@oppo.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Cc: Audra Mitchell <audra@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kassey Li <quic_yingangl@quicinc.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11Documentation/mm: drop pte_bad() descriptions from arch page table helpersAnshuman Khandual1-2/+0
pte_bad() never existed unlike similar helpers at PMU, PUD, and PGD level. This was added erroneously and hence should be dropped instead. Link: https://lkml.kernel.org/r/20231114063456.339652-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11kasan: default to inline instrumentationPaul Heidekrüger1-1/+1
KASan inline instrumentation can yield up to a 2x performance gain at the cost of a larger binary. Make inline instrumentation the default, as suggested in the bug report below. When an architecture does not support inline instrumentation, it should set ARCH_DISABLE_KASAN_INLINE, as done by PowerPC, for instance. Link: https://lkml.kernel.org/r/20231109155101.186028-1-paul.heidekrueger@tum.de Signed-off-by: Paul Heidekrüger <paul.heidekrueger@tum.de> Reported-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=203495 Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: fix process_vm_rw page countsYork Jasper Niebuhr1-7/+8
1. There is a "-1" missing in the page number calculation in process_vm_rw_core. While this can't break anything, it can cause unnecessary allocations in certain cases: Consider handling an iovec ranging over PVM_MAX_PP_ARRAY_COUNT pages that is also aligned to a page boundary. While pp_stack could hold references to such an amount of pinned pages, nr_pages yields (PVM_MAX_PP_ARRAY + 1) in process_vm_rw_core. Consequently, a larger buffer is allocated with kmalloc for no reason. For any page boundary aligned iovec that is a multiple of PAGE_SIZE and larger than PVM_MAX_PP_ARRAY_COUNT pages, nr_pages will be too big by 1 and thus kmalloc allocates excess space for one more pointer. 2. max_pages_per_loop is constant and there is no reason to have it as a variable. A macro does the job just fine and saves memory. 3. Replaced "sizeof(struct pages *)" with "sizeof(struct page *)" to have matching types for allocation and prevent confusion. Link: https://lkml.kernel.org/r/20231111184859.44264-1-yjnworkstation@gmail.com Signed-off-by: York Jasper Niebuhr <yjnworkstation@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mmap: remove the IA64-specific vma expansion implementationLukas Bulwahn1-36/+1
With commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"), there is no need to keep the IA64-specific vma expansion. Clean up the IA64-specific vma expansion implementation. Link: https://lkml.kernel.org/r/20231113124728.3974-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11gfp: include __GFP_NOWARN in GFP_NOWAITMatthew Wilcox (Oracle)1-2/+3
GFP_NOWAIT callers are always prepared for their allocations to fail because they fail so frequently. Forcing the callers to remember to add __GFP_NOWARN is just annoying and leads to an endless stream of patches for the places where we forgot to add it. We can now remove __GFP_NOWARN from all the callers which specify GFP_NOWAIT, but I'd rather wait a cycle and send patches to each maintainer instead of creating a big pile of merge conflicts. Link: https://lkml.kernel.org/r/20231109211507.2262419-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm/page_alloc: dedupe some memcg uncharging logicBrendan Jackman1-8/+4
The duplication makes it seem like some work is required before uncharging in the !PageHWPoison case. But it isn't, so we can simplify the code a little. Note the PageMemcgKmem check is redundant, but I've left it in as it avoids an unnecessary function call. Link: https://lkml.kernel.org/r/20231108164920.3401565-1-jackmanb@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: remove invalidate_inode_page()Matthew Wilcox (Oracle)2-10/+2
All callers are now converted to call mapping_evict_folio(). Link: https://lkml.kernel.org/r/20231108182809.602073-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: convert isolate_page() to mf_isolate_folio()Matthew Wilcox (Oracle)1-14/+14
The only caller now has a folio, so pass it in and operate on it. Saves many page->folio conversions and introduces only one folio->page conversion when calling isolate_movable_page(). Link: https://lkml.kernel.org/r/20231108182809.602073-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: convert soft_offline_in_use_page() to use a folioMatthew Wilcox (Oracle)1-12/+12
Replace the existing head-page logic with folio logic. Link: https://lkml.kernel.org/r/20231108182809.602073-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: use mapping_evict_folio() in truncate_error_page()Matthew Wilcox (Oracle)1-2/+2
We already have the folio and the mapping, so replace the call to invalidate_inode_page() with mapping_evict_folio(). Link: https://lkml.kernel.org/r/20231108182809.602073-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: convert __do_fault() to use a folioMatthew Wilcox (Oracle)1-10/+10
Convert vmf->page to a folio as soon as we're going to use it. This fixes a bug if the fault handler returns a tail page with hardware poison; tail pages have an invalid page->index, so we would fail to unmap the page from the page tables. We actually have to unmap the entire folio (or mapping_evict_folio() will fail), so use unmap_mapping_folio() instead. This also saves various calls to compound_head() hidden in lock_page(), put_page(), etc. Link: https://lkml.kernel.org/r/20231108182809.602073-3-willy@infradead.org Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: make mapping_evict_folio() the preferred way to evict clean foliosMatthew Wilcox (Oracle)2-17/+17
Patch series "Fix fault handler's handling of poisoned tail pages". Since introducing the ability to have large folios in the page cache, it's been possible to have a hwpoisoned tail page returned from the fault handler. We handle this situation poorly; failing to remove the affected page from use. This isn't a minimal patch to fix it, it's a full conversion of all the code surrounding it. This patch (of 6): invalidate_inode_page() does very little beyond calling mapping_evict_folio(). Move the check for mapping being NULL into mapping_evict_folio() and make it available to the rest of the MM for use in the next few patches. Link: https://lkml.kernel.org/r/20231108182809.602073-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231108182809.602073-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: return void from folio_start_writeback() and related functionsMatthew Wilcox (Oracle)3-33/+29
Nobody now checks the return value from any of these functions, so add an assertion at the beginning of the function and return void. Link: https://lkml.kernel.org/r/20231108204605.745109-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: David Howells <dhowells@redhat.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11smb: do not test the return value of folio_start_writeback()Matthew Wilcox (Oracle)1-4/+2
In preparation for removing the return value entirely, stop testing it in smb. Link: https://lkml.kernel.org/r/20231108204605.745109-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11afs: do not test the return value of folio_start_writeback()Matthew Wilcox (Oracle)1-4/+2
In preparation for removing the return value entirely, stop testing it in afs. Link: https://lkml.kernel.org/r/20231108204605.745109-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: remove test_set_page_writeback()Matthew Wilcox (Oracle)1-5/+0
Patch series "Make folio_start_writeback return void". Most of the folio flag-setting functions return void. folio_start_writeback is gratuitously different; the only two filesystems that do anything with the return value emit debug messages if it's already set, and we can (and should) do that internally without bothering the filesystem to do it. This patch (of 4): There are no more callers of this wrapper. Link: https://lkml.kernel.org/r/20231108204605.745109-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231108204605.745109-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11gfs2: convert stuffed_readpage() to stuffed_read_folio()Matthew Wilcox (Oracle)1-20/+17
Use folio_fill_tail() to implement the unstuffing and folio_end_read() to simultaneously mark the folio uptodate and unlock it. Unifies a couple of code paths. Link: https://lkml.kernel.org/r/20231107212643.3490372-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11mm: add folio_fill_tail() and use it in iomapMatthew Wilcox (Oracle)2-12/+40
The iomap code was limited to PAGE_SIZE bytes; generalise it to cover an arbitrary-sized folio, and move it to be a common helper. [akpm@linux-foundation.org: fix folio_fill_tail(), per Andreas Gruenbacher] Link: https://lkml.kernel.org/r/20231107212643.3490372-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>