summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2024-05-30lib/test_hmm.c: handle src_pfns and dst_pfns allocation failureDuoming Zhou1-4/+4
[ Upstream commit c2af060d1c18beaec56351cf9c9bcbbc5af341a3 ] The kcalloc() in dmirror_device_evict_chunk() will return null if the physical memory has run out. As a result, if src_pfns or dst_pfns is dereferenced, the null pointer dereference bug will happen. Moreover, the device is going away. If the kcalloc() fails, the pages mapping a chunk could not be evicted. So add a __GFP_NOFAIL flag in kcalloc(). Finally, as there is no need to have physically contiguous memory, Switch kcalloc() to kvcalloc() in order to avoid failing allocations. Link: https://lkml.kernel.org/r/20240312005905.9939-1-duoming@zju.edu.cn Fixes: b2ef9f5a5cb3 ("mm/hmm/test: add selftest driver for HMM") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Cc: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30overflow: Change DEFINE_FLEX to take __counted_by memberKees Cook1-0/+19
[ Upstream commit d8e45f2929b94099913eb66c3ebb18b5063e9421 ] The norm should be flexible array structures with __counted_by annotations, so DEFINE_FLEX() is updated to expect that. Rename the non-annotated version to DEFINE_RAW_FLEX(), and update the few existing users. Additionally add selftests for the macros. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: e77f43d531af ("Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30kunit: bail out early in __kunit_test_suites_init() if there are no suites ↵Scott Mayhew1-0/+3
to test [ Upstream commit 5496b9b77d7420652202b73cf036e69760be5deb ] Commit c72a870926c2 added a mutex to prevent kunit tests from running concurrently. Unfortunately that mutex gets locked during module load regardless of whether the module actually has any kunit tests. This causes a problem for kunit tests that might need to load other kernel modules (e.g. gss_krb5_test loading the camellia module). So check to see if there are actually any tests to run before locking the kunit_run_lock mutex. Fixes: c72a870926c2 ("kunit: add ability to run tests after boot using debugfs") Reported-by: Nico Pache <npache@redhat.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30kunit: unregister the device on errorWander Lairson Costa1-1/+1
[ Upstream commit fabd480b721eb30aa4e2c89507b53933069f9f6e ] kunit_init_device() should unregister the device on bus register error, but mistakenly it tries to unregister the bus. Unregister the device instead of the bus. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30kunit: Fix kthread referenceMickaël Salaün1-3/+6
[ Upstream commit f8aa1b98ce40184521ed95ec26cc115a255183b2 ] There is a race condition when a kthread finishes after the deadline and before the call to kthread_stop(), which may lead to use after free. Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Fixes: adf505457032 ("kunit: fix UAF when run kfence test case test_gfpzero") Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Rae Moar <rmoar@google.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20240408074625.65017-3-mic@digikod.net Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30kunit/fortify: Fix mismatched kvalloc()/vfree() usageKees Cook1-8/+8
[ Upstream commit 998b18072ceb0613629c256b409f4d299829c7ec ] The kv*() family of tests were accidentally freeing with vfree() instead of kvfree(). Use kvfree() instead. Fixes: 9124a2640148 ("kunit/fortify: Validate __alloc_size attribute results") Link: https://lore.kernel.org/r/20240425230619.work.299-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30maple_tree: Add mtree_alloc_cyclic()Chuck Lever1-0/+93
[ Upstream commit 9b6713cc75229f25552c643083cbdbfb771e5bca ] I need a cyclic allocator for the simple_offset implementation in fs/libfs.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170820144179.6328.12838600511394432325.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: 23cdd0eed3f1 ("libfs: Fix simple_offset_rename_exchange()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30mm/slub, kunit: Use inverted data to corrupt kmem cacheGuenter Roeck1-1/+1
[ Upstream commit b1080c667b3b2c8c38a7fa83ca5567124887abae ] Two failure patterns are seen randomly when running slub_kunit tests with CONFIG_SLAB_FREELIST_RANDOM and CONFIG_SLAB_FREELIST_HARDENED enabled. Pattern 1: # test_clobber_zone: pass:1 fail:0 skip:0 total:1 ok 1 test_clobber_zone # test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72 Expected 3 == slab_errors, but slab_errors == 0 (0x0) # test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:84 Expected 2 == slab_errors, but slab_errors == 0 (0x0) # test_next_pointer: pass:0 fail:1 skip:0 total:1 not ok 2 test_next_pointer In this case, test_next_pointer() overwrites p[s->offset], but the data at p[s->offset] is already 0x12. Pattern 2: ok 1 test_clobber_zone # test_next_pointer: EXPECTATION FAILED at lib/slub_kunit.c:72 Expected 3 == slab_errors, but slab_errors == 2 (0x2) # test_next_pointer: pass:0 fail:1 skip:0 total:1 not ok 2 test_next_pointer In this case, p[s->offset] has a value other than 0x12, but one of the expected failures is nevertheless missing. Invert data instead of writing a fixed value to corrupt the cache data structures to fix the problem. Fixes: 1f9f78b1b376 ("mm/slub, kunit: add a KUnit test for SLUB debugging functionality") Cc: Oliver Glitta <glittao@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> CC: Daniel Latypov <dlatypov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17dyndbg: fix old BUG_ON in >control parserJim Cromie1-1/+5
commit 00e7d3bea2ce7dac7bee1cf501fb071fd0ea8f6c upstream. Fix a BUG_ON from 2009. Even if it looks "unreachable" (I didn't really look), lets make sure by removing it, doing pr_err and return -EINVAL instead. Cc: stable <stable@kernel.org> Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20240429193145.66543-2-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-17maple_tree: fix mas_empty_area_rev() null pointer dereferenceLiam R. Howlett1-8/+8
commit 955a923d2809803980ff574270f81510112be9cf upstream. Currently the code calls mas_start() followed by mas_data_end() if the maple state is MA_START, but mas_start() may return with the maple state node == NULL. This will lead to a null pointer dereference when checking information in the NULL node, which is done in mas_data_end(). Avoid setting the offset if there is no node by waiting until after the maple state is checked for an empty or single entry state. A user could trigger the events to cause a kernel oops by unmapping all vmas to produce an empty maple tree, then mapping a vma that would cause the scenario described above. Link: https://lkml.kernel.org/r/20240422203349.2418465-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Marius Fleischer <fleischermarius@gmail.com> Closes: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW2-R6w@mail.gmail.com/ Tested-by: Marius Fleischer <fleischermarius@gmail.com> Tested-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-17Fix a potential infinite loop in extract_user_to_sg()David Howells1-1/+1
[ Upstream commit 6a30653b604aaad1bf0f2e74b068ceb8b6fc7aea ] Fix extract_user_to_sg() so that it will break out of the loop if iov_iter_extract_pages() returns 0 rather than looping around forever. [Note that I've included two fixes lines as the function got moved to a different file and renamed] Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Fixes: f5f82cd18732 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: netfs@lists.linux.dev Link: https://lore.kernel.org/r/1967121.1714034372@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definitionAndrii Nakryiko1-2/+3
[ Upstream commit 229087f6f1dc2d0c38feba805770f28529980ec0 ] Turns out that due to CONFIG_DEBUG_INFO_BTF_MODULES not having an explicitly specified "menu item name" in Kconfig, it's basically impossible to turn it off (see [0]). This patch fixes the issue by defining menu name for CONFIG_DEBUG_INFO_BTF_MODULES, which makes it actually adjustable and independent of CONFIG_DEBUG_INFO_BTF, in the sense that one can have DEBUG_INFO_BTF=y and DEBUG_INFO_BTF_MODULES=n. We still keep it as defaulting to Y, of course. Fixes: 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it") Reported-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/CAK3+h2xiFfzQ9UXf56nrRRP=p1+iUxGoEP5B+aq9MDT5jLXDSg@mail.gmail.com [0] Link: https://lore.kernel.org/bpf/20240404220344.3879270-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-02stackdepot: respect __GFP_NOLOCKDEP allocation flagAndrey Ryabinin1-2/+2
commit 6fe60465e1d53ea321ee909be26d97529e8f746c upstream. If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ #49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Xiubo Li <xiubli@redhat.com> Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/ Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/ Suggested-by: Dave Chinner <david@fromorbit.com> Tested-by: Xiubo Li <xiubli@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27bootconfig: use memblock_free_late to free xbc memory to buddyQiang Zhang1-8/+11
commit 89f9a1e876b5a7ad884918c03a46831af202c8a0 upstream. On the time to free xbc memory in xbc_exit(), memblock may has handed over memory to buddy allocator. So it doesn't make sense to free memory back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs shows this case. This patch fixes the xbc memory free problem by calling memblock_free() in early xbc init error rewind path and calling memblock_free_late() in xbc exit path to free memory to buddy allocator. [ 9.410890] ================================================================== [ 9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260 [ 9.426850] Read of size 8 at addr ffff88845dd30000 by task swapper/0/1 [ 9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 6.9.0-rc3-00208-g586b5dfb51b9 #5 [ 9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D-00000000 Dec 28 2023 [ 9.460789] Call Trace: [ 9.463518] <TASK> [ 9.465859] dump_stack_lvl+0x53/0x70 [ 9.469949] print_report+0xce/0x610 [ 9.473944] ? __virt_addr_valid+0xf5/0x1b0 [ 9.478619] ? memblock_isolate_range+0x12d/0x260 [ 9.483877] kasan_report+0xc6/0x100 [ 9.487870] ? memblock_isolate_range+0x12d/0x260 [ 9.493125] memblock_isolate_range+0x12d/0x260 [ 9.498187] memblock_phys_free+0xb4/0x160 [ 9.502762] ? __pfx_memblock_phys_free+0x10/0x10 [ 9.508021] ? mutex_unlock+0x7e/0xd0 [ 9.512111] ? __pfx_mutex_unlock+0x10/0x10 [ 9.516786] ? kernel_init_freeable+0x2d4/0x430 [ 9.521850] ? __pfx_kernel_init+0x10/0x10 [ 9.526426] xbc_exit+0x17/0x70 [ 9.529935] kernel_init+0x38/0x1e0 [ 9.533829] ? _raw_spin_unlock_irq+0xd/0x30 [ 9.538601] ret_from_fork+0x2c/0x50 [ 9.542596] ? __pfx_kernel_init+0x10/0x10 [ 9.547170] ret_from_fork_asm+0x1a/0x30 [ 9.551552] </TASK> [ 9.555649] The buggy address belongs to the physical page: [ 9.561875] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x45dd30 [ 9.570821] flags: 0x200000000000000(node=0|zone=2) [ 9.576271] page_type: 0xffffffff() [ 9.580167] raw: 0200000000000000 ffffea0011774c48 ffffea0012ba1848 0000000000000000 [ 9.588823] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 [ 9.597476] page dumped because: kasan: bad access detected [ 9.605362] Memory state around the buggy address: [ 9.610714] ffff88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.618786] ffff88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.626857] >ffff88845dd30000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.634930] ^ [ 9.638534] ffff88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.646605] ffff88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.654675] ================================================================== Link: https://lore.kernel.org/all/20240414114944.1012359-1-qiang4.zhang@linux.intel.com/ Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Stable@vger.kernel.org Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-17lib: checksum: hide unused expected_csum_ipv6_magic[]Arnd Bergmann1-2/+3
[ Upstream commit e9d47b7b31563a6524b9f64ea70ed0289cc4d9c4 ] When CONFIG_NET is disabled, an extra warning shows up for this unused variable: lib/checksum_kunit.c:218:18: error: 'expected_csum_ipv6_magic' defined but not used [-Werror=unused-const-variable=] Replace the #ifdef with an IS_ENABLED() check that makes the compiler's dead-code-elimination take care of the link failure. Fixes: f24a70106dc1 ("lib: checksum: Fix build with CONFIG_NET=n") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13dump_stack: Do not get cpu_sync for panic CPUJohn Ogness1-3/+13
[ Upstream commit 7412dc6d55eed6b76180e40ac3601412ebde29bd ] dump_stack() is called in panic(). If for some reason another CPU is holding the printk_cpu_sync and is unable to release it, the panic CPU will be unable to continue and print the stacktrace. Since non-panic CPUs are not allowed to store new printk messages anyway, there is no need to synchronize the stacktrace output in a panic situation. For the panic CPU, do not get the printk_cpu_sync because it is not needed and avoids a potential deadlock scenario in panic(). Link: https://lore.kernel.org/lkml/ZcIGKU8sxti38Kok@alley Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-15-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-10stackdepot: rename pool_index to pool_index_plus_1Peter Collingbourne1-2/+2
[ Upstream commit a6c1d9cb9a68bfa4512248419c4f4d880d19fe90 ] Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") changed the meaning of the pool_index field to mean "the pool index plus 1". This made the code accessing this field less self-documenting, as well as causing debuggers such as drgn to not be able to easily remain compatible with both old and new kernels, because they typically do that by testing for presence of the new field. Because stackdepot is a debugging tool, we should make sure that it is debugger friendly. Therefore, give the field a different name to improve readability as well as enabling debugger backwards compatibility. This is needed in 6.9, which would otherwise become an odd release with the new semantics and old name so debuggers wouldn't recognize the new semantics there. Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle") Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88 Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-10lib/stackdepot: move stack_record struct definition into the headerOscar Salvador1-43/+0
[ Upstream commit 8151c7a35d8bd8a12e93538ef7963ea209b6ab41 ] In order to move the heavy lifting into page_owner code, this one needs to have access to the stack_record structure, which right now sits in lib/stackdepot.c. Move it to the stackdepot.h header so page_owner can access stack_record's struct fields. Link: https://lkml.kernel.org/r/20240215215907.20121-3-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: a6c1d9cb9a68 ("stackdepot: rename pool_index to pool_index_plus_1") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03pci_iounmap(): Fix MMIO mapping leakPhilipp Stanner1-1/+1
[ Upstream commit 7626913652cc786c238e2dd7d8740b17d41b2637 ] The #ifdef ARCH_HAS_GENERIC_IOPORT_MAP accidentally also guards iounmap(), which means MMIO mappings are leaked. Move the guard so we call iounmap() for MMIO mappings. Fixes: 316e8d79a095 ("pci_iounmap'2: Electric Boogaloo: try to make sense of it all") Link: https://lore.kernel.org/r/20240131090023.12331-2-pstanner@redhat.com Reported-by: Danilo Krummrich <dakr@redhat.com> Suggested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27lib/stackdepot: off by one in depot_fetch_stack()Dan Carpenter1-1/+1
[ Upstream commit dc24559472a682eb124e869cb110e7a2fd857322 ] The stack_pools[] array has DEPOT_MAX_POOLS. The "pools_num" tracks the number of pools which are initialized. See depot_init_pool() for more details. If pool_index == pools_num_cached, this will read one element beyond what we want. If not all the pools are initialized, then the pool will be NULL, triggering a WARN(), and if they are all initialized it will read one element beyond the end of the array. Link: https://lkml.kernel.org/r/361ac881-60b7-471f-91e5-5bf8fe8042b2@moroto.mountain Fixes: b29d31885814 ("lib/stackdepot: store free stack records in a freelist") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27lib/stackdepot: fix first entry having a 0-handleOscar Salvador1-7/+9
[ Upstream commit 3ee34eabac2abb6b1b6fcdebffe18870719ad000 ] Patch series "page_owner: print stacks and their outstanding allocations", v10. page_owner is a great debug functionality tool that lets us know about all pages that have been allocated/freed and their specific stacktrace. This comes very handy when debugging memory leaks, since with some scripting we can see the outstanding allocations, which might point to a memory leak. In my experience, that is one of the most useful cases, but it can get really tedious to screen through all pages and try to reconstruct the stack <-> allocated/freed relationship, becoming most of the time a daunting and slow process when we have tons of allocation/free operations. This patchset aims to ease that by adding a new functionality into page_owner. This functionality creates a new directory called 'page_owner_stacks' under 'sys/kernel//debug' with a read-only file called 'show_stacks', which prints out all the stacks followed by their outstanding number of allocations (being that the times the stacktrace has allocated but not freed yet). This gives us a clear and a quick overview of stacks <-> allocated/free. We take advantage of the new refcount_f field that stack_record struct gained, and increment/decrement the stack refcount on every __set_page_owner() (alloc operation) and __reset_page_owner (free operation) call. Unfortunately, we cannot use the new stackdepot api STACK_DEPOT_FLAG_GET because it does not fulfill page_owner needs, meaning we would have to special case things, at which point makes more sense for page_owner to do its own {dec,inc}rementing of the stacks. E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0, such stack gets evicted, so page_owner would lose information. This patchset also creates a new file called 'set_threshold' within 'page_owner_stacks' directory, and by writing a value to it, the stacks which refcount is below such value will be filtered out. A PoC can be found below: # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt # head -40 page_owner_full_stacks.txt prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 page_cache_ra_unbounded+0x96/0x180 filemap_get_pages+0xfd/0x590 filemap_read+0xcc/0x330 blkdev_read_iter+0xb8/0x150 vfs_read+0x285/0x320 ksys_read+0xa5/0xe0 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 521 prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 __filemap_get_folio+0x14a/0x490 ext4_write_begin+0xbd/0x4b0 [ext4] generic_perform_write+0xc1/0x1e0 ext4_buffered_write_iter+0x68/0xe0 [ext4] ext4_file_write_iter+0x70/0x740 [ext4] vfs_write+0x33d/0x420 ksys_write+0xa5/0xe0 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 4609 ... ... # echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt # head -40 page_owner_full_stacks_5000.txt prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 __filemap_get_folio+0x14a/0x490 ext4_write_begin+0xbd/0x4b0 [ext4] generic_perform_write+0xc1/0x1e0 ext4_buffered_write_iter+0x68/0xe0 [ext4] ext4_file_write_iter+0x70/0x740 [ext4] vfs_write+0x33d/0x420 ksys_pwrite64+0x75/0x90 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 6781 prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 pcpu_populate_chunk+0xec/0x350 pcpu_balance_workfn+0x2d1/0x4a0 process_scheduled_works+0x84/0x380 worker_thread+0x12a/0x2a0 kthread+0xe3/0x110 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 stack_count: 8641 This patch (of 7): The very first entry of stack_record gets a handle of 0, but this is wrong because stackdepot treats a 0-handle as a non-valid one. E.g: See the check in stack_depot_fetch() Fix this by adding and offset of 1. This bug has been lurking since the very beginning of stackdepot, but no one really cared as it seems. Because of that I am not adding a Fixes tag. Link: https://lkml.kernel.org/r/20240215215907.20121-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20240215215907.20121-2-osalvador@suse.de Co-developed-by: Marco Elver <elver@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: dc24559472a6 ("lib/stackdepot: off by one in depot_fetch_stack()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27net: blackhole_dev: fix build warning for ethh set but not usedBreno Leitao1-2/+1
[ Upstream commit 843a8851e89e2e85db04caaf88d8554818319047 ] lib/test_blackhole_dev.c sets a variable that is never read, causing this following building warning: lib/test_blackhole_dev.c:32:17: warning: variable 'ethh' set but not used [-Wunused-but-set-variable] Remove the variable struct ethhdr *ethh, which is unused. Fixes: 509e56b37cc3 ("blackhole_dev: add a selftest") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27lib: memcpy_kunit: Fix an invalid format specifier in an assertion msgDavid Gow1-2/+2
[ Upstream commit 0a549ed22c3c7cc6da5c5f5918efd019944489a5 ] The 'i' passed as an assertion message is a size_t, so should use '%zu', not '%d'. This was found by annotating the _MSG() variants of KUnit's assertions to let gcc validate the format strings. Fixes: bb95ebbe89a7 ("lib: Introduce CONFIG_MEMCPY_KUNIT_TEST") Signed-off-by: David Gow <davidgow@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27lib/cmdline: Fix an invalid format specifier in an assertion msgDavid Gow1-1/+1
[ Upstream commit d2733a026fc7247ba42d7a8e1b737cf14bf1df21 ] The correct format specifier for p - n (both p and n are pointers) is %td, as the type should be ptrdiff_t. This was discovered by annotating KUnit assertion macros with gcc's printf specifier, but note that gcc incorrectly suggested a %d or %ld specifier (depending on the pointer size of the architecture being built). Fixes: 0ea09083116d ("lib/cmdline: Allow get_options() to take 0 to validate the input") Signed-off-by: David Gow <davidgow@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Daniel Latypov <dlatypov@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27kunit: test: Log the correct filter string in executor_testDavid Gow1-1/+1
[ Upstream commit 6f2f793fba78eb4a0d5a34a71bc781118ed923d3 ] KUnit's executor_test logs the filter string in KUNIT_ASSERT_EQ_MSG(), but passed a random character from the filter, rather than the whole string. This was found by annotating KUNIT_ASSERT_EQ_MSG() to let gcc validate the format string. Fixes: 76066f93f1df ("kunit: add tests for filtering attributes") Signed-off-by: David Gow <davidgow@google.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27kunit: Setup DMA masks on the kunit deviceMaxime Ripard1-0/+4
[ Upstream commit c5215d54dc10e801a6cefef62445a00a4c28a515 ] Commit d393acce7b3f ("drm/tests: Switch to kunit devices") switched the DRM device creation helpers from an ad-hoc implementation to the new kunit device creation helpers introduced in commit d03c720e03bd ("kunit: Add APIs for managing devices"). However, while the DRM helpers were using a platform_device, the kunit helpers are using a dedicated bus and device type. That situation creates small differences in the initialisation, and one of them is that the kunit devices do not have the DMA masks setup. In turn, this means that we can't do any kind of DMA buffer allocation anymore, which creates a regression on some (downstream for now) tests. Let's set up a default DMA mask that should work on any platform to fix it. Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Signed-off-by: Maxime Ripard <mripard@kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06iov_iter: get rid of 'copy_mc' flagLinus Torvalds1-23/+0
This flag is only set by one single user: the magical core dumping code that looks up user pages one by one, and then writes them out using their kernel addresses (by using a BVEC_ITER). That actually ends up being a huge problem, because while we do use copy_mc_to_kernel() for this case and it is able to handle the possible machine checks involved, nothing else is really ready to handle the failures caused by the machine check. In particular, as reported by Tong Tiangen, we don't actually support fault_in_iov_iter_readable() on a machine check area. As a result, the usual logic for writing things to a file under a filesystem lock, which involves doing a copy with page faults disabled and then if that fails trying to fault pages in without holding the locks with fault_in_iov_iter_readable() does not work at all. We could decide to always just make the MC copy "succeed" (and filling the destination with zeroes), and that would then create a core dump file that just ignores any machine checks. But honestly, this single special case has been problematic before, and means that all the normal iov_iter code ends up slightly more complex and slower. See for example commit c9eec08bac96 ("iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()") where David Howells re-organized the code just to avoid having to check the 'copy_mc' flags inside the inner iov_iter loops. So considering that we have exactly one user, and that one user is a non-critical special case that doesn't actually ever trigger in real life (Tong found this with manual error injection), the sane solution is to just decide that the onus on handling the machine check lines on that user instead. Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying the user data to a stable kernel page before writing it out. Fixes: f1982740f5e7 ("iov_iter: Convert iterate*() to inline funcs") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/ Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reported-by: Tong Tiangen <tongtiangen@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds2-8/+13
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29kunit: Fix again checksum tests on big endian CPUsChristophe Leroy1-8/+9
Commit b38460bc463c ("kunit: Fix checksum tests on big endian CPUs") fixed endianness issues with kunit checksum tests, but then commit 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") introduced new issues on big endian CPUs. Those issues are once again reflected by the warnings reported by sparse. So, fix them with the same approach, perform proper conversion in order to support both little and big endian CPUs. Once the conversions are properly done and the right types used, the sparse warnings are cleared as well. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-24stackdepot: use variable size records for non-evictable entriesMarco Elver1-123/+127
With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space. Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior. With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting. Before: pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454 After: pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008 [elver@google.com: fix -Wstringop-overflow warning] Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23netlink: add nla be16/32 types to minlen arrayFlorian Westphal1-0/+4
BUG: KMSAN: uninit-value in nla_validate_range_unsigned lib/nlattr.c:222 [inline] BUG: KMSAN: uninit-value in nla_validate_int_range lib/nlattr.c:336 [inline] BUG: KMSAN: uninit-value in validate_nla lib/nlattr.c:575 [inline] BUG: KMSAN: uninit-value in __nla_validate_parse+0x2e20/0x45c0 lib/nlattr.c:631 nla_validate_range_unsigned lib/nlattr.c:222 [inline] nla_validate_int_range lib/nlattr.c:336 [inline] validate_nla lib/nlattr.c:575 [inline] ... The message in question matches this policy: [NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255), but because NLA_BE32 size in minlen array is 0, the validation code will read past the malformed (too small) attribute. Note: Other attributes, e.g. BITFIELD32, SINT, UINT.. are also missing: those likely should be added too. Reported-by: syzbot+3f497b07aa3baf2fb4d0@syzkaller.appspotmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/all/CABOYnLzFYHSnvTyS6zGa-udNX55+izqkOt2sB9WDqUcEGW6n8w@mail.gmail.com/raw Fixes: ecaf75ffd5f5 ("netlink: introduce bigendian integer types") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240221172740.5092-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21lib/Kconfig.debug: TEST_IOV_ITER depends on MMUGuenter Roeck1-0/+1
Trying to run the iov_iter unit test on a nommu system such as the qemu kc705-nommu emulation results in a crash. KTAP version 1 # Subtest: iov_iter # module: kunit_iov_iter 1..9 BUG: failure at mm/nommu.c:318/vmap()! Kernel panic - not syncing: BUG! The test calls vmap() directly, but vmap() is not supported on nommu systems, causing the crash. TEST_IOV_ITER therefore needs to depend on MMU. Link: https://lkml.kernel.org/r/20240208153010.1439753-1-linux@roeck-us.net Fixes: 2d71340ff1d4 ("iov_iter: Kunit tests for copying to/from an iterator") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-17Merge tag 'driver-core-6.8-rc5' of ↵Linus Torvalds1-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core fixes, a kobject fix, and a documentation update for 6.8-rc5. In detail these changes are: - devlink fixes for reported issues with 6.8-rc1 - topology scheduling regression fix that has been reported by many - kobject loosening of checks change in -rc1 is now reverted as some codepaths seemed to need the checks - documentation update for the CVE process. Has been reviewed by many, the last minute change to the document was to bring the .rst format back into the the new style rules, the contents did not change. All of these, except for the documentation update, have been in linux-next for over a week. The documentation update has been reviewed for weeks by a group of developers, and in public for a week and the wording has stabilized for now. If future changes are needed, we can do so before 6.8-final is out (or anytime after that)" * tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation: Document the Linux Kernel CVE process Revert "kobject: Remove redundant checks for whether ktype is NULL" driver core: fw_devlink: Improve logs for cycle detection driver core: fw_devlink: Improve detection of overlapping cycles driver core: Fix device_link_flag_is_sync_state_only() topology: Set capacity_freq_ref in all cases
2024-02-16Merge tag 'trace-v6.8-rc4' of ↵Linus Torvalds1-19/+30
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the #ifndef that didn't have the 'CONFIG_' prefix on HAVE_DYNAMIC_FTRACE_WITH_REGS The fix to have dynamic trampolines work with x86 broke arm64 as the config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the previous fix was to fix. - Fix tracing_on state The code to test if "tracing_on" is set incorrectly used ring_buffer_record_is_on() which returns false if the ring buffer isn't able to be written to. But the ring buffer disable has several bits that disable it. One is internal disabling which is used for resizing and other modifications of the ring buffer. But the "tracing_on" user space visible flag should only report if tracing is actually on and not internally disabled, as this can cause confusion as writing "1" when it is disabled will not enable it. Instead use ring_buffer_record_is_set_on() which shows the user space visible settings. - Fix a false positive kmemleak on saved cmdlines Now that the saved_cmdlines structure is allocated via alloc_page() and not via kmalloc() it has become invisible to kmemleak. The allocation done to one of its pointers was flagged as a dangling allocation leak. Make kmemleak aware of this allocation and free. - Fix synthetic event dynamic strings An update that cleaned up the synthetic event code removed the return value of trace_string(), and had it return zero instead of the length, causing dynamic strings in the synthetic event to always have zero size. - Clean up documentation and header files for seq_buf * tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: seq_buf: Fix kernel documentation seq_buf: Don't use "proxy" headers tracing/synthetic: Fix trace_string() return value tracing: Inform kmemleak of saved_cmdlines allocation tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on() tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
2024-02-15seq_buf: Fix kernel documentationAndy Shevchenko1-17/+18
There are plenty of issues with the kernel documentation here: - misspelled word "sequence" - different style of returned value descriptions - missed Return sections - unaligned style of ASCII / NUL-terminated / etc - wrong function references Fix all these. Link: https://lkml.kernel.org/r/20240215152506.598340-1-andriy.shevchenko@linux.intel.com Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-15seq_buf: Don't use "proxy" headersAndy Shevchenko1-2/+12
Update header inclusions to follow IWYU (Include What You Use) principle. Link: https://lkml.kernel.org/r/20240215142255.400264-1-andriy.shevchenko@linux.intel.com Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-15Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of ↵Linus Torvalds3-0/+19
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fix from Shuah Khan: "One important fix to unregister kunit_bus when KUnit module is unloaded. Not doing so causes an error when KUnit module tries to re-register the bus when it gets reloaded" * tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: device: Unregister the kunit_bus on shutdown
2024-02-08Revert "kobject: Remove redundant checks for whether ktype is NULL"Greg Kroah-Hartman1-10/+14
This reverts commit 1b28cb81dab7c1eedc6034206f4e8d644046ad31. It is reported to cause problems, so revert it for now until the root cause can be found. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 1b28cb81dab7 ("kobject: Remove redundant checks for whether ktype is NULL") Cc: Zhen Lei <thunder.leizhen@huawei.com> Closes: https://lore.kernel.org/oe-lkp/202402071403.e302e33a-oliver.sang@intel.com Link: https://lore.kernel.org/r/2024020849-consensus-length-6264@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-07kunit: device: Unregister the kunit_bus on shutdownDavid Gow3-0/+19
If KUnit is built as a module, and it's unloaded, the kunit_bus is not unregistered. This causes an error if it's then re-loaded later, as we try to re-register the bus. Unregister the bus and root_device on shutdown, if it looks valid. In addition, be more specific about the value of kunit_bus_device. It is: - a valid struct device* if the kunit_bus initialised correctly. - an ERR_PTR if it failed to initialise. - NULL before initialisation and after shutdown. Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-31Merge tag 'linux_kselftest-kunit-fixes-6.8-rc3' of ↵Linus Torvalds4-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fixes from Shuah Khan: "NULL vs IS_ERR() bug fixes, documentation update, MAINTAINERS file update to add Rae Moar as a reviewer, and a fix to run test suites only after module initialization completes" * tag 'linux_kselftest-kunit-fixes-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: Documentation: KUnit: Update the instructions on how to test static functions kunit: run test suites only after module initialization completes MAINTAINERS: kunit: Add Rae Moar as a reviewer kunit: device: Fix a NULL vs IS_ERR() check in init() kunit: Fix a NULL vs IS_ERR() bug
2024-01-26stackdepot: make fast paths lock-less againMarco Elver1-111/+211
With the introduction of the pool_rwlock (reader-writer lock), several fast paths end up taking the pool_rwlock as readers. Furthermore, stack_depot_put() unconditionally takes the pool_rwlock as a writer. Despite allowing readers to make forward-progress concurrently, reader-writer locks have inherent cache contention issues, which does not scale well on systems with large CPU counts. Rework the synchronization story of stack depot to again avoid taking any locks in the fast paths. This is done by relying on RCU-protected list traversal, and the NMI-safe subset of RCU to delay reuse of freed stack records. See code comments for more details. Along with the performance issues, this also fixes incorrect nesting of rwlock within a raw_spinlock, given that stack depot should still be usable from anywhere: | [ BUG: Invalid wait context ] | ----------------------------- | swapper/0/1 is trying to lock: | ffffffff89869be8 (pool_rwlock){..--}-{3:3}, at: stack_depot_save_flags | other info that might help us debug this: | context-{5:5} | 2 locks held by swapper/0/1: | #0: ffffffff89632440 (rcu_read_lock){....}-{1:3}, at: __queue_work | #1: ffff888100092018 (&pool->lock){-.-.}-{2:2}, at: __queue_work <-- raw_spin_lock Stack depot usage stats are similar to the previous version after a KASAN kernel boot: $ cat /sys/kernel/debug/stackdepot/stats pools: 838 allocations: 29865 frees: 6604 in_use: 23261 freelist_size: 1879 The number of pools is the same as previously. The freelist size is minimally larger, but this may also be due to variance across system boots. This shows that even though we do not eagerly wait for the next RCU grace period (such as with synchronize_rcu() or call_rcu()) after freeing a stack record - requiring depot_pop_free() to "poll" if an entry may be used - new allocations are very likely to happen in later RCU grace periods. Link: https://lkml.kernel.org/r/20240118110216.2539519-2-elver@google.com Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-26stackdepot: add stats counters exported via debugfsMarco Elver1-0/+53
Add a few basic stats counters for stack depot that can be used to derive if stack depot is working as intended. This is a snapshot of the new stats after booting a system with a KASAN-enabled kernel: $ cat /sys/kernel/debug/stackdepot/stats pools: 838 allocations: 29861 frees: 6561 in_use: 23300 freelist_size: 1840 Generally, "pools" should be well below the max; once the system is booted, "in_use" should remain relatively steady. Link: https://lkml.kernel.org/r/20240118110216.2539519-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-22kunit: run test suites only after module initialization completesMarco Pagani2-3/+15
Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") fixed a wild-memory-access bug that could have happened during the loading phase of test suites built and executed as loadable modules. However, it also introduced a problematic side effect that causes test suites modules to crash when they attempt to register fake devices. When a module is loaded, it traverses the MODULE_STATE_UNFORMED and MODULE_STATE_COMING states before reaching the normal operating state MODULE_STATE_LIVE. Finally, when the module is removed, it moves to MODULE_STATE_GOING before being released. However, if the loading function load_module() fails between complete_formation() and do_init_module(), the module goes directly from MODULE_STATE_COMING to MODULE_STATE_GOING without passing through MODULE_STATE_LIVE. This behavior was causing kunit_module_exit() to be called without having first executed kunit_module_init(). Since kunit_module_exit() is responsible for freeing the memory allocated by kunit_module_init() through kunit_filter_suites(), this behavior was resulting in a wild-memory-access bug. Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") fixed this issue by running the tests when the module is still in MODULE_STATE_COMING. However, modules in that state are not fully initialized, lacking sysfs kobjects. Therefore, if a test module attempts to register a fake device, it will inevitably crash. This patch proposes a different approach to fix the original wild-memory-access bug while restoring the normal module execution flow by making kunit_module_exit() able to detect if kunit_module_init() has previously initialized the tests suite set. In this way, test modules can once again register fake devices without crashing. This behavior is achieved by checking whether mod->kunit_suites is a virtual or direct mapping address. If it is a virtual address, then kunit_module_init() has allocated the suite_set in kunit_filter_suites() using kmalloc_array(). On the contrary, if mod->kunit_suites is still pointing to the original address that was set when looking up the .kunit_test_suites section of the module, then the loading phase has failed and there's no memory to be freed. v4: - rebased on 6.8 - noted that kunit_filter_suites() must return a virtual address v3: - add a comment to clarify why the start address is checked v2: - add include <linux/mm.h> Fixes: 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") Reviewed-by: David Gow <davidgow@google.com> Tested-by: Rae Moar <rmoar@google.com> Tested-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Marco Pagani <marpagan@redhat.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-22kunit: device: Fix a NULL vs IS_ERR() check in init()Dan Carpenter1-2/+2
The root_device_register() function does not return NULL, it returns error pointers. Fix the check to match. Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-22kunit: Fix a NULL vs IS_ERR() bugDan Carpenter1-1/+1
The kunit_device_register() function doesn't return NULL, it returns error pointers. Change the KUNIT_ASSERT_NOT_NULL() to check for ERR_OR_NULL(). Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-01-20Merge tag 'riscv-for-linus-6.8-mw4' of ↵Linus Torvalds2-6/+292
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for tuning for systems with fast misaligned accesses. - Support for SBI-based suspend. - Support for the new SBI debug console extension. - The T-Head CMOs now use PA-based flushes. - Support for enabling the V extension in kernel code. - Optimized IP checksum routines. - Various ftrace improvements. - Support for archrandom, which depends on the Zkr extension. - The build is no longer broken under NET=n, KUNIT=y for ports that don't define their own ipv6 checksum. * tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits) lib: checksum: Fix build with CONFIG_NET=n riscv: lib: Check if output in asm goto supported riscv: Fix build error on rv32 + XIP riscv: optimize ELF relocation function in riscv RISC-V: Implement archrandom when Zkr is available riscv: Optimize hweight API with Zbb extension riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support riscv: ftrace: Make function graph use ftrace directly riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name riscv: Restrict DWARF5 when building with LLVM to known working versions riscv: Hoist linker relaxation disabling logic into Kconfig kunit: Add tests for csum_ipv6_magic and ip_fast_csum riscv: Add checksum library riscv: Add checksum header riscv: Add static key for misaligned accesses asm-generic: Improve csum_fold RISC-V: selftests: cbo: Ensure asm operands match constraints ...
2024-01-20Merge tag 'strlcpy-removal-v6.8-rc1' of ↵Linus Torvalds4-26/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull strlcpy removal from Kees Cook: "As promised, this is 'part 2' of the hardening tree, late in -rc1 now that all the other trees with strlcpy() removals have landed. One new user appeared (in bcachefs) but was a trivial refactor. The kernel is now free of the strlcpy() API! - Remove of the final (very recent) user of strlcpy() (in bcachefs) - Remove the strlcpy() API. Long live strscpy()" * tag 'strlcpy-removal-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: string: Remove strlcpy() bcachefs: Replace strlcpy() with strscpy()
2024-01-19string: Remove strlcpy()Kees Cook4-26/+1
With all the users of strlcpy() removed[1] from the kernel, remove the API, self-tests, and other references. Leave mentions in Documentation (about its deprecation), and in checkpatch.pl (to help migrate host-only tools/ usage). Long live strscpy(). Link: https://github.com/KSPP/linux/issues/89 [1] Cc: Azeem Shaikh <azeemshaikh38@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: linux-hardening@vger.kernel.org Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-01-19lib: checksum: Fix build with CONFIG_NET=nPalmer Dabbelt1-0/+2
The generic ipv6 checksums are only defined with CONFIG_NET=y, so gate the test as well. Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401192143.jLdjbIy3-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202401192357.WU4nPRdN-lkp@intel.com/ Reviewed-By: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240119145600.3093-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-19Merge tag 'for-6.8/block-2024-01-18' of git://git.kernel.dk/linuxLinus Torvalds1-5/+0
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - tcp, fc, and rdma target fixes (Maurizio, Daniel, Hannes, Christoph) - discard fixes and improvements (Christoph) - timeout debug improvements (Keith, Max) - various cleanups (Daniel, Max, Giuxen) - trace event string fixes (Arnd) - shadow doorbell setup on reset fix (William) - a write zeroes quirk for SK Hynix (Jim) - MD pull request via Song: - Sparse warning since v6.0 (Bart) - /proc/mdstat regression since v6.7 (Yu Kuai) - Use symbolic error value (Christian) - IO Priority documentation update (Christian) - Fix for accessing queue limits without having entered the queue (Christoph, me) - Fix for loop dio support (Christoph) - Move null_blk off deprecated ida interface (Christophe) - Ensure nbd initializes full msghdr (Eric) - Fix for a regression with the folio conversion, which is now easier to hit because of an unrelated change (Matthew) - Remove redundant check in virtio-blk (Li) - Fix for a potential hang in sbitmap (Ming) - Fix for partial zone appending (Damien) - Misc changes and fixes (Bart, me, Kemeng, Dmitry) * tag 'for-6.8/block-2024-01-18' of git://git.kernel.dk/linux: (45 commits) Documentation: block: ioprio: Update schedulers loop: fix the the direct I/O support check when used on top of block devices blk-mq: Remove the hctx 'run' debugfs attribute nbd: always initialize struct msghdr completely block: Fix iterating over an empty bio with bio_for_each_folio_all block: bio-integrity: fix kcalloc() arguments order virtio_blk: remove duplicate check if queue is broken in virtblk_done sbitmap: remove stale comment in sbq_calc_wake_batch block: Correct a documentation comment in blk-cgroup.c null_blk: Remove usage of the deprecated ida_simple_xx() API block: ensure we hold a queue reference when using queue limits blk-mq: rename blk_mq_can_use_cached_rq block: print symbolic error name instead of error code blk-mq: fix IO hang from sbitmap wakeup race nvmet-rdma: avoid circular locking dependency on install_queue() nvmet-tcp: avoid circular locking dependency on install_queue() nvme-pci: set doorbell config before unquiescing block: fix partial zone append completion handling in req_bio_endio() block/iocost: silence warning on 'last_period' potentially being unused md/raid1: Use blk_opf_t for read and write operations ...