summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2023-02-14Merge tag 'mm-hotfixes-stable-2023-02-13-13-50' of ↵Linus Torvalds1-19/+20
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Twelve hotfixes, mostly against mm/. Five of these fixes are cc:stable" * tag 'mm-hotfixes-stable-2023-02-13-13-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem scripts/gdb: fix 'lx-current' for x86 lib: parser: optimize match_NUMBER apis to use local array mm: shrinkers: fix deadlock in shrinker debugfs mm: hwpoison: support recovery from ksm_might_need_to_copy() kasan: fix Oops due to missing calls to kasan_arch_is_ready() revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" fsdax: dax_unshare_iter() should return a valid length mm/gup: add folio to list when folio_isolate_lru() succeed aio: fix mremap after fork null-deref mailmap: add entry for Alexander Mikhalitsyn mm: extend max struct page size for kmsan
2023-02-10lib: parser: optimize match_NUMBER apis to use local arrayLi Lingfeng1-19/+20
Memory will be allocated to store substring_t in match_strdup(), which means the caller of match_strdup() may need to be scheduled out to wait for reclaiming memory. smatch complains that this can cuase sleeping in an atoic context. Using local array to store substring_t to remove the restriction. Link: https://lkml.kernel.org/r/20230120032352.242767-1-lilingfeng3@huawei.com Link: https://lore.kernel.org/all/20221104023938.2346986-5-yukuai1@huaweicloud.com/ Link: https://lkml.kernel.org/r/20230120032352.242767-1-lilingfeng3@huawei.com Fixes: 2c0647988433 ("blk-iocost: don't release 'ioc->lock' while updating params") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reported-by: Yu Kuai <yukuai1@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: BingJing Chang <bingjingc@synology.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Hou Tao <houtao1@huawei.com> Cc: James Smart <james.smart@broadcom.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: yangerkun <yangerkun@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-05Merge tag 'perf_urgent_for_v6.2_rc7' of ↵Linus Torvalds1-0/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Lock the proper critical section when dealing with perf event context * tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_event_pmu_context serialization
2023-02-03Merge tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of ↵Linus Torvalds3-12/+102
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "25 hotfixes, mainly for MM. 13 are cc:stable" * tag 'mm-hotfixes-stable-2023-02-02-19-24-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() Kconfig.debug: fix the help description in SCHED_DEBUG mm/swapfile: add cond_resched() in get_swap_pages() mm: use stack_depot_early_init for kmemleak Squashfs: fix handling and sanity checking of xattr_ids count sh: define RUNTIME_DISCARD_EXIT highmem: round down the address passed to kunmap_flush_on_unmap() migrate: hugetlb: check for hugetlb shared PMD in node migration mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups Revert "mm: kmemleak: alloc gray object for reserved region with direct map" freevxfs: Kconfig: fix spelling maple_tree: should get pivots boundary by type .mailmap: update e-mail address for Eugen Hristev mm, mremap: fix mremap() expanding for vma's with vm_ops->close() squashfs: harden sanity check in squashfs_read_xattr_id_table ia64: fix build error due to switch case label appearing next to declaration mm: multi-gen LRU: fix crash during cgroup migration Revert "mm: add nodes= arg to memory.reclaim" zsmalloc: fix a race with deferred_handles storing ...
2023-02-03Merge tag 'linux-kselftest-kunit-fixes-6.2-rc7' of ↵Linus Torvalds2-15/+26
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "Three fixes to bugs that cause kernel crash, link error during build, and a third to fix kunit_test_init_section_suites() extra indirection issue" * tag 'linux-kselftest-kunit-fixes-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: fix kunit_test_init_section_suites(...) kunit: fix bug in KUNIT_EXPECT_MEMEQ kunit: Export kunit_running()
2023-02-01Kconfig.debug: fix the help description in SCHED_DEBUGye xingchen1-1/+1
The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched. Link: https://lkml.kernel.org/r/202301291013573466558@zte.com.cn Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-01mm: use stack_depot_early_init for kmemleakZhaoyang Huang1-0/+1
Mirsad report the below error which is caused by stack_depot_init() failure in kvcalloc. Solve this by having stackdepot use stack_depot_early_init(). On 1/4/23 17:08, Mirsad Goran Todorovac wrote: I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak: [root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff951c118568b0 (size 16): comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s) hex dump (first 16 bytes): 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0....... backtrace: [root@pc-mtodorov ~]# Apparently, backtrace of called functions on the stack is no longer printed with the list of memory leaks. This appeared on Lenovo desktop 10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and 6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from Mr. Torvalds' tree. I don't know if this is deliberate feature for some reason or a bug. Please find attached the config, lshw and kmemleak output. [vbabka@suse.cz: remove stack_depot_init() call] Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/ Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com Fixes: 56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: ke.wang <ke.wang@unisoc.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-01maple_tree: should get pivots boundary by typeWei Yang1-2/+3
We should get pivots boundary by type. Fixes a potential overindexing of mt_pivots[]. Link: https://lkml.kernel.org/r/20221112234308.23823-1-richard.weiyang@gmail.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-01maple_tree: fix mas_empty_area_rev() lower bound validationLiam Howlett2-9/+97
mas_empty_area_rev() was not correctly validating the start of a gap against the lower limit. This could lead to the range starting lower than the requested minimum. Fix the issue by better validating a gap once one is found. This commit also adds tests to the maple tree test suite for this issue and tests the mas_empty_area() function for similar bound checking. Link: https://lkml.kernel.org/r/20230111200136.1851322-1-Liam.Howlett@oracle.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216911 Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: <amanieu@gmail.com> Link: https://lore.kernel.org/linux-mm/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/ Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31perf: Fix perf_event_pmu_context serializationJames Clark1-0/+31
Syzkaller triggered a WARN in put_pmu_ctx(). WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278 This is because there is no locking around the access of "if (!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in put_pmu_ctx(). The decrement of the reference count in put_pmu_ctx() also happens outside of the spinlock, leading to the possibility of this order of events, and the context being cleared in put_pmu_ctx(), after its refcount is non zero: CPU0 CPU1 find_get_pmu_context() if (!epc->ctx) == false put_pmu_ctx() atomic_dec_and_test(&epc->refcount) == true epc->refcount == 0 atomic_inc(&epc->refcount); epc->refcount == 1 list_del_init(&epc->pmu_ctx_entry); epc->ctx = NULL; Another issue is that WARN_ON for no active PMU events in put_pmu_ctx() is outside of the lock. If the perf_event_pmu_context is an embedded one, even after clearing it, it won't be deleted and can be re-used. So the warning can trigger. For this reason it also needs to be moved inside the lock. The above warning is very quick to trigger on Arm by running these two commands at the same time: while true; do perf record -- ls; done while true; do perf record -- ls; done [peterz: atomic_dec_and_raw_lock*()] Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: syzbot+697196bc0265049822bd@syzkaller.appspotmail.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20230127143141.1782804-2-james.clark@arm.com
2023-01-31kunit: fix bug in KUNIT_EXPECT_MEMEQRae Moar1-15/+25
In KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ, add check if one of the inputs is NULL and fail if this is the case. Currently, the kernel crashes if one of the inputs is NULL. Instead, fail the test and add an appropriate error message. Fixes: b8a926bea8b1 ("kunit: Introduce KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ macros") This was found by the kernel test robot: https://lore.kernel.org/all/202212191448.D6EDPdOh-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-01-28Merge tag 'hardening-v6.2-rc6' of ↵Linus Torvalds2-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST - Reorganize gcc-plugin includes for GCC 13 - Silence bcache memcpy run-time false positive warnings * tag 'hardening-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: bcache: Silence memcpy() run-time false positive warnings gcc-plugins: Reorganize gimple includes for GCC 13 kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST
2023-01-28Merge tag 'trace-v6.2-rc5' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix filter memory leak by calling ftrace_free_filter() - Initialize trace_printk() earlier so that ftrace_dump_on_oops shows data on early crashes. - Update the outdated instructions in scripts/tracing/ftrace-bisect.sh - Add lockdep_is_held() to fix lockdep warning - Add allocation failure check in create_hist_field() - Don't initialize pointer that gets set right away in enabled_monitors_write() - Update MAINTAINER entries - Fix help messages in Kconfigs - Fix kernel-doc header for update_preds() * tag 'trace-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Update MAINTAINERS file to add tree and mailing list rv: remove redundant initialization of pointer ptr ftrace: Maintain samples/ftrace tracing/filter: fix kernel-doc warnings lib: Kconfig: fix spellos trace_events_hist: add check for return value of 'create_hist_field' tracing/osnoise: Use built-in RCU list checking tracing: Kconfig: Fix spelling/grammar/punctuation ftrace/scripts: Update the instructions for ftrace-bisect.sh tracing: Make sure trace_printk() can output as soon as it can be used ftrace: Export ftrace_free_filter() to modules
2023-01-26Merge tag 'net-6.2-rc6' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - sched: sch_taprio: do not schedule in taprio_reset() Previous releases - regressions: - core: fix UaF in netns ops registration error path - ipv4: prevent potential spectre v1 gadgets - ipv6: fix reachability confirmation with proxy_ndp - netfilter: fix for the set rbtree - eth: fec: use page_pool_put_full_page when freeing rx buffers - eth: iavf: fix temporary deadlock and failure to set MAC address Previous releases - always broken: - netlink: prevent potential spectre v1 gadgets - netfilter: fixes for SCTP connection tracking - mctp: struct sock lifetime fixes - eth: ravb: fix possible hang if RIS2_QFF1 happen - eth: tg3: resolve deadlock in tg3_reset_task() during EEH Misc: - Mat stepped out as MPTCP co-maintainer" * tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits) net: mdio-mux-meson-g12a: force internal PHY off on mux switch docs: networking: Fix bridge documentation URL tsnep: Fix TX queue stop/wake for multiple queues net/tg3: resolve deadlock in tg3_reset_task() during EEH net: mctp: mark socks as dead on unhash, prevent re-add net: mctp: hold key reference when looking up a general key net: mctp: move expiry timer delete to unhash net: mctp: add an explicit reference from a mctp_sk_key to sock net: ravb: Fix possible hang if RIS2_QFF1 happen net: ravb: Fix lack of register setting after system resumed for Gen3 net/x25: Fix to not accept on connected socket ice: move devlink port creation/deletion sctp: fail if no bound addresses can be used for a given scope net/sched: sch_taprio: do not schedule in taprio_reset() Revert "Merge branch 'ethtool-mac-merge'" netrom: Fix use-after-free of a listening socket. netfilter: conntrack: unify established states for SCTP paths Revert "netfilter: conntrack: add sctp DATA_SENT state" netfilter: conntrack: fix bug in for_each_sctp_chunk netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE ...
2023-01-25kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TESTKees Cook2-0/+11
Since the long memcpy tests may stall a system for tens of seconds in virtualized architecture environments, split those tests off under CONFIG_MEMCPY_SLOW_KUNIT_TEST so they can be separately disabled. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/20221226195206.GA2626419@roeck-us.net Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: David Gow <davidgow@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-01-25lib: Kconfig: fix spellosRandy Dunlap2-2/+2
Fix spelling in lib/ Kconfig files. (reported by codespell) Link: https://lkml.kernel.org/r/20230124181655.16269-1-rdunlap@infradead.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-01-21netlink: prevent potential spectre v1 gadgetsEric Dumazet1-0/+3
Most netlink attributes are parsed and validated from __nla_validate_parse() or validate_nla() u16 type = nla_type(nla); if (type == 0 || type > maxtype) { /* error or continue */ } @type is then used as an array index and can be used as a Spectre v1 gadget. array_index_nospec() can be used to prevent leaking content of kernel memory to malicious users. This should take care of vast majority of netlink uses, but an audit is needed to take care of others where validation is not yet centralized in core netlink functions. Fixes: bfa83a9e03cf ("[NETLINK]: Type-safe netlink messages/attributes interface") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230119110150.2678537-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-10/+15
Pull rdma fixes from Jason Gunthorpe: - Several hfi1 patches fixing some long standing driver bugs - Overflow when working with sg lists with elements greater than 4G - An rxe regression with object numbering after the mrs reach their limit - A theoretical problem with the scatterlist merging code * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: lib/scatterlist: Fix to calculate the last_pg properly IB/hfi1: Remove user expected buffer invalidate race IB/hfi1: Immediately remove invalid memory from hardware IB/hfi1: Fix expected receive setup error exit issues IB/hfi1: Reserve user expected TIDs IB/hfi1: Reject a zero-length user expected buffer RDMA/core: Fix ib block iterator counter overflow RDMA/rxe: Prevent faulty rkey generation RDMA/rxe: Fix inaccurate constants in rxe_type_info
2023-01-20kunit: Export kunit_running()Arnd Bergmann1-0/+1
Using kunit_fail_current_test() in a loadable module causes a link error like: ERROR: modpost: "kunit_running" [drivers/gpu/drm/vc4/vc4.ko] undefined! Export the symbol to allow using it from modules. Fixes: da43ff045c3f ("drm/vc4: tests: Fail the current test if we access a register") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-01-19Sync with v6.2-rc4Andrew Morton1-1/+0
Merge branch 'master' into mm-hotfixes-stable
2023-01-17Merge tag 'mm-hotfixes-stable-2023-01-16-15-23' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "21 hotfixes. Thirteen of these address pre-6.1 issues and hence have the cc:stable tag" * tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) init/Kconfig: fix typo (usafe -> unsafe) nommu: fix split_vma() map_count error nommu: fix do_munmap() error path nommu: fix memory leak in do_mmap() error path MAINTAINERS: update Robert Foss' email address proc: fix PIE proc-empty-vm, proc-pid-vm tests mm: update mmap_sem comments to refer to mmap_lock include/linux/mm: fix release_pages_arg kernel doc comment lib/win_minmax: use /* notation for regular comments kasan: mark kasan_kunit_executing as static nilfs2: fix general protection fault in nilfs_btree_insert() Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects hugetlb: unshare some PMDs when splitting VMAs mm: fix vma->anon_name memory leak for anonymous shmem VMAs mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection() ...
2023-01-16lib/scatterlist: Fix to calculate the last_pg properlyYishai Hadas1-10/+15
The last_pg is wrong, it is actually the first page of the last scatterlist element. To get the last page of the last scatterlist element we have to add prv->length. So it is checking mergability against the wrong page, Further, a SG element is not guaranteed to end on a page boundary, so we have to check the sub page location also for merge eligibility. Fix the above by checking physical contiguity based on PFNs, compute the actual last page and then call pages_are_mergable(). Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages") Link: https://lore.kernel.org/r/20230111101054.188136-1-yishaih@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-13lockref: stop doing cpu_relax in the cmpxchg loopMateusz Guzik1-1/+0
On the x86-64 architecture even a failing cmpxchg grants exclusive access to the cacheline, making it preferable to retry the failed op immediately instead of stalling with the pause instruction. To illustrate the impact, below are benchmark results obtained by running various will-it-scale tests on top of the 6.2-rc3 kernel and Cascade Lake (2 sockets * 24 cores * 2 threads) CPU. All results in ops/s. Note there is some variance in re-runs, but the code is consistently faster when contention is present. open3 ("Same file open/close"): proc stock no-pause 1 805603 814942 (+%1) 2 1054980 1054781 (-0%) 8 1544802 1822858 (+18%) 24 1191064 2199665 (+84%) 48 851582 1469860 (+72%) 96 609481 1427170 (+134%) fstat2 ("Same file fstat"): proc stock no-pause 1 3013872 3047636 (+1%) 2 4284687 4400421 (+2%) 8 3257721 5530156 (+69%) 24 2239819 5466127 (+144%) 48 1701072 5256609 (+209%) 96 1269157 6649326 (+423%) Additionally, a kernel with a private patch to help access() scalability: access2 ("Same file access"): proc stock patched patched +nopause 24 2378041 2005501 5370335 (-15% / +125%) That is, fixing the problems in access itself *reduces* scalability after the cacheline ping-pong only happens in lockref with the pause instruction. Note that fstat and access benchmarks are not currently integrated into will-it-scale, but interested parties can find them in pull requests to said project. Code at hand has a rather tortured history. First modification showed up in commit d472d9d98b46 ("lockref: Relax in cmpxchg loop"), written with Itanium in mind. Later it got patched up to use an arch-dependent macro to stop doing it on s390 where it caused a significant regression. Said macro had undergone revisions and was ultimately eliminated later, going back to cpu_relax. While I intended to only remove cpu_relax for x86-64, I got the following comment from Linus: I would actually prefer just removing it entirely and see if somebody else hollers. You have the numbers to prove it hurts on real hardware, and I don't think we have any numbers to the contrary. So I think it's better to trust the numbers and remove it as a failure, than say "let's just remove it on x86-64 and leave everybody else with the potentially broken code" Additionally, Will Deacon (maintainer of the arm64 port, one of the architectures previously benchmarked): So, from the arm64 side of the fence, I'm perfectly happy just removing the cpu_relax() calls from lockref. As such, come back full circle in history and whack it altogether. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/ Acked-by: Tony Luck <tony.luck@intel.com> # ia64 Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-12lib/win_minmax: use /* notation for regular commentsRandy Dunlap1-1/+1
Don't use kernel-doc "/**" notation for non-kernel-doc comments. Prevents a kernel-doc warning: lib/win_minmax.c:31: warning: expecting prototype for lib/minmax.c(). Prototype was for minmax_subwin_update() instead Link: https://lkml.kernel.org/r/20230102211614.26343-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+1
Pull rdma fixes from Jason Gunthorpe: "Most noticeable is that Yishai found a big data corruption regression due to a change in the scatterlist: - Do not wrongly combine non-contiguous pages in scatterlist - Fix compilation warnings on gcc 13 - Oops when using some mlx5 stats - Bad enforcement of atomic responder resources in mlx5" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: lib/scatterlist: Fix to merge contiguous pages into the last SG properly RDMA/mlx5: Fix validation of max_rd_atomic caps for DC RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device RDMA/srp: Move large values to a new enum for gcc13
2023-01-05lib/scatterlist: Fix to merge contiguous pages into the last SG properlyYishai Hadas1-1/+1
When sg_alloc_append_table_from_pages() calls to pages_are_mergeable() in its 'sgt_append->prv' flow to check whether it can merge contiguous pages into the last SG, it passes the page arguments in the wrong order. The first parameter should be the next candidate page to be merged to the last page and not the opposite. The current code leads to a corrupted SG which resulted in OOPs and unexpected errors when non-contiguous pages are merged wrongly. Fix to pass the page parameters in the right order. Fixes: 1567b49d1a40 ("lib/scatterlist: add check when merging zone device pages") Link: https://lore.kernel.org/r/20230105112339.107969-1-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-27kunit: alloc_string_stream_fragment error handling bug fixYoungJun.park1-1/+3
When it fails to allocate fragment, it does not free and return error. And check the pointer inappropriately. Fixed merge conflicts with commit 618887768bb7 ("kunit: update NULL vs IS_ERR() tests") Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: YoungJun.park <her0gyugyu@gmail.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-22test_maple_tree: add test for mas_spanning_rebalance() on insufficient dataLiam Howlett1-0/+23
Add a test to the maple tree test suite for the spanning rebalance insufficient node issue does not go undetected again. Link: https://lkml.kernel.org/r/20221219161922.2708732-3-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-22maple_tree: fix mas_spanning_rebalance() on insufficient dataLiam Howlett1-1/+3
Mike Rapoport contacted me off-list with a regression in running criu. Periodic tests fail with an RCU stall during execution. Although rare, it is possible to hit this with other uses so this patch should be backported to fix the regression. This patchset adds the fix and a test case to the maple tree test suite. This patch (of 2): An insufficient node was causing an out-of-bounds access on the node in mas_leaf_max_gap(). The cause was the faulty detection of the new node being a root node when overwriting many entries at the end of the tree. Fix the detection of a new root and ensure there is sufficient data prior to entering the spanning rebalance loop. Link: https://lkml.kernel.org/r/20221219161922.2708732-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20221219161922.2708732-2-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Mike Rapoport <rppt@kernel.org> Tested-by: Mike Rapoport <rppt@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-20Merge tag 'spdx-6.2-rc1' of ↵Linus Torvalds2-22/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX/License additions from Greg KH: "Here are two small updates for LICENSES and some kernel files that add the Copyleft-next license and use it in a SPDX tag as a dual-license for some kernel files. These have been discussed thoroughly in public on the linux-spdx mailing list, and have the needed acks on them, as well as having been in linux-next with no reported issues for quite some time" * tag 'spdx-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: testing: use the copyleft-next-0.3.1 SPDX tag LICENSES: Add the copyleft-next-0.3.1 license
2022-12-20Merge tag 'asm-generic-6.2-1' of ↵Linus Torvalds1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are only three fairly simple patches. The #include change to linux/swab.h addresses a userspace build issue, and the change to the mmio tracing logic helps provide more useful traces" * tag 'asm-generic-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: uapi: Add missing _UAPI prefix to <asm-generic/types.h> include guard asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug info include/uapi/linux/swab: Fix potentially missing __always_inline
2022-12-19Merge tag 'kbuild-v6.2' of ↵Linus Torvalds1-2/+27
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support zstd-compressed debug info - Allow W=1 builds to detect objects shared among multiple modules - Add srcrpm-pkg target to generate a source RPM package - Make the -s option detection work for future GNU Make versions - Add -Werror to KBUILD_CPPFLAGS when CONFIG_WERROR=y - Allow W=1 builds to detect -Wundef warnings in any preprocessed files - Raise the minimum supported version of binutils to 2.25 - Use $(intcmp ...) to compare integers if GNU Make >= 4.4 is used - Use $(file ...) to read a file if GNU Make >= 4.2 is used - Print error if GNU Make older than 3.82 is used - Allow modpost to detect section mismatches with Clang LTO - Include vmlinuz.efi into kernel tarballs for arm64 CONFIG_EFI_ZBOOT=y * tag 'kbuild-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits) buildtar: fix tarballs with EFI_ZBOOT enabled modpost: Include '.text.*' in TEXT_SECTIONS padata: Mark padata_work_init() as __ref kbuild: ensure Make >= 3.82 is used kbuild: refactor the prerequisites of the modpost rule kbuild: change module.order to list *.o instead of *.ko kbuild: use .NOTINTERMEDIATE for future GNU Make versions kconfig: refactor Makefile to reduce process forks kbuild: add read-file macro kbuild: do not sort after reading modules.order kbuild: add test-{ge,gt,le,lt} macros Documentation: raise minimum supported version of binutils to 2.25 kbuild: add -Wundef to KBUILD_CPPFLAGS for W=1 builds kbuild: move -Werror from KBUILD_CFLAGS to KBUILD_CPPFLAGS kbuild: Port silent mode detection to future gnu make. init/version.c: remove #include <generated/utsrelease.h> firmware_loader: remove #include <generated/utsrelease.h> modpost: Mark uuid_le type to be suitable only for MEI kbuild: add ability to make source rpm buildable using koji kbuild: warn objects shared among multiple modules ...
2022-12-19Merge tag 'zstd-linus-v6.2' of https://github.com/terrelln/linuxLinus Torvalds38-2443/+6689
Pull zstd updates from Nick Terrell: "Update the kernel to upstream zstd v1.5.2 [0]. Specifically to the tag v1.5.2-kernel [1] which includes several cherrypicked fixes for the kernel on top of v1.5.2. Excepting the MAINTAINERS change, all the changes in this can be generated by: git clone https://github.com/facebook/zstd cd zstd/contrib/linux-kernel git checkout v1.5.2-kernel LINUX=/path/to/linux/repo make import Additionally, this includes several minor typo fixes, which have all been fixed upstream so they are maintained on the next import" Link: https://github.com/facebook/zstd/releases/tag/v1.5.2 [0] Link: https://github.com/facebook/zstd/tree/v1.5.2-kernel [1] Link: https://lore.kernel.org/lkml/20221024202606.404049-1-nickrterrell@gmail.com/ Link: https://github.com/torvalds/linux/commit/637a642f5ca5e850186bb64ac75ebb0f124b458d * tag 'zstd-linus-v6.2' of https://github.com/terrelln/linux: zstd: import usptream v1.5.2 zstd: Move zstd-common module exports to zstd_common_module.c lib: zstd: Fix comment typo lib: zstd: fix repeated words in comments MAINTAINERS: git://github -> https://github.com for terrelln lib: zstd: clean up double word in comment.
2022-12-19Merge tag 'mm-nonmm-stable-2022-12-17-20-32' of ↵Linus Torvalds2-9/+14
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull fault-injection updates from Andrew Morton: "Some fault-injection improvements from Wei Yongjun which enable stacktrace filtering on x86_64" * tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: fault-injection: make stacktrace filter works as expected fault-injection: make some stack filter attrs more readable fault-injection: skip stacktrace filtering by default fault-injection: allow stacktrace filter for x86-64
2022-12-19Merge tag 'mm-stable-2022-12-17-2' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more mm updates from Andrew Morton: - A few late-breaking minor fixups - Two minor feature patches which were awkwardly dependent on mm-nonmm. I need to set up a new branch to handle such things. * tag 'mm-stable-2022-12-17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: zram: zsmalloc: Add an additional co-maintainer mm/kmemleak: use %pK to display kernel pointers in backtrace mm: use stack_depot for recording kmemleak's backtrace maple_tree: update copyright dates for test code maple_tree: fix mas_find_rev() comment mm/gup_test: free memory allocated via kvcalloc() using kvfree()
2022-12-16Merge tag 'driver-core-6.2-rc1' of ↵Linus Torvalds1-11/+18
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.2-rc1. The "big" change in here is the addition of a new macro, container_of_const() that will preserve the "const-ness" of a pointer passed into it. The "problem" of the current container_of() macro is that if you pass in a "const *", out of it can comes a non-const pointer unless you specifically ask for it. For many usages, we want to preserve the "const" attribute by using the same call. For a specific example, this series changes the kobj_to_dev() macro to use it, allowing it to be used no matter what the const value is. This prevents every subsystem from having to declare 2 different individual macros (i.e. kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce the const value at build time, which having 2 macros would not do either. The driver for all of this have been discussions with the Rust kernel developers as to how to properly mark driver core, and kobject, objects as being "non-mutable". The changes to the kobject and driver core in this pull request are the result of that, as there are lots of paths where kobjects and device pointers are not modified at all, so marking them as "const" allows the compiler to enforce this. So, a nice side affect of the Rust development effort has been already to clean up the driver core code to be more obvious about object rules. All of this has been bike-shedded in quite a lot of detail on lkml with different names and implementations resulting in the tiny version we have in here, much better than my original proposal. Lots of subsystem maintainers have acked the changes as well. Other than this change, included in here are smaller stuff like: - kernfs fixes and updates to handle lock contention better - vmlinux.lds.h fixes and updates - sysfs and debugfs documentation updates - device property updates All of these have been in the linux-next tree for quite a while with no problems" * tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits) device property: Fix documentation for fwnode_get_next_parent() firmware_loader: fix up to_fw_sysfs() to preserve const usb.h: take advantage of container_of_const() device.h: move kobj_to_dev() to use container_of_const() container_of: add container_of_const() that preserves const-ness of the pointer driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion. driver core: fix up missed scsi/cxlflash class.devnode() conversion. driver core: fix up some missing class.devnode() conversions. driver core: make struct class.devnode() take a const * driver core: make struct class.dev_uevent() take a const * cacheinfo: Remove of_node_put() for fw_token device property: Add a blank line in Kconfig of tests device property: Rename goto label to be more precise device property: Move PROPERTY_ENTRY_BOOL() a bit down device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*() kernfs: fix all kernel-doc warnings and multiple typos driver core: pass a const * into of_device_uevent() kobject: kset_uevent_ops: make name() callback take a const * kobject: kset_uevent_ops: make filter() callback take a const * kobject: make kobject_namespace take a const * ...
2022-12-16Merge tag 'char-misc-6.2-rc1' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char/misc and other driver subsystem changes for 6.2-rc1. Nothing earth-shattering in here at all, just a lot of new driver development and minor fixes. Highlights include: - fastrpc driver updates - iio new drivers and updates - habanalabs driver updates for new hardware and features - slimbus driver updates - speakup module parameters added to aid in boot time configuration - i2c probe_new conversions for lots of different drivers - other small driver fixes and additions One semi-interesting change in here is the increase of the number of misc dynamic minors available to 1048448 to handle new huge-cpu systems. All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (521 commits) extcon: usbc-tusb320: Convert to i2c's .probe_new() extcon: rt8973: Convert to i2c's .probe_new() extcon: fsa9480: Convert to i2c's .probe_new() extcon: max77843: Replace irqchip mask_invert with unmask_base chardev: fix error handling in cdev_device_add() mcb: mcb-parse: fix error handing in chameleon_parse_gdd() drivers: mcb: fix resource leak in mcb_probe() coresight: etm4x: fix repeated words in comments coresight: cti: Fix null pointer error on CTI init before ETM coresight: trbe: remove cpuhp instance node before remove cpuhp state counter: stm32-lptimer-cnt: fix the check on arr and cmp registers update misc: fastrpc: Add dma_mask to fastrpc_channel_ctx misc: fastrpc: Add mmap request assigning for static PD pool misc: fastrpc: Safekeep mmaps on interrupted invoke misc: fastrpc: Add support for audiopd misc: fastrpc: Rework fastrpc_req_munmap misc: fastrpc: Use fastrpc_map_put in fastrpc_map_create on fail misc: fastrpc: Add fastrpc_remote_heap_alloc misc: fastrpc: Add reserved mem support misc: fastrpc: Rename audio protection domain to root ...
2022-12-16fault-injection: make stacktrace filter works as expectedWei Yongjun1-3/+9
stacktrace filter is checked after others, such as fail-nth, interval and probability. This make it doesn't work well as expected. Fix to running stacktrace filter before other filters. It will speed up fault inject testing for driver modules. Link: https://lkml.kernel.org/r/20220817080332.1052710-5-weiyongjun1@huawei.com Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Isabella Basso <isabbasso@riseup.net> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-16fault-injection: make some stack filter attrs more readableWei Yongjun1-4/+4
Attributes of stack filter are show as unsigned decimal, such as 'require-start', 'require-end'. This patch change to show them as unsigned hexadecimal for more readable. Before: $ echo 0xffffffffc0257000 > /sys/kernel/debug/failslab/require-start $ cat /sys/kernel/debug/failslab/require-start 18446744072638263296 After: $ echo 0xffffffffc0257000 > /sys/kernel/debug/failslab/require-start $ cat /sys/kernel/debug/failslab/require-start 0xffffffffc0257000 [wangyufen@huawei.com: use debugfs_create_xul() instead of debugfs_create_xl()] Link: https://lkml.kernel.org/r/1664331299-4976-1-git-send-email-wangyufen@huawei.com Link: https://lkml.kernel.org/r/20220817080332.1052710-4-weiyongjun1@huawei.com Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Isabella Basso <isabbasso@riseup.net> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-16fault-injection: skip stacktrace filtering by defaultWei Yongjun1-1/+1
If FAULT_INJECTION_STACKTRACE_FILTER is enabled, the depth is default to 32. This means fail_stacktrace() will iter each entry's stacktrace, even if filter is not configured. This patch changes to quick return from fail_stacktrace() if stacktrace filter is not set. Link: https://lkml.kernel.org/r/20220817080332.1052710-3-weiyongjun1@huawei.com Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Isabella Basso <isabbasso@riseup.net> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-16fault-injection: allow stacktrace filter for x86-64Wei Yongjun1-1/+0
This patchset allow fault injection to run on x86_64 and makes stacktrace filter work as expected. With this, we can test a device driver module with fault injection more easily. This patch (of 4): FAULT_INJECTION_STACKTRACE_FILTER option was apparently disallowed on x86_64 because of problems with the stack unwinder: commit 6d690dcac92a84f98fd774862628ff871b713660 Author: Akinobu Mita <akinobu.mita@gmail.com> Date: Sat May 12 10:36:53 2007 -0700 fault injection: disable stacktrace filter for x86-64 However, there is no problems whatsoever with this today. Let's allow it again. Link: https://lkml.kernel.org/r/20220817080332.1052710-1-weiyongjun1@huawei.com Link: https://lkml.kernel.org/r/20220817080332.1052710-2-weiyongjun1@huawei.com Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Isabella Basso <isabbasso@riseup.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-16mm: use stack_depot for recording kmemleak's backtraceZhaoyang Huang1-0/+1
Using stack_depot to record kmemleak's backtrace which has been implemented on slub for reducing redundant information. [akpm@linux-foundation.org: fix build - remove now-unused __save_stack_trace()] [zhaoyang.huang@unisoc.com: v3] Link: https://lkml.kernel.org/r/1667101354-4669-1-git-send-email-zhaoyang.huang@unisoc.com [akpm@linux-foundation.org: fix v3 layout oddities] [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/1666864224-27541-1-git-send-email-zhaoyang.huang@unisoc.com Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: ke.wang <ke.wang@unisoc.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhaoyang Huang <huangzhaoyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-16maple_tree: fix mas_find_rev() commentLiam Howlett1-1/+1
mas_find_rev() uses mas_prev_entry(), not mas_next_entry(), correct comment. Link: https://lkml.kernel.org/r/20221025173756.2719616-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-15Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-12-14Merge tag 'v6.2-p1' of ↵Linus Torvalds4-0/+1177
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Optimise away self-test overhead when they are disabled - Support symmetric encryption via keyring keys in af_alg - Flip hwrng default_quality, the default is now maximum entropy Algorithms: - Add library version of aesgcm - CFI fixes for assembly code - Add arm/arm64 accelerated versions of sm3/sm4 Drivers: - Remove assumption on arm64 that kmalloc is DMA-aligned - Fix selftest failures in rockchip - Add support for RK3328/RK3399 in rockchip - Add deflate support in qat - Merge ux500 into stm32 - Add support for TEE for PCI ID 0x14CA in ccp - Add mt7986 support in mtk - Add MaxLinear platform support in inside-secure - Add NPCM8XX support in npcm" * tag 'v6.2-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (184 commits) crypto: ux500/cryp - delete driver crypto: stm32/cryp - enable for use with Ux500 crypto: stm32 - enable drivers to be used on Ux500 dt-bindings: crypto: Let STM32 define Ux500 CRYP hwrng: geode - Fix PCI device refcount leak hwrng: amd - Fix PCI device refcount leak crypto: qce - Set DMA alignment explicitly crypto: octeontx2 - Set DMA alignment explicitly crypto: octeontx - Set DMA alignment explicitly crypto: keembay - Set DMA alignment explicitly crypto: safexcel - Set DMA alignment explicitly crypto: hisilicon/hpre - Set DMA alignment explicitly crypto: chelsio - Set DMA alignment explicitly crypto: ccree - Set DMA alignment explicitly crypto: ccp - Set DMA alignment explicitly crypto: cavium - Set DMA alignment explicitly crypto: img-hash - Fix variable dereferenced before check 'hdev->req' crypto: arm64/ghash-ce - use frame_push/pop macros consistently crypto: arm64/crct10dif - use frame_push/pop macros consistently crypto: arm64/aes-modes - use frame_push/pop macros consistently ...
2022-12-14Merge tag 'hardening-v6.2-rc1' of ↵Linus Torvalds10-343/+1075
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: - Convert flexible array members, fix -Wstringop-overflow warnings, and fix KCFI function type mismatches that went ignored by maintainers (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook) - Remove the remaining side-effect users of ksize() by converting dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add more __alloc_size attributes, and introduce full testing of all allocator functions. Finally remove the ksize() side-effect so that each allocation-aware checker can finally behave without exceptions - Introduce oops_limit (default 10,000) and warn_limit (default off) to provide greater granularity of control for panic_on_oops and panic_on_warn (Jann Horn, Kees Cook) - Introduce overflows_type() and castable_to_type() helpers for cleaner overflow checking - Improve code generation for strscpy() and update str*() kern-doc - Convert strscpy and sigphash tests to KUnit, and expand memcpy tests - Always use a non-NULL argument for prepare_kernel_cred() - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell) - Adjust orphan linker section checking to respect CONFIG_WERROR (Xin Li) - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu) - Fix um vs FORTIFY warnings for always-NULL arguments * tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits) ksmbd: replace one-element arrays with flexible-array members hpet: Replace one-element array with flexible-array member um: virt-pci: Avoid GCC non-NULL warning signal: Initialize the info in ksignal lib: fortify_kunit: build without structleak plugin panic: Expose "warn_count" to sysfs panic: Introduce warn_limit panic: Consolidate open-coded panic_on_warn checks exit: Allow oops_limit to be disabled exit: Expose "oops_count" to sysfs exit: Put an upper limit on how often we can oops panic: Separate sysctl logic from CONFIG_SMP mm/pgtable: Fix multiple -Wstringop-overflow warnings mm: Make ksize() a reporting-only function kunit/fortify: Validate __alloc_size attribute results drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid() drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid() driver core: Add __alloc_size hint to devm allocators overflow: Introduce overflows_type() and castable_to_type() coredump: Proactively round up to kmalloc bucket size ...
2022-12-14Merge tag 'for-linus-iommufd' of ↵Linus Torvalds2-0/+136
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...
2022-12-14Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds4-11/+34
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-14Merge branch 'zstd-next' into zstd-linusNick Terrell38-2443/+6689
2022-12-14Merge tag 'net-next-6.2' of ↵Linus Torvalds7-27/+18
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations - Add inet drop monitor support - A few GRO performance improvements - Add infrastructure for atomic dev stats, addressing long standing data races - De-duplicate common code between OVS and conntrack offloading infrastructure - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs - Add the helper support for connection-tracking OVS offload BPF: - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF - Make cgroup local storage available to non-cgroup attached BPF programs - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers - A relevant bunch of BPF verifier fixes and improvements - Veristat tool improvements to support custom filtering, sorting, and replay of results - Add LLVM disassembler as default library for dumping JITed code - Lots of new BPF documentation for various BPF maps - Add bpf_rcu_read_{,un}lock() support for sleepable programs - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs - Add support storing struct task_struct objects as kptrs in maps - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions Protocols: - TCP: implement Protective Load Balancing across switch links - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path - UDP: Introduce optional per-netns hash lookup table - IPv6: simplify and cleanup sockets disposal - Netlink: support different type policies for each generic netlink operation - MPTCP: add MSG_FASTOPEN and FastOpen listener side support - MPTCP: add netlink notification support for listener sockets events - SCTP: add VRF support, allowing sctp sockets binding to VRF devices - Add bridging MAC Authentication Bypass (MAB) support - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading - IPSec: extended ack support for more descriptive XFRM error reporting - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks - Tun: bump the link speed from 10Mbps to 10Gbps - Tun/VirtioNet: implement UDP segmentation offload support Driver API: - PHY/SFP: improve power level switching between standard level 1 and the higher power levels - New API for netdev <-> devlink_port linkage - PTP: convert existing drivers to new frequency adjustment implementation - DSA: add support for rx offloading - Autoload DSA tagging driver when dynamically changing protocol - Add new PCP and APPTRUST attributes to Data Center Bridging - Add configuration support for 800Gbps link speed - Add devlink port function attribute to enable/disable RoCE and migratable - Extend devlink-rate to support strict prioriry and weighted fair queuing - Add devlink support to directly reading from region memory - New device tree helper to fetch MAC address from nvmem - New big TCP helper to simplify temporary header stripping New hardware / drivers: - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter - PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S - PTP: - Orolia ART-CARD - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device Drivers: - CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support - RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support" * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...