summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-12-13Merge tag 'pull-iov_iter' of ↵Linus Torvalds31-72/+72
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls
2022-12-13Merge tag 'pull-elfcore' of ↵Linus Torvalds2-221/+51
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull elf coredumping updates from Al Viro: "Unification of regset and non-regset sides of ELF coredump handling. Collecting per-thread register values is the only thing that needs to be ifdefed there..." * tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: [elf] get rid of get_note_info_size() [elf] unify regset and non-regset cases [elf][non-regset] use elf_core_copy_task_regs() for dumper as well [elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument) elf_core_copy_task_regs(): task_pt_regs is defined everywhere [elf][regset] simplify thread list handling in fill_note_info() [elf][regset] clean fill_note_info() a bit kill extern of vsyscall32_sysctl kill coredump_params->regs kill signal_pt_regs()
2022-12-13Merge tag 'mm-nonmm-stable-2022-12-12' of ↵Linus Torvalds37-120/+446
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - A ptrace API cleanup series from Sergey Shtylyov - Fixes and cleanups for kexec from ye xingchen - nilfs2 updates from Ryusuke Konishi - squashfs feature work from Xiaoming Ni: permit configuration of the filesystem's compression concurrency from the mount command line - A series from Akinobu Mita which addresses bound checking errors when writing to debugfs files - A series from Yang Yingliang to address rapidio memory leaks - A series from Zheng Yejian to address possible overflow errors in encode_comp_t() - And a whole shower of singleton patches all over the place * tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits) ipc: fix memory leak in init_mqueue_fs() hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount rapidio: devices: fix missing put_device in mport_cdev_open kcov: fix spelling typos in comments hfs: Fix OOB Write in hfs_asc2mac hfs: fix OOB Read in __hfs_brec_find relay: fix type mismatch when allocating memory in relay_create_buf() ocfs2: always read both high and low parts of dinode link count io-mapping: move some code within the include guarded section kernel: kcsan: kcsan_test: build without structleak plugin mailmap: update email for Iskren Chernev eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD rapidio: fix possible UAF when kfifo_alloc() fails relay: use strscpy() is more robust and safer cpumask: limit visibility of FORCE_NR_CPUS acct: fix potential integer overflow in encode_comp_t() acct: fix accuracy loss for input value of encode_comp_t() linux/init.h: include <linux/build_bug.h> and <linux/stringify.h> rapidio: rio: fix possible name leak in rio_register_mport() rapidio: fix possible name leaks when rio_add_device() fails ...
2022-12-13Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds14-34/+27
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12Merge tag 'printk-for-6.2' of ↵Linus Torvalds1-3/+18
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of this_cpu_inc() on strong load-store architectures. - Introduce new console_list_lock to synchronize a manipulation of the list of registered consoles and their flags. This is a first step in removing the big-kernel-lock-like behavior of console_lock(). This semaphore still serializes console->write() calbacks against: - each other. It primary prevents potential races between early and proper console drivers using the same device. - suspend()/resume() callbacks and init() operations in some drivers. - various other operations in the tty/vt and framebufer susbsystems. It is likely that console_lock() serializes even operations that are not directly conflicting with the console->write() callbacks here. This is the most complicated big-kernel-lock aspect of the console_lock() that will be hard to untangle. - Introduce new console_srcu lock that is used to safely iterate and access the registered console drivers under SRCU read lock. This is a prerequisite for introducing atomic console drivers and console kthreads. It will reduce the complexity of serialization against normal consoles and console_lock(). Also it should remove the risk of deadlock during critical situations, like Oops or panic, when only atomic consoles are registered. - Check whether the console is registered instead of enabled on many locations. It was a historical leftover. - Cleanly force a preferred console in xenfb code instead of a dirty hack. - A lot of code and comment clean ups and improvements. * tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits) printk: htmldocs: add missing description tty: serial: sh-sci: use setup() callback for early console printk: relieve console_lock of list synchronization duties tty: serial: kgdboc: use console_list_lock to trap exit tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console() tty: serial: kgdboc: use console_list_lock for list traversal tty: serial: kgdboc: use srcu console list iterator proc: consoles: use console_list_lock for list iteration tty: tty_io: use console_list_lock for list synchronization printk, xen: fbfront: create/use safe function for forcing preferred netconsole: avoid CON_ENABLED misuse to track registration usb: early: xhci-dbc: use console_is_registered() tty: serial: xilinx_uartps: use console_is_registered() tty: serial: samsung_tty: use console_is_registered() tty: serial: pic32_uart: use console_is_registered() tty: serial: earlycon: use console_is_registered() tty: hvc: use console_is_registered() efi: earlycon: use console_is_registered() tty: nfcon: use console_is_registered() serial_core: replace uart_console_enabled() with uart_console_registered() ...
2022-12-12Merge tag 'locks-v6.2' of ↵Linus Torvalds10-26/+52
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "The main change here is to add the new locks_inode_context helper, and convert all of the places that dereference inode->i_flctx directly to use that instead. There is a new helper to indicate whether any locks are held on an inode. This is mostly for Ceph but may be usable elsewhere too. Andi Kleen requested that we print the PID when the LOCK_MAND warning fires, to help track down applications trying to use it. Finally, we added some new warnings to some of the file locking functions that fire when the ->fl_file and filp arguments differ. This helped us find some long-standing bugs in lockd. Patches for those are in Chuck Lever's tree and should be in his v6.2 PR. After that patch, people using NFSv2/v3 locking may see some warnings fire until those go in. Happy Holidays!" * tag 'locks-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: Add process name and pid to locks warning nfsd: use locks_inode_context helper nfs: use locks_inode_context helper lockd: use locks_inode_context helper ksmbd: use locks_inode_context helper cifs: use locks_inode_context helper ceph: use locks_inode_context helper filelock: add a new locks_inode_context accessor function filelock: new helper: vfs_inode_has_locks filelock: WARN_ON_ONCE when ->fl_file and filp don't match
2022-12-12Merge tag 'execve-v6.2-rc1' of ↵Linus Torvalds4-36/+45
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: "Most are small refactorings and bug fixes, but three things stand out: switching timens (which got reverted before) looks solid now, FOLL_FORCE has been removed (no failures seen yet across several weeks in -next), and some whitespace cleanups (which are long overdue). - Add timens support (when switching mm). This version has survived in -next for the entire cycle (Andrei Vagin) - Various small bug fixes, refactoring, and readability improvements (Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin) - Remove FOLL_FORCE for stack setup (Kees Cook) - Whitespace cleanups (Rolf Eike Beer, Kees Cook)" * tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_misc: fix shift-out-of-bounds in check_special_flags binfmt: Fix error return code in load_elf_fdpic_binary() exec: Remove FOLL_FORCE for stack setup binfmt_elf: replace IS_ERR() with IS_ERR_VALUE() binfmt_elf: simplify error handling in load_elf_phdrs() binfmt_elf: fix documented return value for load_elf_phdrs() exec: simplify initial stack size expansion binfmt: Fix whitespace issues exec: Add comments on check_unsafe_exec() fs counting ELF uapi: add spaces before '{' selftests/timens: add a test for vfork+exit fs/exec: switch timens when a task gets a new mm
2022-12-12Merge tag 'pstore-v6.2-rc1' of ↵Linus Torvalds5-29/+160
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "A small collection of bug fixes, refactorings, and general improvements: - Reporting improvements and return path fixes (Guilherme G. Piccoli, Wang Yufen, Kees Cook) - Clean up kmsg_bytes module parameter usage (Guilherme G. Piccoli) - Add Guilherme to pstore MAINTAINERS entry - Choose friendlier allocation flags (Qiujun Huang, Stephen Boyd)" * tag 'pstore-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Avoid kcore oops by vmap()ing with VM_IOREMAP pstore/ram: Fix error return code in ramoops_probe() pstore: Alert on backend write error MAINTAINERS: Update pstore maintainers pstore/ram: Set freed addresses to NULL pstore/ram: Move internal definitions out of kernel-wide include pstore/ram: Move pmsg init earlier pstore/ram: Consolidate kfree() paths efi: pstore: Follow convention for the efi-pstore backend name pstore: Inform unregistered backend names as well pstore: Expose kmsg_bytes as a module parameter pstore: Improve error reporting in case of backend overlap pstore/zone: Use GFP_ATOMIC to allocate zone buffer
2022-12-12hfsplus: fix bug causing custom uid and gid being unable to be assigned with ↵Aditya Garg3-2/+8
mount Despite specifying UID and GID in mount command, the specified UID and GID were not being assigned. This patch fixes this issue. Link: https://lkml.kernel.org/r/C0264BF5-059C-45CF-B8DA-3A3BD2C803A2@live.com Signed-off-by: Aditya Garg <gargaditya08@live.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-12hfs: Fix OOB Write in hfs_asc2macZhangPeng1-1/+1
Syzbot reported a OOB Write bug: loop0: detected capacity change from 0 to 64 ================================================================== BUG: KASAN: slab-out-of-bounds in hfs_asc2mac+0x467/0x9a0 fs/hfs/trans.c:133 Write of size 1 at addr ffff88801848314e by task syz-executor391/3632 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report+0xcd/0x100 mm/kasan/report.c:495 hfs_asc2mac+0x467/0x9a0 fs/hfs/trans.c:133 hfs_cat_build_key+0x92/0x170 fs/hfs/catalog.c:28 hfs_lookup+0x1ab/0x2c0 fs/hfs/dir.c:31 lookup_open fs/namei.c:3391 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x10e6/0x2df0 fs/namei.c:3710 do_filp_open+0x264/0x4f0 fs/namei.c:3740 If in->len is much larger than HFS_NAMELEN(31) which is the maximum length of an HFS filename, a OOB write could occur in hfs_asc2mac(). In that case, when the dst reaches the boundary, the srclen is still greater than 0, which causes a OOB write. Fix this by adding a check on dstlen in while() before writing to dst address. Link: https://lkml.kernel.org/r/20221202030038.1391945-1-zhangpeng362@huawei.com Fixes: 328b92278650 ("[PATCH] hfs: NLS support") Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Reported-by: <syzbot+dc3b1cf9111ab5fe98e7@syzkaller.appspotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-12hfs: fix OOB Read in __hfs_brec_findZhangPeng1-0/+2
Syzbot reported a OOB read bug: ================================================================== BUG: KASAN: slab-out-of-bounds in hfs_strcmp+0x117/0x190 fs/hfs/string.c:84 Read of size 1 at addr ffff88807eb62c4e by task kworker/u4:1/11 CPU: 1 PID: 11 Comm: kworker/u4:1 Not tainted 6.1.0-rc6-syzkaller-00308-g644e9524388a #0 Workqueue: writeback wb_workfn (flush-7:0) Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report+0xcd/0x100 mm/kasan/report.c:495 hfs_strcmp+0x117/0x190 fs/hfs/string.c:84 __hfs_brec_find+0x213/0x5c0 fs/hfs/bfind.c:75 hfs_brec_find+0x276/0x520 fs/hfs/bfind.c:138 hfs_write_inode+0x34c/0xb40 fs/hfs/inode.c:462 write_inode fs/fs-writeback.c:1440 [inline] If the input inode of hfs_write_inode() is incorrect: struct inode struct hfs_inode_info struct hfs_cat_key struct hfs_name u8 len # len is greater than HFS_NAMELEN(31) which is the maximum length of an HFS filename OOB read occurred: hfs_write_inode() hfs_brec_find() __hfs_brec_find() hfs_cat_keycmp() hfs_strcmp() # OOB read occurred due to len is too large Fix this by adding a Check on len in hfs_write_inode() before calling hfs_brec_find(). Link: https://lkml.kernel.org/r/20221130065959.2168236-1-zhangpeng362@huawei.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reported-by: <syzbot+e836ff7133ac02be825f@syzkaller.appspotmail.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-12ocfs2: always read both high and low parts of dinode link countAlexey Asemov1-2/+1
When filesystem is using indexed-dirs feature, maximum link count values can spill over to i_links_count_hi, up to OCFS2_DX_LINK_MAX links. ocfs2_read_links_count() checks for OCFS2_INDEXED_DIR_FL flag in dinode, but this flag is only valid for directories so for files the check causes high part of the link count not being read back from file dinodes resulting in wrong link count value when file has >65535 links. As ocfs2_set_links_count() always writes both high and low parts of link count, the flag check on reading may be removed. Link: https://lkml.kernel.org/r/cbfca02b-b39f-89de-e1a8-904a6c60407e@alex-at.net Signed-off-by: Alexey Asemov <alex@alex-at.net> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-07fscache: Fix oops due to race with cookie_lru and use_cookieDave Wysochanski1-0/+8
If a cookie expires from the LRU and the LRU_DISCARD flag is set, but the state machine has not run yet, it's possible another thread can call fscache_use_cookie and begin to use it. When the cookie_worker finally runs, it will see the LRU_DISCARD flag set, transition the cookie->state to LRU_DISCARDING, which will then withdraw the cookie. Once the cookie is withdrawn the object is removed the below oops will occur because the object associated with the cookie is now NULL. Fix the oops by clearing the LRU_DISCARD bit if another thread uses the cookie before the cookie_worker runs. BUG: kernel NULL pointer dereference, address: 0000000000000008 ... CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] ... Call Trace: netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] process_one_work+0x217/0x3e0 worker_thread+0x4a/0x3b0 kthread+0xd6/0x100 Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Reported-by: Daire Byrne <daire.byrne@gmail.com> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Daire Byrne <daire@dneg.com> Link: https://lore.kernel.org/r/20221117115023.1350181-1-dwysocha@redhat.com/ # v1 Link: https://lore.kernel.org/r/20221117142915.1366990-1-dwysocha@redhat.com/ # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-06pstore: Avoid kcore oops by vmap()ing with VM_IOREMAPStephen Boyd1-1/+5
An oops can be induced by running 'cat /proc/kcore > /dev/null' on devices using pstore with the ram backend because kmap_atomic() assumes lowmem pages are accessible with __va(). Unable to handle kernel paging request at virtual address ffffff807ff2b000 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000081d87000 [ffffff807ff2b000] pgd=180000017fe18003, p4d=180000017fe18003, pud=180000017fe18003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: dm_integrity CPU: 7 PID: 21179 Comm: perf Not tainted 5.15.67-10882-ge4eb2eb988cd #1 baa443fb8e8477896a370b31a821eb2009f9bfba Hardware name: Google Lazor (rev3 - 8) (DT) pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x110/0x260 lr : vread+0x194/0x294 sp : ffffffc013ee39d0 x29: ffffffc013ee39f0 x28: 0000000000001000 x27: ffffff807ff2b000 x26: 0000000000001000 x25: ffffffc0085a2000 x24: ffffff802d4b3000 x23: ffffff80f8a60000 x22: ffffff802d4b3000 x21: ffffffc0085a2000 x20: ffffff8080b7bc68 x19: 0000000000001000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffd3073f2e60 x14: ffffffffad588000 x13: 0000000000000000 x12: 0000000000000001 x11: 00000000000001a2 x10: 00680000fff2bf0b x9 : 03fffffff807ff2b x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffffff802d4b4000 x4 : ffffff807ff2c000 x3 : ffffffc013ee3a78 x2 : 0000000000001000 x1 : ffffff807ff2b000 x0 : ffffff802d4b3000 Call trace: __memcpy+0x110/0x260 read_kcore+0x584/0x778 proc_reg_read+0xb4/0xe4 During early boot, memblock reserves the pages for the ramoops reserved memory node in DT that would otherwise be part of the direct lowmem mapping. Pstore's ram backend reuses those reserved pages to change the memory type (writeback or non-cached) by passing the pages to vmap() (see pfn_to_page() usage in persistent_ram_vmap() for more details) with specific flags. When read_kcore() starts iterating over the vmalloc region, it runs over the virtual address that vmap() returned for ramoops. In aligned_vread() the virtual address is passed to vmalloc_to_page() which returns the page struct for the reserved lowmem area. That lowmem page is passed to kmap_atomic(), which effectively calls page_to_virt() that assumes a lowmem page struct must be directly accessible with __va() and friends. These pages are mapped via vmap() though, and the lowmem mapping was never made, so accessing them via the lowmem virtual address oopses like above. Let's side-step this problem by passing VM_IOREMAP to vmap(). This will tell vread() to not include the ramoops region in the kcore. Instead the area will look like a bunch of zeros. The alternative is to teach kmap() about vmalloc areas that intersect with lowmem. Presumably such a change isn't a one-liner, and there isn't much interest in inspecting the ramoops region in kcore files anyway, so the most expedient route is taken for now. Cc: Brian Geffon <bgeffon@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221205233136.3420802-1-swboyd@chromium.org
2022-12-03pstore/ram: Fix error return code in ramoops_probe()Wang Yufen1-0/+2
In the if (dev_of_node(dev) && !pdata) path, the "err" may be assigned a value of 0, so the error return code -EINVAL may be incorrectly set to 0. To fix set valid return code before calling to goto. Fixes: 35da60941e44 ("pstore/ram: add Device Tree bindings") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/1669969374-46582-1-git-send-email-wangyufen@huawei.com
2022-12-03binfmt_misc: fix shift-out-of-bounds in check_special_flagsLiu Shixin1-4/+4
UBSAN reported a shift-out-of-bounds warning: left shift of 1 by 31 places cannot be represented in type 'int' Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 ubsan_epilogue+0xa/0x44 lib/ubsan.c:151 __ubsan_handle_shift_out_of_bounds+0x1e7/0x208 lib/ubsan.c:322 check_special_flags fs/binfmt_misc.c:241 [inline] create_entry fs/binfmt_misc.c:456 [inline] bm_register_write+0x9d3/0xa20 fs/binfmt_misc.c:654 vfs_write+0x11e/0x580 fs/read_write.c:582 ksys_write+0xcf/0x120 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x4194e1 Since the type of Node's flags is unsigned long, we should define these macros with same type too. Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221102025123.1117184-1-liushixin2@huawei.com
2022-12-03Merge tag 'mm-hotfixes-stable-2022-12-02' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes, 11 marked cc:stable. Only three or four of the latter address post-6.0 issues, which is hopefully a sign that things are converging" * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible" Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: take the right locks for page table retraction mm: migrate: fix THP's mapcount on isolation mm: introduce arch_has_hw_nonleaf_pmd_young() mm: add dummy pmd_young() for architectures not having it mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes() tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing madvise: use zap_page_range_single for madvise dontneed mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02proc: consoles: use console_list_lock for list iterationJohn Ogness1-7/+12
The console_lock is used in part to guarantee safe list iteration. The console_list_lock should be used because list synchronization responsibility will be removed from the console_lock in a later change. Note, the console_lock is still needed to serialize the device() callback with other console operations. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20221116162152.193147-35-john.ogness@linutronix.de
2022-12-02proc: consoles: document console_lock usageJohn Ogness1-0/+9
The console_lock is held throughout the start/show/stop procedure to print out device/driver information about all registered consoles. Since the console_lock is being used for multiple reasons, explicitly document these reasons. This will be useful when the console_lock is split into fine-grained locking. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20221116162152.193147-11-john.ogness@linutronix.de
2022-12-02printk: Convert console_drivers list to hlistThomas Gleixner1-1/+2
Replace the open coded single linked list with a hlist so a conversion to SRCU protected list walks can reuse the existing primitives. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20221116162152.193147-3-john.ogness@linutronix.de
2022-12-02binfmt: Fix error return code in load_elf_fdpic_binary()Wang Yufen1-2/+3
Fix to return a negative error code from create_elf_fdpic_tables() instead of 0. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/1669945261-30271-1-git-send-email-wangyufen@huawei.com
2022-12-01fat (exportfs): fix some kernel-doc warningsBo Liu1-2/+2
Fix the following W=1 kernel build warning(s): fs/fat/nfs.c:21: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst fs/fat/nfs.c:139: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Link: https://lkml.kernel.org/r/20221111075648.4005-1-liubo03@inspur.com Signed-off-by: Bo Liu <liubo03@inspur.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01ocfs2: fix memory leak in ocfs2_mount_volume()Li Zetao3-2/+6
There is a memory leak reported by kmemleak: unreferenced object 0xffff88810cc65e60 (size 32): comm "mount.ocfs2", pid 23753, jiffies 4302528942 (age 34735.105s) hex dump (first 32 bytes): 10 00 00 00 00 00 00 00 00 01 01 01 01 01 01 01 ................ 01 01 01 01 01 01 01 01 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8170f73d>] __kmalloc+0x4d/0x150 [<ffffffffa0ac3f51>] ocfs2_compute_replay_slots+0x121/0x330 [ocfs2] [<ffffffffa0b65165>] ocfs2_check_volume+0x485/0x900 [ocfs2] [<ffffffffa0b68129>] ocfs2_mount_volume.isra.0+0x1e9/0x650 [ocfs2] [<ffffffffa0b7160b>] ocfs2_fill_super+0xe0b/0x1740 [ocfs2] [<ffffffff818e1fe2>] mount_bdev+0x312/0x400 [<ffffffff819a086d>] legacy_get_tree+0xed/0x1d0 [<ffffffff818de82d>] vfs_get_tree+0x7d/0x230 [<ffffffff81957f92>] path_mount+0xd62/0x1760 [<ffffffff81958a5a>] do_mount+0xca/0xe0 [<ffffffff81958d3c>] __x64_sys_mount+0x12c/0x1a0 [<ffffffff82f26f15>] do_syscall_64+0x35/0x80 [<ffffffff8300006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This call stack is related to two problems. Firstly, the ocfs2 super uses "replay_map" to trace online/offline slots, in order to recover offline slots during recovery and mount. But when ocfs2_truncate_log_init() returns an error in ocfs2_mount_volume(), the memory of "replay_map" will not be freed in error handling path. Secondly, the memory of "replay_map" will not be freed if d_make_root() returns an error in ocfs2_fill_super(). But the memory of "replay_map" will be freed normally when completing recovery and mount in ocfs2_complete_mount_recovery(). Fix the first problem by adding error handling path to free "replay_map" when ocfs2_truncate_log_init() fails. And fix the second problem by calling ocfs2_free_replay_slots(osb) in the error handling path "out_dismount". In addition, since ocfs2_free_replay_slots() is static, it is necessary to remove its static attribute and declare it in header file. Link: https://lkml.kernel.org/r/20221109074627.2303950-1-lizetao1@huawei.com Fixes: 9140db04ef18 ("ocfs2: recover orphans in offline slots during recovery and mount") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01debugfs: fix error when writing negative value to atomic_t debugfs fileAkinobu Mita1-6/+22
The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value for a debugfs file created by debugfs_create_atomic_t(). This restores the previous behaviour by introducing DEFINE_DEBUGFS_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-4-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed valueAkinobu Mita1-3/+19
Patch series "fix error when writing negative value to simple attribute files". The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), but some attribute files want to accept a negative value. This patch (of 3): The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value. This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-01nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()ZhangPeng1-0/+7
Syzbot reported a null-ptr-deref bug: NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 1 PID: 3603 Comm: segctord Not tainted 6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0 fs/nilfs2/alloc.c:608 Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7 RSP: 0018:ffffc90003dff830 EFLAGS: 00010212 RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010 RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158 R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0 Call Trace: <TASK> nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline] nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193 nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236 nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940 nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline] nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline] nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088 nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337 nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568 nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018 nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline] nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045 nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline] nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> ... If DAT metadata file is corrupted on disk, there is a case where req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during a b-tree operation that cascadingly updates ancestor nodes of the b-tree, because nilfs_dat_commit_alloc() for a lower level block can initialize the blocknr on the same DAT entry between nilfs_dat_prepare_end() and nilfs_dat_commit_end(). If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free() without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and causes the NULL pointer dereference above in nilfs_palloc_commit_free_entry() function, which leads to a crash. Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free(). This also calls nilfs_error() in that case to notify that there is a fatal flaw in the filesystem metadata and prevent further operations. Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ebe05ee8e98f755f61d0@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30afs: Fix server->active leak in afs_put_serverMarc Dionne1-1/+1
The atomic_read was accidentally replaced with atomic_inc_return, which prevents the server from getting cleaned up and causes rmmod to hang with a warning: Can't purge s=00000001 Fixes: 2757a4dc1849 ("afs: Fix access after dec in put functions") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20221130174053.2665818-1-marc.dionne@auristor.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-30Add process name and pid to locks warningAndi Kleen1-1/+1
It's fairly useless to complain about using an obsolete feature without telling the user which process used it. My Fedora desktop randomly drops this message, but I would really need this patch to figure out what triggers is. [ jlayton: print pid as well as process name ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30nfsd: use locks_inode_context helperJeff Layton1-3/+3
nfsd currently doesn't access i_flctx safely everywhere. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30nfs: use locks_inode_context helperJeff Layton4-5/+5
nfs currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30lockd: use locks_inode_context helperJeff Layton1-2/+2
lockd currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30ksmbd: use locks_inode_context helperJeff Layton1-1/+1
ksmbd currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Cc: Steve French <sfrench@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30cifs: use locks_inode_context helperJeff Layton1-1/+1
cifs currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Cc: Steve French <smfrench@samba.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30ceph: use locks_inode_context helperJeff Layton1-2/+2
ceph currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30filelock: add a new locks_inode_context accessor functionJeff Layton1-12/+12
There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure. Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30filelock: new helper: vfs_inode_has_locksJeff Layton1-0/+23
Ceph has a need to know whether a particular inode has any locks set on it. It's currently tracking that by a num_locks field in its filp->private_data, but that's problematic as it tries to decrement this field when releasing locks and that can race with the file being torn down. Add a new vfs_inode_has_locks helper that just returns whether any locks are currently held on the inode. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-29afs: Fix fileserver probe RTT handlingDavid Howells1-2/+2
The fileserver probing code attempts to work out the best fileserver to use for a volume by retrieving the RTT calculated by AF_RXRPC for the probe call sent to each server and comparing them. Sometimes, however, no RTT estimate is available and rxrpc_kernel_get_srtt() returns false, leading good fileservers to be given an RTT of UINT_MAX and thus causing the rotation algorithm to ignore them. Fix afs_select_fileserver() to ignore rxrpc_kernel_get_srtt()'s return value and just take the estimated RTT it provides - which will be capped at 1 second. Fixes: 1d4adfaf6574 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166965503999.3392585.13954054113218099395.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-29Merge tag 'fuse-fixes-6.1-rc8' of ↵Linus Torvalds1-21/+16
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fix from Miklos Szeredi: "Fix a regression introduced in -rc4" * tag 'fuse-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: lock inode unconditionally in fuse_fallocate()
2022-11-27Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds3-9/+20
Pull vfs fix from Al Viro: "Amir's copy_file_range() fix" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: fix copy_file_range() averts filesystem freeze protection
2022-11-27Merge tag '6.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-3/+5
Pull cifs fixes from Steve French: "Two small cifs/smb3 client fixes: - an unlock missing in an error path in copychunk_range found by xfstest 476 - a fix for a use after free in a debug code path" * tag '6.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix missing unlock in cifs_file_copychunk_range() cifs: Use after free in debug code
2022-11-26Merge tag 'nfsd-6.1-6' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix rare data corruption on READ operations * tag 'nfsd-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix reads with a non-zero offset that don't end on a page boundary
2022-11-26Merge tag 'zonefs-6.1-rc7' of ↵Linus Torvalds2-8/+21
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: - Fix a race between zonefs module initialization of sysfs attribute directory and mounting a drive (from Xiaoxu). - Fix active zone accounting in the rare case of an IO error due to a zone transition to offline or read-only state (from me). * tag 'zonefs-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Fix active zone accounting zonefs: Fix race between modprobe and mount
2022-11-26Merge tag 'for-6.1-rc6-tag' of ↵Linus Torvalds7-35/+132
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix a regression in nowait + buffered write - in zoned mode fix endianness when comparing super block generation - locking and lockdep fixes: - fix potential sleeping under spinlock when setting qgroup limit - lockdep warning fixes when btrfs_path is freed after copy_to_user - do not modify log tree while holding a leaf from fs tree locked - fix freeing of sysfs files of static features on error - use kv.alloc for zone map allocation as a fallback to avoid warnings due to high order allocation - send, avoid unaligned encoded writes when attempting to clone range * tag 'for-6.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: sysfs: normalize the error handling branch in btrfs_init_sysfs() btrfs: do not modify log tree while holding a leaf from fs tree locked btrfs: use kvcalloc in btrfs_get_dev_zone_info btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit() btrfs: send: avoid unaligned encoded writes when attempting to clone range btrfs: zoned: fix missing endianness conversion in sb_write_pointer btrfs: free btrfs_path before copying subvol info to userspace btrfs: free btrfs_path before copying fspath to userspace btrfs: free btrfs_path before copying inodes to userspace btrfs: free btrfs_path before copying root refs to userspace btrfs: fix assertion failure and blocking during nowait buffered write
2022-11-25Merge tag 'mm-hotfixes-stable-2022-11-24' of ↵Linus Torvalds2-1/+9
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "24 MM and non-MM hotfixes. 8 marked cc:stable and 16 for post-6.0 issues. There have been a lot of hotfixes this cycle, and this is quite a large batch given how far we are into the -rc cycle. Presumably a reflection of the unusually large amount of MM material which went into 6.1-rc1" * tag 'mm-hotfixes-stable-2022-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits) test_kprobes: fix implicit declaration error of test_kprobes nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1 mm: fix unexpected changes to {failslab|fail_page_alloc}.attr swapfile: fix soft lockup in scan_swap_map_slots hugetlb: fix __prep_compound_gigantic_page page flag setting kfence: fix stack trace pruning proc/meminfo: fix spacing in SecPageTables mm: multi-gen LRU: retry folios written back while isolated mailmap: update email address for Satya Priya mm/migrate_device: return number of migrating pages in args->cpages kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible MAINTAINERS: update Alex Hung's email address mailmap: update Alex Hung's email address mm: mmap: fix documentation for vma_mas_szero mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed mm/memory: return vm_fault_t result from migrate_to_ram() callback mm: correctly charge compressed memory to its memcg ipc/shm: call underlying open/close vm_ops gcov: clang: fix the buffer overflow issue ...
2022-11-25Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-2/+12
Pull vfs fixes from Al Viro: "A couple of fixes, one of them for this cycle regression..." * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: vfs_tmpfile: ensure O_EXCL flag is enforced fs: use acquire ordering in __fget_light()
2022-11-25use less confusing names for iov_iter direction initializersAl Viro31-72/+72
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25zonefs: Fix active zone accountingDamien Le Moal2-2/+15
If a file zone transitions to the offline or readonly state from an active state, we must clear the zone active flag and decrement the active seq file counter. Do so in zonefs_account_active() using the new zonefs inode flags ZONEFS_ZONE_OFFLINE and ZONEFS_ZONE_READONLY. These flags are set if necessary in zonefs_check_zone_condition() based on the result of report zones operation after an IO error. Fixes: 87c9ce3ffec9 ("zonefs: Add active seq file accounting") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2022-11-25vfs: fix copy_file_range() averts filesystem freeze protectionAmir Goldstein3-9/+20
Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") removed fallback to generic_copy_file_range() for cross-fs cases inside vfs_copy_file_range(). To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to generic_copy_file_range() was added in nfsd and ksmbd code, but that call is missing sb_start_write(), fsnotify hooks and more. Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that will take care of the fallback, but that code would be subtle and we got vfs_copy_file_range() logic wrong too many times already. Instead, add a flag to explicitly request vfs_copy_file_range() to perform only generic_copy_file_range() and let nfsd and ksmbd use this flag only in the fallback path. This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code paths to reduce the risk of further regressions. Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") Tested-by: Namjae Jeon <linkinjeon@kernel.org> Tested-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25[elf] get rid of get_note_info_size()Al Viro1-6/+1
it's trivial now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25[elf] unify regset and non-regset casesAl Viro1-192/+38
The only real difference is in filling per-thread notes - getting the values of registers. And this is the only part that is worth an ifdef - we don't need to duplicate the logics regarding gathering threads, filling other notes, etc. It would've been hard to do back when regset-based variant had been introduced, mostly due to sharing bits and pieces of helpers with aout coredumps. As the result, too much had been duplicated and the copies had drifted away since then. Now it can be done cleanly... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>