summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2024-04-26zram: add max_pages param to recompressionSergey Senozhatsky1-3/+28
Introduce "max_pages" param to recompress device attribute which sets an upper limit on the number of entries (pages) zram attempts to recompress (in this particular recompression call). S/W recompression can be quite expensive so limiting the number of pages recompress touches can be quite helpful. Link: https://lkml.kernel.org/r/20240329094050.2815699-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Brian Geffon <bgeffon@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-02nullblk: Fix cleanup order in null_add_dev() error pathDamien Le Moal1-2/+2
In null_add_dev(), if an error happen after initializing the resources for a zoned null block device, we must free these resources before exiting the function. To ensure this, move the out_cleanup_zone label after out_cleanup_disk as we jump to this latter label if an error happens after calling null_init_zoned_dev(). Fixes: e440626b1caf ("null_blk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240330005300.1503252-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-22Merge tag 'block-6.9-20240322' of git://git.kernel.dk/linuxLinus Torvalds1-1/+0
Pull more block updates from Jens Axboe: - NVMe pull request via Keith: - Make an informative message less ominous (Keith) - Enhanced trace decoding (Guixin) - TCP updates (Hannes, Li) - Fabrics connect deadlock fix (Chunguang) - Platform API migration update (Uwe) - A new device quirk (Jiawei) - Remove dead assignment in fd (Yufeng) * tag 'block-6.9-20240322' of git://git.kernel.dk/linux: nvmet-rdma: remove NVMET_RDMA_REQ_INVALIDATE_RKEY flag nvme: remove redundant BUILD_BUG_ON check floppy: remove duplicated code in redo_fd_request() nvme/tcp: Add wq_unbound modparam for nvme_tcp_wq nvme-tcp: Export the nvme_tcp_wq to sysfs drivers/nvme: Add quirks for device 126f:2262 nvme: parse format command's lbafu when tracing nvme: add tracing of reservation commands nvme: parse zns command's zsa and zrasf to string nvme: use nvme_disk_is_ns_head helper nvme: fix reconnection fail due to reserved tag allocation nvmet: add tracing of zns commands nvmet: add tracing of authentication commands nvme-apple: Convert to platform remove callback returning void nvmet-tcp: do not continue for invalid icreq nvme: change shutdown timeout setting message
2024-03-19floppy: remove duplicated code in redo_fd_request()Yufeng Wang1-1/+0
duplicated code in redo_fd_request(), unlock_fdc() function has the same code "do_floppy = NULL" inside. Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn> Suggested-by: Denis Efremov <efremov@linux.com> Link: https://lore.kernel.org/r/20240319014219.7812-1-wangyufeng@kylinos.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-15Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-15Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
2024-03-11Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linuxLinus Torvalds32-1017/+762
Pull block updates from Jens Axboe: - MD pull requests via Song: - Cleanup redundant checks (Yu Kuai) - Remove deprecated headers (Marc Zyngier, Song Liu) - Concurrency fixes (Li Lingfeng) - Memory leak fix (Li Nan) - Refactor raid1 read_balance (Yu Kuai, Paul Luse) - Clean up and fix for md_ioctl (Li Nan) - Other small fixes (Gui-Dong Han, Heming Zhao) - MD atomic limits (Christoph) - NVMe pull request via Keith: - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith) - Improvement and cleanup for cached request handling (Christoph) - Moving towards atomic queue limits. Core changes and driver bits so far (Christoph) - Fix UAF issues in aoeblk (Chun-Yi) - Zoned fix and cleanups (Damien) - s390 dasd cleanups and fixes (Jan, Miroslav) - Block issue timestamp caching (me) - noio scope guarding for zoned IO (Johannes) - block/nvme PI improvements (Kanchan) - Ability to terminate long running discard loop (Keith) - bdev revalidation fix (Li) - Get rid of old nr_queues hack for kdump kernels (Ming) - Support for async deletion of ublk (Ming) - Improve IRQ bio recycling (Pavel) - Factor in CPU capacity for remote vs local completion (Qais) - Add shared_tags configfs entry for null_blk (Shin'ichiro - Fix for a regression in page refcounts introduced by the folio unification (Tony) - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid, Ricardo, Roman, Tang, Uwe) * tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits) block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC block/swim: Convert to platform remove callback returning void cdrom: gdrom: Convert to platform remove callback returning void block: remove disk_stack_limits md: remove mddev->queue md: don't initialize queue limits md/raid10: use the atomic queue limit update APIs md/raid5: use the atomic queue limit update APIs md/raid1: use the atomic queue limit update APIs md/raid0: use the atomic queue limit update APIs md: add queue limit helpers md: add a mddev_is_dm helper md: add a mddev_add_trace_msg helper md: add a mddev_trace_remap helper bcache: move calculation of stripe_size and io_opt into bcache_device_init virtio_blk: Do not use disk_set_max_open/active_zones() aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts block: move capacity validation to blkpg_do_ioctl() block: prevent division by zero in blk_rq_stat_sum() drbd: atomically update queue limits in drbd_reconsider_queue_parameters ...
2024-03-11Merge tag 'vfs-6.9.super' of ↵Linus Torvalds10-117/+116
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull block handle updates from Christian Brauner: "Last cycle we changed opening of block devices, and opening a block device would return a bdev_handle. This allowed us to implement support for restricting and forbidding writes to mounted block devices. It was accompanied by converting and adding helpers to operate on bdev_handles instead of plain block devices. That was already a good step forward but ultimately it isn't necessary to have special purpose helpers for opening block devices internally that return a bdev_handle. Fundamentally, opening a block device internally should just be equivalent to opening files. So now all internal opens of block devices return files just as a userspace open would. Instead of introducing a separate indirection into bdev_open_by_*() via struct bdev_handle bdev_file_open_by_*() is made to just return a struct file. Opening and closing a block device just becomes equivalent to opening and closing a file. This all works well because internally we already have a pseudo fs for block devices and so opening block devices is simple. There's a few places where we needed to be careful such as during boot when the kernel is supposed to mount the rootfs directly without init doing it. Here we need to take care to ensure that we flush out any asynchronous file close. That's what we already do for opening, unpacking, and closing the initramfs. So nothing new here. The equivalence of opening and closing block devices to regular files is a win in and of itself. But it also has various other advantages. We can remove struct bdev_handle completely. Various low-level helpers are now private to the block layer. Other helpers were simply removable completely. A follow-up series that is already reviewed build on this and makes it possible to remove bdev->bd_inode and allows various clean ups of the buffer head code as well. All places where we stashed a bdev_handle now just stash a file and use simple accessors to get to the actual block device which was already the case for bdev_handle" * tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) block: remove bdev_handle completely block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access bdev: remove bdev pointer from struct bdev_handle bdev: make struct bdev_handle private to the block layer bdev: make bdev_{release, open_by_dev}() private to block layer bdev: remove bdev_open_by_path() reiserfs: port block device access to file ocfs2: port block device access to file nfs: port block device access to files jfs: port block device access to file f2fs: port block device access to files ext4: port block device access to file erofs: port device access to file btrfs: port device access to file bcachefs: port block device access to file target: port block device access to file s390: port block device access to file nvme: port block device access to file block2mtd: port device access to files bcache: port block device access to files ...
2024-03-08block/swim: Convert to platform remove callback returning voidUwe Kleine-König1-4/+2
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/a00aea8201ea85ae726411bb0fb015ea026ff40a.1709886922.git.u.kleine-koenig@pengutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-07Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"Ahelenia Ziemiańska1-1/+1
Found with git grep 'MODULE_AUTHOR(".*([^)]*@' Fixed with sed -i '/MODULE_AUTHOR(".*([^)]*@/{s/ (/ </g;s/)"/>"/;s/)and/> and/}' \ $(git grep -l 'MODULE_AUTHOR(".*([^)]*@') Also: in drivers/media/usb/siano/smsusb.c normalise ", INC" to ", Inc"; this is what every other MODULE_AUTHOR for this company says, and it's what the header says in drivers/sbus/char/openprom.c normalise a double-spaced separator; this is clearly copied from the copyright header, where the names are aligned on consecutive lines thusly: * Linux/SPARC PROM Configuration Driver * Copyright (C) 1996 Thomas K. Dyas (tdyas@noc.rutgers.edu) * Copyright (C) 1996 Eddie C. Dost (ecd@skynet.be) but the authorship branding is single-line Link: https://lkml.kernel.org/r/mk3geln4azm5binjjlfsgjepow4o73domjv6ajybws3tz22vb3@tarta.nabijaczleweli.xyz Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06virtio_blk: Do not use disk_set_max_open/active_zones()Damien Le Moal1-2/+2
In virtblk_read_zoned_limits(), setting a zoned block device maximum number of open and active zones using the functions disk_set_max_open_zones() and disk_set_max_active_zones() is incorrect as setting the limits for the request queue is now done atomically when the gendisk is created (with blk_mq_alloc_disk()). The value set by the disk_set_max_open/active_zones() functions will be overwritten. Fix this by setting the maximum number of open and active zones directly in the queue_limits structure passed to virtblk_read_zoned_limits(). Fixes: 8b837256560c ("virtio_blk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240301192639.410183-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06aoe: fix the potential use-after-free problem in aoecmd_cfg_pktsChun-Yi Lee2-6/+7
This patch is against CVE-2023-6270. The description of cve is: A flaw was found in the ATA over Ethernet (AoE) driver in the Linux kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on `struct net_device`, and a use-after-free can be triggered by racing between the free on the struct and the access through the `skbtxq` global queue. This could lead to a denial of service condition or potential code execution. In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial code is finished. But the net_device ifp will still be used in later tx()->dev_queue_xmit() in kthread. Which means that the dev_put(ifp) should NOT be called in the success path of skb initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into use-after-free because the net_device is freed. This patch removed the dev_put(ifp) in the success path in aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx(). Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270 Fixes: 7562f876cd93 ("[NET]: Rework dev_base via list_head (v3)") Signed-off-by: Chun-Yi Lee <jlee@suse.com> Link: https://lore.kernel.org/r/20240305082048.25526-1-jlee@suse.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: atomically update queue limits in drbd_reconsider_queue_parametersChristoph Hellwig1-73/+46
Switch drbd_reconsider_queue_parameters to set up the queue parameters in an on-stack queue_limits structure and apply the atomically. Remove various helpers that have become so trivial that they can be folded into drbd_reconsider_queue_parameters. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: split out a drbd_discard_supported helperChristoph Hellwig1-8/+17
Add a helper to check if discard is supported for a given connection / backing device combination. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-7-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: don't set max_write_zeroes_sectors in decide_on_discard_supportChristoph Hellwig1-1/+0
fixup_write_zeroes always overrides the max_write_zeroes_sectors value a little further down the callchain, so don't bother to setup a limit in decide_on_discard_support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-6-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parametersChristoph Hellwig1-34/+22
drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters and there is no really clear boundary of responsibilities between the two. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-5-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: refactor the backing dev max_segments calculationChristoph Hellwig1-8/+17
Factor out a drbd_backing_dev_max_segments helper that checks the backing device limitation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20240306140332.623759-4-philipp.reisner@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: refactor drbd_reconsider_queue_parametersChristoph Hellwig1-35/+49
Split out a drbd_max_peer_bio_size helper for the peer I/O size, and condense the various checks to a nested min3(..., max())) instead of using a lot of local variables. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-06drbd: pass the max_hw_sectors limit to blk_alloc_diskChristoph Hellwig1-4/+9
Pass a queue_limits structure with the max_hw_sectors limit to blk_alloc_disk instead of updating the limit on the allocated gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240305134041.137006-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-05zram: zcomp: remove zcomp_set_max_streams() declarationKefeng Wang1-1/+0
The zcomp_set_max_streams() is removed from commit 43209ea2d17a ("zram: remove max_comp_streams internals"), remove the declaration. Link: https://lkml.kernel.org/r/20240223035548.2591882-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-01nbd: use the atomic queue limits API in nbd_set_sizeChristoph Hellwig1-4/+11
Use queue_limits_start_update / queue_limits_commit_update to update all the limits in one go and with proper sanity checking. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240229143846.1047223-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01nbd: freeze the queue for queue limits updatesChristoph Hellwig1-1/+13
nbd currently updates the logical and physical block sizes as well as the discard_sectors on a live queue. Freeze the queue first to make sure there are not commands in flight that can see torn or inconsistent limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240229143846.1047223-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01nbd: don't clear discard_sectors in nbd_config_putChristoph Hellwig1-1/+2
nbd_config_put currently clears discard_sectors when unusing a device. This is pretty odd behavior and different from the sector size configuration which is simply left in places and then reconfigured when nbd_set_size is as part of configuring the device. Change nbd_set_size to clear discard_sectors if discard is not supported so that all the queue limits changes are handled in one place. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240229143846.1047223-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01pktcdvd: don't set max_hw_sectors on the underlying deviceChristoph Hellwig1-5/+6
pktcdvd sets max_hw_sectors on the queue of the underlying device that it doesn't own (and doesn't reset it ever) since the driver was merged. This can create all kinds of problems as the underlying driver doesn't even know about it changing the limit. As the state purpose is to not create I/Os larger than a single frame, and pktcdvd never builds bios larger than that, just set REQ_NOMERGE on the bios it submits so that largers I/Os never get built. Note: I don't have packet writing hardware, so this is compile tested only. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240229144408.1047967-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-29ublk: add UBLK_CMD_DEL_DEV_ASYNCMing Lei1-3/+6
The current command UBLK_CMD_DEL_DEV won't return until the device is released, this way looks more reliable, but makes userspace more difficult to implement, especially about orders: unmap command buffer(which holds one ublkc reference), ublkc close, io_uring_file_unregister, ublkb close. Add UBLK_CMD_DEL_DEV_ASYNC so that device deletion won't wait release, then userspace needn't worry about the above order. Actually both loop and nbd is deleted in this async way. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240223075539.89945-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-29ublk: improve getting & putting ublk deviceMing Lei1-5/+7
Firstly convert get_device() and put_device() into ublk_get_device() and ublk_put_device(). Secondly annotate ublk_get_device() & ublk_put_device() as noinline for trace, especially it is often to trigger device deletion hang when incorrect order is used on ublkc mmap, ublkc close, io_uring_sqe_unregister_file, ublkb close. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240223075539.89945-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27xen-blkfront: atomically update queue limitsChristoph Hellwig1-18/+23
Pass the initial queue limits to blk_mq_alloc_disk and use the blkif_set_queue_limits API to update the limits on reconnect. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20240221125845.3610668-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27xen-blkfront: don't redundantly set max_sements in blkif_recoverChristoph Hellwig1-5/+3
blkif_set_queue_limits already sets the max_sements limits, so don't do it a second time. Also remove a comment about a long fixe bug in blk_mq_update_nr_hw_queues. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20240221125845.3610668-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27xen-blkfront: rely on the default discard granularityChristoph Hellwig1-2/+2
The block layer now sets the discard granularity to the physical block size default. Take advantage of that in xen-blkfront and only set the discard granularity if explicitly specified. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20240221125845.3610668-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-27xen-blkfront: set max_discard/secure erase limits to UINT_MAXChristoph Hellwig1-4/+2
Currently xen-blkfront set the max discard limit to the capacity of the device, which is suboptimal when the capacity changes. Just set it to UINT_MAX, which has the same effect and is simpler. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20240221125845.3610668-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-25zram: port block device access to fileChristian Brauner2-14/+14
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-12-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25xen: port block device access to fileChristian Brauner3-23/+22
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-11-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25rnbd: port block device access to fileChristian Brauner2-15/+15
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-10-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25pktcdvd: port block device access to fileChristian Brauner1-34/+34
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-9-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25drbd: port block device access to fileChristian Brauner2-31/+31
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-8-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-22zram: do not allocate physically contiguous strm buffersBarry Song1-2/+3
Currently zram allocates 2 physically contiguous pages per-CPU's compression stream (we may have up to 4 streams per-CPU). Since those buffers are per-CPU we allocate them from CPU hotplug path, which may have higher risks of failed allocations on devices with fragmented memory. Switch to virtually contiguous allocations - crypto comp does not seem impose requirements on compression working buffers to be physically contiguous. Link: https://lkml.kernel.org/r/20240213065400.6561-1-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22zram: use copy_page for full page copyMark-PK Tsai1-1/+1
Some architectures, such as arm, have implemented optimized copy_page for full page copying. Replace the full page memcpy with copy_page to take advantage of the optimization. Link: https://lkml.kernel.org/r/20231007070554.8657-1-mark-pk.tsai@mediatek.com Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: YJ Chiang <yj.chiang@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22null_blk: Delete nullb.{queue_depth, nr_queues}John Garry2-13/+0
Since commit 8b631f9cf0b8 ("null_blk: remove the bio based I/O path"), struct nullb members queue_depth and nr_queues are only ever written, so delete them. With that, null_exit_hctx() can also be deleted. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240222083420.6026-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-22pktcdvd: set queue limits at disk allocation timeChristoph Hellwig1-11/+5
Remove pkt_init_queue and just pass the two parameters directly to blk_alloc_disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240222073647.3776769-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-22pktcdvd: stop setting q->queuedataChristoph Hellwig1-5/+4
The two users can get the private data from the gendisk with one less pointer dereference, and we can drop the useless q parameter from pkt_make_request_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240222073647.3776769-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig3-31/+29
Pass the queue limits directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: remove null_gendisk_registerChristoph Hellwig1-25/+16
null_gendisk_register isn't a very useful abstraction given that it doesn't even allocate the gendisk. Merge it into the only caller instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: refactor tag_set setupChristoph Hellwig1-55/+51
Move the tagset initialization out of null_add_dev into a new null_setup_tagset helper, and move the shared vs local differences out of null_init_tag_set into the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: initialize the tag_set timeout in null_init_tag_setChristoph Hellwig1-1/+1
Otherwise it will be reset to the always same value when initializing a device using the shared tag_set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: remove the bio based I/O pathChristoph Hellwig4-328/+69
The bio based I/O path complicates null_blk and also make various data structures, including the per-command one way bigger than required for the main request based interface. As the bio-based path is mostly used by stacking drivers and simple memory based drivers, and brd is a good example driver for the latter there is no need to have a bio based path in null_blk. Remove the path to simplify the driver and make future block layer API changes simpler by not having to deal with the complex two API setup in null_blk. Note that the queue_mode field in struct nullb_device is kept as that is simpler than having two different places to check the value and fully open coding the debugfs helpers as the existing ones won't work without a named struct member. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20ublk: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig1-49/+41
Pass the limits ublk imposes directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240215070300.2200308-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20sunvdc: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig1-9/+9
Pass the few limits sunvdc imposes directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240215070300.2200308-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20rnbd-clt: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig1-39/+25
Pass the limits rnbd-clt imposes directly to blk_mq_alloc_disk instead of setting them one at a time. While at it don't set an explicit number of discard segments, as 1 is the default (which most drivers rely on). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20240215070300.2200308-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20rbd: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig1-14/+15
Pass the limits rbd imposes directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240215070300.2200308-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20ps3disk: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig1-8/+9
Pass the few limits ps3disk imposes directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240215070300.2200308-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>