summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2024-01-12Merge tag 'for-6.8/io_uring-2024-01-08' of git://git.kernel.dk/linuxLinus Torvalds1-1/+1
Pull io_uring updates from Jens Axboe: "Mostly just come fixes and cleanups, but one feature as well. In detail: - Harden the check for handling IOPOLL based on return (Pavel) - Various minor optimizations (Pavel) - Drop remnants of SCM_RIGHTS fd passing support, now that it's no longer supported since 6.7 (me) - Fix for a case where bytes_done wasn't initialized properly on a failure condition for read/write requests (me) - Move the register related code to a separate file (me) - Add support for returning the provided ring buffer head (me) - Add support for adding a direct descriptor to the normal file table (me, Christian Brauner) - Fix for ensuring pending task_work for a ring with DEFER_TASKRUN is run even if we timeout waiting (me)" * tag 'for-6.8/io_uring-2024-01-08' of git://git.kernel.dk/linux: io_uring: ensure local task_work is run on wait timeout io_uring/kbuf: add method for returning provided buffer ring head io_uring/rw: ensure io->bytes_done is always initialized io_uring: drop any code related to SCM_RIGHTS io_uring/unix: drop usage of io_uring socket io_uring/register: move io_uring_register(2) related code to register.c io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL io_uring/cmd: inline io_uring_cmd_get_task io_uring/cmd: inline io_uring_cmd_do_in_task_lazy io_uring: split out cmd api into a separate header io_uring: optimise ltimeout for inline execution io_uring: don't check iopoll if request completes
2024-01-12Merge tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linuxLinus Torvalds13-115/+84
Pull block updates from Jens Axboe: "Pretty quiet round this time around. This contains: - NVMe updates via Keith: - nvme fabrics spec updates (Guixin, Max) - nvme target udpates (Guixin, Evan) - nvme attribute refactoring (Daniel) - nvme-fc numa fix (Keith) - MD updates via Song: - Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai) - Fix raid5 hang issue (Junxiao Bi) - Add Yu Kuai as Reviewer of the md subsystem - Remove deprecated flavors (Song Liu) - raid1 read error check support (Li Nan) - Better handle events off-by-1 case (Alex Lyakas) - Efficiency improvements for passthrough (Kundan) - Support for mapping integrity data directly (Keith) - Zoned write fix (Damien) - rnbd fixes (Kees, Santosh, Supriti) - Default to a sane discard size granularity (Christoph) - Make the default max transfer size naming less confusing (Christoph) - Remove support for deprecated host aware zoned model (Christoph) - Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel, Bart, Christoph)" * tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits) block: Treat sequential write preferred zone type as invalid block: remove disk_clear_zoned sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics drivers/block/xen-blkback/common.h: Fix spelling typo in comment blk-cgroup: fix rcu lockdep warning in blkg_lookup() blk-cgroup: don't use removal safe list iterators block: floor the discard granularity to the physical block size mtd_blkdevs: use the default discard granularity bcache: use the default discard granularity zram: use the default discard granularity null_blk: use the default discard granularity nbd: use the default discard granularity ubd: use the default discard granularity block: default the discard granularity to sector size bcache: discard_granularity should not be smaller than a sector block: remove two comments in bio_split_discard block: rename and document BLK_DEF_MAX_SECTORS loop: don't abuse BLK_DEF_MAX_SECTORS aoe: don't abuse BLK_DEF_MAX_SECTORS null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS ...
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds4-34/+42
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-09mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov1-1/+1
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08Merge tag 'vfs-6.8.iov_iter' of ↵Linus Torvalds1-7/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iov_iter cleanups from Christian Brauner: "This contains a minor cleanup. The patches drop an unused argument from import_single_range() allowing to replace import_single_range() with import_ubuf() and dropping import_single_range() completely" * tag 'vfs-6.8.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iov_iter: replace import_single_range() with import_ubuf() iov_iter: remove unused 'iov' argument from import_single_range()
2024-01-08Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-2/+0
Pull vfs rw updates from Christian Brauner: "This contains updates from Amir for read-write backing file helpers for stacking filesystems such as overlayfs: - Fanotify is currently in the process of introducing pre content events. Roughly, a new permission event will be added indicating that it is safe to write to the file being accessed. These events are used by hierarchical storage managers to e.g., fill the content of files on first access. During that work we noticed that our current permission checking is inconsistent in rw_verify_area() and remap_verify_area(). Especially in the splice code permission checking is done multiple times. For example, one time for the whole range and then again for partial ranges inside the iterator. In addition, we mostly do permission checking before we call file_start_write() except for a few places where we call it after. For pre-content events we need such permission checking to be done before file_start_write(). So this is a nice reason to clean this all up. After this series, all permission checking is done before file_start_write(). As part of this cleanup we also massaged the splice code a bit. We got rid of a few helpers because we are alredy drowning in special read-write helpers. We also cleaned up the return types for splice helpers. - Introduce generic read-write helpers for backing files. This lifts some overlayfs code to common code so it can be used by the FUSE passthrough work coming in over the next cycles. Make Amir and Miklos the maintainers for this new subsystem of the vfs" * tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: fix __sb_write_started() kerneldoc formatting fs: factor out backing_file_mmap() helper fs: factor out backing_file_splice_{read,write}() helpers fs: factor out backing_file_{read,write}_iter() helpers fs: prepare for stackable filesystems backing file helpers fsnotify: optionally pass access range in file permission hooks fsnotify: assert that file_start_write() is not held in permission hooks fsnotify: split fsnotify_perm() into two hooks fs: use splice_copy_file_range() inline helper splice: return type ssize_t from all helpers fs: use do_splice_direct() for nfsd/ksmbd server-side-copy fs: move file_start_write() into direct_splice_actor() fs: fork splice_file_range() from do_splice_direct() fs: create {sb,file}_write_not_started() helpers fs: create file_write_started() helper fs: create __sb_write_started() helper fs: move kiocb_start_write() into vfs_iocb_iter_write() fs: move permission hook out of do_iter_read() fs: move permission hook out of do_iter_write() fs: move file_start_write() into vfs_iter_write() ...
2024-01-05drivers/block/xen-blkback/common.h: Fix spelling typo in commentliyouhong1-1/+1
Fix spelling typo in comment. Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: liyouhong <liyouhong@kylinos.cn> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20231226095701.172080-1-liyouhong@kylinos.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-29zram: use the default discard granularityChristoph Hellwig1-1/+0
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-29null_blk: use the default discard granularityChristoph Hellwig1-1/+0
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-29nbd: use the default discard granularityChristoph Hellwig1-5/+1
The discard granularity now defaults to a single sector, so don't set that value explicitly. Also don't bother clearing it as a discard granularity without discard_sectors doesn't mean anything. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-27loop: don't abuse BLK_DEF_MAX_SECTORSChristoph Hellwig1-1/+2
BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors limits. Don't use it to initialize max_hw_setors, which is a hardware / driver capacility. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-27aoe: don't abuse BLK_DEF_MAX_SECTORSChristoph Hellwig1-1/+2
BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors limits. Don't use it to initialize max_hw_setors, which is a hardware / driver capacility. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-27null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORSChristoph Hellwig1-10/+2
null_blk has some rather odd capping of the max_hw_sectors value to BLK_DEF_MAX_SECTORS, which doesn't make sense - max_hw_sector is the hardware limit, and BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors field used for normal file system I/O. Remove all the capping, and simply leave it to the block layer or user to take up or not all of that for file system I/O. Fixes: ea17fd354ca8 ("null_blk: Allow controlling max_hw_sectors limit") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-27loop: don't update discard limits from loop_set_statusChristoph Hellwig1-2/+0
loop_set_status doesn't change anything relevant to the discard and write_zeroes setting, so don't bother calling loop_config_discard. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227082020.249427-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-22drbd: actlog: fix kernel-doc warnings and spellingRandy Dunlap1-6/+10
Fix all kernel-doc warnings in drbd_actlog.c: drbd_actlog.c:963: warning: No description found for return value of 'drbd_rs_begin_io' drbd_actlog.c:1015: warning: Function parameter or member 'peer_device' not described in 'drbd_try_rs_begin_io' drbd_actlog.c:1015: warning: Excess function parameter 'device' description in 'drbd_try_rs_begin_io' drbd_actlog.c:1015: warning: No description found for return value of 'drbd_try_rs_begin_io' drbd_actlog.c:1197: warning: No description found for return value of 'drbd_rs_del_all' Fix one spelling error (s/ore/or/). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <linux-block@vger.kernel.org> Link: https://lore.kernel.org/r/20231222061909.8791-1-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-20block: simplify disk_set_zonedChristoph Hellwig3-3/+3
Only use disk_set_zoned to actually enable zoned device support. For clearing it, call disk_clear_zoned, which is renamed from disk_clear_zone_settings and now directly clears the zoned flag as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-20block: remove support for the host aware zone modelChristoph Hellwig3-3/+3
When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-20virtio_blk: remove the broken zone revalidation supportChristoph Hellwig1-26/+0
virtblk_revalidate_zones is called unconditionally from virtblk_config_changed_work from the virtio config_changed callback. virtblk_revalidate_zones is a bit odd in that it re-clears the zoned state for host aware or non-zoned devices, which isn't needed unless the zoned mode changed - but a zone mode change to a host managed model isn't handled at all, and virtio_blk also doesn't handle any other config change except for a capacity change is handled (and even if it was the upper layers above virtio_blk wouldn't handle it very well). But even the useful case of a size change that would add or remove zones isn't handled properly as blk_revalidate_disk_zones expects the device capacity to cover all zones, but the capacity is only updated after virtblk_revalidate_zones. As this code appears to be entirely untested and is getting in the way remove it for now, but it can be readded in a fixed version with proper test coverage if needed. Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices") Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-20virtio_blk: cleanup zoned device probingChristoph Hellwig1-28/+22
Move reading and checking the zoned model from virtblk_probe_zoned_device into the caller, leaving only the code to perform the actual setup for host managed zoned devices in virtblk_probe_zoned_device. This allows to share the model reading and sharing between builds with and without CONFIG_BLK_DEV_ZONED, and improve it for the !CONFIG_BLK_DEV_ZONED case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-13block/rnbd-srv: Check for unlikely string overflowKees Cook1-9/+10
Since "dev_search_path" can technically be as large as PATH_MAX, there was a risk of truncation when copying it and a second string into "full_path" since it was also PATH_MAX sized. The W=1 builds were reporting this warning: drivers/block/rnbd/rnbd-srv.c: In function 'process_msg_open.isra': drivers/block/rnbd/rnbd-srv.c:616:51: warning: '%s' directive output may be truncated writing up to 254 bytes into a region of size between 0 and 4095 [-Wformat-truncation=] 616 | snprintf(full_path, PATH_MAX, "%s/%s", | ^~ In function 'rnbd_srv_get_full_path', inlined from 'process_msg_open.isra' at drivers/block/rnbd/rnbd-srv.c:721:14: drivers/block/rnbd/rnbd-srv.c:616:17: note: 'snprintf' output between 2 and 4351 bytes into a destination of size 4096 616 | snprintf(full_path, PATH_MAX, "%s/%s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 617 | dev_search_path, dev_name); | ~~~~~~~~~~~~~~~~~~~~~~~~~~ To fix this, unconditionally check for truncation (as was already done for the case where "%SESSNAME%" was present). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312100355.lHoJPgKy-lkp@intel.com/ Cc: Md. Haris Iqbal <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <linux-block@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231212214738.work.169-kees@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-12io_uring: split out cmd api into a separate headerPavel Begunkov1-1/+1
linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring cmd infra are completely orthogonal and don't need each other's definitions. Start cleaning it up by splitting out all command bits into a new header file. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-11zram: use kmap_local_page()Sergey Senozhatsky1-15/+15
Use kmap_local_page() instead of kmap_atomic() which has been deprecated. Link: https://lkml.kernel.org/r/20231128083845.848008-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11zram: tweak writeback config helpSergey Senozhatsky1-2/+2
Writeback is for incompressible and idle zram pages. Link: https://lkml.kernel.org/r/20231115024223.4133148-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-11zram: split memory-tracking and ac-time trackingSergey Senozhatsky3-16/+24
ZRAM_MEMORY_TRACKING enables two features: - per-entry ac-time tracking - debugfs interface The latter one is the reason why memory-tracking depends on DEBUG_FS, while the former one is used far beyond debugging these days. Namely ac-time is used for fine grained writeback of idle entries (pages). Move ac-time tracking under its own config option so that it can be enabled (along with writeback) on systems without DEBUG_FS. [senozhatsky@chromium.org: ifdef fixup, per Dmytro] Link: https://lkml.kernel.org/r/20231117013543.540280-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20231115024223.4133148-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-05iov_iter: replace import_single_range() with import_ubuf()Jens Axboe1-5/+2
With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-05iov_iter: remove unused 'iov' argument from import_single_range()Jens Axboe1-4/+2
It is entirely unused, just get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-04virtio_blk: fix snprintf truncation compiler warningStefan Hajnoczi1-4/+4
Commit 4e0400525691 ("virtio-blk: support polling I/O") triggers the following gcc 13 W=1 warnings: drivers/block/virtio_blk.c: In function ‘init_vq’: drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~ drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~ drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a false positive because the lower bound -2147483648 is incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs < 65536. The code mixes int, unsigned short, and unsigned int types in addition to using "%d" for an unsigned value. Use unsigned short and "%u" consistently to solve the compiler warning. Cc: Suwan Kim <suwan.kim027@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/ Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20231204140743.1487843-1-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-27block/rnbd: use %pe to print errorsSupriti Singh2-13/+13
While printing error, replace %ld by %pe. %pe prints a string whereas %ld would print an error code. Signed-off-by: Supriti Singh <supriti.singh@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20231124213422.113449-3-haris.iqbal@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-27block/rnbd: add support for REQ_OP_WRITE_ZEROESSantosh Pradhan3-8/+18
Remove REQ_OP_WRITE_SAME in favour of REQ_OP_WRITE_ZEROES. Signed-off-by: Santosh Pradhan <santosh.pradhan@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20231124213422.113449-2-haris.iqbal@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-24fs: move file_start_write() into vfs_iter_write()Amir Goldstein1-2/+0
All the callers of vfs_iter_write() call file_start_write() just before calling vfs_iter_write() except for target_core_file's fd_do_rw(). Move file_start_write() from the callers into vfs_iter_write(). fd_do_rw() calls vfs_iter_write() with a non-regular file, so file_start_write() is a no-op. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-11-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-21nbd: pass nbd_sock to nbd_read_reply() instead of indexLi Nan1-13/+22
If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be krealloc in nbd_add_socket(), and a garbage request is received now, a UAF may occurs. T1 nbd_ioctl __nbd_ioctl nbd_add_socket blk_mq_freeze_queue T2 recv_work nbd_read_reply sock_xmit krealloc config->socks def config->socks Pass nbd_sock to nbd_read_reply(). And introduce a new function sock_xmit_recv(), which differs from sock_xmit only in the way it get socket. ================================================================== BUG: KASAN: use-after-free in sock_xmit+0x525/0x550 Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779 Workqueue: knbd4-recv recv_work Call Trace: __dump_stack dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 sock_xmit+0x525/0x550 nbd_read_reply+0xfe/0x2c0 recv_work+0x1c2/0x750 process_one_work+0x6b6/0xf10 worker_thread+0xdd/0xd80 kthread+0x30a/0x410 ret_from_fork+0x22/0x30 Allocated by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track set_alloc_info __kasan_kmalloc __kasan_kmalloc.constprop.0+0xf0/0x130 slab_post_alloc_hook slab_alloc_node slab_alloc __kmalloc_track_caller+0x157/0x550 __do_krealloc krealloc+0x37/0xb0 nbd_add_socket +0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Freed by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x40 __kasan_slab_free.part.0+0x13f/0x1b0 slab_free_hook slab_free_freelist_hook slab_free kfree+0xcb/0x6c0 krealloc+0x56/0xb0 nbd_add_socket+0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20block/null_blk: Fix double blk_mq_start_request() warningChengming Zhou1-12/+13
When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by: <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/ Fixes: d78bfa1346ab ("block/null_blk: add queue_rqs() support") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20nbd: fix null-ptr-dereference while accessing 'nbd->config'Li Nan1-1/+17
Memory reordering may occur in nbd_genl_connect(), causing config_refs to be set to 1 while nbd->config is still empty. Opening nbd at this time will cause null-ptr-dereference. T1 T2 nbd_open nbd_get_config_unlocked nbd_genl_connect nbd_alloc_and_init_config //memory reordered refcount_set(&nbd->config_refs, 1) // 2 nbd->config ->null point nbd->config = config // 1 Fix it by adding smp barrier to guarantee the execution sequence. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-4-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20nbd: factor out a helper to get nbd_config without holding 'config_lock'Li Nan1-8/+19
There are no functional changes, just to make code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-3-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20nbd: fold nbd config initialization into nbd_alloc_config()Li Nan1-22/+19
There are no functional changes, make the code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-2-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-07nbd: fix uaf in nbd_openLi Lingfeng1-2/+9
Commit 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") cleans up disk by blk_cleanup_disk() and it won't set disk->private_data as NULL as before. UAF may be triggered in nbd_open() if someone tries to open nbd device right after nbd_put() since nbd has been free in nbd_dev_remove(). Fix this by implementing ->free_disk and free private data in it. Fixes: 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231107103435.2074904-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+3
Pull virtio updates from Michael Tsirkin: "vhost,virtio,vdpa: features, fixes, cleanups. vdpa/mlx5: - VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK - new maintainer vdpa: - support for vq descriptor mappings - decouple reset of iotlb mapping from device reset and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vdpa_sim: implement .reset_map support vdpa/mlx5: implement .reset_map driver op vhost-vdpa: clean iotlb map during reset for older userspace vdpa: introduce .compat_reset operation callback vhost-vdpa: introduce IOTLB_PERSIST backend feature bit vhost-vdpa: reset vendor specific mapping to initial state in .release vdpa: introduce .reset_map operation callback virtio_pci: add check for common cfg size virtio-blk: fix implicit overflow on virtio_max_dma_size virtio_pci: add build offset check for the new common cfg items virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit vduse: make vduse_class constant vhost-scsi: Spelling s/preceeding/preceding/g virtio: kdoc for struct virtio_pci_modern_device vdpa: Update sysfs ABI documentation MAINTAINERS: Add myself as mlx5_vdpa driver virtio-balloon: correct the comment of virtballoon_migratepage() mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa/mlx5: Update cvq iotlb mapping on ASID change vdpa/mlx5: Make iotlb helper functions more generic ...
2023-11-02Merge tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linuxLinus Torvalds4-132/+237
Pull block updates from Jens Axboe: - Improvements to the queue_rqs() support, and adding null_blk support for that as well (Chengming) - Series improving badblocks support (Coly) - Key store support for sed-opal (Greg) - IBM partition string handling improvements (Jan) - Make number of ublk devices supported configurable (Mike) - Cancelation improvements for ublk (Ming) - MD pull requests via Song: - Handle timeout in md-cluster, by Denis Plotnikov - Cleanup pers->prepare_suspend, by Yu Kuai - Rewrite mddev_suspend(), by Yu Kuai - Simplify md_seq_ops, by Yu Kuai - Reduce unnecessary locking array_state_store(), by Mariusz Tkaczyk - Make rdev add/remove independent from daemon thread, by Yu Kuai - Refactor code around quiesce() and mddev_suspend(), by Yu Kuai - NVMe pull request via Keith: - nvme-auth updates (Mark) - nvme-tcp tls (Hannes) - nvme-fc annotaions (Kees) - Misc cleanups and improvements (Jiapeng, Joel) * tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux: (95 commits) block: ublk_drv: Remove unused function md: cleanup pers->prepare_suspend() nvme-auth: allow mixing of secret and hash lengths nvme-auth: use transformed key size to create resp nvme-auth: alloc nvme_dhchap_key as single buffer nvmet-tcp: use 'spin_lock_bh' for state_lock() powerpc/pseries: PLPKS SED Opal keystore support block: sed-opal: keystore access for SED Opal keys block:sed-opal: SED Opal keystore ublk: simplify aborting request ublk: replace monitor with cancelable uring_cmd ublk: quiesce request queue when aborting queue ublk: rename mm_lock as lock ublk: move ublk_cancel_dev() out of ub->mutex ublk: make sure io cmd handled in submitter task context ublk: don't get ublk device reference in ublk_abort_queue() ublk: Make ublks_max configurable ublk: Limit dev_id/ub_number values md-cluster: check for timeout while a new disk adding nvme: rework NVME_AUTH Kconfig selection ...
2023-11-01virtio-blk: fix implicit overflow on virtio_max_dma_sizezhenwei pi1-1/+3
The following codes have an implicit conversion from size_t to u32: (u32)max_size = (size_t)virtio_max_dma_size(vdev); This may lead overflow, Ex (size_t)4G -> (u32)0. Once virtio_max_dma_size() has a larger size than U32_MAX, use U32_MAX instead. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20230904061045.510460-1-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-31Merge tag 'hardening-v6.7-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One of the more voluminous set of changes is for adding the new __counted_by annotation[1] to gain run-time bounds checking of dynamically sized arrays with UBSan. - Add LKDTM test for stuck CPUs (Mark Rutland) - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo) - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva) - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh) - Convert group_info.usage to refcount_t (Elena Reshetova) - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva) - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn) - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook) - Fix strtomem() compile-time check for small sources (Kees Cook)" * tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits) hwmon: (acpi_power_meter) replace open-coded kmemdup_nul reset: Annotate struct reset_control_array with __counted_by kexec: Annotate struct crash_mem with __counted_by virtio_console: Annotate struct port_buffer with __counted_by ima: Add __counted_by for struct modsig and use struct_size() MAINTAINERS: Include stackleak paths in hardening entry string: Adjust strtomem() logic to allow for smaller sources hardening: x86: drop reference to removed config AMD_IOMMU_V2 randstruct: Fix gcc-plugin performance mode to stay in group mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by irqchip/imx-intmux: Annotate struct intmux_data with __counted_by KVM: Annotate struct kvm_irq_routing_table with __counted_by virt: acrn: Annotate struct vm_memory_region_batch with __counted_by hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by sparc: Annotate struct cpuinfo_tree with __counted_by isdn: kcapi: replace deprecated strncpy with strscpy_pad isdn: replace deprecated strncpy with strscpy NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by ...
2023-10-28block: move bdev_mark_dead out of disk_check_media_changeChristoph Hellwig2-2/+6
disk_check_media_change is mostly called from ->open where it makes little sense to mark the file system on the device as dead, as we are just opening it. So instead of calling bdev_mark_dead from disk_check_media_change move it into the few callers that are not in an open instance. This avoid calling into bdev_mark_dead and thus taking s_umount with open_mutex held. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231017184823.1383356-4-hch@lst.de Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28zram: Convert to use bdev_open_by_dev()Jan Kara2-18/+15
Convert zram to use bdev_open_by_dev() and pass the handle around. CC: Minchan Kim <minchan@kernel.org> CC: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-8-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28xen/blkback: Convert to bdev_open_by_dev()Jan Kara3-23/+25
Convert xen/blkback to use bdev_open_by_dev() and pass the handle around. CC: xen-devel@lists.xenproject.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-7-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28rnbd-srv: Convert to use bdev_open_by_path()Jan Kara2-14/+15
Convert rnbd-srv to use bdev_open_by_path() and pass the handle around. CC: Jack Wang <jinpu.wang@ionos.com> CC: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Acked-by: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-6-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28pktcdvd: Convert to bdev_open_by_dev()Jan Kara1-35/+41
Convert pktcdvd to use bdev_open_by_dev(). Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-5-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28drdb: Convert to use bdev_open_by_path()Jan Kara2-33/+34
Convert drdb to use bdev_open_by_path(). CC: drbd-dev@lists.linbit.com Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-4-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-19block: ublk_drv: Remove unused functionJiapeng Chong1-9/+0
The function are defined in the ublk_drv.c file, but not called elsewhere, so delete the unused function. drivers/block/ublk_drv.c:1211:20: warning: unused function 'ublk_abort_io_cmds'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6938 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Fixes: b4e1353f4651 ("ublk: simplify aborting request") Reviewed-by: Ming Lei <ming.lei@rehdat.com> Link: https://lore.kernel.org/r/20231019030444.53680-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-17ublk: simplify aborting requestMing Lei1-30/+11
Now ublk_abort_queue() is run exclusively with ublk_queue_rq() and the ubq_daemon task, so simplify aborting request: - set UBLK_IO_FLAG_ABORTED in ublk_abort_queue() just for aborting this request - abort request in ublk_queue_rq() if ubq->canceling is set Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-17ublk: replace monitor with cancelable uring_cmdMing Lei1-89/+119
Monitor work actually introduces one extra context for handling abort, this way is easy to cause race, and also introduce extra delay when handling aborting. Now we start to support cancelable uring_cmd, so use it instead: 1) this cancel callback is either run from the uring cmd submission task context or called after the io_uring context is exit, so the callback is run exclusively with ublk_ch_uring_cmd() and __ublk_rq_task_work(). 2) the previous patch freezes request queue when calling ublk_abort_queue(), which is now completely exclusive with ublk_queue_rq() and ublk_ch_uring_cmd()/__ublk_rq_task_work(). 3) in timeout handler, if all IOs are in-flight, then all uring commands are completed, uring command canceling can't help us to provide forward progress any more, so call ublk_abort_requests() in timeout handler. This way simplifies aborting queue, and is helpful for adding new feature, such as, relax the limit of using single task for handling one queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-17ublk: quiesce request queue when aborting queueMing Lei1-9/+50
So far aborting queue ends request when the ubq daemon is exiting, and it can be run concurrently with ublk_queue_rq(), this way is fragile and we depend on the tricky usage of UBLK_IO_FLAG_ABORTED for avoiding such race. Quiesce queue when aborting queue, and the two code paths can be run completely exclusively, then it becomes easier to add new ublk feature, such as relaxing single same task limit for each queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>