summaryrefslogtreecommitdiff
path: root/include/linux/blk_types.h
AgeCommit message (Collapse)AuthorFilesLines
2021-08-31Merge tag 'for-5.15-tag' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The highlights of this round are integrations with fs-verity and idmapped mounts, the rest is usual mix of minor improvements, speedups and cleanups. There are some patches outside of btrfs, namely updating some VFS interfaces, all straightforward and acked. Features: - fs-verity support, using standard ioctls, backward compatible with read-only limitation on inodes with previously enabled fs-verity - idmapped mount support - make mount with rescue=ibadroots more tolerant to partially damaged trees - allow raid0 on a single device and raid10 on two devices, degenerate cases but might be useful as an intermediate step during conversion to other profiles - zoned mode block group auto reclaim can be disabled via sysfs knob Performance improvements: - continue readahead of node siblings even if target node is in memory, could speed up full send (on sample test +11%) - batching of delayed items can speed up creating many files - fsync/tree-log speedups - avoid unnecessary work (gains +2% throughput, -2% run time on sample load) - reduced lock contention on renames (on dbench +4% throughput, up to -30% latency) Fixes: - various zoned mode fixes - preemptive flushing threshold tuning, avoid excessive work on almost full filesystems Core: - continued subpage support, preparation for implementing remaining features like compression and defragmentation; with some limitations, write is now enabled on 64K page systems with 4K sectors, still considered experimental - no readahead on compressed reads - inline extents disabled - disabled raid56 profile conversion and mount - improved flushing logic, fixing early ENOSPC on some workloads - inode flags have been internally split to read-only and read-write incompat bit parts, used by fs-verity - new tree items for fs-verity - descriptor item - Merkle tree item - inode operations extended to be namespace-aware - cleanups and refactoring Generic code changes: - fs: new export filemap_fdatawrite_wbc - fs: removed sync_inode - block: bio_trim argument type fixups - vfs: add namespace-aware lookup" * tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits) btrfs: reset replace target device to allocation state on close btrfs: zoned: fix ordered extent boundary calculation btrfs: do not do preemptive flushing if the majority is global rsv btrfs: reduce the preemptive flushing threshold to 90% btrfs: tree-log: check btrfs_lookup_data_extent return value btrfs: avoid unnecessarily logging directories that had no changes btrfs: allow idmapped mount btrfs: handle ACLs on idmapped mounts btrfs: allow idmapped INO_LOOKUP_USER ioctl btrfs: allow idmapped SUBVOL_SETFLAGS ioctl btrfs: allow idmapped SET_RECEIVED_SUBVOL ioctls btrfs: relax restrictions for SNAP_DESTROY_V2 with subvolids btrfs: allow idmapped SNAP_DESTROY ioctls btrfs: allow idmapped SNAP_CREATE/SUBVOL_CREATE ioctls btrfs: check whether fsgid/fsuid are mapped during subvolume creation btrfs: allow idmapped permission inode op btrfs: allow idmapped setattr inode op btrfs: allow idmapped tmpfile inode op btrfs: allow idmapped symlink inode op btrfs: allow idmapped mkdir inode op ...
2021-08-31Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull support for struct bio recycling from Jens Axboe: "This adds bio recycling support for polled IO, allowing quick reuse of a bio for high IOPS scenarios via a percpu bio_set list. It's good for almost a 10% improvement in performance, bumping our per-core IO limit from ~3.2M IOPS to ~3.5M IOPS" * tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block: bio: improve kerneldoc documentation for bio_alloc_kiocb() block: provide bio_clear_hipri() helper block: use the percpu bio cache in __blkdev_direct_IO io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
2021-08-23bio: add allocation cache abstractionJens Axboe1-0/+1
Add a per-cpu bio_set cache for bio allocations, enabling us to quickly recycle them instead of going through the slab allocator. This cache isn't IRQ safe, and hence is only really suitable for polled IO. Very simple - keeps a count of bio's in the cache, and maintains a max of 512 with a slack of 64. If we get above max + slack, we drop slack number of bio's. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23block: fix argument type of bio_trim()Chaitanya Kulkarni1-0/+1
The function bio_trim has offset and size arguments that are declared as int. The callers of this function use sector_t type when passing the offset and size, e.g. drivers/md/raid1.c:narrow_write_error() and drivers/md/raid1.c:narrow_write_error(). Change offset and size arguments to sector_t type for bio_trim(). Also, add WARN_ON_ONCE() to catch their overflow. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-09block: remove the bd_bdi in struct block_deviceChristoph Hellwig1-1/+0
Just retrieve the bdi from the disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: look up holders by bdevChristoph Hellwig1-3/+0
Invert they way the holder relations are tracked. This very slightly reduces the memory overhead for partitioned devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210804094147.459763-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: make the block holder code optionalChristoph Hellwig1-1/+1
Move the block holder code into a separate file as it is not in any way related to the other block_dev.c code, and add a new selectable config option for it so that we don't have to build it without any remapped drivers selected. The Kconfig symbol contains a _DEPRECATED suffix to match the comments added in commit 49731baa41df ("block: restore multiple bd_link_disk_holder() support"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-09Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+0
Pull more block updates from Jens Axboe: "A combination of changes that ended up depending on both the driver and core branch (and/or the IDE removal), and a few late arriving fixes. In detail: - Fix io ticks wrap-around issue (Chunguang) - nvme-tcp sock locking fix (Maurizio) - s390-dasd fixes (Kees, Christoph) - blk_execute_rq polling support (Keith) - blk-cgroup RCU iteration fix (Yu) - nbd backend ID addition (Prasanna) - Partition deletion fix (Yufen) - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph) - Removal of now dead block request types due to IDE removal (Christoph) - Loop probing and control device cleanups (Christoph) - Device uevent fix (Christoph) - Misc cleanups/fixes (Tetsuo, Christoph)" * tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits) blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs block: fix the problem of io_ticks becoming smaller nvme-tcp: can't set sk_user_data without write_lock loop: remove unused variable in loop_set_status() block: remove the bdgrab in blk_drop_partitions block: grab a device refcount in disk_uevent s390/dasd: Avoid field over-reading memcpy() dasd: unexport dasd_set_target_state block: check disk exist before trying to add partition ubd: remove dead code in ubd_setup_common nvme: use return value from blk_execute_rq() block: return errors from blk_execute_rq() nvme: use blk_execute_rq() for passthrough commands block: support polling through blk_execute_rq block: remove REQ_OP_SCSI_{IN,OUT} block: mark blk_mq_init_queue_data static loop: rewrite loop_exit using idr_for_each_entry loop: split loop_lookup loop: don't allow deleting an unspecified loop device loop: move loop_ctl_mutex locking into loop_add ...
2021-07-01Merge tag 'for-5.14/dm-changes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Various DM persistent-data library improvements and fixes that benefit both the DM thinp and cache targets. - A few small DM kcopyd efficiency improvements. - Significant zoned related block core, DM core and DM zoned target changes that culminate with adding zoned append emulation (which is required to properly fix DM crypt's zoned support). - Various DM writecache target changes that improve efficiency. Adds an optional "metadata_only" feature that only promotes bios flagged with REQ_META. But the most significant improvement is writecache's ability to pause writeback, for a confiurable time, if/when the working set is larger than the cache (and the cache is full) -- this ensures performance is no worse than the slower origin device. * tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits) dm writecache: make writeback pause configurable dm writecache: pause writeback if cache full and origin being written directly dm io tracker: factor out IO tracker dm btree remove: assign new_root only when removal succeeds dm zone: fix dm_revalidate_zones() memory allocation dm ps io affinity: remove redundant continue statement dm writecache: add optional "metadata_only" parameter dm writecache: add "cleaner" and "max_age" to Documentation dm writecache: write at least 4k when committing dm writecache: flush origin device when writing and cache is full dm writecache: have ssd writeback wait if the kcopyd workqueue is busy dm writecache: use list_move instead of list_del/list_add in writecache_writeback() dm writecache: commit just one block, not a full page dm writecache: remove unused gfp_t argument from wc_add_block() dm crypt: Fix zoned block device support dm: introduce zone append emulation dm: rearrange core declarations for extended use from dm-zone.c block: introduce BIO_ZONE_WRITE_LOCKED bio flag block: introduce bio zone helpers block: improve handling of all zones reset operation ...
2021-07-01block: remove REQ_OP_SCSI_{IN,OUT}Christoph Hellwig1-3/+0
With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-04block: introduce BIO_ZONE_WRITE_LOCKED bio flagDamien Le Moal1-0/+1
Introduce the BIO flag BIO_ZONE_WRITE_LOCKED to indicate that a BIO owns the write lock of the zone it is targeting. This is the counterpart of the struct request flag RQF_ZONE_WRITE_LOCKED. This new BIO flag is reserved for now for zone write locking control for device mapper targets exposing a zoned block device. Since in this case, the lock flag must not be propagated to the struct request that will be used to process the BIO, a BIO private flag is used rather than changing the RQF_ZONE_WRITE_LOCKED request flag into a common REQ_XXX flag that could be used for both BIO and request. This avoids conflicts down the stack with the block IO scheduler zone write locking (in mq-deadline). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-06-01block: move bd_part_count to struct gendiskChristoph Hellwig1-3/+0
The bd_part_count value only makes sense for whole devices, so move it to struct gendisk and give it a more descriptive name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210525061301.2242282-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01block: move bd_mutex to struct gendiskChristoph Hellwig1-1/+0
Replace the per-block device bd_mutex with a per-gendisk open_mutex, thus simplifying locking wherever we deal with partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08block: use bi_max_vecs to find the bvec poolChristoph Hellwig1-28/+1
Instead of encoding of the bvec pool using magic bio flags, just use a helper to find the pool based on the max_vecs value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: do not reassig ->bi_bdev when partition remappingChristoph Hellwig1-0/+1
There is no good reason to reassign ->bi_bdev when remapping the partition-relative block number to the device wide one, as all the information required by the drivers comes from the gendisk anyway. Keeping the original ->bi_bdev alive will allow to greatly simplify the partition-away I/O accounting. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: store a block_device pointer in struct bioChristoph Hellwig1-2/+1
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: merge struct block_device and struct hd_structChristoph Hellwig1-2/+6
Instead of having two structures that represent each block device with different life time rules, merge them into a single one. This also greatly simplifies the reference counting rules, as we can use the inode reference count as the main reference count for the new struct block_device, with the device model reference front ending it for device model interaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: allocate struct hd_struct as part of struct bdev_inodeChristoph Hellwig1-1/+1
Allocate hd_struct together with struct block_device to pre-load the lifetime rule changes in preparation of merging the two structures. Note that part0 was previously embedded into struct gendisk, but is a separate allocation now, and already points to the block_device instead of the hd_struct. The lifetime of struct gendisk is still controlled by the struct device embedded in the part0 hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: move the policy field to struct block_deviceChristoph Hellwig1-0/+1
Move the policy field to struct block_device and rename it to the more descriptive bd_read_only. Also turn the field into a bool as it is used as such. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: move make_it_fail to struct block_deviceChristoph Hellwig1-0/+3
Move the make_it_fail flag to struct block_device an turn it into a bool in preparation of killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: move holder_dir to struct block_deviceChristoph Hellwig1-0/+1
Move the holder_dir field to struct block_device in preparation for kill struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: move the partition_meta_info to struct block_deviceChristoph Hellwig1-0/+2
Move the partition_meta_info to struct block_device in preparation for killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: move the start_sect field to struct block_deviceChristoph Hellwig1-0/+1
Move the start_sect field to struct block_device in preparation of killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: move disk stat accounting to struct block_deviceChristoph Hellwig1-0/+2
Move the dkstats and stamp field to struct block_device in preparation of killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: remove ->bd_containsChristoph Hellwig1-1/+3
Now that each hd_struct has a reference to the corresponding block_device, there is no need for the bd_contains pointer. Add a bdev_whole() helper to look up the whole device block_device struture instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: add a bdev_kobj helperChristoph Hellwig1-0/+3
Add a little helper to find the kobject for a struct block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02fs: simplify freeze_bdev/thaw_bdevChristoph Hellwig1-0/+1
Store the frozen superblock in struct block_device to avoid the awkward interface that can return a sb only used a cookie, an ERR_PTR or NULL. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-14block: add zone specific block statusesKeith Busch1-0/+18
A zoned device with limited resources to open or activate zones may return an error when the host exceeds those limits. The same command may be successful if retried later, but the host needs to wait for specific zone states before it should expect a retry to succeed. Have the block layer provide an appropriate status for these conditions so applications can distinuguish this error for special handling. Cc: linux-api@vger.kernel.org Cc: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-13Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+3
Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...
2020-09-25block: remove unused BLK_QC_T_EAGAIN flagJeffle Xu1-2/+1
commit 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE") removed the REQ_NOWAIT_INLINE related code, but the diff wasn't applied to blk_types.h somehow. Then commit 2771cefeac49 ("block: remove the REQ_NOWAIT_INLINE flag") removed the REQ_NOWAIT_INLINE flag while the BLK_QC_T_EAGAIN flag still remains. Fixes: 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-23block: move the NEED_PART_SCAN flag to struct gendiskChristoph Hellwig1-3/+1
We can only scan for partitions on the whole disk, so move the flag from struct block_device to struct gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: rename bd_invalidatedChristoph Hellwig1-1/+3
Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags variable to better describe the condition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: remove an outdated comment on the bd_dev fieldChristoph Hellwig1-1/+1
kdev_t is long gone, so we don't need to comment a field isn't one.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: remove the BIO_USER_MAPPED flagChristoph Hellwig1-1/+0
Just check if there is private data, in which case the bio must have originated from bio_copy_user_iov. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: remove the BIO_NULL_MAPPED flagChristoph Hellwig1-1/+0
We can simply use a boolean flag in the bio_map_data data structure instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02block: fix locking for struct block_device size updatesChristoph Hellwig1-0/+1
Two different callers use two different mutexes for updating the block device size, which obviously doesn't help to actually protect against concurrent updates from the different callers. In addition one of the locks, bd_mutex is rather prone to deadlocks with other parts of the block stack that use it for high level synchronization. Switch to using a new spinlock protecting just the size updates, as that is all we need, and make sure everyone does the update through the proper helper. This fixes a bug reported with the nvme revalidating disks during a hot removal operation, which can currently deadlock on bd_mutex. Reported-by: Xianting Tian <xianting_tian@126.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbersColy Li1-4/+4
Currently REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL are defined as even numbers 6 and 8, such zone reset bios are treated as READ bios by bio_data_dir(), which is obviously misleading. The macro bio_data_dir() is defined in include/linux/bio.h as, 55 #define bio_data_dir(bio) \ 56 (op_is_write(bio_op(bio)) ? WRITE : READ) And op_is_write() is defined in include/linux/blk_types.h as, 397 static inline bool op_is_write(unsigned int op) 398 { 399 return (op & 1); 400 } The convention of op_is_write() is when there is data transfer then the op code should be odd number, and treat as a write op. bio_data_dir() treats all bio direction as READ if op_is_write() reports false, and WRITE if op_is_write() reports true. Because REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL are even numbers, although they don't transfer data but reporting them as READ bio by bio_data_dir() is misleading and might be wrong. Because these two commands will reset the writer pointers of the resetting zones, and all content after the reset write pointer will be invalid and unaccessible, obviously they are not READ bios in any means. This patch changes REQ_OP_ZONE_RESET from 6 to 15, and changes REQ_OP_ZONE_RESET_ALL from 8 to 17. Now bios with these two op code can be treated as WRITE by bio_data_dir(). Although they don't transfer data, now we keep them consistent with REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES with the ituition that they change on-media content and should be WRITE request. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@fb.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01block: remove the all_bdevs listChristoph Hellwig1-1/+0
Instead just iterate over the inodes for the block device superblock. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01block: remove the unused bd_private field from struct block_deviceChristoph Hellwig1-7/+0
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01block: remove the bd_queue field from struct block_deviceChristoph Hellwig1-1/+0
Just use bd_disk->queue instead. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01block: remove the bd_block_size field from struct block_deviceChristoph Hellwig1-1/+0
We can trivially calculate the block size from the inodes i_blkbits variable. Use that instead of keeping two redundant copies of the information in slightly different formats. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-24block: move struct block_device to blk_types.hChristoph Hellwig1-1/+38
Move the struct block_device definition together with most of the block layer definitions, as it has nothing to do with the rest of fs.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-16block: remove the REQ_NOWAIT_INLINE flagChristoph Hellwig1-2/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-14block: Inline encryption support for blk-mqSatya Tangirala1-0/+6
We must have some way of letting a storage device driver know what encryption context it should use for en/decrypting a request. However, it's the upper layers (like the filesystem/fscrypt) that know about and manages encryption contexts. As such, when the upper layer submits a bio to the block layer, and this bio eventually reaches a device driver with support for inline encryption, the device driver will need to have been told the encryption context for that bio. We want to communicate the encryption context from the upper layer to the storage device along with the bio, when the bio is submitted to the block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can represent an encryption context (note that we can't use the bi_private field in struct bio to do this because that field does not function to pass information across layers in the storage stack). We also introduce various functions to manipulate the bio_crypt_ctx and make the bio/request merging logic aware of the bio_crypt_ctx. We also make changes to blk-mq to make it handle bios with encryption contexts. blk-mq can merge many bios into the same request. These bios need to have contiguous data unit numbers (the necessary changes to blk-merge are also made to ensure this) - as such, it suffices to keep the data unit number of just the first bio, since that's all a storage driver needs to infer the data unit number to use for each data block in each bio in a request. blk-mq keeps track of the encryption context to be used for all the bios in a request with the request's rq_crypt_ctx. When the first bio is added to an empty request, blk-mq will program the encryption context of that bio into the request_queue's keyslot manager, and store the returned keyslot in the request's rq_crypt_ctx. All the functions to operate on encryption contexts are in blk-crypto.c. Upper layers only need to call bio_crypt_set_ctx with the encryption key, algorithm and data_unit_num; they don't have to worry about getting a keyslot for each encryption context, as blk-mq/blk-crypto handles that. Blk-crypto also makes it possible for request-based layered devices like dm-rq to make use of inline encryption hardware by cloning the rq_crypt_ctx and programming a keyslot in the new request_queue when necessary. Note that any user of the block layer can submit bios with an encryption context, such as filesystems, device-mapper targets, etc. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-13block: Introduce REQ_OP_ZONE_APPENDKeith Busch1-0/+14
Define REQ_OP_ZONE_APPEND to append-write sectors to a zone of a zoned block device. This is a no-merge write operation. A zone append write BIO must: * Target a zoned block device * Have a sector position indicating the start sector of the target zone * The target zone must be a sequential write zone * The BIO must not cross a zone boundary * The BIO size must not be split to ensure that a single range of LBAs is written with a single command. Implement these checks in generic_make_request_checks() using the helper function blk_check_zone_append(). To avoid write append BIO splitting, introduce the new max_zone_append_sectors queue limit attribute and ensure that a BIO size is always lower than this limit. Export this new limit through sysfs and check these limits in bio_full(). Also when a LLDD can't dispatch a request to a specific zone, it will return BLK_STS_ZONE_RESOURCE indicating this request needs to be delayed, e.g. because the zone it will be dispatched to is still write-locked. If this happens set the request aside in a local list to continue trying dispatching requests such as READ requests or a WRITE/ZONE_APPEND requests targetting other zones. This way we can still keep a high queue depth without starving other requests even if one request can't be served due to zone write-locking. Finally, make sure that the bio sector position indicates the actual write position as indicated by the device on completion. Signed-off-by: Keith Busch <kbusch@kernel.org> [ jth: added zone-append specific add_page and merge_page helpers ] Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-29block: replace BIO_QUEUE_ENTERED with BIO_CGROUP_ACCTChristoph Hellwig1-1/+1
BIO_QUEUE_ENTERED is only used for cgroup accounting now, so rename the flag and move setting it into the cgroup code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-18blk_types: Replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-11-21block: add iostat counters for flush requestsKonstantin Khlebnikov1-0/+1
Requests that triggers flushing volatile writeback cache to disk (barriers) have significant effect to overall performance. Block layer has sophisticated engine for combining several flush requests into one. But there is no statistics for actual flushes executed by disk. Requests which trigger flushes usually are barriers - zero-size writes. This patch adds two iostat counters into /sys/class/block/$dev/stat and /proc/diskstats - count of completed flush requests and their total time. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07block: add zone open, close and finish operationsAjay Joshi1-0/+25
Zoned block devices (ZBC and ZAC devices) allow an explicit control over the condition (state) of zones. The operations allowed are: * Open a zone: Transition to open condition to indicate that a zone will actively be written * Close a zone: Transition to closed condition to release the drive resources used for writing to a zone * Finish a zone: Transition an open or closed zone to the full condition to prevent write operations To enable this control for in-kernel zoned block device users, define the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH as well as the generic function blkdev_zone_mgmt() for submitting these operations on a range of zones. This results in blkdev_reset_zones() removal and replacement with this new zone magement function. Users of blkdev_reset_zones() (f2fs and dm-zoned) are updated accordingly. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25block: reorder bio::__bi_remaining for better packingDavid Sterba1-1/+1
Simple reordering of __bi_remaining can reduce bio size by 8 bytes that are now wasted on padding (measured on x86_64): struct bio { struct bio * bi_next; /* 0 8 */ struct gendisk * bi_disk; /* 8 8 */ unsigned int bi_opf; /* 16 4 */ short unsigned int bi_flags; /* 20 2 */ short unsigned int bi_ioprio; /* 22 2 */ short unsigned int bi_write_hint; /* 24 2 */ blk_status_t bi_status; /* 26 1 */ u8 bi_partno; /* 27 1 */ /* XXX 4 bytes hole, try to pack */ struct bvec_iter bi_iter; /* 32 24 */ /* XXX last struct has 4 bytes of padding */ atomic_t __bi_remaining; /* 56 4 */ /* XXX 4 bytes hole, try to pack */ [...] /* size: 104, cachelines: 2, members: 19 */ /* sum members: 96, holes: 2, sum holes: 8 */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 40 bytes */ }; Now becomes: struct bio { struct bio * bi_next; /* 0 8 */ struct gendisk * bi_disk; /* 8 8 */ unsigned int bi_opf; /* 16 4 */ short unsigned int bi_flags; /* 20 2 */ short unsigned int bi_ioprio; /* 22 2 */ short unsigned int bi_write_hint; /* 24 2 */ blk_status_t bi_status; /* 26 1 */ u8 bi_partno; /* 27 1 */ atomic_t __bi_remaining; /* 28 4 */ struct bvec_iter bi_iter; /* 32 24 */ /* XXX last struct has 4 bytes of padding */ [...] /* size: 96, cachelines: 2, members: 19 */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 32 bytes */ }; Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>