Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, uring_cmd with UBLK_IO_FETCH_REQ or
UBLK_IO_COMMIT_AND_FETCH_REQ is always checked whether
userspace server has provided IO buffer even flag
UBLK_F_NEED_GET_DATA is configured.
This is a excessive check. If UBLK_F_NEED_GET_DATA is
configured, FETCH_RQ doesn't need to provide IO buffer;
COMMIT_AND_FETCH_REQ also doesn't need to do that if
the IO type is not READ.
Check ub_cmd->addr together with ublk_need_get_data()
and IO type in ublk_ch_uring_cmd().
With this fix, userspace server doesn't need to preserve
buffers for every ublk_io when flag UBLK_F_NEED_GET_DATA
is configured, in order to save memory.
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Fixes: c86019ff75c1 ("ublk_drv: add support for UBLK_IO_NEED_GET_DATA")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230210141356.112321-1-xiaodong.liu@intel.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
`dasd_reserve_req` is allocated before `dasd_vol_info_req`, and it
also needs to be freed before the error returns, just like the other
cases in this function.
Fixes: 9e12e54c7a8f ("s390/dasd: Handle out-of-space constraint")
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Link: https://lore.kernel.org/r/20221208133809.16796-1-linqiheng@huawei.com
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230210000253.1644903-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This does not fix a real bug, since virtual addresses
are currently indentical to physical ones.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20230210000253.1644903-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_acct_set was enabled for raid0/raid5 io accounting. bios that contain
md_io_acct are allocated in the i/o path. There isn't a good method to
monitor if these bios are all finished and freed. In the takeover process,
io_acct_set (which is used for bios with md_io_acct) need to be freed.
However, if some bios finish after io_acct_set is freed, it may trigger
the following panic:
[ 6973.767999] RIP: 0010:mempool_free+0x52/0x80
[ 6973.786098] Call Trace:
[ 6973.786549] md_end_io_acct+0x31/0x40
[ 6973.787227] blk_update_request+0x224/0x380
[ 6973.787994] blk_mq_end_request+0x1a/0x130
[ 6973.788739] blk_complete_reqs+0x35/0x50
[ 6973.789456] __do_softirq+0xd7/0x2c8
[ 6973.790114] ? sort_range+0x20/0x20
[ 6973.790763] run_ksoftirqd+0x2a/0x40
[ 6973.791400] smpboot_thread_fn+0xb5/0x150
[ 6973.792114] kthread+0x10b/0x130
[ 6973.792724] ? set_kthread_struct+0x50/0x50
[ 6973.793491] ret_from_fork+0x1f/0x40
Fix this by increasing and decreasing active_io for each bio with
md_io_acct so that mddev_suspend() will wait until all bios from
io_acct_set finish before freeing io_acct_set.
Reported-by: Fine Fan <ffan@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
Inside ublk_ctrl_del_dev(), when the device is removed, we wait
until the device number is freed with holding global lock of
ublk_ctl_mutex, this way isn't friendly from user viewpoint:
1) if device is in-use, the current delete command hangs in
ublk_ctrl_del_dev(), and user can't break from the handling
because wait_event() is used
2) global lock is held, so any new device can't be added and
other old devices can't be removed.
Improve the deleting handling by the following way, suggested by
Nadav:
1) wait without holding the global lock
2) replace wait_event() with wait_event_interruptible()
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Suggested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230207150700.545530-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
for-6.3/block
Pull NVMe updates from Christoph:
"nvme updates for Linux 6.3
- small improvements to the logging functionality (Amit Engel)
- authentication cleanups (Hannes Reinecke)
- cleanup and optimize the DMA mapping cod in the PCIe driver
(Keith Busch)
- work around the command effects for Format NVM (Keith Busch)
- misc cleanups (Keith Busch, Christoph Hellwig)"
* tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme:
nvme: mask CSE effects for security receive
nvme: always initialize known command effects
nvmet: for nvme admin set_features cmd, call nvmet_check_data_len_lte()
nvme-tcp: add additional info for nvme_tcp_timeout log
nvme: add nvme_opcode_str function for all nvme cmd types
nvme: remove nvme_execute_passthru_rq
nvme-pci: place descriptor addresses in iod
nvme-pci: use mapped entries for sgl decision
nvme-pci: remove SGL segment descriptors
nvme-auth: don't use NVMe status codes
nvme-fabrics: clarify AUTHREQ result handling
|
|
queuedata is not referenced in ublk_drv and we can use driver_data
instead. Pass NULL to blk_mq_alloc_disk() as queuedata while allocating
ublk's gendisk.
Signed-off-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230207070839.370817-4-ZiyangZhang@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
WRITE_ZEROES won't return bytes returned just like FLUSH and DISCARD,
and we can end it directly. Add missing comment for it in
ublk_complete_rq().
Signed-off-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230207070839.370817-3-ZiyangZhang@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bio_has_data() allows a NULL bio so the NULL check in
ublk_rq_has_data() is unnecessary.
Signed-off-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230207070839.370817-2-ZiyangZhang@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_page helper to initialize a bvec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_page helper to initialize bvecs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_virt helper to initialize the special_vec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_page helper to initialize the copy up bvec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_virt helper to initialize the special_vec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_page helper to initialize bvecs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_set_page helper to initialize bvecs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a helper for setting up the special_bvec instead of open coding it
in three place, and use the new bvec_set_page helper to initialize
special_vec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just replace magic numbers by MD_RESYNC_* enumerations.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
dm raid calls md_stop to stop the raid device. It needs to
free the writes_pending here.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
Now the type of active_io is atomic. It's used to count how many ios are
in the submitting process and it's added and decreased very time. But it
only needs to check if it's zero when suspending the raid. So we can
switch atomic to percpu to improve the performance.
After switching active_io to percpu type, we use the state of active_io
to judge if the raid device is suspended. And we don't need to wake up
->sb_wait in md_handle_request anymore. It's done in the callback function
which is registered when initing active_io. The argument mddev->suspended
is only used to count how many users are trying to set raid to suspend
state.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
This helper function will be used in next patch. It's easy for
understanding.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
Don't update recovery_cp when curr_resync is MD_RESYNC_ACTIVE, otherwise
md may skip the resync of the first 3 sectors if the resync procedure is
interrupted before the first calling of ->sync_request() as shown below:
md_do_sync thread control thread
// setup resync
mddev->recovery_cp = 0
j = 0
mddev->curr_resync = MD_RESYNC_ACTIVE
// e.g., set array as idle
set_bit(MD_RECOVERY_INTR, &&mddev_recovery)
// resync loop
// check INTR before calling sync_request
!test_bit(MD_RECOVERY_INTR, &mddev->recovery
// resync interrupted
// update recovery_cp from 0 to 3
// the resync of three 3 sectors will be skipped
mddev->recovery_cp = 3
Fixes: eac58d08d493 ("md: Use enum for overloaded magic numbers used by mddev->curr_resync")
Cc: stable@vger.kernel.org # 6.0+
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
|
|
Make the following minor changes which were reported by colleagues
while reviewing this code:
- Remove the parentheses from around the LOOP_DEFAULT_HW_Q_DEPTH
definition since these are superfluous.
- Accept other number formats than decimal, e.g. hexadecimal.
- Do not set hw_queue_depth to an out-of-range value, even if that value
won't be used.
- Use the LOOP_DEFAULT_HW_Q_DEPTH macro in the kernel module parameter
description to prevent that the description gets out of sync.
This patch has been tested as follows:
# modprobe -r loop
# modprobe loop hw_queue_depth=-1
modprobe: ERROR: could not insert 'loop': Invalid argument
# modprobe loop hw_queue_depth=0
modprobe: ERROR: could not insert 'loop': Invalid argument
# modprobe loop hw_queue_depth=1; cat /sys/module/loop/parameters/hw_queue_depth
1
# modprobe -r loop; modprobe loop; cat /sys/module/loop/parameters/hw_queue_depth hw_queue_depth=0x10
16
# modprobe -r loop; modprobe loop; cat /sys/module/loop/parameters/hw_queue_depth hw_queue_depth=128
128
# modprobe -r loop; modprobe loop hw_queue_depth=129; cat /sys/module/loop/parameters/hw_queue_depth
129
# modprobe -r loop; modprobe loop hw_queue_depth=$((1<<32))
modprobe: ERROR: could not insert 'loop': Numerical result out of range
See also commit ef44c50837ab ("loop: allow user to set the queue
depth").
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230130211347.832110-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The nvme driver will freeze the IO queues in response to an admin
command with CSE bits set. These bits notify the host that the command
that's about to be executed needs to be done exclusively, hence the
freeze.
The Security Receive command is often reported by multiple vendors with
CSE bits set. The reason for this is that the result depends on the
previous Security Send. This has nothing to do with IO queues, though,
so the driver is taking an overly cautious response to seeing this
passthrough command, while unable to fufill the intended admin queue
action.
Rather than freeze IO during this harmless command, mask off the
effects. This freezing is observed to cause IO latency spikes when host
software periodically validates the security state of the drives.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Instead of appending command effects flags per IO, set the known effects
flags the driver needs to react to just once during initial setup.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is due to the fact that the host is allowed to pass the controller
an sgl describing a buffer that is larger than the payload itself
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This provides additional details about the rq/cmd that is timed out
example log if CONFIG_NVME_VERBOSE_ERRORS is configured:
"nvme nvme0: queue 2 timeout cid 0xd058 type 4 opc Write (0x1)"
example log if CONFIG_NVME_VERBOSE_ERRORS is not configured:
"nvme nvme0: queue 2 timeout cid 0xd058 type 4 opc I/O Cmd (0x1)"
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
nvme_opcode_str will handle io/admin/fabrics ops
This improves NVMe errors logging
Signed-off-by: Amit Engel <Amit.Engel@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
After moving the nvme_passthru_end call to the callers of
nvme_execute_passthru_rq, this function has become quite pointless,
so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
|
The 'struct nvme_iod' space is appended at the end of the preallocated
'struct request', and padded to the cache line size. This leaves some
free memory (in most kernel configs) up for grabs.
Instead of appending the nvme data descriptor addresses after the
scatterlist, inline these for free within struct nvme_iod. There is now
enough space in the mempool for 128 possibe segments.
And without increasing the size of the preallocated requests, we can
hold up to 5 PRP descriptor elements, allowing the driver to increase
its max transfer size to 8MB.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The driver uses the dma entries for setting up its command's SGL/PRP
lists. The dma mapping might have fewer entries than the physical
segments, so check the dma mapped count to determine which nvme data
layout method is more optimal.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The max segments this driver can see is 127, well below the 256
threshold needed to add an nvme sgl segment descriptor. Remove all the
useless checks and dead code.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
NVMe status codes are part of the wire protocol, and shouldn't be
fabricated in the stack. So with this patch the authentication code
is switched over to use error codes; as a side effect authentication
failures due to internal error won't be retried anymore.
But that shouldn't have happened anyway.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The NVMe 2.0 spec defines the ATR and ASCR bits in the AUTHREQ
connect response field to be mutually exclusive. So to clarify the
handling here switch the AUTHREQ handling to use the bit definitions
and check for both bits.
And while we're at it, add a message to the user that secure
concatenation is not supported (yet).
Suggested-by: Mark Lehrer <mark.lehrer@wdc.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Owner of one unprivileged ublk device could be one evil user, which
can grant this disk's privilege to other users deliberately, and
this way could be like making one trap and waiting for other users
to be caught.
So only owner to open unprivileged disk even though the owner
grants disk privilege to other user. This way is reasonable too
given anyone can create ublk disk, and no need other's grant.
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: 4093cb5a0634 ("ublk_drv: add mechanism for supporting unprivileged ublk device")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230131040446.214583-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
s390 iterates over the bio using bio_for_each_segment and doesn't need
any bio splitting.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20230123075356.60847-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ps3vram iterates over the bio one segment, that is page aligned and max
page sized chunk, a time. Because of that there is no point in
calling bio_split_to_limits, or explicitly setting the default limits
that are only used by bio_split_to_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Geoff Levand <geoff@infradead.org>
Link: https://lore.kernel.org/r/20230123074718.57951-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a generic bdev_zone_no() helper to calculate zone number for a
given sector in a block device. This helper internally uses disk_zone_no()
to find the zone number.
Use the helper bdev_zone_no() to calculate nr of zones. This lets us
make modifications to the math if needed in one place.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20230110143635.77300-4-p.raghav@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch sets REQ_ALLOC_CACHE flag for uring-passthru requests.
This is a prep-patch so that normal / IRQ-driven uring-passthru
I/Os can also leverage bio-cache.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20230117120638.72254-2-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
unprivileged ublk device is helpful for container use case, such
as: ublk device created in one unprivileged container can be controlled
and accessed by this container only.
Implement this feature by adding flag of UBLK_F_UNPRIVILEGED_DEV, and if
this flag isn't set, any control command has been run from privileged
user. Otherwise, any control command can be sent from any unprivileged
user, but the user has to be permitted to access the ublk char device
to be controlled.
In case of UBLK_F_UNPRIVILEGED_DEV:
1) for command UBLK_CMD_ADD_DEV, it is always allowed, and user needs
to provide owner's uid/gid in this command, so that udev can set correct
ownership for the created ublk device, since the device owner uid/gid
can be queried via command of UBLK_CMD_GET_DEV_INFO.
2) for other control commands, they can only be run successfully if the
current user is allowed to access the specified ublk char device, for
running the permission check, path of the ublk char device has to be
provided by these commands.
Also add one control of command UBLK_CMD_GET_DEV_INFO2 which always
include the char dev path in payload since userspace may not have
knowledge if this device is created in unprivileged mode.
For applying this mechanism, system administrator needs to take
the following policies:
1) chmod 0666 /dev/ublk-control
2) change ownership of ublkcN & ublkbN
- chown owner_uid:owner_gid /dev/ublkcN
- chown owner_uid:owner_gid /dev/ublkbN
Both can be done via one simple udev rule.
Userspace:
https://github.com/ming1/ubdsrv/tree/unprivileged-ublk
'ublk add -t $TYPE --un_privileged=1' is for creating one un-privileged
ublk device if the user is un-privileged.
Link: https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230106041711.914434-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Prepare for supporting unprivileged ublk device by limiting max number
ublk devices added. Otherwise too many ublk devices could be added by
un-trusted user, which can be thought as one DoS.
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230106041711.914434-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Userspace side only knows device ID, but the associated path of ublkc* and
ublkb* could be changed by udev, and that depends on userspace's policy, so
add parameter of UBLK_PARAM_TYPE_DEVT for retrieving major/minor of the
ublkc* and ublkb*, then user may figure out major/minor of the ublk disks
he/she owns. With major/minor, it is easy to find the device node path.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230106041711.914434-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It is annoying for each control command handler to get/put ublk
device and deal with failure.
Control command handler is simplified a lot by moving
ublk_get_device_from_id into ublk_ctrl_uring_cmd().
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230106041711.914434-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If any ubq daemon is unprivileged, the ublk char device is allowed
for unprivileged user actually, and we can't trust the current user,
so not probe partitions.
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230106041711.914434-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No one uses 'nr_aborted_queues' any more, so remove it.
Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230106041711.914434-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20230113123538.144276-9-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Trying to remove an "empty" (just initialized, or "cleared") interval
from the tree, this results in an endless loop.
As we typically protect the tree with a spinlock_irq,
the result is a hung system.
Be nice to error cleanup code paths, ignore removal of empty intervals.
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20230113123538.144276-8-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This require_context attribute originated in a proposed sparse patch by
Philipp Reisner back in 2008. Johannes Berg had a different solution to
a similar problem, and that patch "won" in the end; so the require_context
thing never got merged. The whole history can be read at [0].
DRBD kept using these annotations anyway for a while. Nowadays, on a
modern unmodified sparse, they obviously do nothing, and they are hardly
used anymore anyway.
So, just remove the definitions of these macros.
[0] https://www.spinics.net/lists/linux-sparse/msg01150.html
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Joel Colledge <joel.colledge@linbit.com>
Link: https://lore.kernel.org/r/20230113123538.144276-6-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Joel Colledge <joel.colledge@linbit.com>
Link: https://lore.kernel.org/r/20230113123538.144276-5-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To be more similar to what we do in the out-of-tree module and ease the
upstreaming process.
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Joel Colledge <joel.colledge@linbit.com>
Link: https://lore.kernel.org/r/20230113123506.144082-4-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|