summaryrefslogtreecommitdiff
path: root/include/uapi/linux/ublk_cmd.h
AgeCommit message (Collapse)AuthorFilesLines
2023-08-21ublk: zoned: support REQ_OP_ZONE_RESET_ALLMing Lei1-0/+1
There isn't any reason to not support REQ_OP_ZONE_RESET_ALL given everything is actually handled in userspace, not mention it is pretty easy to support RESET_ALL. So enable REQ_OP_ZONE_RESET_ALL and let userspace handle it. Verified by 'tools/zbc_reset_zone -all /dev/ublkb0' in libzbc[1] with libublk-rs based ublk-zoned target prototype[2], follows command line for creating ublk-zoned: cargo run --example zoned -- add -1 1024 # add $dev_id $DEV_SIZE [1] https://github.com/westerndigitalcorporation/libzbc [2] https://github.com/ming1/libublk-rs/tree/zoned.v2 Cc: Niklas Cassel <Niklas.Cassel@wdc.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Andreas Hindborg <a.hindborg@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230810124326.321472-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09ublk: enable zoned storage supportAndreas Hindborg1-9/+54
Add zoned storage support to ublk: report_zones and operations: - REQ_OP_ZONE_OPEN - REQ_OP_ZONE_CLOSE - REQ_OP_ZONE_FINISH - REQ_OP_ZONE_RESET - REQ_OP_ZONE_APPEND The zone append feature uses the `addr` field of `struct ublksrv_io_cmd` to communicate ALBA back to the kernel. Therefore ublk must be used with the user copy feature (UBLK_F_USER_COPY) for zoned storage support to be available. Without this feature, ublk will not allow zoned storage support. Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230804114610.179530-4-nmi@metaspace.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-04ublk: add control command of UBLK_U_CMD_GET_FEATURESMing Lei1-0/+8
Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's feature set or capability. This way can simplify userspace for maintaining compatibility because userspace doesn't need to send command to one device for querying driver feature set any more. Such as, with the queried feature set, userspace can choose to use: - UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO, - UBLK_U_CMD_* or UBLK_CMD_* Userspace code: https://github.com/ming1/ubdsrv/commits/features-cmd Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230603040601.775227-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: support user copyMing Lei1-0/+3
Currently copy between io request buffer(pages) and userspace buffer is done inside ublk_map_io() or ublk_unmap_io(). This way performs very well in case of pre-allocated userspace io buffer. For dynamically allocated or external userspace backend io buffer, UBLK_F_NEED_GET_DATA is added for ublk server to provide buffer by one extra command communication for WRITE request. For READ, userspace simply provides buffer, but can't know when the buffer is done[1]. Add UBLK_F_USER_COPY by moving io data copy out of kernel by providing read()/write() on /dev/ublkcN, and simply let ublk server do the io data copy. This way makes both side cleaner, the cost is that one extra syscall for copy io data between request and backend buffer. With UBLK_F_USER_COPY, it actually becomes possible to run per-io zero copy now, such as, only do zero copy for big size IO, so it can be thought as one prep patch for supporting zero copy. Meantime zero copy still needs to expose read()/write() buffer for some corner case, such as passthrough IO. [1] READ buffer in UBLK_F_NEED_GET_DATA https://lore.kernel.org/linux-block/116d8a56-0881-56d3-9bcc-78ff3e1dc4e5@linux.alibaba.com/T/#m23bd4b8634c0a054e6797063167b469949a247bb ublksrv loop usercopy code: https://github.com/ming1/ubdsrv/commits/usercopy Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: add read()/write() support for ublk char deviceMing Lei1-1/+21
Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so data copy between io request buffer and userspace buffer can be moved to ublk server from ublk driver. Then UBLK_F_NEED_GET_DATA becomes not necessary, so ublk server can allocate buffer without one extra round uring command communication for userspace to provide buffer. IO buffer can be located by iocb->ki_pos which encodes buffer offset, io tag and queue id info, and type of iocb->ki_pos is u64, so it is big enough for holding reasonable queue depth, nr_queues and max io buffer size. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19block: ublk: switch to ioctl command encodingMing Lei1-0/+43
All ublk commands(control, IO) should have taken ioctl command encoding from the beginning, because ioctl command encoding defines each code uniquely, so driver can figure out wrong command sent from userspace easily; 2) it might help security subsystem for audit uring cmd[1]. Unfortunately we didn't do that way, and it could be one lesson for ublk driver. So switch to ioctl command encoding now, we still support commands encoded in old way, but they become legacy definition. Any new command should take ioctl encoding. See ublksrv code for switching to ioctl command encoding in [2]. [1] https://lore.kernel.org/io-uring/CAHC9VhSVzujW9LOj5Km80AjU0EfAuukoLrxO6BEfnXeK_s6bAg@mail.gmail.com/ [2] https://github.com/ming1/ubdsrv/commits/ioctl_cmd_encoding Cc: Christoph Hellwig <hch@infradead.org> Cc: Ken Kurematsu <k.kurematsu@nskint.co.jp> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230418131810.855959-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30ublk_drv: add mechanism for supporting unprivileged ublk deviceMing Lei1-2/+34
unprivileged ublk device is helpful for container use case, such as: ublk device created in one unprivileged container can be controlled and accessed by this container only. Implement this feature by adding flag of UBLK_F_UNPRIVILEGED_DEV, and if this flag isn't set, any control command has been run from privileged user. Otherwise, any control command can be sent from any unprivileged user, but the user has to be permitted to access the ublk char device to be controlled. In case of UBLK_F_UNPRIVILEGED_DEV: 1) for command UBLK_CMD_ADD_DEV, it is always allowed, and user needs to provide owner's uid/gid in this command, so that udev can set correct ownership for the created ublk device, since the device owner uid/gid can be queried via command of UBLK_CMD_GET_DEV_INFO. 2) for other control commands, they can only be run successfully if the current user is allowed to access the specified ublk char device, for running the permission check, path of the ublk char device has to be provided by these commands. Also add one control of command UBLK_CMD_GET_DEV_INFO2 which always include the char dev path in payload since userspace may not have knowledge if this device is created in unprivileged mode. For applying this mechanism, system administrator needs to take the following policies: 1) chmod 0666 /dev/ublk-control 2) change ownership of ublkcN & ublkbN - chown owner_uid:owner_gid /dev/ublkcN - chown owner_uid:owner_gid /dev/ublkbN Both can be done via one simple udev rule. Userspace: https://github.com/ming1/ubdsrv/tree/unprivileged-ublk 'ublk add -t $TYPE --un_privileged=1' is for creating one un-privileged ublk device if the user is un-privileged. Link: https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/ Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230106041711.914434-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30ublk_drv: add device parameter UBLK_PARAM_TYPE_DEVTMing Lei1-0/+13
Userspace side only knows device ID, but the associated path of ublkc* and ublkb* could be changed by udev, and that depends on userspace's policy, so add parameter of UBLK_PARAM_TYPE_DEVT for retrieving major/minor of the ublkc* and ublkb*, then user may figure out major/minor of the ublk disks he/she owns. With major/minor, it is easy to find the device node path. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230106041711.914434-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-24ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY supportZiyangZhang1-1/+2
START_USER_RECOVERY and END_USER_RECOVERY are two new control commands to support user recovery feature. After a crash, user should send START_USER_RECOVERY, it will: (1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was set by quiesce_work and (b)chardev is released (2) reinit all ubqs, including: (a) put the task_struct and reset ->ubq_daemon to NULL. (b) reset all ublk_io. (3) reset ub->mm to NULL. Then, user should start a new process and send FETCH_REQ on each ubq_daemon. Finally, user should send END_USER_RECOVERY, it will: (1) wait for all new ubq_daemons getting ready. (2) update ublksrv_pid (3) unquiesce the request queue and expect incoming ublk_queue_rq() (4) convert ub's state to UBLK_S_DEV_LIVE Note: we can handle STOP_DEV between START_USER_RECOVERY and END_USER_RECOVERY. This is helpful to users who cannot start new process after sending START_USER_RECOVERY ctrl-cmd. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-7-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-24ublk_drv: support UBLK_F_USER_RECOVERY_REISSUEZiyangZhang1-0/+2
UBLK_F_USER_RECOVERY_REISSUE implies that: With a dying ubq_daemon, ublk_drv let monitor_work requeues rq issued to userspace(ublksrv) before the ubq_daemon is dying. UBLK_F_USER_RECOVERY_REISSUE is designed for backends which: (1) tolerate double-write since ublk_drv may issue the same rq twice. (2) does not let frontend users get I/O error, such as read-only FS and VM backend. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-6-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-24ublk_drv: define macros for recovery feature and check themZiyangZhang1-0/+3
Define some macros for recovery feature. UBLK_S_DEV_QUIESCED implies that ublk_device is quiesced and is ready for recovery. This state can be observed by userspace. UBLK_F_USER_RECOVERY implies that: (1) ublk_drv enables recovery feature. It won't let monitor_work to automatically abort rqs and release the device. (2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to userspace(ublksrv) before crash. (3) With a dying ubq_daemon, in task work and ublk_queue_rq(), ublk_drv requeues rqs. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-3-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03ublk_cmd.h: add one new ublk command: UBLK_IO_NEED_GET_DATAZiyangZhang1-0/+18
Add one new ublk command: UBLK_IO_NEED_GET_DATA. It is prepared for a new feature designed for a user application who wants to allocate IO buffer and set IO buffer address only after it receives an IO request from ublksrv. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Link: https://lore.kernel.org/r/c8a64b6b51c78340da7daa9e1054608695e79619.1659011443.git.ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03ublk_drv: cleanup ublksrv_ctrl_dev_infoMing Lei1-7/+8
Remove all block device related info from ublksrv_ctrl_dev_info, meantime reduce its size into 64 bytes because: 1) ublksrv_ctrl_dev_info becomes cleaner without including any block related info 2) generic set/get parameter command can be used to set block related setting easily and cleanly 3) generic set/get parameter command can be used for extending ublk without needing more info in ublksrv_ctrl_dev_info Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220730092750.1118167-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03ublk_drv: add SET_PARAMS/GET_PARAMS control commandMing Lei1-0/+47
Add two commands to set/get parameters generically. One important goal of ublk is to provide generic framework for making block device by userspace flexibly. As one generic block device, there are still lots of block parameters, such as max_sectors, write_cache/fua, discard related limits, zoned parameters, ...., so this patch starts to add generic mechanism for set/get device parameters. Both generic block parameters(all kinds of queue settings) and ublk feature parameters can be covered with this way, then it becomes quite easy to extend in future. Add two parameter types are used so far: basic(covers basic queue setting and misc settings which can't be grouped easily) and discard, basic type must be set, and discard type becomes optional now This way provides mechanism to simulate any kind of generic block device from userspace easily, from both block queue setting viewpoint or ublk feature viewpoint. The style of putting all parameters together is suggested by Christoph. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220730092750.1118167-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-22ublk_drv: make sure that correct flags(features) returned to userspaceMing Lei1-3/+4
Userspace may support more features or new added flags, but the driver side can be old, so make sure correct flags(features) returned to userpsace, then userspace can work as expected. Also mark the 2nd flags as reversed, just use the 1st one. When we run out of flags, the reserved one can be handled at that time. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220722103817.631258-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-21ublk: remove UBLK_IO_F_PREFLUSHChristoph Hellwig1-1/+0
REQ_PREFLUSH is turned into REQ_OP_FLUSH by the flush state machine and thus never seen by a blk-mq based driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220721130916.1869719-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-18ublk: remove UBLK_IO_F_INTEGRITYChristoph Hellwig1-1/+0
The ublk protocol has no mechanism to actually transfer the integrity metadata, so don't define this flag, which requires that an integrity payload is attached to a bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220718063013.335531-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14ublk_drv: support to complete io command via task_work_addMing Lei1-0/+6
Use task_work_add if it is available, since task_work_add can bring up better performance, especially batching signaling ->ubq_daemon can be done. It is observed that task_work_add() can boost iops by +4% on random 4k io test. Also except for completing io command, all other code paths are same with completing io command via io_uring_cmd_complete_in_task. Meantime add one flag of UBLK_F_URING_CMD_COMP_IN_TASK for comparing the mode easily. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220713140711.97356-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14ublk_drv: add io_uring based userspace block driverMing Lei1-0/+156
This is the driver part of userspace block driver(ublk driver), the other part is userspace daemon part(ublksrv)[1]. The two parts communicate by io_uring's IORING_OP_URING_CMD with one shared cmd buffer for storing io command, and the buffer is read only for ublksrv, each io command is indexed by io request tag directly, and is written by ublk driver. For example, when one READ io request is submitted to ublk block driver, ublk driver stores the io command into cmd buffer first, then completes one IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to ublk driver beforehand by ublksrv for getting notification of any new io request, and each URING_CMD is associated with one io request by tag. After ublksrv gets the io command, it translates and handles the ublk io request, such as, for the ublk-loop target, ublksrv translates the request into same request on another file or disk, like the kernel loop block driver. In ublksrv's implementation, the io is still handled by io_uring, and share same ring with IORING_OP_URING_CMD command. When the target io request is done, the same IORING_OP_URING_CMD is issued to ublk driver for both committing io request result and getting future notification of new io request. Another thing done by ublk driver is to copy data between kernel io request and ublksrv's io buffer: 1) before ubsrv handles WRITE request, copy the request's data into ublksrv's userspace io buffer, so that ublksrv can handle the write request 2) after ubsrv handles READ request, copy ublksrv's userspace io buffer into this READ request, then ublk driver can complete the READ request Zero copy may be switched if mm is ready to support it. ublk driver doesn't handle any logic of the specific user space driver, so it is small/simple enough. [1] ublksrv https://github.com/ming1/ubdsrv Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220713140711.97356-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>